设为首页 收藏本站
查看: 1197|回复: 0

[经验分享] Distributed Message System

[复制链接]

尚未签到

发表于 2015-9-17 09:40:26 | 显示全部楼层 |阅读模式
  http://dongxicheng.org/search-engine/log-systems/
  包括facebook的scribe,apache的chukwa,linkedin的kafka和cloudera的flume
  
  Kafka
  http://www.cnblogs.com/fxjwind/archive/2013/03/22/2975573.html
  http://www.cnblogs.com/fxjwind/archive/2013/03/19/2969655.html
  
  Flume
  Flume User Guide, http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html
  1.1. Architecture
  Flume’s architecture is simple, robust, and flexible.
DSC0000.jpg
  The graph above shows a typical deployment of Flume that collects log data from a set of application servers. The deployment consists of a number of logical nodes, arranged into three tiers. The first tier is the agent tier. Agent nodes are typically installed on the machines that generate the logs and are your data’s initial point of contact with Flume. They forward data to the next tier of collector nodes, which aggregate the separate data flows and forward them to the final storage tier.
  
  Logical nodes are a very flexible abstraction. Every logical node has just two components - a source and a sink.
  Both source and sink can additionally be configured with decorators which perform some simple processing on data as it passes through.
  
  The source tells a logical node where to collect data.
  The sink tells it where to send the data.
  The only difference between two logical nodes is how the source and sink are configured.
  The source, sink, and optional decorators are a powerful set of primitives.
  
  Logical and Physical Nodes
  It’s important to make the distinction between logical nodes and physical nodes. A physical node corresponds to a single Java process running on one machine in a single JVM instance. Usually there is just one physical node per machine.
  Physical nodes act as containers for logical nodes, which are wired together to form data flows. Each physical node can play host to many logical nodes, and takes care of arbitrating the assignment of machine resources between them.
  So, although the agents and the collectors in the preceding example are logically separate processes, they could be running on the same physical node.
  The Master assigns a configuration to each logical node at run-time - all components of a node’s configuration are instantiated dynamically at run-time, and therefore configurations can be changed many times throughout the lifetime of a Flume service without having to restart any Java processes or log into the machines themselves. In fact, logical nodes themselves can be created and deleted dynamically.
  
  1.2. Reliability
  Flume can guarantee that all data received by an agent node will eventually make it to the collector at the end of its flow as long as the agent node keeps running. That is, data can be reliably delivered to its eventual destination.
  这点做的似乎比kafka要好, 并且是可定制的, 分为以下几级, 用户可用根据需要任选:
  However, reliable delivery can be very resource intensive and is often a stronger guarantee than some data sources require. Therefore, Flume allows the user to specify, on a per-flow basis, the level of reliability required. There are three supported reliability levels:
  The end-to-end reliability level,
  The first thing the agent does in this setting is write the event to disk in a 'write-ahead log' (WAL) so that, if the agent crashes and restarts, knowledge of the event is not lost.
  After the event has successfully made its way to the end of its flow, an acknowledgment is sent back to the originating agent so that it knows it no longer needs to store the event on disk.
  This reliability level can withstand any number of failures downstream of the initial agent.
  
  The store on failure reliability level, only require an acknowledgement from the node one hop downstream.
  If the sending node detects a failure, it will store data on its local disk until the downstream node is repaired, or an alternate downstream destination can be selected.
  Data can be lost if a compound or silent failure occurs.
  
  The best-effort reliability level sends data to the next hop with no attempts to confirm or retry delivery. If nodes fail, any data that they were in the process of transmitting or receiving can be lost. This is the weakest reliability level, but also the most lightweight.
  
  1.3. Scalability
  Scalability is the ability to increase system performance linearly - or better - by adding more resources to the system. Flume’s goal is horizontal scalability — the ability to incrementally add more machines to the system to increase throughput.
  
  1.4. Manageability
  Manageability is the ability to control data flows, monitor nodes, modify settings, and control outputs of a large system.
  The Flume Master is the point where global state such as the data flows can be managed, by a web interface or the scriptable Flume command shell.
  Via the Flume Master, users can monitor flows on the fly, such as load imbalances, partial failures, or newly provisioned hardware.
  You can dynamically reconfigure nodes by using the Flume Master. You can reconfigure nodes by using small scripts written in a flexible dataflow specification language, which can be submitted via the Flume Master interface.
  
  1.5. Extensibility
  Extensibility is the ability to add new functionality to a system. For example, you can extend Flume by adding connectors to existing storage layers or data platforms.
  Some general sources include files from the file system, syslog and syslog-ng emulation, or the standard output of a process. More specific sources such as IRC channels and Twitter streams can also be added.
  Similarly, there are many output destinations for events. Although HDFS is the primary output destination, events can be sent to local files, or to monitoring and alerting applications such as Ganglia or communication channels such as IRC.
  
  3. Pseudo-distributed Mode
  Flume is intended to be run as a distributed system with processes spread out across many machines. It can also be run as several processes on a single machine, which is called “pseudo-distributed” mode.
  3.1. Starting Pseudo-distributed Flume Daemons
  There are two kinds of processes in the system: the Flume master and the Flume node.
  The Flume Master is the central management point and controls the data flows of the nodes. It is the single logical entity that holds global state data and controls the Flume node data flows and monitors Flume nodes.
  Flume nodes serve as the data path for streams of events. They can be the sources, conduits, and consumers of event data. The nodes periodically contact the Master to transmit a heartbeat and to get their data flow configuration.
  3.1.1. The Master
  The Master can be manually started by executing the following command:
$ flume master
  After the Master is started, you can access it by pointing a web browser to http://localhost:35871/. This web page displays the status of all Flume nodes that have contacted the Master, and shows each node’s currently assigned configuration. When you start this up without Flume nodes running, the status and configuration tables will be empty.
  不错, 提供的基于webUI的监控...

3.1.2. The Flume Node
  To start a Flume node, invoke the following command in another terminal.

$ flume node_nowatch
  To check whether a Flume node is up, point your browser to the Flume Node status page at http://localhost:35862/.

3.2. Configuring a Node via the Master
  Requiring nodes to contact the Master to get their configuration enables you to dynamically change the configuration of nodes without having to log in to the remote machine to restart the daemon. You can quickly change the node’s previous data flow configuration to a new one.
  The following describes how to "wire" nodes using the Master’s web interface.
  On the Master’s web page, click on the config link. You are presented with two forms. These are web interfaces for setting the node’s data flows. When Flume nodes contact the Master, they will notice that the data flow version has changed, instantiate, and activate the configuration.
  这个真的相当的方便, 打开WebUI, 就可用随便配置每个node的name, souce, sink…当下次node heartbeat时, 会自动更新自己的配置
  If you enter:
  Node name:  host
  Source:  text("/etc/services")
  Sink:  console("avrojson")
  You get the file with each record in JSON format displayed to the console.
  

3.5. Tiering Flume Nodes: Agents and Collectors
  A simple network connection is abstractly just another sink. It would be great if sending events over the network was easy, efficient, and reliable. In reality, collecting data from a distributed set of machines and relying on networking connectivity greatly increases the likelihood and kinds of failures that can occur. The bottom line is that providing reliability guarantees introduces complexity and many tradeoffs.
  为什么要给flume nodes分层, 读完直接写到存储层不行吗, 为什么要分成agents和collectors
  首先, agent工作的系统, 往往不是很稳定的, 有各种fail的可能, 而且在存储前, 如果对数据做些预处理和整合应该更有效一些.

4. Fully-distributed Mode

  Steps to Deploy Flume On a Cluster


  • Install Flume on each machine.
  • Select one or more nodes to be the Master.
  • Modify a static configuration file to use site specific properties.
  • Start the Flume Master node on at least one machine.
  • Start a Flume node on each machine.


4.2. Multiple Collectors

4.2.1. Partitioning Agents across Multiple Collectors
  The preceding graph and dataflow spec shows a typical topology for Flume nodes. For reliable delivery, in the event that the collector stops operating or disconnects from the agents, the agents would need to store their events to their respective disks locally. The agents would then periodically attempt to recontact a collector. Because the collector is down, any analysis or processing downstream is blocked.
  当一个collector fail了, agent可以把数据在本地做缓存, 直到collector恢复了, 继续发送.
  这个明显有点傻, 不是有其他的collector吗, 这个坏了用其他的好了, 人为去调整, Manually Specifying Failover Chains
  当然如果可以自动去调整, 更好, 不过not currently work when using multiple masters.
  

4.4. Multiple Masters
  The Master has two main jobs to perform. The first is to keep track of all the nodes in a Flume deployment and to keep them informed of any changes to their configuration. The second is to track acknowledgements from the end of a Flume flow that is operating in reliable mode so that the source at the top of that flow knows when to stop transmitting an event.
  这明显有单点问题...一挂全挂

4.4.3. Running in Distributed Mode
  Running the Flume Master in distributed mode provides better fault tolerance than in standalone mode, and scalability for hundreds of nodes.
  Configuring machines to run as part of a distributed Flume Master is nearly as simple as standalone mode. As before, flume.master.servers needs to be set, this time to a list of machines:

<property>
<name>flume.master.servers</name>
<value>masterA,masterB,masterC</value>
</property>
  How many machines do I need? The distributed Flume Master will continue to work correctly as long as more than half the physical machines running it are still working and haven’t crashed. Therefore if you want to survive one fault, you need three machines (because 3-1 = 2 > 3/2).
  为什么要半数以上, 就算剩一个, 当Standalone Mode跑, 不行吗, 暂时不明白...
  Each Master process will initially try and contact all other nodes in the ensemble. Until more than half (in this case, two) nodes are alive and contactable, the configuration store will be unable to start, and the Flume Master will not be able to read or write configuration data.
  不到半数不干活...
  

4.4.4. Configuration Stores
  The Flume Master stores all its data in a configuration store. Flume has a pluggable configuration store architecture, and supports two implementations.


  • The Memory-Backed Config Store (MBCS) stores configurations temporarily in memory. If the master node fails and reboots, all the configuration data will be lost. The MBCS is incompatible with distributed masters. However, it is very easy to administer, computationally lightweight, and good for testing and experimentation.
  • The ZooKeeper-Backed Config Store (ZBCS) stores configurations persistently and takes care of synchronizing them between multiple masters.
  Flume and Apache ZooKeeper . Flume relies on the Apache ZooKeeper coordination platform to provide reliable, consistent, and persistent storage for node configuration data. A ZooKeeper ensemble is made up of two or more nodes which communicate regularly with each other to make sure each is up to date. Flume embeds a ZooKeeper server inside the Master process, so starting and maintaining the service is taken care of. However, if you have an existing ZooKeeper service running, Flume supports using that external cluster as well.
  还是要靠zookeeper, 这玩意实在太有用了...

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-114781-1-1.html 上篇帖子: File should not roll when commit is outstanding 下篇帖子: 最完整的历史记录hadoop
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表