flume源码学习9

314598340 · 发表于 2019-1-30 09:31:10

　　HDFSEventSink用于把数据从channel中拿出来（主动pull的形式）然后放到hdfs中，HDFSEventSink在启动时会启动两个线程池callTimeoutPool 和timedRollerPool ，callTimeoutPool 用于运行append/flush等操作hdfs的task（通过callWithTimeout方法调用，并实现timeout功能），用于运行翻转文件的计划任务timedRollerPool：

callTimeoutPool = Executors.newFixedThreadPool(threadsPoolSize,
         new ThreadFactoryBuilder().setNameFormat(timeoutName).build());
timedRollerPool = Executors.newScheduledThreadPool(rollTimerPoolSize,
         new ThreadFactoryBuilder().setNameFormat(rollerName).build());　　channel到sink的操作最终调用了sink的process方法（由SinkProcessor实现类调用），比如HDFSEventSink的process方法,每个process方法中都是一个事务，用来提供原子性操作，process方法调用Channel的take方法从Channel中取出Event，每个transaction中最多的Event数量由hdfs.batchSize设定，默认是100，对每一个Event有如下操作：
1.获取文件的完整路径和名称lookupPath
2.声明一个BucketWriter对象和HDFSWriter 对象，HDFSWriter由hdfs.fileType设定，负责实际数据的写入，BucketWriter可以理解成对hdfs文件和写入方法的封装，每个lookupPath对应一个BucketWriter对象，对应关系写入到sfWriters中（这里sfWriters是一个WriterLinkedHashMap对象，WriterLinkedHashMap是LinkedHashMap的子类（private static class WriterLinkedHashMap  extends LinkedHashMap），用来存放文件到BucketWriter的对应关系，在start方法中初始化：
this.sfWriters = new WriterLinkedHashMap( maxOpenFiles);
长度为hdfs.maxOpenFiles的设置，默认为5000，这个代表最多能打开的文件数量)
3.调用BucketWriter的append方法写入数据
4.当操作的Event数量达到hdfs.batchSize设定后，循环调用每个BucketWriter对象的flush方法，并提交transaction
5.如果出现异常则回滚事务
6.最后关闭transaction
process方法最后返回的是代表Sink状态的Status对象（BACKOFF或者READY），这个可以用于判断Sink的健康状态，比如failover的SinkProcessor就根据这个来判断Sink是否可以提供服务
主要方法分析：
1.构造函数声明一个HDFSWriterFactory对象
在后面会使用HDFSWriterFactory的getWriter方法会根据file类型返回对应的HDFSWriter实现类
2.configure
1)通过configure方法会根据Context设置各种参数项
比如：

inUseSuffix = context.getString( "hdfs.inUseSuffix", defaultInUseSuffix ); //正在写入的文件的后缀名，默认为".tmp"
rollInterval = context.getLong( "hdfs.rollInterval", defaultRollInterval ); //文件翻转时间，默认30
rollSize = context.getLong( "hdfs.rollSize", defaultRollSize ); //文件翻转大小，默认1024
rollCount = context.getLong( "hdfs.rollCount", defaultRollCount ); //默认为10
batchSize = context.getLong( "hdfs.batchSize", defaultBatchSize ); //默认为100
idleTimeout = context.getInteger( "hdfs.idleTimeout", 0); //默认为
String codecName = context.getString( "hdfs.codeC"); //压缩格式
fileType = context.getString( "hdfs.fileType", defaultFileType ); //默认为HDFSWriterFactory.SequenceFileType，即sequencefile
maxOpenFiles = context.getInteger( "hdfs.maxOpenFiles", defaultMaxOpenFiles ); //默认为5000
callTimeout = context.getLong( "hdfs.callTimeout", defaultCallTimeout ); //BucketWriter超时时间，默认为10000
threadsPoolSize = context.getInteger( "hdfs.threadsPoolSize",
      defaultThreadPoolSize); //操作append/open/close/flush任务的线程池大小，默认为10
rollTimerPoolSize = context.getInteger( "hdfs.rollTimerPoolSize",
      defaultRollTimerPoolSize); //文件翻转计时器线程池大小，默认为1
tryCount = context.getInteger( "hdfs.closeTries", defaultTryCount ); //尝试close文件的此数（大于0）
retryInterval = context.getLong( "hdfs.retryInterval", defaultRetryInterval); //间隔时间（大于0）　　2)获取压缩格式

if (codecName == null) { //如果hdfs.codeC没有设置
   codeC = null; //则没有压缩功能
   compType = CompressionType. NONE;
} else {
   codeC = getCodec(codecName);  //调用getCodec方法获取压缩格式
   // TODO : set proper compression type
   compType = CompressionType. BLOCK; //压缩类型为BLOCK类型
}　　3）hdfs文件翻转相关设置，在实例化BucketWriter对象时会用到
needRounding = context.getBoolean( "hdfs.round", false );
if(needRounding) {
   String unit = context.getString( "hdfs.roundUnit", "second" );
   if (unit.equalsIgnoreCase( "hour")) {
      this.roundUnit = Calendar.HOUR_OF_DAY;
   } else if (unit.equalsIgnoreCase("minute" )) {
      this.roundUnit = Calendar.MINUTE;
   } else if (unit.equalsIgnoreCase("second" )){
      this.roundUnit = Calendar.SECOND;
   } else {
      LOG.warn("Rounding unit is not valid, please set one of" +
         "minute, hour, or second. Rounding will be disabled" );
      needRounding = false ;
   }
   this. roundValue = context.getInteger("hdfs.roundValue" , 1);
   if(roundUnit == Calendar. SECOND || roundUnit == Calendar.MINUTE){
      Preconditions. checkArgument(roundValue > 0 && roundValue  0 && roundValue

账号		自动登录	找回密码
密码			立即注册

VMware vcenter+vSphere 6.5 U2共享

【跟谁学】韩宇极简英语课-技术人员不得不

用Zabbix通过JMX方式监控weblogic

winhex数据恢复教程（非常巨大，内容丰富）

Symantec Backup Exec 2015 2016/2012 BE20

NetScaler VPX部署之：NetScaler Gateway调

zabbix3.4.1安装部署+微信推送信息+大屏显

[经验分享] flume源码学习9

扫码加入运维网微信交流群