设为首页 收藏本站
查看: 1542|回复: 0

[经验分享] Apache Kafka源码分析 – ReplicaManager

[复制链接]

尚未签到

发表于 2015-8-3 03:55:24 | 显示全部楼层 |阅读模式
  如果说controller作为master,负责全局的事情,比如选取leader,reassignment等
那么ReplicaManager就是worker,负责接收controller的command以完成replica的管理工作

  而command主要有两种,LeaderAndISRCommand和StopReplicaCommand
  StopReplicaCommand
  处理很简单,主要就是停止fetcher线程,并删除partition目录
  stopReplicas

DSC0000.gif DSC0001.gif   def stopReplicas(stopReplicaRequest: StopReplicaRequest): (mutable.Map[TopicAndPartition, Short], Short) = {
replicaStateChangeLock synchronized { // 加锁
val responseMap = new collection.mutable.HashMap[TopicAndPartition, Short]
if(stopReplicaRequest.controllerEpoch < controllerEpoch) { // 检查Epoch,防止收到过期的request
(responseMap, ErrorMapping.StaleControllerEpochCode)
} else {
controllerEpoch = stopReplicaRequest.controllerEpoch // 更新Epoch
// First stop fetchers for all partitions, then stop the corresponding replicas
replicaFetcherManager.removeFetcherForPartitions(stopReplicaRequest.partitions.map(r => TopicAndPartition(r.topic, r.partition))) // 先通过FetcherManager停止相关partition的Fetcher线程
for(topicAndPartition
leaderPartitionsLock synchronized {
leaderPartitions -= partition
}
if(deletePartition) { // 仅仅在deletePartition=true时,才会真正删除该partition
val removedPartition = allPartitions.remove((topic, partitionId))
if (removedPartition != null)
removedPartition.delete() // this will delete the local log
}
case None => //do nothing if replica no longer exists. This can happen during delete topic retries
}
}  
  LeaderAndISRCommand
  becomeLeaderOrFollower
做些epoch和valid的检查,然后区分出leader和follows,分别调用makeLeaders,makeFollowers


  def becomeLeaderOrFollower(leaderAndISRRequest: LeaderAndIsrRequest): (collection.Map[(String, Int), Short], Short) = {
replicaStateChangeLock synchronized {// 加锁
val responseMap = new collection.mutable.HashMap[(String, Int), Short]
if(leaderAndISRRequest.controllerEpoch < controllerEpoch) { // 检查requset epoch
(responseMap, ErrorMapping.StaleControllerEpochCode)
} else {
val controllerId = leaderAndISRRequest.controllerId
val correlationId = leaderAndISRRequest.correlationId
controllerEpoch = leaderAndISRRequest.controllerEpoch
// First check partition's leader epoch
        // 前面只是检查了request的epoch,但是还要检查其中的每个partitionStateInfo中的leader epoch
val partitionState = new HashMap[Partition, PartitionStateInfo]()
leaderAndISRRequest.partitionStateInfos.foreach{ case ((topic, partitionId), partitionStateInfo) =>
val partition = getOrCreatePartition(topic, partitionId, partitionStateInfo.replicationFactor) // get或创建partition
val partitionLeaderEpoch = partition.getLeaderEpoch()
// If the leader epoch is valid record the epoch of the controller that made the leadership decision.
// This is useful while updating the isr to maintain the decision maker controller's epoch in the zookeeper path
if (partitionLeaderEpoch < partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leaderEpoch) { // local的partitionLeaderEpoch要小于request中的leaderEpoch,否则就是过时的request
if(partitionStateInfo.allReplicas.contains(config.brokerId)) // 判断该partition是否被assigned给当前的broker
partitionState.put(partition, partitionStateInfo)
else { }
} else { // Received invalid LeaderAndIsr request
// Otherwise record the error code in response
responseMap.put((topic, partitionId), ErrorMapping.StaleLeaderEpochCode)
}
}
val partitionsTobeLeader = partitionState
.filter{ case (partition, partitionStateInfo) => partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leader == config.brokerId}
val partitionsToBeFollower = (partitionState -- partitionsTobeLeader.keys)
if (!partitionsTobeLeader.isEmpty) makeLeaders(controllerId, controllerEpoch, partitionsTobeLeader, leaderAndISRRequest.correlationId, responseMap)
if (!partitionsToBeFollower.isEmpty) makeFollowers(controllerId, controllerEpoch, partitionsToBeFollower, leaderAndISRRequest.leaders, leaderAndISRRequest.correlationId, responseMap)
// we initialize highwatermark thread after the first leaderisrrequest. This ensures that all the partitions
// have been completely populated before starting the checkpointing there by avoiding weird race conditions
if (!hwThreadInitialized) {
startHighWaterMarksCheckPointThread() // 启动HighWaterMarksCheckPointThread
hwThreadInitialized = true
}
replicaFetcherManager.shutdownIdleFetcherThreads()
(responseMap, ErrorMapping.NoError)
}
}
}  
makeLeaders
停止Fetcher,调用partition.makeLeader,把这些partition加到leaderPartitions中


  /*
* Make the current broker to become leader for a given set of partitions by:
*
* 1. Stop fetchers for these partitions
* 2. Update the partition metadata in cache
* 3. Add these partitions to the leader partitions set
*
* If an unexpected error is thrown in this function, it will be propagated to KafkaApis where
* the error message will be set on each partition since we do not know which partition caused it
*  TODO: the above may need to be fixed later
*/
private def makeLeaders(controllerId: Int, epoch: Int,
partitionState: Map[Partition, PartitionStateInfo],
correlationId: Int, responseMap: mutable.Map[(String, Int), Short]) = {
try {
// First stop fetchers for all the partitions
replicaFetcherManager.removeFetcherForPartitions(partitionState.keySet.map(new TopicAndPartition(_)))
// Update the partition information to be the leader
partitionState.foreach{ case (partition, partitionStateInfo) =>
partition.makeLeader(controllerId, partitionStateInfo, correlationId)}
// Finally add these partitions to the list of partitions for which the leader is the current broker
leaderPartitionsLock synchronized {
leaderPartitions ++= partitionState.keySet
}
} catch {
}
}  
makeFollowers
除了修改leaderPartitions和Mark as followers以外
作为followers,需要truncated log到highWatermark,然后启动fetcher去catch leader
  
  /*
* Make the current broker to become follower for a given set of partitions by:
*
* 1. Remove these partitions from the leader partitions set.
* 2. Mark the replicas as followers so that no more data can be added from the producer clients.
* 3. Stop fetchers for these partitions so that no more data can be added by the replica fetcher threads.
* 4. Truncate the log and checkpoint offsets for these partitions.
* 5. If the broker is not shutting down, add the fetcher to the new leaders.
*
* The ordering of doing these steps make sure that the replicas in transition will not
* take any more messages before checkpointing offsets so that all messages before the checkpoint
* are guaranteed to be flushed to disks
*
* If an unexpected error is thrown in this function, it will be propagated to KafkaApis where
* the error message will be set on each partition since we do not know which partition caused it
*/
private def makeFollowers(controllerId: Int, epoch: Int, partitionState: Map[Partition, PartitionStateInfo],
leaders: Set[Broker], correlationId: Int, responseMap: mutable.Map[(String, Int), Short]) {
try {
leaderPartitionsLock synchronized {
leaderPartitions --= partitionState.keySet
}
partitionState.foreach{ case (partition, leaderIsrAndControllerEpoch) =>
partition.makeFollower(controllerId, leaderIsrAndControllerEpoch, leaders, correlationId)}
replicaFetcherManager.removeFetcherForPartitions(partitionState.keySet.map(new TopicAndPartition(_)))
logManager.truncateTo(partitionState.map{ case(partition, leaderISRAndControllerEpoch) => // 将当前replica的log truncate到highWatermark,因为只有committed的数据是可以保证和leader一致的
new TopicAndPartition(partition) -> partition.getOrCreateReplica().highWatermark
})
if (!isShuttingDown.get()) { // 如果该broker没有shutting down
val partitionAndOffsets = mutable.Map[TopicAndPartition, BrokerAndInitialOffset]()
partitionState.foreach {
case (partition, partitionStateInfo) =>
val leader = partitionStateInfo.leaderIsrAndControllerEpoch.leaderAndIsr.leader // 找到leader
leaders.find(_.id == leader) match {
case Some(leaderBroker) =>
partitionAndOffsets.put(new TopicAndPartition(partition),   // get当前replica的logEndOffset
BrokerAndInitialOffset(leaderBroker, partition.getReplica().get.logEndOffset))
case None =>
}
}
replicaFetcherManager.addFetcherForPartitions(partitionAndOffsets) // 启动Fetcher去catch leader
}
else { }
}
} catch {
}
}

  
checkpointHighWatermarks
对于每个replica而已,HighWatermarks是很重要的,因为只有通过它可以知道到底哪些数据是一致的,这样就算broker crash,恢复的时候只需要基于HighWatermarks继续catch就可以
所以对于HighWatermarks,需要做cp


  /**
* Flushes the highwatermark value for all partitions to the highwatermark file
*/
def checkpointHighWatermarks() {
val replicas = allPartitions.values.map(_.getReplica(config.brokerId)).collect{case Some(replica) => replica}
val replicasByDir = replicas.filter(_.log.isDefined).groupBy(_.log.get.dir.getParent)
for((dir, reps)  (new TopicAndPartition(r) -> r.highWatermark)).toMap
try {
highWatermarkCheckpoints(dir).write(hwms)
} catch {
case e: IOException =>
fatal("Error writing to highwatermark file: ", e)
Runtime.getRuntime().halt(1)
}
}
}  

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-93413-1-1.html 上篇帖子: Apache Kafka源码分析 – Log Management 下篇帖子: Apache Kafka源码分析 – Replica and Partition
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表