第14课：Spark Streaming源码解读之State管理之updateStateByKey和mapWithState解密

zhouu · 发表于 2019-1-31 06:49:02

/**　　
* RDD storing the keyed states of `mapWithState` operation and corresponding mapped data.
　　
* Each partition of this RDD has a single record of type `MapWithStateRDDRecord`. This contains a
　　
* `StateMap` (containing the keyed-states) and the sequence of records returned by the mapping
　　
* function of  `mapWithState`.
　　
* @param prevStateRDD The previous MapWithStateRDD on whose StateMap data `this` RDD
　　
  *                   will be created
　　
* @param partitionedDataRDD The partitioned data RDD which is used update the previous StateMaps
　　
*                         in the `prevStateRDD` to create `this` RDD
　　
* @param mappingFunction  The function that will be used to update state and return new data
　　
* @param batchTime       The time of the batch to which this RDD belongs to. Use to update
　　
* @param timeoutThresholdTime The time to indicate which keys are timeout
　　
*/private[streaming] class MapWithStateRDD[K: ClassTag, V: ClassTag, S: ClassTag, E: ClassTag]( private var prevStateRDD: RDD[MapWithStateRDDRecord[K, S, E]], private var partitionedDataRDD: RDD[(K, V)],
　　
mappingFunction: (Time, K, Option[V], State[S]) => Option[E],
　　
batchTime: Time,
　　
timeoutThresholdTime: Option[Long]
　　
  ) extends RDD[MapWithStateRDDRecord[K, S, E]](
　　
partitionedDataRDD.sparkContext,
　　
List(    new OneToOneDependency[MapWithStateRDDRecord[K, S, E]](prevStateRDD),    new OneToOneDependency(partitionedDataRDD))
　　
  ) {
　　

　　
  @volatile private var doFullScan = false
　　

　　
  require(prevStateRDD.partitioner.nonEmpty)
　　
  require(partitionedDataRDD.partitioner == prevStateRDD.partitioner)  override val partitioner = prevStateRDD.partitioner  override def checkpoint(): Unit = {
　　
super.checkpoint()
　　
doFullScan = true
　　
  }  override def compute(
　　
   partition: Partition, context: TaskContext): Iterator[MapWithStateRDDRecord[K, S, E]] = { val stateRDDPartition = partition.asInstanceOf[MapWithStateRDDPartition] val prevStateRDDIterator = prevStateRDD.iterator(
　　
   stateRDDPartition.previousSessionRDDPartition, context) val dataIterator = partitionedDataRDD.iterator(
　　
   stateRDDPartition.partitionedDataRDDPartition, context) val prevRecord = if (prevStateRDDIterator.hasNext) Some(prevStateRDDIterator.next()) else None val newRecord = MapWithStateRDDRecord.updateRecordWithData(
　　
   prevRecord,
　　
   dataIterator,
　　
   mappingFunction,
　　
   batchTime,
　　
   timeoutThresholdTime,
　　
   removeTimedoutData = doFullScan // remove timedout data only when full scan is enabled
　　
)
　　
Iterator(newRecord)
　　
  }

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] 第14课：Spark Streaming源码解读之State管理之updateStateByKey和mapWithState解密

浏览过的版块

扫码加入运维网微信交流群