At most once—this handles the first case described. Messages are immediately marked as consumed, so they can't be given out twice, but many failure scenarios may lead to losing messages.
At least once—this is the second case where we guarantee each message will be delivered at least once, but in failure cases may be delivered twice.
Exactly once—this is what people actually want, each message is delivered once and only once.
第一种情况是将消费的状态存储在了broker端,一旦消费了就改变状态,但会因为网络原因少消费信息,第二种是存在两端,并且先在broker端将状态记为send,等consumer处理完之后将状态标记为consumed,但也有可能因为在处理消息时产生异常,导致状态标记错误等,并且会产生性能的问题。第三种当然是最好的结果。
Kafka解决这个问题采用high water mark这样的标记,也就是设置offset:
1
Kafka does two unusual things with respect to metadata. First the stream is partitioned on the brokers into asetof distinct partitions. The semantic meaning of these partitions is left up to the producer and the producer specifieswhichpartition a message belongs to. Within a partition messages are storedinthe order inwhichthey arrive at the broker, and will be given out to consumersinthat same order. This means that rather than store metadataforeach message (marking it as consumed, say), we just need to store the"high water mark"foreach combination of consumer, topic, and partition. Hence the total metadata required to summarize the state of the consumer is actually quite small. In Kafka we refer to this high-water mark as"theoffset" for reasons that will becomeclear in the implementation section.
所以在每次消费信息时,log4j中都会输出不同的offset:
1
[FetchRunnable-0] INFO : kafka.consumer.FetcherRunnable#info : FetchRunnable-0start fetching topic: test part:0offset: 0from 192.168.181.128:9092
2
3
[FetchRunnable-0] INFO : kafka.consumer.FetcherRunnable#info : FetchRunnable-0start fetching topic: test part:0offset: 15from 192.168.181.128:9092