zookeeper 存储之文件格式分析
zookeeper主要存放了两类文件,一个是snapshot和log,前者是内存数的快照,后者类似mysql的binlog,将所有与修改数据相关的操作记录在log中,两类文件的目录可在配置文件中指定
下面通过几个典型的场景来分析两种文件的存储格式
snapshot文件格式
详见ZooKeeperServer.takeSnapshot,
列举1个简单的场景说明问题
场景 刚刚装了zookeeper,服务启动后会产生snapshot文件
000000005a 4b 53 4e 00 00 00 02ff ff ff ff ff ff ff ff|ZKSN............|
0000001000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|
00000020ff ff ff ff ff ff ff ff00 00 00 00 00 00 00 00|................|
0000003000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|
*
0000006000 00 00 00 00 00 00 0a2f 7a 6f 6f 6b 65 65 70|......../zookeep|
0000007065 72 00 00 00 00 ff ffff ff ff ff ff ff 00 00|er..............|
0000008000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|
*
000000b000 00 00 00 00 00 00 0000 00 00 00 00 10 2f 7a|............../z|
000000c06f 6f 6b 65 65 70 65 722f 71 75 6f 74 61 00 00|ookeeper/quota..|
000000d000 00 ff ff ff ff ff ffff ff 00 00 00 00 00 00|................|
000000e000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|
*
0000011000 00 00 00 00 00 00 0000 01 2f 00 00 00 00 ab|........../.....|
0000012010 2b d2 00 00 00 01 2f |.+...../|
00000128
分成4个部分
a)header
[*]magic:4个字节,“ZKSN”的int值对应 0x 5a 4b 53 4e 【对应偏移地址0x 00000000---0x 00000003】
[*]version:4个字节,默认为2对应 0x 00 00 00 02 【对应偏移地址0x 00000004---0x 00000007】
[*]dbid:8个字节,默认为-1 对应 0x ff ff ff ff ff ff ff ff 【对应偏移地址0x 00000008---0x 0000000f】
b)data
[*]count:session个数,4个字节 此时为0,如果不为0,会存放session的id和timeout,对应 0x 00 00 00 00 【对应偏移地址0x 00000010---0x 00000013】
[*]内存树:
[*]map: acl 映射的个数 4个字节,此时为0 对应0x 00 00 00 00 【对应偏移地址0x 00000013---0x 00000017】
[*]开始递归写node
[*]第一个node路径为""也即根节点
[*]path
[*]len: 4个字节 ,此时为0 对应 0x 00 00 00 00 【对应偏移地址0x 00000017---0x 0000001b】
[*]node
[*]data
[*]len: 4个字节,此时为0 ,对应 0x 00 00 00 00 【对应偏移地址0x 0000001c---0x 0000001f】
[*]acl: 8个字节,此时-1,对应 0x ff ff ff ff ff ff ff ff 【对应偏移地址0x 00000020---0x 00000027】
[*]statpersisted: 状态存储
[*]czxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000028---0x 0000002f】
[*]mzxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000030---0x 00000037】
[*]ctime : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000038---0x 0000003f】
[*]mtime : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000040---0x 00000047】
[*]version : 4个字节,此时为0 ,对应 0x 00 00 00 【对应偏移地址0x 00000048---0x 0000004b】
[*]cversion : 4个字节,此时为0 ,对应 0x 00 00 00【对应偏移地址0x 0000004c---0x 0000004f】
[*]aversion : 4个字节,此时为0 ,对应 0x 00 00 00 【对应偏移地址0x 00000050---0x 00000053】
[*]ephemeralOwner : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000054---0x 0000005b】
[*]pzxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 0000005c---0x 00000063】
[*]开始序列化第2个节点,即根节点的子节点(/zookeeper)
[*]path
[*]len: 4个字节 ,此时为/zookeeper的长度10对应 0x 00 00 00 0a 【对应偏移地址0x 00000064---0x 00000067】
[*]内容: 10个字节 此时为“/zookeeper”的ascii表示 0x 2f 7a 6f 6f 6b 65 65 70 65 72【对应偏移地址0x 00000068---0x 00000071】
[*]node: 此时和根节点一样,下面的字节和根节点一样
[*]data
[*]len: 4个字节,此时为0 ,对应 0x 00 00 00 00 【对应偏移地址0x 00000072---0x 00000075】
[*]acl: 8个字节,此时-1,对应 0x ff ff ff ff ff ff ff ff 【对应偏移地址0x 00000076---0x 0000007d】
[*]statpersisted: 状态存储
[*]czxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 0000007e---0x 00000085】
[*]mzxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000086---0x 0000008d】
[*]ctime : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 0000008e---0x 00000095】
[*]mtime : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000096---0x 0000009d】
[*]version : 4个字节,此时为0 ,对应 0x 00 00 00 【对应偏移地址0x 0000009e---0x 000000a1】
[*]cversion : 4个字节,此时为0 ,对应 0x 00 00 00【对应偏移地址0x 000000a2---0x 000000a5】
[*]aversion : 4个字节,此时为0 ,对应 0x 00 00 00 【对应偏移地址0x 000000a6---0x 000000a9】
[*]ephemeralOwner : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 000000aa---0x 000000b1】
[*]pzxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 000000b2---0x 000000b9】
[*]开始序列化第3个节点(/zookeeper的子节点/zookeeper/quota)
[*]path
[*]len: 16个字节 ,此时为”/zookeeper/quota“的长度16, 对应 0x 00 00 00 10 【对应偏移地址0x 000000ba---0x 000000bd】
[*]内容: 10个字节 此时为”/zookeeper/quota“的ascii表示0x 2f 7a 6f 6f 6b 65 65 70 65 722f 71 75 6f 74 61
【对应偏移地址0x 000000be---0x 000000cd】
[*]node: 此时和根节点一样,下面的字节和根节点一样
[*]data
[*]len: 4个字节,此时为0 ,对应 0x 00 00 00 00 【对应偏移地址0x 000000ce---0x 000000d1】
[*]acl: 8个字节,此时-1,对应 0x ff ff ff ff ff ff ff ff 【对应偏移地址0x 000000d2---0x 000000d9】
[*]statpersisted: 状态存储
[*]czxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 000000da---0x 000000e1】
[*]mzxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 000000e2---0x 000000e9】
[*]ctime : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 000000ea---0x 000000f1】
[*]mtime : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 000000f2---0x 000000f9】
[*]version : 4个字节,此时为0 ,对应 0x 00 00 00 【对应偏移地址0x 000000fa---0x 000000fd】
[*]cversion : 4个字节,此时为0 ,对应 0x 00 00 00【对应偏移地址0x 000000fe---0x 00000101】
[*]aversion : 4个字节,此时为0 ,对应 0x 00 00 00 【对应偏移地址0x 00000102---0x 00000105】
[*]ephemeralOwner : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000106---0x 0000010d】
[*]pzxid : 8个字节,此时为0 ,对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 0000010e---0x 00000115】
[*]树的结尾以"/"结束
[*]共5个字节,前4个表示长度为1,后面是"/"的ascii码0x 2f ,总共是0x 00 0000 01 2f 【对应偏移地址0x 00000116---0x 0000011a】
c) 校验码
通过Adler32校验算法,对前面的字节得出的一个校验码,占8个字节
0x 00 00 00 00 ab 10 2b d2 【对应偏移地址0x 0000011b---0x 00000122】
d)结束符
和内存树一样以"/"为结束符
共5个字节,前4个表示长度为1,后面是"/"的ascii码0x 2f ,总共是0x 00 00 00 01 2f 【对应偏移地址0x 00000123---0x 00000127】
log文件格式
详见FileTxnLog.append
场景1) 启动一个客户端
此时会跟据当前事务的id,此时为1,产生log.1的文件
1) fileheader
[*]magic:4个字节,“ZKLG”的int值对应 0x 5a 4b 4c 47 【对应偏移地址0x 00000000---0x 00000003】
[*]version:4个字节,默认为2对应 0x 00 00 00 02 【对应偏移地址0x 00000004---0x 00000007】
[*]dbid:8个字节,默认为0 对应 0x 00 00 00 00 00 00 00 00 【对应偏移地址0x 00000008---0x 0000000f】
2)请求内容
[*]txnEntryCRC(校验码,对于下面的txEntry)
[*]采用和snapshot同样的算法Adler32得到的长整数8个字节 0x 00 00 00 00 59 27 08 06【对应偏移地址0x 00000010---0x 00000017】
[*]txEntry
[*]内容长度:4个字节0x 00 00 00 24 【对应偏移地址0x 00000018---0x 000001b】
[*]hdr
[*]clientId:长整数8个字节 0x 01 3a 69 4e 19 1a 00 00 【对应偏移地址0x 0000001c---0x 00000023】
[*]cxid:此时为整数0,4个字节0x 00 00 00 00 【对应偏移地址0x 00000024---0x 00000027】
[*]zxid:此时为长整数1,8个字节 0x 00 00 00 00 00 00 00 01 【对应偏移地址0x 00000028---0x 0000002f】
[*]time:长整数8个字节 , 00 00 01 3a 69 4e ab af 【对应偏移地址0x 00000030---0x 00000037】
[*]type:操作码(码表见org.apache.zookeeper.ZooDefs.OpCode)此时为整数-10,4个字节0x ff ff ff f6 【对应偏移地址0x 00000038---0x 0000003b】
[*]txn
[*]timeOut:此时整数400000,4个字节0x 00 06 1a 80【对应偏移地址0x 0000003c---0x 0000003f】
[*]EOR
[*]写入一个固定的字节作为结尾:0x 42 【对应偏移地址0x 00000040】
此时为
12345678000000005a 4b 4c 47 00 00 00 0200 00 00 00 00 00 00 00|ZKLG............|0000001000 00 00 00 59 27 08 0600 00 00 24 01 3a 69 4e|....Y'.....$.:iN|0000002019 1a 00 00 00 00 00 0000 00 00 00 00 00 00 01|................|0000003000 00 01 3a 69 4e ab afff ff ff f6 00 06 1a 80|...:iN..........|0000004042 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|B...............|0000005000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|*04000010
场景2) 添加节点
zk.create("/root", "mydata".getBytes(), Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
开始添加txnEntry
[*]txnEntryCRC(校验码)
采用和snapshot同样的算法Adler32得到的长整数8个字节,
0x 00 00 00 00 ba 26 0c5e【对应偏移地址0x 00000041---0x 00000048】
[*]txnEntry
[*]内容长度:4个字节 , 0x 00 00 00 53 【对应偏移地址0x 00000049---0x 000004c】
[*]hdr:entry 头信息
[*]clientId:此时为长整数,8个字节 0x 01 3a 69 4e 19 1a 00 00 【对应偏移地址0x 0000004d---0x 00000054】
[*]cxid:此时为整数2,4个字节, 0x 00 00 0002 【对应偏移地址0x 00000055---0x 00000058】
[*]zxid:此时为长整数2,8个字节,0x 00 00 00 00 00 00 00 02 【对应偏移地址0x 00000059---0x 00000060】
[*]time:此时为长整数8个字节, 0x 00 00 01 3a 69 4f 3fa5 【对应偏移地址0x 00000061---0x 00000068】
[*]type:此时为整数1,4个字节, 0x 00 00 00 01 【对应偏移地址0x 00000069---0x 0000006c】
[*]txn: 节点内容
[*]path:此时为“/root”, 占用9个字节,前4个表示长度5,后5个为"/root"5个字符的ascii码, 0x 00 00 00 05 2f 72 6f 6f 74 【对应偏移地址0x 0000006d---0x 00000075】
[*]data:此时为"mydata"的字节数组,占用10个字节,前4个为长度6,后6个为"mydata"的字节数组,
0x 00 0000 06 6d 79 64 61 74 61【对应偏移地址0x 00000076---0x 0000007f】
[*]开始写acl信息
[*]acl: acl长度,占用4个字节,此时长度为1, 0x 00 00 00 01 【对应偏移地址0x 00000080---0x 00000083】
[*]e1:一条acl具体信息
[*]perms:4个字节,此时为整数31, 0x 00 00 00 1f 【对应偏移地址0x 00000084---0x 00000087】
[*]id
[*]scheme:此时为字符串“world”,占用9个字节,前4个为长度5,后5个“world”5个字符的ascii码,
0x 00 00 00 05 77 6f 72 6c 64 【对应偏移地址0x 00000088---0x 00000090】
[*]id:此时为字符串“anyone”,占用10个字节,前4个为长度6,后6个“anyone”6个字符的ascii码,
0x00 00 00 06 61 6e 796f 6e 65 【对应偏移地址0x 00000091---0x 0000009a】
[*]ephemeral:此时为“false”,占用1个字节,如果true,写1,false写0 , 0x 00 【对应偏移地址0x 0000009b】
[*]parentCVersion:此时为整数1,占用4个字节,0x00 00 00 01 【对应偏移地址0x 0000009c---0x 0000009f】
[*]EOR
[*]写入一个固定的字节作为结尾:0x 42 【对应偏移地址0x 000000a0】
1234567891011121314000000005a 4b 4c 47 00 00 00 0200 00 00 00 00 00 00 00|ZKLG............|0000001000 00 00 00 59 27 08 0600 00 00 24 01 3a 69 4e|....Y'.....$.:iN|0000002019 1a 00 00 00 00 00 0000 00 00 00 00 00 00 01|................|0000003000 00 01 3a 69 4e ab afff ff ff f6 00 06 1a 80|...:iN..........|0000004042 00 00 00 00 ba 26 0c5e 00 00 00 53 01 3a 69|B.....&.^...S.:i|000000504e 19 1a 00 00 00 00 0002 00 00 00 00 00 00 00|N...............|0000006002 00 00 01 3a 69 4f 3fa5 00 00 00 01 00 00 00|....:iO?........|0000007005 2f 72 6f 6f 74 00 0000 06 6d 79 64 61 74 61|./root....mydata|0000008000 00 00 01 00 00 00 1f00 00 00 05 77 6f 72 6c|............worl|0000009064 00 00 00 06 61 6e 796f 6e 65 00 00 00 00 01|d....anyone.....|000000a042 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|B...............|000000b000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|*04000010
场景3)再添加一个节点
String realPath = zk.create("/root/childone",
"childone".getBytes(), Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
1234567891011121314151617181920000000005a 4b 4c 47 00 00 00 0200 00 00 00 00 00 00 00|ZKLG............|0000001000 00 00 00 59 27 08 0600 00 00 24 01 3a 69 4e|....Y'.....$.:iN|0000002019 1a 00 00 00 00 00 0000 00 00 00 00 00 00 01|................|0000003000 00 01 3a 69 4e ab afff ff ff f6 00 06 1a 80|...:iN..........|0000004042 00 00 00 00 ba 26 0c5e 00 00 00 53 01 3a 69|B.....&.^...S.:i|000000504e 19 1a 00 00 00 00 0002 00 00 00 00 00 00 00|N...............|0000006002 00 00 01 3a 69 4f 3fa5 00 00 00 01 00 00 00|....:iO?........|0000007005 2f 72 6f 6f 74 00 0000 06 6d 79 64 61 74 61|./root....mydata|0000008000 00 00 01 00 00 00 1f00 00 00 05 77 6f 72 6c|............worl|0000009064 00 00 00 06 61 6e 796f 6e 65 00 00 00 00 01|d....anyone.....|000000a042 00 00 00 00 bc 21 10aa 00 00 00 5e 01 3a 69|B.....!.....^.:i|000000b04e 19 1a 00 00 00 00 0004 00 00 00 00 00 00 00|N...............|000000c003 00 00 01 3a 69 6a 309c 00 00 00 01 00 00 00|....:ij0........|000000d00e 2f 72 6f 6f 74 2f 6368 69 6c 64 6f 6e 65 00|./root/childone.|000000e000 00 08 63 68 69 6c 646f 6e 65 00 00 00 01 00|...childone.....|000000f000 00 1f 00 00 00 05 776f 72 6c 64 00 00 00 06|.......world....|0000010061 6e 79 6f 6e 65 00 0000 00 01 42 00 00 00 00|anyone.....B....|0000011000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|*04000010这个也是产生节点,和上面类似,不再赘述,可以分析出来
字节范围从 000000a1---0000010b
此次的cxid为4,zxid为3,type还是1
场景4)修改节点数据
zk.setData("/root/childone", "childonemodify".getBytes(), -1);
1234567891011121314151617181920212223242526000000005a 4b 4c 47 00 00 00 0200 00 00 00 00 00 00 00|ZKLG............|0000001000 00 00 00 59 27 08 0600 00 00 24 01 3a 69 4e|....Y'.....$.:iN|0000002019 1a 00 00 00 00 00 0000 00 00 00 00 00 00 01|................|0000003000 00 01 3a 69 4e ab afff ff ff f6 00 06 1a 80|...:iN..........|0000004042 00 00 00 00 ba 26 0c5e 00 00 00 53 01 3a 69|B.....&.^...S.:i|000000504e 19 1a 00 00 00 00 0002 00 00 00 00 00 00 00|N...............|0000006002 00 00 01 3a 69 4f 3fa5 00 00 00 01 00 00 00|....:iO?........|0000007005 2f 72 6f 6f 74 00 0000 06 6d 79 64 61 74 61|./root....mydata|0000008000 00 00 01 00 00 00 1f00 00 00 05 77 6f 72 6c|............worl|0000009064 00 00 00 06 61 6e 796f 6e 65 00 00 00 00 01|d....anyone.....|000000a042 00 00 00 00 bc 21 10aa 00 00 00 5e 01 3a 69|B.....!.....^.:i|000000b04e 19 1a 00 00 00 00 0004 00 00 00 00 00 00 00|N...............|000000c003 00 00 01 3a 69 6a 309c 00 00 00 01 00 00 00|....:ij0........|000000d00e 2f 72 6f 6f 74 2f 6368 69 6c 64 6f 6e 65 00|./root/childone.|000000e000 00 08 63 68 69 6c 646f 6e 65 00 00 00 01 00|...childone.....|000000f000 00 1f 00 00 00 05 776f 72 6c 64 00 00 00 06|.......world....|0000010061 6e 79 6f 6e 65 00 0000 00 01 42 00 00 00 00|anyone.....B....|00000110af 4a 0f 23 00 00 00 4801 3a 69 4e 19 1a 00 00|.J.#...H.:iN....|0000012000 00 00 07 00 00 00 0000 00 00 04 00 00 01 3a|...............:|0000013069 74 8f f3 00 00 00 0500 00 00 0e 2f 72 6f 6f|it........../roo|0000014074 2f 63 68 69 6c 64 6f6e 65 00 00 00 0e 63 68|t/childone....ch|0000015069 6c 64 6f 6e 65 6d 6f64 69 66 79 00 00 00 01|ildonemodify....|0000016042 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|B...............|0000017000 00 00 00 00 00 00 0000 00 00 00 00 00 00 00|................|*04000010这个也是修改节点,和上面类似,不再赘述,可以分析出来
字节范围从 0000010c---00000160
此次的cxid为7,zxid为4,type还是5(从org.apache.zookeeper.ZooDefs.OpCode看到5就是setData)
页:
[1]