glusterfs 中的字典查询

hongleimi 发表于 2015-9-10 00:40:47

　　glusterfs文件系统是一个分布式的文件系统，但是与很多分布式文件系统不一样，它没有元数服务器，听说swift上也是应用了这个技术的。glusterfs中每个xlator的配置信息都是用dict进行管理的。dict这玩意儿，说白了就是一个hash表，是一个key/value的内存数据库。今天花了点时间慢慢研究了glusterfs中的设计，觉得还是挺有意思的。
　　上篇博客介绍了glusterfs文件系统的内存池的设计，而glusterfs的内存池正应用在这项技术上。首先，glusterfsd在程序初始化时，就建立了三个池dict_pool、dict_pair_pool、dict_data_pool。接下来看看它是怎么玩这三个内存池的呢！
　　1、在使用dict之前，首先是建立dict对象，这点是面向对象的思想吧。

1 dict_t *
2 get_new_dict (void)
3 {
4       return get_new_dict_full (1);
5 }

　　glusterfs调用get_new_dict来建立一个dict对象，接下来看看get_new_dict又做了什么呢？

1 dict_t *
2 get_new_dict_full (int size_hint)
3 {
4       dict_t *dict = mem_get0 (THIS->ctx->dict_pool);
5
6       if (!dict) {
7             return NULL;
8       }
9
10       dict->hash_size = size_hint;
11       if (size_hint == 1) {
12             /*
13                * This is the only case we ever see currently.If we ever
14                * need to support resizing the hash table, the resize function
15                * will have to take into account the possibility that
16                * "members" is not separately allocated (i.e. don't just call
17                * realloc() blindly.
18                */
19             dict->members = &dict->members_internal;
20       }
21       else {
22             /*
23                * We actually need to allocate space for size_hint *pointers*
24                * but we actually allocate space for one *structure*.Since
25                * a data_pair_t consists of five pointers, we're wasting four
26                * pointers' worth for N=1, and will overrun what we allocated
27                * for N>5.If anybody ever starts using size_hint, we'll need
28                * to fix this.
29                */
30             GF_ASSERT (size_hint <=
31                         (sizeof(data_pair_t) / sizeof(data_pair_t *)));
32             dict->members = mem_get0 (THIS->ctx->dict_pair_pool);
33             if (!dict->members) {
34                      mem_put (dict);
35                      return NULL;
36             }
37       }
38
39       LOCK_INIT (&dict->lock);
40
41       return dict;
42 }
　　size_hint是要分配的字典的大小。当 size_hint为1时，字典内的数据将是一个链表（用链表解决HASH冲突问题）。

　　　　接下来看看程序又将是如何向字典中添加一项数据的呢？首先还是来看看dict_t 的数据结构吧：

　　

1 struct _dict {
2       unsigned char is_static:1;
3       int32_t       hash_size;
4       int32_t       count;
5       int32_t       refcount;
6       data_pair_t **members;
7       data_pair_t *members_list;
8       char       *extra_free;
9       char       *extra_stdfree;
10       gf_lock_t    lock;
11       data_pair_t *members_internal;
12       data_pair_t free_pair;
13       gf_boolean_t free_pair_in_use;
14 };
　　
　　在dict_t中有一个lock子成员，每次操作dict_t对象时，首先要对它进行加锁：
　　
　　

int32_t
dict_add (dict_t *this, char *key, data_t *value)
{
int32_t ret;
if (!this || !value) {
gf_log_callingfn ("dict", GF_LOG_WARNING,
"!this || !value for key=%s", key);
return -1;
}
LOCK (&this->lock);
ret = _dict_set (this, key, value, 0);
UNLOCK (&this->lock);
return ret;
}
　　
　　不得不说glusterfs的编码风格还是挺漂亮的，它把一些细节与核心点分的很清楚，代码看上去那个爽啊！！看上面的代码：打日志与加锁放一在一块，核心的处理将在_dict_set中处理。

1 static int32_t
2 _dict_set (dict_t *this, char *key, data_t *value, gf_boolean_t replace)
3 {
4       int hashval;
5       data_pair_t *pair;
6       char key_free = 0;
7       int tmp = 0;
8       int ret = 0;
9
10       if (!key) {
11             ret = gf_asprintf (&key, "ref:%p", value);
12             if (-1 == ret) {
13                      gf_log ("dict", GF_LOG_WARNING, "asprintf failed %s", key);
14                      return -1;
15             }
16             key_free = 1;
17       }
18
19       tmp = SuperFastHash (key, strlen (key));
20       hashval = (tmp % this->hash_size);
21
22       /* Search for a existing key if 'replace' is asked for */
23       if (replace) {
24             pair = _dict_lookup (this, key);
25
26             if (pair) {
27                      data_t *unref_data = pair->value;
28                      pair->value = data_ref (value);
29                      data_unref (unref_data);
30                      if (key_free)
31                               GF_FREE (key);
32                      /* Indicates duplicate key */
33                      return 0;
34             }
35       }
36
37       if (this->free_pair_in_use) {
38             pair = mem_get0 (THIS->ctx->dict_pair_pool);
39             if (!pair) {
40                      if (key_free)
41                               GF_FREE (key);
42                      return -1;
43             }
44       }
45       else {
46             pair = &this->free_pair;
47             this->free_pair_in_use = _gf_true;
48       }
49
50       if (key_free) {
51             /* It's ours.Use it. */
52             pair->key = key;
53             key_free = 0;
54       }
55       else {
56             pair->key = (char *) GF_CALLOC (1, strlen (key) + 1,
57                                              gf_common_mt_char);
58             if (!pair->key) {
59                      if (pair == &this->free_pair) {
60                               this->free_pair_in_use = _gf_false;
61                      }
62                      else {
63                               mem_put (pair);
64                      }
65                      return -1;
66             }
67             strcpy (pair->key, key);
68       }
69       pair->value = data_ref (value);
70
71       pair->hash_next = this->members;
72       this->members = pair;
73
74       pair->next = this->members_list;
75       pair->prev = NULL;
76       if (this->members_list)
77             this->members_list->prev = pair;
78       this->members_list = pair;
79       this->count++;
80
81       if (key_free)
82             GF_FREE (key);
83       return 0;
84 }
　　19行利用HASH算法计算HASH值，20行缩小HASH值的范围，23行到了35行为替换处理。37-48行是让我最难受的代码，这个地方不知道是不是设计的问题。55行之后是插入新的HASH键值的操作。
　　再看看查询的操作吧。

1 data_t *
2 dict_get (dict_t *this, char *key)
3 {
4       data_pair_t *pair;
5
6       if (!this || !key) {
7             gf_log_callingfn ("dict", GF_LOG_INFO,
8                               "!this || key=%s", (key) ? key : "()");
9             return NULL;
10       }
11
12       LOCK (&this->lock);
13
14       pair = _dict_lookup (this, key);
15
16       UNLOCK (&this->lock);
17
18       if (pair)
19             return pair->value;
20
21       return NULL;
22 }
　　同样是先处理锁之类的杂项操作，_dict_lookup才是真正的始作俑者。

1 static data_pair_t *
2 _dict_lookup (dict_t *this, char *key)
3 {
4       if (!this || !key) {
5             gf_log_callingfn ("dict", GF_LOG_WARNING,
6                               "!this || !key (%s)", key);
7             return NULL;
8       }
9
10       int hashval = SuperFastHash (key, strlen (key)) % this->hash_size;
11       data_pair_t *pair;
12
13       for (pair = this->members; pair != NULL; pair = pair->hash_next) {
14             if (pair->key && !strcmp (pair->key, key))
15                      return pair;
16       }
17
18       return NULL;
19 }
　　查询的代码相当的简单吧，计算一个哈希值，再查询一个链表就OK了。
　　查看了glusterfs中的所有代码，glusterfs_new_dict_full调用时几乎都是传入参数1，只有dict_copy接口比较特别：

1 dict_t *
2 dict_copy (dict_t *dict,
3          dict_t *new)
4 {
5       if (!dict) {
6             gf_log_callingfn ("dict", GF_LOG_WARNING, "dict is NULL");
7             return NULL;
8       }
9
10       if (!new)
11             new = get_new_dict_full (dict->hash_size);
12
13       dict_foreach (dict, _copy, new);
14
15       return new;
16 }
　　从代码上看，只有此处才发挥了HASH表的作用，其它的都只是把dict_t当成链表来使用。而且这个地方也并不是用HASH表的思想，只是把一个链表转换成了HASH表。这个是我在glusterfs中见到的一处最不明智的地方。
　　

页: [1]

运维网's Archiver

glusterfs 中的字典查询