【备份】redis源码分析-如何rehash

chenqb 发表于 2016-12-20 11:20:51

　　
原文地址：redis源码分析-如何rehash
dict实现中主要用到如下结构体,其实就是个典型的链式hash。
一个dict会有2个hash table，由dictht结构管理，编号为0和1.
使用是优先使用0号hash table，当空间不足时会调用dictExpand来扩展hash table，此时准备1号hash table用于增量的rehash使用。rehash完成后把0号释放，1号保存到0号。
rehashidx是下一个需要rehash的项在ht中的索引，不需要rehash时置为-1。也就是说-1时，表示不进行rehash。
iterators记录当前dict中的迭代器数，主要是为了避免在有迭代器时rehash，在有迭代器时rehash可能会造成值的丢失或重复，
dictht中的table是一个数组+指针形式的hash表，size表hash数组(桶)的大小，used表示hash表的元素个数，这两个值与rehash、resize过程密切相关。sizemask等于size-1，这是为了方便将hash值映射到数组中。
typedef struct dictEntry {
void *key;
void *val;
struct dictEntry *next;
} dictEntry;
typedef struct dictht {
dictEntry **table;
unsigned long size;//hash桶的个数
unsigned long sizemask;//hash取模的用到
unsigned long used;//元素个数
} dictht;
typedef struct dict {
dictType *type;
void *privdata;
dictht ht;
int rehashidx; /* rehashing not in progress if rehashidx == -1 */
int iterators; /* number of iterators currently running */
} dict;
typedef struct dictIterator {
dict *d;
int table;
int index;
dictEntry *entry, *nextEntry;
} dictIterator;
rehash有2种工作模式
lazy rehashing：在每次对dict进行操作的时候执行一个slot的rehash
active rehashing：每100ms里面使用1ms时间进行rehash。
什么时候dict做扩容
在数据插入的时候会调用dictKeyIndex,该方法里会调用_dictExpandIfNeeded，判断dict是否需要rehash，当dict中桶的个数大于元素的个数时，调用dictExpand扩展hash
/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
/* If the hash table is empty expand it to the intial size,
* if the table is “full” dobule its size. */
if (dictIsRehashing(d)) return DICT_OK;
if (d->ht.size == 0)
return dictExpand(d, DICT_HT_INITIAL_SIZE);
if (d->ht.used >= d->ht.size && dict_can_resize)
return dictExpand(d, ((d->ht.size > d->ht.used) ?
d->ht.size : d->ht.used)*2);
return DICT_OK;
}
dictExpand的工作主要是初始化hash表，默认是扩大两倍(并不单纯是桶的两倍)，然后赋值给ht，然后状态改为rehashing,此时该dict开始rehashing
扩容过程如何进行
rehash主要在dictRehash中完成。先看下什么时候进行rehash。
active rehashing ：serverCron中，当没有后台子线程时，会调用incrementallyRehash，最终调用dictRehashMilliseconds。incrementallyRehash的时间较长，rehash的个数也比较多。这里每次执行 1 millisecond rehash 操作；如果未完成 rehash，会在下一个 loop 里面继续执行。
/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
int dictRehashMilliseconds(dict *d, int ms) {
long long start = timeInMilliseconds();
int rehashes = 0;
while(dictRehash(d,100)) {
rehashes += 100;
if (timeInMilliseconds()-start > ms) break;
}
return rehashes;
}
lazy rehashing：_dictRehashStep中，也会调用dictRehash，而_dictRehashStep每次仅会rehash一个值从ht到 ht，但由于_dictRehashStep是被dictGetRandomKey、dictFind、 dictGenericDelete、dictAdd调用的，因此在每次dict增删查改时都会被调用，这无疑就加快rehash了过程。
我们再来看看做rehash的方法。dictRehash每次增量rehash n个元素，由于在自动调整大小时已设置好了ht的大小，因此rehash的主要过程就是遍历ht，取得key，然后将该key按ht的桶的大小重新rehash，并在rehash完后将ht指向ht,然后将ht清空。在这个过程中rehashidx非常重要，它表示上次rehash时在ht的下标位置。
int dictRehash(dict *d, int n) {
if (!dictIsRehashing(d)) return 0;
while(n–) {
dictEntry *de, *nextde;
/* Check if we already rehashed the whole table… */
if (d->ht.used == 0) {
_dictFree(d->ht.table);
d->ht = d->ht;//1幅值给0
_dictReset(&d->ht);//rehash完成后把0号释放，1号保存到0号
d->rehashidx = -1;
return 0;
}
/* Note that rehashidx can’t overflow as we are sure there are more
* elements because ht.used != 0 */
while(d->ht.table == NULL) d->rehashidx++;//在当前桶为null，找下一个
de = d->ht.table;
/* Move all the keys in this bucket from the old to the new hash HT */
while(de) {
unsigned int h;
nextde = de->next;
/* Get the index in the new hash table */
h = dictHashKey(d, de->key) & d->ht.sizemask;
de->next = d->ht.table;
d->ht.table = de;
d->ht.used–;
d->ht.used++;
de = nextde;
}
d->ht.table = NULL;//rehash过的ht桶置为null
d->rehashidx++;//偏移+1
}
return 1;
}
可以看到，redis对dict的rehash是分批进行的，这样不会阻塞请求，设计的比较优雅。
但是在调用dictFind的时候，可能需要对两张dict表做查询。唯一的优化判断是，当key在ht不存在且不在rehashing状态时，可以速度返回空。如果在rehashing状态，当在ht没值的时候，还需要在ht里查找。
dictAdd的时候，如果状态是rehashing，则把值插入到ht，否则ht

页: [1]

运维网's Archiver

【备份】redis源码分析-如何rehash