Oracle的latch机制源代码解析——借postgresql猜测Oracle的latch

万子 · 发表于 2016-7-30 01:00:26

　　由于我手里并没有Oracle的源代码（而兄弟伙又未必敢冒着进去的风险把手里的源代码给我看），所以只能借用postgresql来分析一下Oracle的latch机制了（为什么是latch而不是mutex？）。
　　在这里可以下载到postgresql的7.4.30源代码：http://www.postgresql.org/ftp/source/v7.4.30/
　　为什么会借用postgresql来分析Oracle的latch机制呢？
　　Oracle与postgresql都采用了共享内存以及多进程模型，两者在共享内存内的数据结构同步与互斥上，面临的问题是极其接近的。
　　postgresql的spinlock与latch机制相仿，解决的问题类似，只是名字和代码实现有些差异。
　　postgresql的spinlock实现在postgresql-7.4.30\src\backend\storage\lmgr\s_lock.c

/*
* s_lock(lock) - platform-independent portion of waiting for a spinlock.
*/
void
s_lock(volatile slock_t *lock, const char *file, int line)
{
/*
* We loop tightly for awhile, then delay using select() and try
* again. Preferably, "awhile" should be a small multiple of the
* maximum time we expect a spinlock to be held.  100 iterations seems
* about right.  In most multi-CPU scenarios, the spinlock is probably
* held by a process on another CPU and will be released before we
* finish 100 iterations.  However, on a uniprocessor, the tight loop
* is just a waste of cycles, so don't iterate thousands of times.
*
* Once we do decide to block, we use randomly increasing select()
* delays. The first delay is 10 msec, then the delay randomly
* increases to about one second, after which we reset to 10 msec and
* start again.  The idea here is that in the presence of heavy
* contention we need to increase the delay, else the spinlock holder
* may never get to run and release the lock.  (Consider situation
* where spinlock holder has been nice'd down in priority by the
* scheduler --- it will not get scheduled until all would-be
* acquirers are sleeping, so if we always use a 10-msec sleep, there
* is a real possibility of starvation.)  But we can't just clamp the
* delay to an upper bound, else it would take a long time to make a
* reasonable number of tries.
*
* We time out and declare error after NUM_DELAYS delays (thus, exactly
* that many tries).  With the given settings, this will usually take
* 3 or so minutes.  It seems better to fix the total number of tries
* (and thus the probability of unintended failure) than to fix the
* total time spent.
*
* The select() delays are measured in centiseconds (0.01 sec) because 10
* msec is a common resolution limit at the OS level.
*/
#define SPINS_PER_DELAY100
#define NUM_DELAYS1000
#define MIN_DELAY_CSEC1
#define MAX_DELAY_CSEC100
intspins = 0;
intdelays = 0;
intcur_delay = MIN_DELAY_CSEC;
struct timeval delay;
while (TAS(lock))
{
if (++spins > SPINS_PER_DELAY)
{
if (++delays > NUM_DELAYS)
s_lock_stuck(lock, file, line);
delay.tv_sec = cur_delay / 100;
delay.tv_usec = (cur_delay % 100) * 10000;
(void) select(0, NULL, NULL, NULL, &delay);
#if defined(S_LOCK_TEST)
fprintf(stdout, "*");
fflush(stdout);
#endif
/* increase delay by a random fraction between 1X and 2X */
cur_delay += (int) (cur_delay *
(((double) random()) / ((double) MAX_RANDOM_VALUE)) + 0.5);
/* wrap back to minimum delay when max is exceeded */
if (cur_delay > MAX_DELAY_CSEC)
cur_delay = MIN_DELAY_CSEC;
spins = 0;
}
}
}
　　代码里的select函数就是IO多路复用里的select，你没有看错，就是IO多路复用里的select（最新的已经改成pg_usleep()了），在这里，select仅仅做了休眠的工作，跟IO多路复用没有任何关系。select在这里仅仅起到把当前进程丢入操作系统内核的wait queue链表里去直到delay这个结构体描述的时间之后，再把这个进程从wait queue取出来挂接到running queue里去。
　　但是，遍历两个链表（时间复杂度是圈N），以及队列指针修改之后，CPU跟着做的context switch是非常耗费CPU资源的工作，而这样的工作，发生的频率是毫秒（millisecond）级别的。所以，这种函数（select这种会引起context switch）尽量少用。这也是，spinlock相对于传统的排队型锁的优势。
　　这种spinlock的test and set、spin这种操作的本身是比较容易的。直接写个while循环，不断执行"=="这样的C语言语句就行了。不过，spinlock里有个机制是sleep操作，就是当某进程spin了很长时间之后，发现还是无法获取到资源，这个时候就会让这个进程sleep。
　　关键点就在这里了。恩。Oracle或者postgresql只是一个应用软件，它不具备让一个进程休眠的能力，应用软件（用C或者比C更软的语言）最多只能操作内存和写逻辑，它要么在跑，要么跑完了退出，绝对不会有别的状态（其实，所谓的进程的状态是操作系统内核意淫出来的，就是task_struct结构体里的state字段的值是多少。从CPU的角度来讲，它并不care），如果想要有别的状态，就必须调用系统api，通过0x80号陷阱进入操作系统内核的代码，修改该进程（或者别的进程，只要具备足够的权限）的task_struct的state的数值（0,1,2,3,4...）。

所以，无论是Oracle或者是postgresql或者是别的什么数据库或者别的什么server软件，只要它想实现一种锁机制，而这种锁机制具备sleep的功能，它就必须要借用操作系统的api，也就是说，这种锁是操作系统api之上的衍生品，这么说起来，即便Oracle也是操作系统之上的衍生品。

当然，从理论上来讲，也可以自己写一个数据库，这个数据库不建立在操作系统之上，它具备任务、文件系统的管理功能，它是无敌的。是的。理论上来讲是这样的。不过，如果有人这样做了，那么他的脑袋一定被门夹了。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] Oracle的latch机制源代码解析——借postgresql猜测Oracle的latch

扫码加入运维网微信交流群