由mysql timestamp字段引发的一个系统bug

shangban · 发表于 2015-12-21 09:35:27

2015-06-30 13:11:00左右，memcache由于硬件故障，莫名重启，导致数据库访问量突然增大(按照之前的测试结果，远未到达系统瓶颈)，且当时syscpu 远高于正常值。
监控信息如下:

当时的mysqld堆栈信息如下:

Thread 50 (Thread 0x2afb6591a700 (LWP 7020)):
#0 0x0000003c34ef808f in __lll_unlock_wake_private () from /lib64/libc.so.6
#1 0x0000003c34e9dccc in _L_unlock_2226 () from /lib64/libc.so.6
#2 0x0000003c34e9db01 in __tz_convert () from /lib64/libc.so.6
#3 0x0000000000783012 in Time_zone_system::gmt_sec_to_TIME(st_mysql_time*, long) const ()
#4 0x00000000007c177b in Field_timestampf::get_date_internal(st_mysql_time*) ()
#5 0x00000000007c23dd in Field_temporal_with_date::val_str(String*, String*) ()
#6 0x0000000000672fa8 in Protocol_text::store(Field*) ()
#7 0x00000000006722f5 in Protocol::send_result_set_row(List<Item>*) ()
#8 0x00000000006c77af in select_send::send_data(List<Item>&) ()
#9 0x00000000006db6b0 in end_send(JOIN*, st_join_table*, bool) ()
#10 0x00000000006d6c4f in evaluate_join_record(JOIN*, st_join_table*) ()
#11 0x00000000006d6fbb in sub_select(JOIN*, st_join_table*, bool) ()
#12 0x00000000006d61e8 in JOIN::exec() ()
#13 0x000000000071f0d5 in mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, SQL_I_List<st_order>*, SQL_I_List<st_order>*, Item*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) ()
#14 0x000000000071f935 in handle_select(THD*, select_result*, unsigned long) ()
#15 0x000000000057fe02 in execute_sqlcom_select(THD*, TABLE_LIST*) ()
#16 0x00000000006f935c in mysql_execute_command(THD*) ()
#17 0x00000000006fe5e8 in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
#18 0x00000000006ffdc9 in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
#19 0x0000000000789a08 in threadpool_process_request(THD*) ()
#20 0x000000000078a9dd in worker_main(void*) ()
#21 0x0000000000b4a8c3 in pfs_spawn_thread ()
#22 0x0000003c352079d1 in start_thread () from /lib64/libpthread.so.0
#23 0x0000003c34ee88fd in clone () from /lib64/libc.so.6

从堆栈可以看到大部分请求都在调用glibc的时区转换函数，但该函数由于全局锁的原因，在并发环境下产生了热点竞争。

深入分析mysql时区相关的函数:

void
Time_zone_system::gmt_sec_to_TIME(MYSQL_TIME *tmp, my_time_t t) const
{
struct tm tmp_tm;
time_t tmp_t= (time_t)t;
localtime_r(&tmp_t, &tmp_tm);
localtime_to_TIME(tmp, &tmp_tm);
tmp->time_type= MYSQL_TIMESTAMP_DATETIME;
adjust_leap_second(tmp);
}

继续进入localtime_r函数

struct tm *
__tz_convert (const time_t *timer, int use_localtime, struct tm *tp)
{
long int leap_correction;
int leap_extra_secs;
if (timer == NULL)
{
__set_errno (EINVAL);
return NULL;
}
__libc_lock_lock (tzset_lock);
/* Update internal database according to current TZ setting.
POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
This is a good idea since this allows at least a bit more parallelism. */
tzset_internal (tp == &_tmbuf && use_localtime, 1);
if (__use_tzfile)
__tzfile_compute (*timer, use_localtime, &leap_correction,
&leap_extra_secs, tp);
else
{
if (! __offtime (timer, 0, tp))
tp = NULL;
else
__tz_compute (*timer, tp, use_localtime);
leap_correction = 0L;
leap_extra_secs = 0;
}
if (tp)
{
if (! use_localtime)
{
tp->tm_isdst = 0;
tp->tm_zone = "GMT";
tp->tm_gmtoff = 0L;
}
if (__offtime (timer, tp->tm_gmtoff - leap_correction, tp))
tp->tm_sec += leap_extra_secs;
else
tp = NULL;
}
__libc_lock_unlock (tzset_lock);
return tp;
}

很清楚的看到，在调用系统时区转换时，有全局锁__libc_lock_lock的保护，导致线程并发环境下，系统性能受限。

而在time_zone=system时，会调用该函数从而获取系统的时区。如果将time_zone='+8:00'则不会调用系统时区.
对此，真实的模拟了线上的case(分别对time_zone=system和time_zone='+8:00'做了压测):

mysqlslap --no-defaults -u$user -p$password --create-schema=DianPingTS -h$host -P$port --number-of-queries=1000000000 --concurrency=30 --query="select AddTime,UpdateTime from $table where addtime>='2015-07-01 15:21:35' and addtime<='2015-07-01 17:33:35';"

以下是测试结果:

由此可见，在time_zone='+8:00'在timestamp类型的大量转换情况下，性能能得到质的提升..

小伙伴们，如果也遇到该问题，只要将time_zone='+8:00' (注意地区在东8区哦)就可以了..^_^

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] 由mysql timestamp字段引发的一个系统bug

浏览过的版块

扫码加入运维网微信交流群