|
2015-06-30 13:11:00左右,memcache由于硬件故障,莫名重启,导致数据库访问量突然增大(按照之前的测试结果,远未到达系统瓶颈),且当时syscpu 远高于正常值。
监控信息如下:
当时的mysqld堆栈信息如下:
- Thread 50 (Thread 0x2afb6591a700 (LWP 7020)):
- #0 0x0000003c34ef808f in __lll_unlock_wake_private () from /lib64/libc.so.6
- #1 0x0000003c34e9dccc in _L_unlock_2226 () from /lib64/libc.so.6
- #2 0x0000003c34e9db01 in __tz_convert () from /lib64/libc.so.6
- #3 0x0000000000783012 in Time_zone_system::gmt_sec_to_TIME(st_mysql_time*, long) const ()
- #4 0x00000000007c177b in Field_timestampf::get_date_internal(st_mysql_time*) ()
- #5 0x00000000007c23dd in Field_temporal_with_date::val_str(String*, String*) ()
- #6 0x0000000000672fa8 in Protocol_text::store(Field*) ()
- #7 0x00000000006722f5 in Protocol::send_result_set_row(List<Item>*) ()
- #8 0x00000000006c77af in select_send::send_data(List<Item>&) ()
- #9 0x00000000006db6b0 in end_send(JOIN*, st_join_table*, bool) ()
- #10 0x00000000006d6c4f in evaluate_join_record(JOIN*, st_join_table*) ()
- #11 0x00000000006d6fbb in sub_select(JOIN*, st_join_table*, bool) ()
- #12 0x00000000006d61e8 in JOIN::exec() ()
- #13 0x000000000071f0d5 in mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, SQL_I_List<st_order>*, SQL_I_List<st_order>*, Item*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) ()
- #14 0x000000000071f935 in handle_select(THD*, select_result*, unsigned long) ()
- #15 0x000000000057fe02 in execute_sqlcom_select(THD*, TABLE_LIST*) ()
- #16 0x00000000006f935c in mysql_execute_command(THD*) ()
- #17 0x00000000006fe5e8 in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
- #18 0x00000000006ffdc9 in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
- #19 0x0000000000789a08 in threadpool_process_request(THD*) ()
- #20 0x000000000078a9dd in worker_main(void*) ()
- #21 0x0000000000b4a8c3 in pfs_spawn_thread ()
- #22 0x0000003c352079d1 in start_thread () from /lib64/libpthread.so.0
- #23 0x0000003c34ee88fd in clone () from /lib64/libc.so.6
从堆栈可以看到大部分请求都在调用glibc的时区转换函数,但该函数由于全局锁的原因,在并发环境下产生了热点竞争。
深入分析mysql时区相关的函数:
- void
- Time_zone_system::gmt_sec_to_TIME(MYSQL_TIME *tmp, my_time_t t) const
- {
- struct tm tmp_tm;
- time_t tmp_t= (time_t)t;
- localtime_r(&tmp_t, &tmp_tm);
- localtime_to_TIME(tmp, &tmp_tm);
- tmp->time_type= MYSQL_TIMESTAMP_DATETIME;
- adjust_leap_second(tmp);
- }
继续进入localtime_r函数
- struct tm *
- __tz_convert (const time_t *timer, int use_localtime, struct tm *tp)
- {
- long int leap_correction;
- int leap_extra_secs;
- if (timer == NULL)
- {
- __set_errno (EINVAL);
- return NULL;
- }
- __libc_lock_lock (tzset_lock);
- /* Update internal database according to current TZ setting.
- POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
- This is a good idea since this allows at least a bit more parallelism. */
- tzset_internal (tp == &_tmbuf && use_localtime, 1);
- if (__use_tzfile)
- __tzfile_compute (*timer, use_localtime, &leap_correction,
- &leap_extra_secs, tp);
- else
- {
- if (! __offtime (timer, 0, tp))
- tp = NULL;
- else
- __tz_compute (*timer, tp, use_localtime);
- leap_correction = 0L;
- leap_extra_secs = 0;
- }
- if (tp)
- {
- if (! use_localtime)
- {
- tp->tm_isdst = 0;
- tp->tm_zone = "GMT";
- tp->tm_gmtoff = 0L;
- }
- if (__offtime (timer, tp->tm_gmtoff - leap_correction, tp))
- tp->tm_sec += leap_extra_secs;
- else
- tp = NULL;
- }
- __libc_lock_unlock (tzset_lock);
- return tp;
- }
很清楚的看到,在调用系统时区转换时,有全局锁__libc_lock_lock的保护,导致线程并发环境下,系统性能受限。
而在time_zone=system时,会调用该函数从而获取系统的时区。如果将time_zone='+8:00'则不会调用系统时区.
对此,真实的模拟了线上的case(分别对time_zone=system和time_zone='+8:00'做了压测):
- mysqlslap --no-defaults -u$user -p$password --create-schema=DianPingTS -h$host -P$port --number-of-queries=1000000000 --concurrency=30 --query="select AddTime,UpdateTime from $table where addtime>='2015-07-01 15:21:35' and addtime<='2015-07-01 17:33:35';"
以下是测试结果:
由此可见,在time_zone='+8:00'在timestamp类型的大量转换情况下,性能能得到质的提升..
小伙伴们,如果也遇到该问题,只要将time_zone='+8:00' (注意地区在东8区哦)就可以了..^_^ |
|
|