select from_unixtime(clock,'%Y%m%d %H:%i:%S'),value from history_uint where itemid in ('53855');
| 20140314 11:12:02 | 0 |
| 20140314 11:26:10 | 0 |
| 20140314 11:27:11 | 0 |
| 20140314 11:28:12 | 0 |
| 20140314 11:29:13 | 10 |
| 20140314 11:30:28 | 0 |
| 20140314 11:31:29 | 0 |
| 20140314 11:32:29 | 0 |
| 20140314 11:33:30 | 0 |
| 20140314 11:34:46 | 0 | 5.proxy相对应的history_proxy表中查看相关数据,发现在value字段有Received value xxx isnotsuitable forvalue type的信息,看来和数据的类型有关系。
select id,from_unixtime(clock,'%Y%m%d %H:%i:%S'),value from proxy_history where itemid in ('53855');
+------------+----------------------------------------+--------------------------------------------------------------------------------------------------------+
| id | from_unixtime(clock,'%Y%m%d %H:%i:%S') | value |
+------------+----------------------------------------+--------------------------------------------------------------------------------------------------------+
| 4792921942 | 20140314 11:12:02 | 0 |
| 4792948233 | 20140314 11:13:19 | Received value [283.33334] is not suitable for value type [Numeric (unsigned)] and data type [Decimal] |
| 4792967862 | 20140314 11:14:19 | Received value [266.33334] is not suitable for value type [Numeric (unsigned)] and data type [Decimal] |
| 4792987031 | 20140314 11:15:19 | Received value [315.33334] is not suitable for value type [Numeric (unsigned)] and data type [Decimal] |
| 4793199599 | 20140314 11:26:10 | 0 |
| 4793219166 | 20140314 11:27:11 | 0 |
| 4793239212 | 20140314 11:28:12 | 0 |
| 4793258721 | 20140314 11:29:13 | 10 |
| 4793283508 | 20140314 11:30:28 | 0 |
| 4793303560 | 20140314 11:31:29 | 0 |
| 4793322826 | 20140314 11:32:29 | 0 |
| 4793342173 | 20140314 11:33:30 | 0 |
+------------+----------------------------------------+--------------------------------------------------------------------------------------------------------+ 6.查看item的设置,发现value type设置的是Numeric(unsigned)的,而item的值会产生float类型的值,proxy_history的value字段是longtext类型的,而server端history_uint表的字段是bigint类型的。在类型转换存储的时候就会造成数据的丢失。其实在item表中有个error的信息字段,是记录了item获取值存在的错误的。。可以通过这个直接定位到问题。
问题rc找到了,为了方便可以直接通过update 数据库来fix这个问题。
update items set value_type=0 where value_type=3 and (key_ like 'hadoop_stats[regionserver%' or key_ like 'hadoop_stats[hmaster%'); 不过发现一个比较奇怪的问题,在value type不正确的时候,agent数据获取并不是根据interval来的,有时候会间隔10min左右。。这样也加剧了history表中数据丢失的严重性,不知道是不知zabbix agent内部的机制,有时间需要看看代码才行。
agent端对应日志:
7964:20140314:111202.800 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:111319.053 For key [hadoop_stats[regionserver,requests]] received value [283.33334]
7964:20140314:111419.454 For key [hadoop_stats[regionserver,requests]] received value [266.33334]
7964:20140314:111519.532 For key [hadoop_stats[regionserver,requests]] received value [315.33334]
#中间10分钟左右没有值产生
7964:20140314:112610.705 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:112711.308 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:112812.375 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:112913.086 For key [hadoop_stats[regionserver,requests]] received value [10.0]
7964:20140314:113028.703 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:113129.180 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:113229.941 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:113330.568 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:113446.343 For key [hadoop_stats[regionserver,requests]] received value [0.0]
7964:20140314:113546.977 For key [hadoop_stats[regionserver,requests]] received value [0.0] 最后贴下更新后的graph情况: