xy123321 发表于 2015-12-23 14:26:43

sed 取某时段内apache的访问日志

  http://www.linuxyunwei.com/2013/05/sed-%E5%8F%96%E6%9F%90%E6%97%B6%E6%AE%B5%E5%86%85apache%E7%9A%84%E8%AE%BF%E9%97%AE%E6%97%A5%E5%BF%97/

开发部有如下需求:
导出 2013-05-24 15:00:00 ~ 2013-05-28 16:00:00 之间的apache访问日志
  Apache日志格式为:















Shell
222.92.115.194 - - "GET /media/js/jquery.eislideshow.js HTTP/1.1" 304 -222.92.115.194 - - "GET /media/style/tpl/tpl_buy_left/tpl_buy_left.js HTTP/1.1" 304 -222.92.115.194 - - "GET /favicon.ico HTTP/1.1" 404 17846222.92.115.194 - - "GET /large-display/interactive/ HTTP/1.1" 200 21382222.92.115.194 - - "GET /favicon.ico HTTP/1.1" 404 17846222.92.115.194 - - "GET /large-display/single/ HTTP/1.1" 200 21386222.92.115.194 - - "GET /favicon.ico HTTP/1.1" 404 17846222.92.115.195 - - "GET /dsc/ HTTP/1.1" 200 34530222.92.115.195 - - "GET /media/img/channel_icon.jpg HTTP/1.1" 404 17846222.92.115.195 - - "GET /favicon.ico HTTP/1.1" 404 17846


1
2
3
4
5
6
7
8
9
10

222.92.115.194 - - "GET /media/js/jquery.eislideshow.js HTTP/1.1" 304 -
222.92.115.194 - - "GET /media/style/tpl/tpl_buy_left/tpl_buy_left.js HTTP/1.1" 304 -
222.92.115.194 - - "GET /favicon.ico HTTP/1.1" 404 17846
222.92.115.194 - - "GET /large-display/interactive/ HTTP/1.1" 200 21382
222.92.115.194 - - "GET /favicon.ico HTTP/1.1" 404 17846
222.92.115.194 - - "GET /large-display/single/ HTTP/1.1" 200 21386
222.92.115.194 - - "GET /favicon.ico HTTP/1.1" 404 17846
222.92.115.195 - - "GET /dsc/ HTTP/1.1" 200 34530
222.92.115.195 - - "GET /media/img/channel_icon.jpg HTTP/1.1" 404 17846
222.92.115.195 - - "GET /favicon.ico HTTP/1.1" 404 17846   
截取命令:















Shell
# sed -n '/24\/May\/2013:15:00:01/,/28\/May\/2013:16:59:58/p' xxxx-access_log > 20130524.15-20130528.16-access_log.txt


1

# sed -n '/24\/May\/2013:15:00:01/,/28\/May\/2013:16:59:58/p' xxxx-access_log > 20130524.15-20130528.16-access_log.txt   
PS:需要注意的是如果起始时间在日志中不存在,则整个截取将返回 0 行结果。而如果结束时间在日志中不存在,则会截取到日志的最后一条。所以在截取前得要找到最日志中最合适的起始点和结束点。
我的做法是先使用grep去找到两个点再使用sed去截取















Shell
# 找出 2013-05-24 15点第一条记录的时间# grep '24/May/2013:15' xxxx-access_log | head -110.200.114.183 - - "GET /gp10/pic_259_218_1368781965.png HTTP/1.0" 401 484# 找出 2013-05-28 16点最后一条记录的时间# grep '28/May/2013:16' xxxx-access_log | tail -1222.92.115.195 - - "GET /favicon.ico HTTP/1.1" 404 17846# 然后取这两个时间段之间的记录


1
2
3
4
5
6
7

# 找出 2013-05-24 15点第一条记录的时间
# grep '24/May/2013:15' xxxx-access_log | head -1
10.200.114.183 - - "GET /gp10/pic_259_218_1368781965.png HTTP/1.0" 401 484
# 找出 2013-05-28 16点最后一条记录的时间
# grep '28/May/2013:16' xxxx-access_log | tail -1
222.92.115.195 - - "GET /favicon.ico HTTP/1.1" 404 17846
# 然后取这两个时间段之间的记录   
页: [1]
查看完整版本: sed 取某时段内apache的访问日志