InternalInputBuffer处理HTTP请求行-Tomcat源码-我们到底能走多远系列（11）

mianfeis · 发表于 2015-8-7 07:01:26

我们到底能走多远系列（11）
　　扯淡：
　　　　最近行情不好吗？跳槽的比较少嘛，哈哈。有些人一直抗拒跳槽，觉得弊端很多：什么业务积累，职务，别人觉得你不可靠啊等等。我就想：这一辈子时间有限，何必为了一颗可以乘凉的树，放弃穿过森林的机会呢？祝在跳槽路上的朋友顺利！（ps：个人喜欢面试那种刺激感）
　　　　最爽不顾躺着，最美不过夕阳。秋天的夕阳是一年中最华丽的，各位不要错过哦。
　　主题：
　　　　在tomcat中，一个http请求，会被送到Http11Processor类，执行这个类的process(Socket theSocket) 处理的传入的Socket，Socket里面装的就是http消息。
　　tomcat是如何调用到Http11Processor的process方法的，可以参照：http://blog.iyunv.com/woorh/article/details/8017323
　　Http11Processor在org.apache.coyote.http11包下。
　　Http11Processor的rocess方法中，用inputBuffer.parseRequestLine();调用了解析http消息的请求行。这里的inputBuffer是tomcat自定义的InternalInputBuffer。
　　需要了解的是：
　　1，org.apache.coyote.Request 是tomcat内部使用用于存放关于request消息的数据结构
　　2，org.apache.tomcat.util.buf.MessageBytes 用于存放消息，在org.apache.coyote.Request中大量用于存放解析后的byte字符
　　3，org.apache.tomcat.util.buf.ByteChunk 真正用于存放数据的数据结构，存放的是byte[],org.apache.tomcat.util.buf.MessageBytes使用它。
　　大流程：
　　　　http消息通过inputBuffer解析后放到Request中，Request把它放到相应的MessageBytes，最后MessageBytes把它存到ByteChunk里。
　　以上都可以通过方法调用来完成流程。
　　主要关注的是解析的源代码，在查看源代码前需要了解http请求行的结构：可以参照：http://www.iyunv.com/killbug/archive/2012/10/10/2719142.html
　　阅读前准备：
　　1，方法中全部的异常时，会调用getString方法，其实就是StringManager的写日志方法，这是tomcat中统一的管理日志的方法。详细的解释在前一篇中已经几时过了：Tomcat StringManager阅读学习
　　　　2，

转义字符	意义	ASCII码值（十进制）
\a	响铃(BEL)	007
\b	退格(BS) ，将当前位置移到前一列	008
\f	换页(FF)，将当前位置移到下页开头	012
\n	换行(LF) ，将当前位置移到下一行开头	010
\r	回车(CR) ，将当前位置移到本行开头	013
\t	水平制表(HT) （跳到下一个TAB位置）	009
\v	垂直制表(VT)	011
\\	代表一个反斜线字符''\'	092

　　源码：
　　　　

public void parseRequestLine()
throws IOException {
int start = 0;
//
// Skipping blank lines
// 忽略空行
//

byte chr = 0;
do {
// Read new bytes if needed
if (pos >= lastValid) {
if (!fill())
throw new EOFException(sm.getString("iib.eof.error"));
}
chr = buf[pos++];
} while ((chr == Constants.CR) || (chr == Constants.LF));
pos--;
// Mark the current buffer position
start = pos;
//
// Reading the method name
// Method name is always US-ASCII
//
// space类似于开关一样，当为false时，查内容，为true时，去除空行时间
boolean space = false;
while (!space) {
// Read new bytes if needed
if (pos >= lastValid) {
if (!fill())
throw new EOFException(sm.getString("iib.eof.error"));
}
// Spec says no CR or LF in method name
if (buf[pos] == Constants.CR || buf[pos] == Constants.LF) {
throw new IllegalArgumentException(
sm.getString("iib.invalidmethod"));
}
// Spec says single SP but it also says be tolerant of HT
// 查出第一个空格，tab居然也是允许的
if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
space = true;//跳出循环
// 把下标记录下来,这里的method()得到一个Requast的MessageBytes：methodMB
request.method().setBytes(buf, start, pos - start);
}
pos++;
}
// Spec says single SP but also says be tolerant of multiple and/or HT
// 忽略空格后面的空格或者tab，因为是忽略的内容所以不需要什么start
while (space) {
// Read new bytes if needed
if (pos >= lastValid) {
if (!fill())
throw new EOFException(sm.getString("iib.eof.error"));
}
if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
pos++;// 忽略的方式就是继续移动下标
} else {
space = false;
}
}
// Mark the current buffer position
start = pos; // 出现start了，后面肯定是需要记录下标
int end = 0;
int questionPos = -1;
//
// Reading the URI
// 上面是源码的注释，URI是什么？你懂的
//

boolean eol = false;
while (!space) {
// Read new bytes if needed
if (pos >= lastValid) {
if (!fill())
throw new EOFException(sm.getString("iib.eof.error"));
}
// Spec says single SP but it also says be tolerant of HT
// 寻找第二个空格，第一个空格和第二个空格之间就是传说中的URI
if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
space = true;
end = pos;
} else if ((buf[pos] == Constants.CR)
|| (buf[pos] == Constants.LF)) {
// HTTP/0.9 style request
// 为了兼容HTTP/0.9格式
eol = true;
space = true;
end = pos;
} else if ((buf[pos] == Constants.QUESTION) // 遇到‘?’了
&& (questionPos == -1)) {
// 把问号的位置先记录下来
questionPos = pos;
}
pos++;
}
// 把可能包含问号的URI的起始位和结束位记录下来
request.unparsedURI().setBytes(buf, start, end - start);
if (questionPos >= 0) {// 有问号的情况
// 问号位置记录
request.queryString().setBytes(buf, questionPos + 1,
end - questionPos - 1);
// 把URI记录下来
request.requestURI().setBytes(buf, start, questionPos - start);
} else {
request.requestURI().setBytes(buf, start, end - start);
}
// Spec says single SP but also says be tolerant of multiple and/or HT
// 这段算是重复代码吧，就是忽略空格用和tab用的
while (space) {
// Read new bytes if needed
if (pos >= lastValid) {
if (!fill())
throw new EOFException(sm.getString("iib.eof.error"));
}
if (buf[pos] == Constants.SP || buf[pos] == Constants.HT) {
pos++;
} else {
space = false;
}
}
// Mark the current buffer position
start = pos;
end = 0;
//
// Reading the protocol
// Protocol is always US-ASCII
//
// eol标志位是为了标记是否是HTTP/0.9 style request 前面代码已经提到
// 最后一样：protocol（HTTP/ 1.1或1.0）
while (!eol) {
// Read new bytes if needed
if (pos >= lastValid) {
if (!fill())
throw new EOFException(sm.getString("iib.eof.error"));
}
// 查出 /r/n（CRLF）
if (buf[pos] == Constants.CR) {
end = pos;
} else if (buf[pos] == Constants.LF) {
if (end == 0)
end = pos;
eol = true;
}
pos++;
}
// 至此把head分成三部分，放到Request定义好的MessageBytes中去了
if ((end - start) > 0) {
request.protocol().setBytes(buf, start, end - start);
} else {
request.protocol().setString("");
}
}
　　总结习点：
　　1，利用表示位来控制解析式每个while的功能，上面代码用的是space
　　　　2，不用截断的方式存储需要的内容，而是记录开始和结束的下标。
　　
　　
　　让我们继续前行
　　
　　----------------------------------------------------------------------
　　
　　努力不一定成功，但不努力肯定不会成功。
共勉
　　

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

RH199 RHCSA Rapid Track

Red Hat RHCE 8 (EX294) Cert Guide

[经验分享] InternalInputBuffer处理HTTP请求行-Tomcat源码-我们到底能走多远系列（11）

浏览过的版块

扫码加入运维网微信交流群