背景
之前听过一种说法,要尽量把TIME_WAIT状态留在客户端(客户端主动发起FIN),而不是服务端。因为TIME_WAIT状态的连接需要等待2MSL后才能被释放,会导致资源占用。这种说法很合理,但我在看WebSocket RFC时,文档中提到要让服务端主动发起FIN,让TIME_WAIT状态留在服务端,因为TIME_WAIT不会影响服务端处理新连接。这和前面的说法不一致,下面我们来看看问题出在哪里。
The underlying TCP connection, in most normal cases, SHOULD be closed
first by the server, so that it holds the TIME_WAIT state and not the
client (as this would prevent it from re-opening the connection for 2
maximum segment lifetimes (2MSL), while there is no corresponding
server impact as a TIME_WAIT connection is immediately reopened upon
a new SYN with a higher seq number). In abnormal cases (such as not
having received a TCP Close from the server after a reasonable amount
of time) a client MAY initiate the TCP Close. As such, when a server
is instructed to Close the WebSocket Connection it SHOULD initiate
a TCP Close immediately, and when a client is instructed to do the
same, it SHOULD wait for a TCP Close from the server.
服务端TIME_WAIT实验
先写个简单的程序看看服务端处理TIME_WAIT的现象。内核版本3.10。实验代码见下文。
启动server
1 | LISTEN 0 1 127.0.0.1:8888 *:* users:(("server",pid=656524,fd=3)) |
使用nc固定一个端口号连接服务端,服务端向客户端输出hello后主动关闭连接,Ctrl-C结束nc程序(内核会自动回复ack)
1 | nc -p 59818 127.0.0.1 8888 |
观察到连接变为TIME_WAIT状态
1 | LISTEN 0 1 127.0.0.1:8888 *:* users:(("server",pid=656524,fd=3)) |
再次使用相同端口向服务端发起连接,抓包发现握手成功,原来处于TIME-WAIT状态的连接被重用
1 | IP 127.0.0.1.59818 > 127.0.0.1.8888: Flags [S], seq 2318980489, win 43690, options [mss 65495,nop,nop,sackOK,nop,wscale 7], length 0 |
实际验证下来,TIME—WAIT状态并没有阻碍相同四元组的TCP新连接建立,看起来是没有“副作用“的。
内核源码
查看我所使用的3.10内核源码,在连接处于TIME-WAIT状态时,如果收到SYN是合法的,则会重用此连接。核心原理是判断收到的SYN序列号和时间戳(如果开启)是否发生回绕,注释中如果只判断序列号回绕,在小于40Mbit/sec的网络下是安全的,如果开启了TCP时间戳(PAWS),整个机制更值得信赖,风险更小。
1 | /* Out of window segment. |
客户端TIME-WAIT实验
客户端是否能复用处于TIME-WAIT状态的连接呢,使用nc验证下。
1 | # 启动服务端 |
实验显示客户端默认无法重用处于TIME-WAIT的连接。
如果想让客户端尽快复用处于TIME-WAIT的连接,也是有办法的,答案就是tcp_tw_reuse
内核参数。
net.ipv4.tcp_tw_reuse
如果开启此选项,客户端在调用connect()连接远端服务器时,如果内核发现相同四元组的连接处于TIME-WAIT状态,且此状态时间超过1s,就会重用这个连接。
1 | # tcp_tw_reuse生效前提是tcp_timestamps开启,此选项是默认开启的 |
虽然客户端可以通过tcp_tw_reuse
复用TIME-WAIT连接,但此“复用”并没有服务端复用TIME-WAIT安全。详细可参考这里
总结
RFC6455建议是很合理的,虽然服务端和客户端都有机制复用TIME-WAIT连接,但服务端复用的行为更安全。而且服务端资源有限,应该本着不信任客户端的原则,掌握断开连接的主动权,避免因等待客户端断连而浪费资源。
实验代码
server
1 | package main |
- https://www.rfc-editor.org/rfc/rfc1122#page-88
- https://www.rfc-editor.org/rfc/rfc1323.html
- https://xiaolincoding.com/network/3_tcp/time_wait_recv_syn.html
- https://serverfault.com/questions/693529/how-does-server-side-time-wait-really-work
- https://web.archive.org/web/20141029223553/http://blogs.technet.com/b/networking/archive/2010/08/11/how-tcp-time-wait-assassination-works.aspx