Using TCP for Low-Latency Applications

Last week I ran into a nasty little problem while implementing an application with soft real-time requirements. I was aiming at 1 ms or less for a TCP-based request-response roundtrip on a local network. Should be trivial, but why did my tests indicate that I wasn't even getting close?

The scenario was simple: A server (my part) gets a request from a client. Before it can answer, it has to ask a backend system for some information. The backend system listens on a TCP port and answers in a query-response fashion (processing time is far below 1 ms). Both query and response typically fit in a single TCP segment. The response may sometimes be larger, which is one reason why TCP was an adequate choice for the backend system.

As a first optimization I used a TCP connection pool to get around TCP's three-way handshake (SYN, SYN ACK, ACK) and thus cutting down latency. However, tracing showed response times around 40 ms on the local network. Actually worse than without the connection pool! Admittedly, the conventional use for connection pools is to circumvent TCP's slow start mechanism. But still, something was very wrong.

I quickly guessed that this was a buffering issue with the TCP stack or maybe Java's socket wrapper. Since calling flush() on the socket didn't seem to help either, I turned to the Socket FAQ. As usual, the FAQ provides you with two pieces of information: a) the solution that works well in practice and b) the impression that things are more complicated than one would guess.

The relevant section of the FAQ is "2.11. How can I force a socket to send the data in its buffer?". It starts discouragingly with a quote from Richard Stevens himself:

"You can't force it. Period. TCP makes up its own mind as to when it can send data. Now, normally when you call write() on a TCP socket, TCP will indeed send a segment, but there's no guarantee and no way to force this."

Sounds bad. But wait, wasn't there a PUSH flag in the TCP header for speeding things up a bit? From RFC 793:

"A sending TCP is allowed to collect data from the sending user and to send that data in segments at its own convenience, until the push function is signaled, then it must send all unsent data. When a receiving TCP sees the PUSH flag, it must not wait for more data from the sending TCP before passing the data to the receiving process."

Too bad: According to the FAQ, the socket API gives you no way to trigger this magic push functionality! It's not connected to flush(), as one might have guessed.

So, game over? Fortunately not. One thing you can do is disabling Nagle's Algorithm by setting TCP_NODELAY on the socket (in Java, that's Socket.setNoDelay(true)). Nagle's Algorithm is basically a buffering facility inside the kernel's TCP stack. It makes sure data from multiple write() doesn't necessarily result in sending TCP segments immediately. Instead, data is collected, reducing network overhead for applications where sometimes just a single byte of user data is sent over the network (telnet-style applications come to mind). Unfortunately, it gets in our way causing the 40 ms delays mentioned earlier.

As a result, thanks to TCP_NODELAY response times to the backend system dropped below the 1 ms threshold. Great!

The moral of this story: If your TCP-based real-time application suffers from bad latency, try setting TCP_NODELAY. And also, even if you know TCP fairly well, you still don't get the full picture unless you also know how your operating system's TCP stack and APIs work.

social