Motivation

So far we have used active techniques to place ourselves in the middle of connections, but have acted only as an observer. But we can do more. From our vantage point in the middle of a connection, we can alter the traffic that we are forwarding along, or chose to replace or drop the channel content entirely.

This opens up a large number of opportunities to attack systems and gain access that cannot be achieved simply by observing the existing network flows.

TCP Sessions

After you login to a service, such as Telnet, you don’t need to re-authenticate for every command you send and every action you take. Instead, your authenticated status is tied to your session.

In the case of Telnet, as well as more modern secure protocols such as SSH and mTLS, your session is bound to your network connection [1]. As long as you maintain the connection, you are authenticated and any actions you take on that connection are authorized by your identity.

In Telnet, the authenticated terminal session is tied to the TCP session, starting with the three-way-handshake and ending when the connection closes, by either party sending FIN (polite), RST (rude), or simply closing the connection (passive aggressive).

This means that if an attacker can take control of the connection, they take control of the authenticated session and gain access as the user. [2]

Flows

So then, what defines a TCP connection? Each party only sees a bunch of packets and there is no concept of a connection native to layers 1, 2, or 3. Instead, the receiver determines what TCP connection a given packet belongs to by looking at the TCP header information. A particular “flow” is defined by the source IP address, source port, destination IP address, and destination port.

Conceptually, this means that if you send a packet to a machine using the same header information as an existing connection, it will be interpreted as part of the given flow. Now, it not quite that simple with TCP. (But you bet it is that simple with UDP!)

In protocols where authentication is tied to the connection, injecting additional data into the flow causes the injected messages to be interpreted in the authenticated context, as if it really came from the authenticated user. If we are in the middle of a connection, we can additionally remove data from the stream as well. In other words, we can hijack an ongoing session to put it to whatever use we like!

Sequence and acknowledgment numbers

In addition to sender and receiver information, for a packet to be considered a valid member of a TCP flow, it needs to have the correct sequence (seq) number and acknowledgment (ack) number. Which respectively count the number of payload bytes sent and received by the sender as part of the connection.

Untitled

You might imagine that for a new connection, the sequence and acknowledgment numbers both start at 0 (i.e. Indicating that zero bytes have been sent and received). However, this would introduce security and reliability issues.

On security, setting the sequence and acknowledgment numbers to 0 at the start makes it reasonable easy to guess the correct numbers to include in a packet and, again, execute a blind injection attack on a TCP session. Furthermore, if an attacker is able to inject any extra data into the stream, the peers will lose synchronization and drop the connection. Effectively, a single packet of blind injection in TCP results in the connection being taken over.

Instead the peer that initiates the connection selects their sequence number in the first SYN packet. The responder uses that sequence number to determine their acknowledgment number and selects their own sequence number to include in the SYN ACK packet. Any decent implementation of TCP will select the sequence number at random. Because the sequence and acknowledgement numbers are each 32-bits, it is difficult (although not always impossible) to guess these numbers. As a result, blind injection attacks against TCP connections are rare.

However! If we are in the middle of a connection, there is no need to guess! We can simply use the sequence and acknowledgement numbers that we see coming across the wire.

If we want to modify the contents of a TCP stream it is important to ensure the the sequence and acknowledgement numbers are consistent for each peer. I.e. the sever should see a consistent view of how many bytes have been sent and received, and the client should have their own consistent view even though the attacker is in the middle modifying the data flow such that those views are different.