Thursday, October 11, 2007

Hardware TCP implementations

One way to overcome the processing power requirements of TCP is building hardware implementations of it, widely known as TCP Offload Engines (TOE). The main problem of TOEs is that they are hard to integrate into computing systems, requiring extensive changes in the operating system of the computer or device. The first company to develop such a device was Alacritech.
Debugging TCP
A packet sniffer, which intercepts TCP traffic on a network link, can be useful in debugging networks, network stacks and applications which use TCP by showing the user what packets are passing through a link. Some networking stacks support the SO_DEBUG socket option, which can be enabled on the socket using setsockopt. That option dumps all the packets, TCP states and events on that socket which will be helpful in debugging. netstat is another utility that can be used for debugging.
Alternatives to TCP
For many applications TCP is not appropriate. One big problem (at least with normal implementations) is that the application cannot get at the packets coming after a lost packet until the retransmitted copy of the lost packet is received. This causes problems for real-time applications such as streaming multimedia (such as Internet radio), real-time multiplayer games and voice over IP (VoIP) where it is sometimes more useful to get most of the data in a timely fashion than it is to get all of the data in order.
Also for embedded systems, network booting and servers that serve simple requests from huge numbers of clients (e.g. DNS servers) the complexity of TCP can be a problem. Finally some tricks such as transmitting data between two hosts that are both behind NAT (using STUN or similar systems) are far simpler without a relatively complex protocol like TCP in the way.
Generally where TCP is unsuitable the User Datagram Protocol (UDP) is used. This provides the application multiplexing and checksums that TCP does, but does not handle building streams or retransmission giving the application developer the ability to code those in a way suitable for the situation and/or to replace them with other methods like forward error correction or interpolation.
SCTP is another IP protocol that provides reliable stream oriented services not so dissimilar from TCP. It is newer and considerably more complex than TCP so has not yet seen widespread deployment, however it is especially designed to be used in situations where reliability and near-real-time considerations are important.
Venturi Transport Protocol (VTP) is a patented proprietary protocol that is designed to replace TCP transparently in order to overcome perceived inefficiencies related to wireless data transport.
TCP also has some issues in high bandwidth utilization environments. The TCP congestion avoidance algorithm works very well for ad-hoc environments where it is not known who will be sending data, but if the environment is predictable, a timing based protocol such as ATM can avoid the overhead of the retransmits that TCP needs.
TCP segment structure
A TCP segment consists of two sections:
header
data
The header consists of 11 fields, of which only 10 are required. The eleventh field is optional (pink background in table) and aptly named: options.
TCP Header
Source port – identifies the sending port
Destination port – identifies the receiving port
Sequence number – has a dual role
If the SYN flag is present then this is the initial sequence number and the first data byte is the sequence number plus 1
if the SYN flag is not present then the first data byte is the sequence number
Acknowledgment number – if the ACK flag is set then the value of this field is the sequence number that the sender of the acknowledgment expects next.
Data offset – specifies the size of the TCP header in 32-bit words. The minimum size header is 5 words and the maximum is 15 words thus giving the minimum size of 20 bytes and maximum of 60 bytes. This field gets its name from the fact that it is also the offset from the start of the TCP packet to the data.
Reserved – for future use and should be set to zero
Flags (aka Control bits) – contains 8 bit flags
CWR – Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it received a TCP segment with the ECE flag set (added to header by RFC 3168).
ECE (ECN-Echo) – indicate that the TCP peer is ECN capable during 3-way handshake (added to header by RFC 3168).
URG – indicates that the URGent pointer field is significant
ACK – indicates that the ACKnowledgment field is significant
PSH – Push function
RST – Reset the connection
SYN – Synchronize sequence numbers
FIN – No more data from sender
Window – the number of bytes that may be received on the receiving side before being halted from sliding any further and receiving any more bytes as a result of a packet at the beginning of the sliding window not having been acknowledged or received. Starts at acknowledgement field.
Checksum – The 16-bit checksum field is used for error-checking of the header and data
Urgent pointer – if the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte
Options – the total length of the option field must be a multiple of a 32-bit word and the data offset field adjusted appropriately
Fields used to compute the checksum
TCP checksum using IPv4
When TCP runs over IPv4, the method used to compute the checksum is defined in RFC 793:
The checksum field is the 16 bit one's complement of the one's complement sum of all 16-bit words in the header and text. If a segment contains an odd number of header and text octets to be checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum purposes. The pad is not transmitted as part of the segment. While computing the checksum, the checksum field itself is replaced with zeros.
In other words, all 16-bit words are summed together using one's complement (with the checksum field set to zero). The sum is then one's complemented. This final value is then inserted as the checksum field. Algorithmically speaking, this is the same as for IPv6. The difference is in the data used to make the checksum. When computing the checksum, a pseudo-header that mimics the IPv4 header is shown in the table below.
The source and destination addresses are those in the IPv4 header. The protocol is that for TCP (see List of IPv4 protocol numbers): 6. The TCP length field is the length of the TCP header and data.
TCP checksum using IPv6
When TCP runs over IPv6, the method used to compute the checksum is changed, as per RFC 2460:
Any transport or other upper-layer protocol that includes the addresses from the IP header in its checksum computation must be modified for use over IPv6, to include the 128-bit IPv6 addresses instead of 32-bit IPv4 addresses.
Source address – the one in the IPv6 header
Destination address – the final destination; if the IPv6 packet doesn't contain a Routing header, that will be the destination address in the IPv6 header, otherwise, at the originating node, it will be the address in the last element of the Routing header, and, at the receiving node, it will be the destination address in the IPv6 header.
TCP length – the length of the TCP header and data;
Next Header – the protocol value for TCP
Data
The last field is not a part of the header. The contents of this field are whatever the upper layer protocol wants but this protocol is not set in the header and is presumed based on the port selection.

No comments: