Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Making stuff go fast on a network, part 1

Making stuff go fast on a network, part 1

  • Comments 20
So yesterday, I commented on how I was glossing over lots of details about how to make a client/server protocol operate efficiently on a network.  After I wrote it, I realized that while it's out of scope for a single post, how to make a client/server protocol work efficiently can be part of a series.

So what does it mean to make a program "go fast" on a network?  Well, as I intimated the last time, it's "complicated".  To explain why, I've got to explain how traffic flows on a network.  First off, a caveat: The last time I did any serious work in network performance analysis, it was on 10base-T networks, and the performance characteristics of gigabit networks are likely to be somewhat different.  On the other hand, as far as I know, all the principles I'm thinking about still apply.

 

Let's start with some networking basics.  For the purposes of this discussion, I'm going to limit my discussion to connection oriented protocols (like TCP, SPX or NetBEUI).   I'm not going to discuss datagram based protocols (like IP, UDP and IPX), because (in general) datagram based protocols don't have any of the "interesting" semantics I'm describing.

First, some definitions:

  1. When you're transferring data on a connection oriented protocol, there are two principals involved, the sender and receiver (I'm not going to use the words "client" and "server" because they imply a set of semantics associated with a higher level protocol.
  2. A "Frame" is a unit of data sent on the wire. 
  3. A "Packet" is comprised of one or more frames of data depending on the size of the data being sent.
  4. A "Message" is comprised of one or more packets of data and typically corresponds to a send() call on the sender.

And some axioms:

  1. Networks are unreliable.  The connection oriented protocols attempt to represent this unreliable network as a reliable communication channel, but there are some "interesting" semantics that arise from this.
  2. LAN Networks (Token Ring, Ethernet, etc) all transmit messages in "frames", on Ethernet the maximum frame size (or MSS, Maximum Segment Size) is 1500 bytes.  Some of that frame is used to hold protocol overhead (around 50 bytes or so), so in general, you've got about 1400 bytes of user payload for in each frame available (the numbers vary depending on the protocol used and the networking options used, but 1400 is a reasonable number).
  3. A connection oriented protocol provides certain guarantees:
    1. Reliable delivery - A sender can guarantee one of two things occurred when sending data - the receiver received the data being sent or an error has occurred.
    2. Data ordering - If the sender sends three packets in the order A-B-C, the
      receiver needs to receive the packets in the order A-B-C.

Now then, given axiom #1 (Networks are unreliable), how do you ensure reliable communication?

Well, the answer to that question really depends on the answer to a different question: "How does the sender know that the receiver has received the data?"  Remember that the underlying network is unreliable - there are no guarantees that the data is received (although the network promises to do its best to guarantee that the data's been received).  To ensure that the receiver has received the data, the receiver tells the sender that it's received the data with a special packet called an ACK (for acknowledgment).

If a sender doesn't receive an ack within a timeout period, it will retransmit the data.  It's ok to lose an ACK, the cost of losing an ACK is a retransmission of the data - which is exactly the same as if the receiver hadn't received the ack in the first place.  The receiver knows that the sender has received the ACK because it stops retransmitting the packet (each packet sent by the client has a tag (a sequence number or index of some form) that allows the receiver to identify which packet is being sent).

So the data flow is (more-or-less):

Sender Receiver
Send Packet A  
  Acknowledge "A"
Send Packet B  
  Acknowledge "B"
Send Packet C  
  Acknowledge "C"

You'll notice that to send 3 packets of user data on the wire, you end up sending 6 packets!  Remember from my comment yesterday: It takes effectively the exact same amount of time to send 1 byte of data as it does 1400 bytes.  So even though the ACK packet is very small, it means that you spend twice the time sending a packet - the data and the ACK.

Now then, what happens when you want to send more data than fits in a single packet?  1400ish bytes isn't very much, protocols often send far more than that data in a single send() call, and the transport needs to deal with that.  Well, the sender breaks the request into frames and sends them in turn:

Sender Receiver
Send Packet A, Frame 1  
  Acknowledge "A.1"
Send Packet A, Frame 2  
  Acknowledge "A.2"
Send Packet A.Frame 3  
  Acknowledge "A.3"

There is an interesting consequence to that last guarantee:  It turns out that the sender can't start transmitting B until after it knows that the receiver has received A.  Why is that?

Well, remember our third axiom?  Part B says that a connection oriented transport guarantees data ordering.  This directly means that the sender can't start transmitting B until it knows A has been received.  If that wasn't the case (the sender started transmitting B before A was acknowledged), then the receiver might receive B before A, which would violate the 3rd axiom.  Now it's possible that the receiver might hold onto B and not hand it to the application until it has received A.  But that means that some amount of the data buffers on the receiver will be held by the data for B. If A never arrives, you've just wasted that data.  It's easier to simply never transmit B before A's been acknowledged and that way you guarantee ordering.

 

Ok, so far so good, that's the basics of data transmission.  But why do you have to acknowledge each packet within the larger transmission?   Surely there's a better way.

And there is, it's called a sliding window algorithm.  With a sliding window algorithm, the transmission looks like:

Sender Receiver
Send Packet A. Frame 1  
Send Packet A. Frame 2  
Send Packet A. Frame 3  
  Acknowledge "A.1, A.2, and A.3"

Now the actual algorithm for the sliding window is way beyond the scope of this already too long article, but you get the idea - the receiver waits for a couple of packets to be received and acknowledges all of them.

Again, this is remarkably simplified, if you care about the details, consult your nearest protocol manual.

Tomorrow: Ok, that's cool, now we know how to send data.  But what good is that?

Edit: Rearranged some text because I accidentally left a really important thought hanging.

  • There's a minor typo:

    "which is exactly the same as if the receiver hadn't received the ack in the first place"

    s/ack/data/
  • Personally, the single most enlightening thing I've ever heard about networking and congestion control is Raymond's "the pipe (in bits) is bandwidth (in bits/second) multiplied by latency (in seconds); you want your window to be as large as your pipe". Not even Stevens could have put it more clearly and concisely
  • Larry,

    "There is an interesting consequence to that last guarantee:  It turns out that the sender can't start transmitting B until after it knows that the receiver has received A.  Why is that?"

    Probably a minor mix up. This makes no sense until you read the following paragraph.


  • Apologies, I should learn how to read properly or work on my attention span.
  • "Personally, the single most enlightening thing I've ever heard about networking and congestion control is Raymond's "the pipe (in bits) is bandwidth (in bits/second) multiplied by latency (in seconds); you want your window to be as large as your pipe". Not even Stevens could have put it more clearly and concisely"

    Its also known as the bandwidth-delay product, refer
    http://www.speedguide.net/bdp.php for a simple BDP calculator.

  • "Remember from my comment yesterday: It takes effectively the exact same amount of time to send 1 byte of data as it does 1400 bytes."

    Just a note there for anybody reading that...thats true as you said on an ethernet network. Not so on the Internet. In other words, we're talking about (as you rightly titled this post) making stuff go fast on "a" network, not a bunch of networks connecting two endpoints.

  • Another minor typo:

    "on Ethernet the maximum frame size (or MSS, Maximum Segment Size) is 1500 bytes"

    This is the Maximum Transmission Unit (MTU).
  • Hi Larry.  I think you should perhaps expand or write another entry explaining why SMB performance over a WAN is so degraded, and maybe you could provide solutions/hacks on how to fix it.  Thx.

    -jc
  • "If the sender sends three packets in the order A-B-C"

    You should probably say messages instead of packets.  For TCP at least the sender has little if any control over how the message gets broken up into packets.

    It seems like a minor nit but network newbies have immense trouble understanding this point and write lots of buggy networking code because of it.
  • nick: Really? Ok, we can have other fragmentation limits, but the latency should still stay much higher than the actual transmission time for the packet. (Of course, if everyone decides to inflate their bandwidth requirement by 139900 %, performance will suffer for all.)
  • cnettel:

    Yup, was talking about fragmentation limits...what happens when you hit a network which can't handle large packets?
    [ refer http://en.wikipedia.org/wiki/Maximum_segment_size ]

    Fragmentation occurs, and while that doesn't affect TCP delay since you aren't waiting for acks, the time taken to send the data now depends on the kind of network being transited. So the guarantee mentioned (1 bytes vs 1400 bytes taking the same amount of time) is no longer absolutely valid.

  • "It turns out that the sender can't start transmitting B until after it knows that the receiver has received A."

    I'm not sure I believe this. Surely the axiom applies only to the interface to the connection oriented protocol? That is, the protocol guarantees to the layer above that it (the layer above) will receive the "message" in the correct order - but whether the connection protocol /itself/, internally, sends and receives everything in order or not is irrelevant.
  • Mu,
     That's right, but remember my comment about buffering - the stack <i>could</i> send out of order and complete out of order IF AND ONLY IF it could guarantee that the receiver had resources to buffer up packet B until packet A is received.  Packet A might never be received, in which case the stack would have wasted the resources.

     That means you have an easy way of mounting a denial of service attack against the server.  Send A.1, B.1-B.20, C.1-C.20, ... Z.1-Z.20.  The server would have to keep all those packets in memory waiting on A.2 to arrive.

     All the connection oriented protocols I know of solve the out-of-order arrival by simply waiting on the ack of A before B is sent.

    It turns out that this ordering behavior is critical, and I'll be talking about this in the 3rd article in this series (I got nailed with a horrible cold so haven't written anything after this post).
  • Ok, I sit corrected. I'd still change your "can't" to a "shouldn't" or "must not" though :)
Page 1 of 2 (20 items) 12