May, 2006

Larry Osterman's WebLog

Confessions of an Old Fogey
  • Larry Osterman's WebLog

    Making stuff go fast on a network, part 1

    • 20 Comments
    So yesterday, I commented on how I was glossing over lots of details about how to make a client/server protocol operate efficiently on a network.  After I wrote it, I realized that while it's out of scope for a single post, how to make a client/server protocol work efficiently can be part of a series.

    So what does it mean to make a program "go fast" on a network?  Well, as I intimated the last time, it's "complicated".  To explain why, I've got to explain how traffic flows on a network.  First off, a caveat: The last time I did any serious work in network performance analysis, it was on 10base-T networks, and the performance characteristics of gigabit networks are likely to be somewhat different.  On the other hand, as far as I know, all the principles I'm thinking about still apply.

     

    Let's start with some networking basics.  For the purposes of this discussion, I'm going to limit my discussion to connection oriented protocols (like TCP, SPX or NetBEUI).   I'm not going to discuss datagram based protocols (like IP, UDP and IPX), because (in general) datagram based protocols don't have any of the "interesting" semantics I'm describing.

    First, some definitions:

    1. When you're transferring data on a connection oriented protocol, there are two principals involved, the sender and receiver (I'm not going to use the words "client" and "server" because they imply a set of semantics associated with a higher level protocol.
    2. A "Frame" is a unit of data sent on the wire. 
    3. A "Packet" is comprised of one or more frames of data depending on the size of the data being sent.
    4. A "Message" is comprised of one or more packets of data and typically corresponds to a send() call on the sender.

    And some axioms:

    1. Networks are unreliable.  The connection oriented protocols attempt to represent this unreliable network as a reliable communication channel, but there are some "interesting" semantics that arise from this.
    2. LAN Networks (Token Ring, Ethernet, etc) all transmit messages in "frames", on Ethernet the maximum frame size (or MSS, Maximum Segment Size) is 1500 bytes.  Some of that frame is used to hold protocol overhead (around 50 bytes or so), so in general, you've got about 1400 bytes of user payload for in each frame available (the numbers vary depending on the protocol used and the networking options used, but 1400 is a reasonable number).
    3. A connection oriented protocol provides certain guarantees:
      1. Reliable delivery - A sender can guarantee one of two things occurred when sending data - the receiver received the data being sent or an error has occurred.
      2. Data ordering - If the sender sends three packets in the order A-B-C, the
        receiver needs to receive the packets in the order A-B-C.

    Now then, given axiom #1 (Networks are unreliable), how do you ensure reliable communication?

    Well, the answer to that question really depends on the answer to a different question: "How does the sender know that the receiver has received the data?"  Remember that the underlying network is unreliable - there are no guarantees that the data is received (although the network promises to do its best to guarantee that the data's been received).  To ensure that the receiver has received the data, the receiver tells the sender that it's received the data with a special packet called an ACK (for acknowledgment).

    If a sender doesn't receive an ack within a timeout period, it will retransmit the data.  It's ok to lose an ACK, the cost of losing an ACK is a retransmission of the data - which is exactly the same as if the receiver hadn't received the ack in the first place.  The receiver knows that the sender has received the ACK because it stops retransmitting the packet (each packet sent by the client has a tag (a sequence number or index of some form) that allows the receiver to identify which packet is being sent).

    So the data flow is (more-or-less):

    Sender Receiver
    Send Packet A  
      Acknowledge "A"
    Send Packet B  
      Acknowledge "B"
    Send Packet C  
      Acknowledge "C"

    You'll notice that to send 3 packets of user data on the wire, you end up sending 6 packets!  Remember from my comment yesterday: It takes effectively the exact same amount of time to send 1 byte of data as it does 1400 bytes.  So even though the ACK packet is very small, it means that you spend twice the time sending a packet - the data and the ACK.

    Now then, what happens when you want to send more data than fits in a single packet?  1400ish bytes isn't very much, protocols often send far more than that data in a single send() call, and the transport needs to deal with that.  Well, the sender breaks the request into frames and sends them in turn:

    Sender Receiver
    Send Packet A, Frame 1  
      Acknowledge "A.1"
    Send Packet A, Frame 2  
      Acknowledge "A.2"
    Send Packet A.Frame 3  
      Acknowledge "A.3"

    There is an interesting consequence to that last guarantee:  It turns out that the sender can't start transmitting B until after it knows that the receiver has received A.  Why is that?

    Well, remember our third axiom?  Part B says that a connection oriented transport guarantees data ordering.  This directly means that the sender can't start transmitting B until it knows A has been received.  If that wasn't the case (the sender started transmitting B before A was acknowledged), then the receiver might receive B before A, which would violate the 3rd axiom.  Now it's possible that the receiver might hold onto B and not hand it to the application until it has received A.  But that means that some amount of the data buffers on the receiver will be held by the data for B. If A never arrives, you've just wasted that data.  It's easier to simply never transmit B before A's been acknowledged and that way you guarantee ordering.

     

    Ok, so far so good, that's the basics of data transmission.  But why do you have to acknowledge each packet within the larger transmission?   Surely there's a better way.

    And there is, it's called a sliding window algorithm.  With a sliding window algorithm, the transmission looks like:

    Sender Receiver
    Send Packet A. Frame 1  
    Send Packet A. Frame 2  
    Send Packet A. Frame 3  
      Acknowledge "A.1, A.2, and A.3"

    Now the actual algorithm for the sliding window is way beyond the scope of this already too long article, but you get the idea - the receiver waits for a couple of packets to be received and acknowledges all of them.

    Again, this is remarkably simplified, if you care about the details, consult your nearest protocol manual.

    Tomorrow: Ok, that's cool, now we know how to send data.  But what good is that?

    Edit: Rearranged some text because I accidentally left a really important thought hanging.

  • Larry Osterman's WebLog

    Making things go fast on a network, part 2

    • 13 Comments

    Sorry about the delay - I got nailed by a nasty sinus infection on Tuesday night last week that took me out until today.

     

    In my last post I started discussing some of the aspects of networking that need to be understood before you can make things "go fast" on a network.

    As a quick recap, here are the definitions and axioms from that post (note: I've changed the definitions of packet and message because they make the subsequent articles easier):

    First, some definitions:

    1. When you're transferring data on a connection oriented protocol, there are two principals involved, the sender and receiver (I'm not going to use the words "client" and "server" because they imply a set of semantics associated with a higher level protocol.
    2. A "Frame" is a unit of data sent on the wire. 
    3. A "Packet" is comprised of one or more frames of data depending on the size of the data being sent, it typically corresponds to a send() call on the sender
    4. A "Message" is comprised of one or more packets of data and typically corresponds to a higher level protocol verb or response.

    And some axioms:

    1. Networks are unreliable.  The connection oriented protocols attempt to represent this unreliable network as a reliable communication channel, but there are some "interesting" semantics that arise from this.
    2. LAN Networks (Token Ring, Ethernet, etc) all transmit messages in "frames", on Ethernet the maximum frame size (or MSS, Maximum Segment Size) is 1500 bytes.  Some of that frame is used to hold protocol overhead (around 50 bytes or so), so in general, you've got about 1400 bytes of user payload for in each frame available (the numbers vary depending on the protocol used and the networking options used, but 1400 is a reasonable number).
    3. A connection oriented protocol provides certain guarantees:
      1. Reliable delivery - A sender can guarantee one of two things occurred when sending data - the receiver received the data being sent or an error has occurred.
      2. Data ordering - If the sender sends three packets in the order A-B-C, the
        receiver needs to receive the packets in the order A-B-C.

    At the end of the last post, I introduced one consequence of these axioms:  When sending packets A, B, and C, the sender cant transmit packet B until the receiver has acknowledged receipt of packet A.  This was because of axiom 3.b.

    There's one thing I forgot in my last post:

    What happens when the receiver isn't ready to receive data from the client?

    Well, it's not very pretty, and the answer depends on the semantics of the protocol, but in general, if the receiver doesn't have room for the packet, it sends a "NACK" to the client (NACK stands for Negative ACKnowledgement).  A NACK tells the client that there's no storage for the request, the client now needs to decide what to do.  Sometimes the NACK contains a hint as to the reason for the failure, for instance the NetBEUI protocol's NACK response includes the reasons like "no remote resources, unexpected request, out of sequence".  The client can use this information to determine if it should hang up the connection or retry (for no remote resources, for instance, it should retry).

    Sender Receiver
    Send Packet A.1  
      Send ACK A.1
    Send Packet A.2  
      Send NACK A.2 (No Memory)
    Send Packet A.2  
      Send NACK A.2 (No Memory)
    Send Packet A.2  
      Send ACK A.2
    Send Packet A.3  
      Send ACK A.3

    All of this retransmission goes on below the covers, applications don't typically need to know about it.  But there's a potential perf pitfall here.

     

    If you're analyzing network traces, you often see this pattern:

    Sender Receiver
    Send Packet A.1  
      Send NACK A.1 (No Memory)
    Send Packet A.1  
      Send NACK A.1 (No Memory)
    Send Packet A.1  
      Send NACK A.1 (No Memory)
    Send Packet A.2  
      Send ACK A.2
    Send Packet A.3  
      Send ACK A.3

    What happened here?  Well, most likely the problem was the receiver didn't have a receive buffer down waiting for the sender to send data, so the sender had to retransmit its data to the receiver before the receiver got around to being able to receive it.

    So here's "Making things go fast on a network" perf rule number 1:

            Always make sure that you have a receive request down BEFORE someone tries to send you data.

    In traditional client/server networking, this rule applies to clients as well as servers - a client that doesn't have a receive outstanding when the server sends the response to a request, it will stall in the same way waiting on the client to get its receive down.

     

    Btw, a piece of silly trivia.  J Allard, of XBox fame used to have two iguanas in his office named ACK and NACK back when he was the PM for NT networking.

     

  • Larry Osterman's WebLog

    Riffing on Raymond - Network performance...

    • 12 Comments

    I keep on doing this, clearly it's evidence of a lack of imagination on my part...

    Raymond's post a while ago discussed some of the problems with network latency (no, I'm not going to touch that particular can of worms).

    It's amazing how many people don't understand how big a deal this problem is.  When I joined the Exchange team back in the mid 1990s, the perf team was spending a HUGE amount of time analyzing the Exchange store RPC traces trying to figure out ways of squeezing out every single byte from the RPC traffic.

    They'd defined compressed forms of Exchange EntryIDs, they were considering encoding Unicode strings using some neutral encoding (UTF8 hadn't been invented at that point, so they were trying to roll their own).

    I came on the team and looked at what they were doing and was astounded.  They were sweating bricks trying to figure out how to squeeze out individual bytes of data from each packet.

    The thing is that the reality was that for the vast majority of cases, all that work didn't actually make a difference.

    The reason has to do with the basic nature of Ethernet based networking (token ring and ATM have different characteristics, but my comments here apply to them as well, it's just that the numbers and behavior characteristics are slightly different).

    In general for all LAN networks, it takes essentially the same time to send one byte of data as it does to send 1K of data.  When you start sending more than 1K of data, then the numbers will start to grow (because you're sending more than one packet), but even then, the overhead of sending 10K of data isn't significantly higher than sending 1K.

    On the other hand, round trips will KILL your performance.  So if you've got a choice between sending 100 messages with 1K in each message and 1 message with a 100K payload, you want to send the 1 100K message all the time.

    Needless to say, I'm MASSIVELY glossing over the issues associated with sending data across a network, the above is simply a reasonable rule of thumb - the round trips are what matters, not the bytes being sent.

     

    Now, having said all that, when you're dealing with dial-up networks, the rules are completely different.  On a 9600 baud connection, it takes one millisecond to send one byte, which means that every single byte counts.  In the Exchange case, since Exchange was designed for corporations with wired networks, it made sense to design the client/server protocol for the LAN environment.  But when you're designing a feature that's intended to be used over dialup, the rules are totally different.  Among other things to consider, on a dial up network, the modems themselves do compression, so compressing the data before transmission isn't always a benefit (compressing already compressed data tends to increase the size of the compressed data (assuming the compression algorithm's worth its salt)).

  • Larry Osterman's WebLog

    Why are computer games like restaurants?

    • 12 Comments
    I was chatting with one of the devs in my group a couple of minutes ago, and we got to talking about the economics of movie computer-game tie-ins.

    As we were chatting, I had this sudden realization.  The economics of a computer video game are almost completely identical to those of a new restaurant.

    It's not as far-fetched an analogy as you might think.

    The vast majority of computer games published aren't particularly profitable.  For every Halo, there are a dozen Universal Studios Theme Park Adventure (no offense to the people who wrote that game, I came upon it with a search for "worst video game").  For every Olive Garden, there are a dozen mom and pop restaurants that fail (and major chains).

    There are a ton of similarities between video games and restaurants.  For instance, consider just some of the factors that lead to the success of a game or a restaurant:

    Computer Game Restaurant
    Game-Play (quality of play) Chef (quality of food)
    Shelf-Space in retailers (more prominent is better) Location (more prominent is better)
    Reviews Reviews
    Story Story (yes, restaurants tell stories)
    Celebrity Game Designer (think Sid Meier) Celebrity Chef (think Emeril or Bobbie Flay)

    One item in that table above that might puzzle people is "Story".  As I mentioned, restaurants tell stories - it's particularly true for chain restaurants like Claim Jumper, Mortons or The Olive Garden.  The story of a restaurant is told via the decor, the exterior decoration, the uniforms of the wait staff, all the things that make up your experience in the restaurant.

    Other intangibles that play into the analogy: With very few exceptions (Half-Life, Halo, etc), computer games make most of their money in the first couple of months they are sold. Similarly, most restaurants will fail in the first couple of years of their existance. 

    The other aspect of this is money.  Both restaurants and computer games are optional expenses - they come from discretionary spending, which means that when times are lean sales go down.

    The more I think about the analogy, the more I like it.

    I don't have a ton of facts to back this analogy, but I think it works. 

    ETA: discussion of money and fix the table width.

  • Larry Osterman's WebLog

    Riffing on Raymond - When caches hurt

    • 12 Comments

    On Tuesday, Raymond posted "A cache with a bad policy is another name for a memory leak".

    Of course that immediately reminded me of a war story from back in the days of Exchange 5.0.  One of the major features of Exchange 5.0 was an NNTP server. 

    NNTP (and POP3) are rather interesting protocols.  In general, clients don't tend to stay connected to the server for a long time, instead they tend to connect, authenticate, perform a couple of operations nd terminate the connection.  Many protocols have similar usage patterns (HTTP comes to mind immediately). 

    Of course, when you're building a server, you need to do performance analysis to determine how fast the server can go (or in the case of an email/news server, how many clients it can support).  For the purposes of performance analysis, we measure requests in terms of "transactions".  For the POP3 or NNTP server, each of these connect/authenticate/operations/terminate sequences was is considered a "transaction".  By measuring the amount of time it takes for each of these transactions to execute, you can easily determine the scalability of your server.  If each transaction takes <n> milliseconds, then you can only perform 1000/<n> transactions per second.  Obviously if you can perform <m> transactions at the same time, then the number of transactions per second becomes (1000/<n>)*<m>.  But <m> is typically a relatively small number (if you're CPU bound, then <m> is typically the number of CPUs on your machine, for instance).  Calculating <m> can be quite complicated, and is way out of scope for this post.

    Obviously if you increase <m> or decease <n>, you'll improve your performance, and when doing perf analysis, you do both.  When we started breaking down <n>, we realized that there were a couple of huge items that were effecting <n>.  The first cost was authentication, the other was reading information about the newsgroup from the Exchange DS (that was required during the "operations" step).  The combined cost of these two items was about 500 milliseconds of the 510 total cost per transaction.

    There were clear benefits here, so I built two separate caches, one for the directory lookups and the other for the authentications.  Both worked flawlessly, and our performance jumped dramatically.

    Things went great until we started getting client stress tests run against the NNTP server.  All of a sudden, our testers started reporting that the NNTP server was leaking memory like crazy.  It was consuming hundreds of megabytes (on a machine with 64M of physical memory), and the machine was essentially unusable.

    Not a problem, the Exchange store had extensive memory leak tracking logic, we just stopped the server to let the leak detection logic do its stuff and tell us where the leak was.

    Of course you know what happened: The leak detection logic reported that there were no leaks.  So, being the typical arrogant developers that I am, I went back to the testers and said "There are no leaks, you must be dreaming".  Needless to say, that didn't go over too well, so I dug deeper.

    I hooked a debugger up to the process and started poking randomly at allocated memory, and discovered that the DS cache was huge.  I dug a bit deeper and discovered that the problem was buried deep in the DS cache logic.  It turns out that the there was an uninitialized variable in the hash function that I was using in my cache, which caused all the entries in the DS cache to have their own buckets.

    So yes, a cache with a bad policy (or a bad cache function) is undistinguishable from a memory leak.

     

  • Larry Osterman's WebLog

    "Pippin"at the Fifth

    • 7 Comments

    I went to see the 5th Avenue Theater's production of "Pippin" last night.

    Pretty good production, wonderful minimalist sets and the typical great 5th Avenue talent (Keith Byron Kirk as the Leading Player was especially good).

    Pippin's an adult show, with adult themes (though not as adult as Hair), I wouldn't take small kids to it, but I loved it (Pippin's one of my all time favorite shows, I saw it in 1970's at Proctor's Theater in Schenectady with Ben Vereen and William Katt (the same cast as is on the video)).

    I utterly loved "War is a Science", the image of generals dancing in office chairs was utterly perfect (including Condoleeza Rice).

     

    In this production, they made one change to the ending that totally blew me away.

    SPOILER WARNING (for a 30 year old show):

    Normally at the end of Pippin, Catherine turns to Pippin and asks "How do you feel?", and Pippin responds "Trapped but happy".

    In this production, Theo (Catherine's son) walks to the center of the empty stage and starts singing a reprise of "Corner of the Sky".  Upon hearing this, the Leading Player and the rest of the cast come out and surround Theo singing a chorus of "Magic to do".

     

    I was astounded.  This one change to the ending totally changed the flavor of the show making it significantly darker - in the original ending, you feel like Pippin has compromised his belief in being extraordinary for a simpler life, but with the new ending, the Players lie in wait to undermine wheoever feels their life should be extraordinary.

  • Larry Osterman's WebLog

    The blog goes dark for a while

    • 7 Comments

    I'm chaperoning my son's trip to Washington DC and NYC for the next 10 days, so I'm not going to have any time to blog.  I'll see what I can do about getting internet access to keep up with the moderation, but no guarantees. 

     

    I'll be back after Memorial Day to finish up the "how to make stuff go fast" series.

     

     

  • Larry Osterman's WebLog

    What's up with Larry not blogging?

    • 5 Comments

    <me>Must make time to blog</me>

    You know it's bad when your co-workers stop by and ask why you've not written anything on your blog in a while.

    Mea-culpa, I've been utterly swamped working on putting Vista Beta2 to bed, and I've not made time to blog (recovering from a cold hasn't helped either).  The crises never seemed to stop - mostly tiny littl ones, but they added up.  As Valorie keeps on saying: You just have to sit down at the keyboard and write.

     

    And I'm an uncle for the first time!!!  On Monday, my brother Jeff and his wife Susan had their twin baby girls!  Very, very exciting :).

    But I didn't even blog that.  I promise, I'll try to be better.

     

Page 1 of 1 (8 items)