<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Networking, Games, and Virtual Environments : Programming</title><link>http://blogs.msdn.com/johnmil/archive/tags/Programming/default.aspx</link><description>Tags: Programming</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>'Managed Prototypes'</title><link>http://blogs.msdn.com/johnmil/archive/2007/07/27/managed-prototypes.aspx</link><pubDate>Fri, 27 Jul 2007 19:02:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4082652</guid><dc:creator>John L. Miller</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/johnmil/comments/4082652.aspx</comments><wfw:commentRss>http://blogs.msdn.com/johnmil/commentrss.aspx?PostID=4082652</wfw:commentRss><wfw:comment>http://blogs.msdn.com/johnmil/rsscomments.aspx?PostID=4082652</wfw:comment><description>&lt;P&gt;&lt;A class="" href="http://www.microsoft.com/downloads/details.aspx?FamilyID=9a927cf6-16e4-4e21-9608-77f06d2156bb&amp;amp;DisplayLang=en" mce_href="http://www.microsoft.com/downloads/details.aspx?FamilyID=9a927cf6-16e4-4e21-9608-77f06d2156bb&amp;amp;DisplayLang=en"&gt;MSCD&lt;/A&gt; has a &lt;A class="" href="http://research.microsoft.com/news/featurestories/publish/MSCD.docx.aspx?0hp=n1" mce_href="http://research.microsoft.com/news/featurestories/publish/MSCD.docx.aspx?0hp=n1"&gt;front-page story&lt;/A&gt; on &lt;A class="" href="http://research.microsoft.com/" mce_href="http://research.microsoft.com"&gt;research.microsoft.com&lt;/A&gt;. A friend of mine asked me about a quote in the article which could perhaps be misunderstood:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;“It is as much as eight times faster than our original managed prototype, and it’s great that customers will have a chance to experience the benefits for themselves.”&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The fact that our prototype was managed is orthogonal to the performance gains we've seen in our MSCD CTP. The speed-up is instead the result of algorithmic and architectural improvements that came out of our lengthy design and optimization efforts. &lt;/P&gt;
&lt;P&gt;For the record, managed code is awesome. It does run a little slower (5% - 20%) for some things - especially if you're new to it and write your code in a way that makes the system do unneccesary work - for example appending to a string 50 times rather than using StringBuilder. Managed code also runs some things a little faster than C++, which surprised me at first. One thing that I think is incontrovertable: developing in C# and managed code is much, MUCH quicker than doing the same job in C or C++. My experience has been a factor of two or factor of three speed-up in development for the same quality results.&lt;/P&gt;
&lt;P&gt;So, please don't misinterpret the quote in the MSCD story: we're patting ourselves on the back for our algorithmic improvement ingenuity, not dissing managed code.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=4082652" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/johnmil/archive/tags/P2P/default.aspx">P2P</category><category domain="http://blogs.msdn.com/johnmil/archive/tags/Networking/default.aspx">Networking</category><category domain="http://blogs.msdn.com/johnmil/archive/tags/Programming/default.aspx">Programming</category></item><item><title>Developing Distributed Systems</title><link>http://blogs.msdn.com/johnmil/archive/2007/01/01/developing-distributed-systems.aspx</link><pubDate>Mon, 01 Jan 2007 17:23:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1393907</guid><dc:creator>John L. Miller</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/johnmil/comments/1393907.aspx</comments><wfw:commentRss>http://blogs.msdn.com/johnmil/commentrss.aspx?PostID=1393907</wfw:commentRss><wfw:comment>http://blogs.msdn.com/johnmil/rsscomments.aspx?PostID=1393907</wfw:comment><description>&lt;P&gt;Over the last five years I've had the fortune to do research and development on several different distributed systems for Microsoft. I've learned a lot from these efforts, some of them things I embarrassingly knew a decade ago and promptly forgot. Here's the three most important:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Know what you're building. As a corollary, make sure you can tell when you're done.&lt;/LI&gt;
&lt;LI&gt;Establish&amp;nbsp;'correctness' criteria to let know you when things work.&lt;/LI&gt;
&lt;LI&gt;Have a way to figure out what's wrong when your system doesn't work.&lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;The first two aren't too hard as long as you think of them, but the third can be a real bear, especially for distributed system developers.&lt;/P&gt;
&lt;P&gt;Take developing &lt;A class="" href="http://en.wikipedia.org/wiki/Peer_Name_Resolution_Protocol" mce_href="http://en.wikipedia.org/wiki/Peer_Name_Resolution_Protocol"&gt;PNRP (the Peer Name Resolution Protocol)&lt;/A&gt; for example.&amp;nbsp;This is a complicated P2P service which is a special-case of a distributed hash table (DHT). To the untrained eye, PNRP appears to route messages randomly through the set of all people running the service, making it difficult to know if it's working correctly except by final transaction results. By mid 2002 the PeerNet team had PNRP implemented and up and running in the lab. We did a simple test out in the wild with a couple dozen participants. The moment of truth came, and we started our services. Each of us registered our email name, and then we tried to resolve&amp;nbsp;each other.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;To our horror, it didn't work. Even worse, we didn't have any obvious way to diagnose the problems. &lt;/P&gt;
&lt;P&gt;In the absence of custom tools, everyone ran &lt;A class="" href="http://technet2.microsoft.com/WindowsServer/en/library/ad2b59d1-0fb8-45e3-9055-a5aeba8817a91033.mspx?mfr=true" mce_href="http://technet2.microsoft.com/WindowsServer/en/library/ad2b59d1-0fb8-45e3-9055-a5aeba8817a91033.mspx?mfr=true"&gt;NetMon &lt;/A&gt;(Microsoft's network sniffer) and saved 5-minute snippets of trace as we would all bring the system up and try something. The traces were saved off, collected together, and then handed to a very talented network developer to examine. To his credit, after a few days of work he had at least some answers for the first five minute session. You'll never &lt;A class="" href="http://en.wikipedia.org/wiki/Grok" mce_href="http://en.wikipedia.org/wiki/Grok"&gt;grok&lt;/A&gt; just how big an achievement this was, unless you happen to be in the handful of people who have developed P2P protocols. &lt;/P&gt;
&lt;P&gt;The next time I helped develop a P2P protocol from scratch was as part of a research project at MSRC. The researchers working on it were already familiar with the woes of debugging distributed systems, so from the very beginning we ensured we could debug any issues that arose. There are two good ways of doing this.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;U&gt;Simulation&lt;/U&gt;. This will help you catch a healthy chunk of systemic problems, but ultimately it's only as clever at finding problems as you are at manufacturing them. There's no way to get away from the need for #2, &lt;/LI&gt;
&lt;LI&gt;&lt;U&gt;Centrally coordinated logging&lt;/U&gt;. Servers are anathema to P2P, but the truth is you should never shy away from a great solution on a purely idealistic&amp;nbsp;basis. &lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;Centralized logging is easily the most important tool we had for validation and troubleshooting. It let us determine which nodes had connectivity, whether they were working they way they should, throughput and consistency numbers, node connectivity, and so on.&lt;/P&gt;
&lt;P&gt;If you're developing a distributed system, I highly recommend implementing a logger with the following properties&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Have a centralized log-server which receives all log data. Use a 'cloud ID'&amp;nbsp;and 'node ID' for each party logging data&amp;nbsp;to separate their activities. &lt;/LI&gt;
&lt;LI&gt;For each message you generate and transmit, log both local generation time and central server receive time. This will let you establish a global ordering of logged messages, critical for understanding the system's behavior.&lt;/LI&gt;
&lt;LI&gt;Have a (removable) logger for the P2P client which logs activities to the central server. &lt;/LI&gt;
&lt;OL&gt;
&lt;LI&gt;Logger transport should be reliable, to prevent message loss. In other words, even when everything else breaks, your logger should still be able to get all of its messages to the central log server. This can be a serious amount of work, but is well worth it.&lt;/LI&gt;
&lt;LI&gt;Logs should be machine-parsable, to enable automated analysis and processing&lt;/LI&gt;
&lt;LI&gt;Logs should be human-readable, or translatable into human-readable form. Once you work with the logs for a while you'll get an intuition for reading them, and find problems that would have been impractical to find with automated processing.&lt;/LI&gt;&lt;/OL&gt;&lt;/OL&gt;
&lt;P&gt;Figuring out the right granularity of messages to log can be tricky. Here's what was useful for us:&lt;/P&gt;
&lt;P&gt;Log messages for each of the following:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Client startup, client shut down&lt;/LI&gt;
&lt;LI&gt;Connection data: connection initiated, succeeded / failed, remotely initiated connection received, connection torn down. You'll be amazed how many problems boil down to the right peers not communicating to each other.&lt;/LI&gt;
&lt;LI&gt;Periodic snapshots of activity. For example, every 60 seconds log bytes sent / received, connections received / rejected / initiated in that 60-second interval.&lt;/LI&gt;
&lt;LI&gt;Significant events, or summaries of significant events. If you're doing file sharing, a significant event might be transferring a file block to a remote party. If you're doing name resolution, it could be receiving and retransmitting a name resolution request. &lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;There's much more you can log, but even these essentials will go a long ways towards helping you understand your system's behavior and validating it.&lt;/P&gt;
&lt;P&gt;Have you developed a distributed system? If so, did you use logging? Or, if you're just getting ready to start developing, are you planning on logging? I'll be very interested to hear...&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=1393907" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/johnmil/archive/tags/P2P/default.aspx">P2P</category><category domain="http://blogs.msdn.com/johnmil/archive/tags/Programming/default.aspx">Programming</category></item><item><title>Threading models for network services</title><link>http://blogs.msdn.com/johnmil/archive/2006/11/23/threading-models.aspx</link><pubDate>Fri, 24 Nov 2006 02:08:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1131622</guid><dc:creator>John L. Miller</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/johnmil/comments/1131622.aspx</comments><wfw:commentRss>http://blogs.msdn.com/johnmil/commentrss.aspx?PostID=1131622</wfw:commentRss><wfw:comment>http://blogs.msdn.com/johnmil/rsscomments.aspx?PostID=1131622</wfw:comment><description>&lt;P&gt;One of the first steps in writing a multi-layered network service is determining a threading model. Common wisdom for a performant network service is that the socket layer, at the very least, should use some form of overlapped IO, such as async winsock calls or IO completion ports. &lt;/P&gt;
&lt;P&gt;I do a lot of interviewing, and in the course of talking to people about code they've written, I'm always astonished at the variety of approaches to solving the same problem. For example, there are people whose idea of asynchronouse programming for a network service is spinning up a dedicated thread for each active connection. This is fine, to a point, but if you expect to have thousands of clients, probably not the best way to go. &lt;/P&gt;
&lt;P&gt;One of my team's projects right now is a multi-layered network service. Our lowest layer sits on top of winsock. The next highest level uses logical transactions (send / receive a message to a neighbor) exposed by the lowest layer, and exposes additional multi-step transactions. Another layer builds on top of this, combining several sets of transactions into meta-transactions. And, of course, a user application sits on top of the whole shebang.&lt;/P&gt;
&lt;P&gt;The question is, how do we thread this? It's a given that the lowest level needs a thread pool to manage socket IO, but what about the higher levels? Each of the higher levels need maintenance timers of one sort or another. They also issue transactions which are too lengthy to do in a single go on the IO thread when a packet is received, lest the socket buffers overflow. &lt;/P&gt;
&lt;P&gt;We were evaluating three alternatives:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Per-level threads&lt;/STRONG&gt;. Have a dedicated thread or thread pool at each level. Use events to notify each higher level that the level beneath it has data or transactions ready for consumption, then have that higher level use its own thread to retrieve the data from the lower level and process it. The cleanest and easiest design wise IMO, but it isn't elegant, and it has performance implications from context switches.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;A single thread pool&lt;/STRONG&gt;. Have a worker thread pool at the lowest level. Higher levels get work done by the lowest level thread pool, registering work items and time-based events for processing as appropriate. In some cases, process a received packet through every layer of the service on the thread which originally received the data. Attractive, but it requires the lowest level call into and do work in every other level. It's also possible to get into a situation where all worker threads are busy on long-lived tasks, to the detriment of high-priority tasks such as socket IO.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;A combined approach&lt;/STRONG&gt;. Do both. Use the receiving thread (or the application thread in the case of calling down) to do any easy work to minimize context switches, and queue any significant work items for later processing. The most efficient, but a potential bug farm as you try to keep straight how work is done in each layer, and what work is safe to do in a given context.&lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;In the end we chose a variant of the single thread pool. We have two logical thread pools, one dedicated to servicing any winsock events, the other available for higher-level work. Timed events are serviced by another thread that will, when the timer fires, queue a work item to the front of the application queue to be serviced by the next worker thread. It's a fairly clean design, and though there'll be some tricky parts, it's the right way to go for us.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=1131622" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/johnmil/archive/tags/Networking/default.aspx">Networking</category><category domain="http://blogs.msdn.com/johnmil/archive/tags/Programming/default.aspx">Programming</category></item></channel></rss>