When I talk to people about P2P content distribution, there's a common misperception. They assume that the more people there are downloading that file, the faster download goes. This isn't usually true, as I'll explain below. What is true is that a peer-to-peer system in which servers participate should always be faster than just using the servers alone.

The following formula - assuming everyone's download capacity exceeds the speed they can actually get the content at - is true for all three systems.

[per-user average download speed] = [total upload speed] / [number of downloaders]

Let's call the system with no peer contribution - such as traditional web downloads - 'client-server.' P2P systems where peers serve files even when they're not actively downloading them we'll call 'always on.' Finally, if peers only serve files for as long as they're actively downloading that specific file, we'll call them 'greedy.'

All three systems will usually have servers (called 'seeds' in the peer-to-peer world), so there's always someone with a full copy of the file that can make sure people can download.

We need a few other numbers to illustrate the speed for these systems. Let's say

  • An average user has 4 Mbps download and 250 Kbps upload.
  • The seed has 10 Mbps upload, 40x as fast as an average user's upload. 
  • The always-on system has 2x as many users uploading as it has doing both upload & download, for a total of 3x as many uploading nodes as the greedy system.

Here's a table that shows average user download speed for 10, 100, 1000, and 10,000 users for each of the three systems

# Clients    Server             always on       greedy
10 1.000 Mbps 1.750 Mbps 1.250 Mbps
100 100 Kbps 850 Kbps 350 Kbps
1,000 10 Kbps 760 Kbps 260 Kbps
10,000 1 Kbps 751 Kbps 251 Kbps

From the table above, you can see that the P2P download *should* always be faster than the seed server on its own. However, the average download speed keeps dropping as the number of clients grows.

The always-on system has significantly faster download speed than the greedy system. This looks great on the surface, but it comes at a price. Users in always-on systems are donating their system's bandwidth even when they're not immediately benefiting. As long as they're OK with this, the system can usually offer improved download speed when the user does want content. But, it means a longer imposition to the user: the upload bandwidth being consumed can impact their other activites, such as web browsing, playing network games, etc. It's also more likely that the upload for a file retrieved earlier in an always-on system will interfer with the system's ability to provide a file the user wants to download NOW, at least in a system where users tend to want newer files rather than older ones.

The framework we use in MSCD leaves the choice of whether to behave as an always-on or a greedy system in the hands of the programmer. For the MSCD CTP which allows users to download Visual Studio 2008 Beta-2 images, we've configured the client to behave as a greedy client. In other words, you only share with other peers until you finish your download, and then you disconnect from the cloud.

Please post a comment if you have any questions in this area, I'm always on the look-out for new reasons to blog :)