AKA: How I spent last week :).
On Tuesday Morning last week, I got an email from "firstname.lastname@example.org":
You've probably already seen this article, but just in case I'd love to hear your response. http://it.slashdot.org/article.pl?sid=07/08/21/1441240 Playing Music Slows Vista Network Performance?
You've probably already seen this article, but just in case I'd love to hear your response.
Playing Music Slows Vista Network Performance?
In fact, I'd not seen this until it was pointed out to me. It seemed surprising, so I went to talk to our perf people, and I ran some experiments on my own.
They didn't know what was up, and I was unable to reproduce the failure on any of my systems, so I figured it was a false alarm (we get them regularly). It turns out that at the same time, the networking team had heard about the same problem and they WERE able to reproduce the problem. I also kept on digging and by lunchtime, I'd also generated a clean reproduction of the problem in my office.
At the same time, Adrian Kingsley-Hughes over at ZDNet Blogs picked up the issue and started writing about the issue.
By Friday, we'd pretty much figured out what was going on and why different groups were seeing different results - it turns out that the issue was highly dependent on your network topology and the amount of data you were pumping through your network adapter - the reason I hadn't been able to reproduce it is that I only have a 100mbit Ethernet adapter in my office - you can get the problem to reproduce on 100mbit networks, but you've really got to work at it to make it visible. Some of the people working on the problem sent a private email to Adrian Kingsley-Hughes on Friday evening reporting the results of our investigation, and Mark Russinovich (a Technical Fellow, and all around insanely smart guy) wrote up a detailed post explaining what's going on in insane detail which he posted this morning.
Essentially, the root of the problem is that for Vista, when you're playing multimedia content, the system throttles incoming network packets to prevent them from overwhelming the multimedia rendering path - the system will only process 10,000 network frames per second (this is a hideously simplistic explanation, see Mark's post for the details)
For 100mbit networks, this isn't a problem - it's pretty hard to get a 100mbit network to generate 10,000 frames in a second (you need to have a hefty CPU and send LOTS of tiny packets), but on a gigabit network, it's really easy to hit the limit.
One of the comments that came up on Adrian's blog was a comment from George Ou (another zdnet blogger):
""The connection between media playback and networking is not immediately obvious. But as you know, the drivers involved in both activities run at extremely high priority. As a result, the network driver can cause media playback to degrade." I can't believe we have to put up with this in the era of dual core and quad core computers. Slap the network driver on one CPU core and put the audio playback on another core and problem solved. But even single core CPUs are so fast that this shouldn't ever be a problem even if audio playback gets priority over network-related CPU usage. It's not like network-related CPU consumption uses more than 50% CPU on a modern dual-core processor even when throughput hits 500 mbps. There’s just no excuse for this."
""The connection between media playback and networking is not immediately obvious. But as you know, the drivers involved in both activities run at extremely high priority. As a result, the network driver can cause media playback to degrade."
I can't believe we have to put up with this in the era of dual core and quad core computers. Slap the network driver on one CPU core and put the audio playback on another core and problem solved. But even single core CPUs are so fast that this shouldn't ever be a problem even if audio playback gets priority over network-related CPU usage. It's not like network-related CPU consumption uses more than 50% CPU on a modern dual-core processor even when throughput hits 500 mbps. There’s just no excuse for this."
At some level, George is right - machines these days are really fast and they can do a lot. But George is missing one of the critical differences between multimedia processing and other processing.
Multimedia playback is fundamentally different from most of the day-to-day operations that occur on your computer. The core of the problem is that multimedia playback is inherently isochronous. For instance, in Vista, the audio engine runs with a periodicity of 10 milliseconds. That means that every 10 milliseconds, it MUST wake up and process the next set of audio samples, or the user will hear a "pop" or “stutter” in their audio playback. It doesn’t matter how fast your processor is, or how many CPU cores it has, the engine MUST wake up every 10 milliseconds, or you get a “glitch”.
For almost everything else in the system, if the system locked up for even as long as 50 milliseconds, you’d never notice it. But for multimedia content (especially for audio content), you absolutely will notice the problem. The core reason behind it has to do with the physics of sound, but whenever there’s a discontinuity in the audio stream, a high frequency transient is generated. The human ear is quite sensitive to these high frequency transients (they sound like "clicks" or "pops").
Anything that stops the audio engine from getting to run every 10 milliseconds (like a flurry of high priority network interrupts) will be clearly perceptible. So it doesn’t matter how much horsepower your machine has, it’s about how many interrupts have to be processed.
We had a meeting the other day with the networking people where we demonstrated the magnitude of the problem - it was pretty dramatic, even on the top-of-the-line laptop. On a lower-end machine it's even more dramatic. On some machines, heavy networking can turn video rendering to a slideshow.
Any car buffs will immediately want to shoot me for this analogy, because I’m sure it’s highly inaccurate (I am NOT a car person), but I think it works: You could almost think of this as an engine with a slip in the timing belt – you’re fine when you’re running the engine at low revs, because the slip doesn’t affect things enough to notice. But when you run the engine at high RPM, the slip becomes catastrophic – the engine requires that the timing be totally accurate, but because it isn’t, valves don’t open when they have to and the engine melts down.
Anyway, that's a long winded discussion. The good news is that the right people are actively engaged on working to ensure that a fix is made available for the problem.
A book on PBX systems (phone systems) once explained it like this.
When dealing with data traffic, you need 100% accuracy but can tolerate some latency.
When dealing with voice (i.e. sound), you can tolerate little or no latency but can degrade quality (more lossy compression or lower sampling bit sizes).
The two are diametrically opposed. The Vista issue, is almost the exact same scenario playing out on a computer.
(I just noticed Larry even mentioned voice communications)
There is a workaround for this problem already. My previous roommate and friend found out you can work around it by removing a false service dependency:
The obvious solution would be to give the ability to the running app to specify if it needs low latency or glitch-free audio.
Chris: Sure, you can hack the system and remove the MMCSS dependency. And your audio and video will glitch like crazy when you do just about anything with your machine.
If you don't care about multimedia performance, that may be an acceptable solution, but one of the important goals for Vista was that the system provide a dramatically better multimedia experience than XP did (it's pretty much trivial to get XP to glitch - just running a CPU intensive process will do it).
DaddyMac: We do. You can opt into using MMCSS.
Of course the Vista multimedia playback infrastructure opts in because we figure that the user wants a good experience.
If you want a crappy multimedia experience, you can set the "SystemResponsiveness" parameter to mmcss to 100 - that'll turn off most of the CPU boost.
Unfortunately because someone screwed up badly, it doesn't turn off the network throttling. Needless to say, some fairly senior people were a bit peeved when they learned this.
That's a small part of the fix for this problem.
Still, I have zero problems playing audio or video while pushing or pulling 60 MB/sec or 480 mbps. It's obvious the throttling simply needs to be dynamic and take in to account how fast the user's CPU is.
Doesn't this issue call for hardware audio/video implementation with local buffer/memory, with better DMA and driver models instead of last years trend to rely on host-based audio/video processing?
George, it's not "obvious" at all. There are fixes that don't require the kind of dynamic throttling you're describing.
I hate having to say this, but.... Trust us. The people working on this have literally decades of experience in designing extremely high performance systems (there are two distinguished engineers and a technical fellow involved in these discussions - you don't get any more senior developers at Microsoft than that). The teams working on the solution fully understand the problem and they believe they've got a solution that will address the issues that have been reported.
We screwed up in Vista and implemented a throttling system that we introduced a serious performance issue for certain classes of hardware. We've acknowledged that and the teams involved are working hard at coming up with a resolution for this issue.
Sebastien: Hardware acceleration of audio wouldn't help this situation at all.
I tested this in our office: gigabit connection between client and server with two gigabit switches in between.
Downloading a large file from the server without WMP playing: about 40% usage of the gigabit NIC.
Downloading a large file from the server with WMP playing: about 12% usage of the gigabit NIC.
Obviously it's quite annoying when you have to move large files around often. However, this rarely happens while WMP is playing (it's an office, remember ;-) ).
No it isn't obvious and I haven't seen the "slide show" effect even when I'm pulling in 400 mbps of data even when I'm playing DVDs. I don't even see a glitch when I'm playing back 10 videos at the same time.
I don't doubt your Sr. Engineers, but maybe we're not communicating clearly here. I just don't see the problems on my hardware that you're describing.
Last week there was a small storm on the internet when it was discovered that playing music on Windows
"George, it's not "obvious" at all. There are fixes that don't require the kind of dynamic throttling you're describing."
I'm sure there is and I wouldn't dare suggest I know more about this issue than Microsoft's engineers. But here is why I'm having a hard time with the explanation if you'll look at the following screenshot.
I'm receiving data at around 300 mbps and I'm playing back a high-quality high-bitrate DVD. I saw zero glitches in the video and hear zero glitches in the audio. There were some intermittent glitches in the DVD when certain processes I have yet to identify in my computer kicked in but they were not when I was receiving data over the network.
There seems to be some glitch in the system and I don't know if that's third party software messing up or some glitches in the OS, but playing back DVDs and pulling in more than 300 mbps of data at the same time didn't seem to be a problem at all. Note that I am using jumbo frames to increase my throughput because of the 10K throttling.
Interesting to read all the comments, can I just say a big thanks to Larry for actually being here and answering questions? Most employees, Microsoft or not, would probably be hiding in a hole of silence and denial by now, and that's even on official communication channels, let alone answering comments about closely guarded implementation details on their blog. Whether Microsoft stuffed up or not on gigabit ethernet transfers while playing audio is fairly immaterial to the vast majority of users out there who got Vista on their WalMart PC, it's reassuring that Larry is willing to engage in technical discussions about it.
Well done! :)
There is one thing I find funny about this whole "We've already got a fix."
How many people internal to MS have been using Vista?
How many people beta tested it?
How many people have been using it since release?
Now it is reported that there is a network slowdown. How many people does this issue really affect?
But someone posts a "fix" and 20-30 people say it worked for them but a few said it made the issue worse. But as far as the people it worked for, they consider the issue fixed.
If you don't get what I am saying, here it is.
With such a poor install and test group for the "bug fix", chances are that all they are doing is causing one issue to go away while creating a totally different issue for other people.
With something as complicated as audio and networking, I'll trust Microsoft over some dude and his roommate.