Wednesday, June 07, 2006 1:24 PM
MikeCal
Some Compaction Answers
Recently one of the filesystem guys and I spent some time investigating the Compaction Thread issue on an upgraded iPaq 4700. We are both supposed to be working on other things than this, but we keep seeing all the pain you folks are feeling and wanted to try to help.
(A quick bit of history for people unaware of the issue. While most WM5 devices perform fine, two devices that were upgraded from WM2003 to WM5 have been seeing strange performance issues. People have tracked this performance issue to the "Compaction Thread" running in filesys.exe on these devices. For more information, read this.)
I don't know that we found every issue, but we did find three:
1) Compaction happens more often than it should.
2) Compaction lasts longer than we'd expect it to.
3) Sometimes compaction never stops.
More often than it should
Flash blocks tend to be 256K in size. If you write data to a block and later want to change it, you need to erase the whole block first. But you might be just changing a tiny bit of data. Erasing 256K to change a few bytes is excessive. For this reason, blocks are subdivided into sectors. A good sector size for a system with persistent storage is 512 bytes. However, if you're using NOR flash and you want the system to XIP, you need to use 4K sectors. (If words like NOR and XIP don't mean anything to you, read this.) It's extremely difficult to XIP and do persistent storage on the same flash part. So, both of the upgrade devices did XIP in WM2003, but don't in WM5. However, we found that the sector size was still set to 4K, not the more appropriate 512 bytes.
The end result of having sectors 8 times as large as they should be is that there are 8 times fewer sectors per block. The sector is the smallest amount of data that can be written at a time. So, if something does a lot of small writes, this configuration issue will result in the compaction thread running 8 times sooner than it should.
We clearly didn't do a good enough job of teaching our OEMs that this value needs to be changed when you switch from XIP to Persistent Storage. We're going to review our documentation and sample code to try to make this clearer for future devices.
Unfortunately, there's no way for end users to change this setting. Even if you find a sector size value in the registry, any changes you make to it won't have any effect. Remember that it's the filesystem that loads the registry. And this value is loaded before then. The only way to get this changed is with an update from the OEM.
Longer than we'd expect
The compaction thread is doing two jobs at the same time. First it's freeing up dirty sectors. Second, it's trying to free them up in a way that spreads the writing of sectors out. The goal is that all blocks get erased about the same number of times as all others (wear leveling). There's a performance tradeoff between wear leveling and performance, and our compaction algorithm is heavily on the "correct wear leveling" side of that equation. This probably isn't the right decision on a NOR system where block erases take a long time. And it's especially problematic on a system with a whole lot of NOR, where there are more blocks to spread across.
We're investigating changes to our compaction algorithm. However this isn't the kind of thing we can just change easily. If we do it incorrectly, we'll corrupt your data. You definitely want us to test any changes heavily and make sure that they really work. I can't tell you when any such changes would be ready for OEMs to use.
The end result, though, is that when compaction starts on these devices, it usually takes a few minutes to stop. We didn't see this behavior during development because our devices either had a lot less NOR to compact or NAND, which compacts much faster.
Sometimes never stops
We found two things happening here. First, compaction stops when the device suspends. So you could notice your device going slowly, hit the power button, put it in your pocket, pull it out 12 hours later, turn it on, and find that it's still compacting. You'd rightfully think that it had been compacting for the last 12 hours. It hadn't really, though. While it was suspended, nothing was running. And, when you started it again, the compaction thread just started back up.
The only way to get the compaction thread to stop is to let it run its course. And the only way to do that is to leave the device on. However, we found that there's a 3rd party picture frame/screen saver program on the HP device called DockWare that runs when you leave the device idle. That program inadvertently writes a bunch of temporary files to flash every time it changes the picture. Doing this dirties enough sectors that the compaction thread can't get ahead of it. So, when this program is running, the compaction thread will never stop.
I don't want to assign blame here. What the application is doing seems reasonable on the surface. And it's really unfortunate that temporary files are written to flash instead of RAM. But doing that would require a RAMDISK, and WM5 doesn't come with one.
The good news is that DockWare has already been updated to not write temporary files. You can download a new version for free from http://www.iliumsoft.com/download.php?prodcode=20120. If you don't want to install a new version, you can also disable DockWare. Hopefully, any updated ROM that HP might release will come with the updated version.
That's all folks
So, those were the answers Andrew and I were able to find. We've sent this information to the OEM/ODM and we'll see where it goes from there.
Mike Calligaro