A teaser on how OneNote storage and replication works

A teaser on how OneNote storage and replication works

  • Comments 26

The other day someone internally was asking how OneNote stored its files and how often the save behaviour actually happened. You know if you were to pull the power cord on your computer what would you lose and what wouldn't you lose? Well Irina Yatsenko from the OneNote Test team wrote up the following to answer the question and she wanted me to post it for all to see:

Now, I'll describe in more details what we do in OneNote 2007:

  1. Internally all data from a single paragraph on a page up to a notebook are represented in a graph, which is split in areas we call "graph spaces". This allows us to load/save incrementally per a graph space, so when you open a notebook, you'd see all section tabs popping up almost immediately though pages inside those sections aren't yet loaded. When saving we can also choose which piece to save, rather than saving everything.
  2. We never save directly to the server hosting the files (even if it's a local machine). First we save into local cache file. Because the cache is local and OneNote has exclusive access to it, we can guarantee that save always succeeds (if not, OneNote will force an exit, because running without a cache means users might lose data, and we think it's better to exit then lose data). Save into cache happens every 30 sec or on exit ([descapa] I have found this to be faster at times though I am not pulling my power cord out)
  3. To propagate the data from the cache back to the original location of the sections we use background process – replication (=sync). Schedule for the sync depends on the actual store: UNC servers / local machine replicate every 30 sec, but for SharePoint it's by default set to 10 min. If replication fails (e.g. because the machine has lost power) the cache will still have the data and will try to replicate again after OneNote is restarted.
  4. Actual mechanics of the incremental save are rather technical. The bottom line is that we have our own binary format and all changes are stored in form of "revisions", sort of diff between current state and previously saved state. As these revisions grow OneNote will run optimization to clean up the revisions and update the main base state.

 

Hope it clears things a bit, let me know if you have any questions.

Thanks Irina! So I hope this explains things like why we have a cache (which allows OneNote to go offline, merge changes and more) as well as explain why our app works certain ways. The storage tech is actually quite complex and innovative; I haven't really appreciated it as much until I deal with other sync technologies that make me choose which copy is the most up-to-date, etc. There is still a lot more going on under the covers but this is a good overview, if you have more questions please let us know.

Leave a Comment
  • Please add 4 and 7 and type the answer here:
  • Post
  • As a OneNote 2003 junkie I have great interest in this new version.

    So for the "incremental save"  does that mean I can use a USB drive to just carry the "incremental change" data around and not have to worry about having to carry the entire notebook file(s) around?

  • Dave - As you can see on this blog post:

    http://blogs.msdn.com/descapa/archive/2006/08/02/686087.aspx

    with OneNote 2007 you can store all of your notes on a USB drive and sycn between two computers.

  • Perhaps the periodic "Optimization of revisions" described above explains my biggest problem with Onenote 2007 - sometimes Onenote disk utilization jumps to constant (hard drive light full on), and stays that way for a couple of hours. During this time, CPU utlization hovers between 90 and 100 percent, although Task Manager claims that Onenote's CPU usage is very low. However, if I terminate Onenote, the Disk and CPU usage immediately returns to normal. If this optimization really is a possible culprit, I would be interested to know if there is anything I can do to "force" the optimization to be done at a certain time, so that it doesn't happen when I'm trying to take notes in a meeting, for instance.

  • I did not make my point clear.   Do just the "deltas"  go onto the USB drive?

    Like:

    Computer-A's OneNote  ---->User makes change--->delta goes to USB---->USB plugged into Computer-B------>Computer-B's OneNote synched.

    Without this if your OneNote is larger than the USB's capacity then there will be a problem.

  • Dave - No the whole file is stored on USB (which will include deltas and the base).  If you store your notes on USB then all of your notes will be there but you can make changes on either computer and when you plugin the USB OneNote will sync the changes to the device.  If there are too many deltas then OneNote will optimize the files.

    More clear now?

  • Blair - You can look in Tools-->Options under Save and there are some options in there.  You can tell OneNote to run all of your optimizations when you click a button and it should clean everything up.  In most cases I never have problems with optimization except for when I ran the beta release.  In RTM I haven't had problems.

    Here is what I suggestion, click on the Optimize Now button and let OneNote finish.  Then see if you get those errors again.  Let us know if this fixes your problem.

  • It's clear now.   With the USB synch method the size of your Notebook is limited by the size of your USB drive.

    Ugh.   Why not take a snapshot of the notebook at startup, let one do their work and then push a button to create the delta file?    then that delta file can be used to synch the notebook on the other machine.

    In any case I am a OneNote 2003 junkie and I'll probably upgrade to OneNote 2007.   I'm also keep my fingers crossed for Zoho as a hosted solution for notes would be might cool.

    Great blog!   Subscribed!!!!   :-)

  • Hi Dan,

    Thanks for suggesting the Optimize Now option. I tried it, and it does seem to exhibit the behavior (high disk and CPU usage, PC is much less responsive, goes on for more than 2 hours) that I was unhappy about.

    I can understand what Irina described about there being a certain "threshold" of unsaved revisions beyond which OneNote decides to syncronize/optimize them. It would be helpful, though, if I had some option - when the sync starts up automatically - to tell Onenote "Now is not a good time!" and have it back off and try again later, . (I noticed that I had the option to cancel when I manually invoked the Optimize).

  • By the way, I should have mentioned that these problems have been experienced using 2007 RTM.

  • Dan,

    Do you know if there's a way to tailor the UNC sync interval?  Here's why.

    I tried running against a non-IIS WebDAV server to share some work with buddies of mine.  Despite all my best efforts to configure it to work correctly, the whole setup is just unstable.  So now I've set up VPNed access to a samba share on a personal machine.

    Suffice to say, due to cable upload speeds, access to the SMB shares is pretty slow.  So slow that OneNote ALWAYS indicates that it is synching with the share.  It'd be nice if I could tweak the registry or something to tell it only to ping the server every 10 minutes or so.

    Is this possible?

    Evan

  • Evan - I just looked and there are no policies/reg keys for the UNC sync interval, only on SharePoint.  If you were to connect via http:// then it would be 10 minutes instead of 30 seconds.

    How about telling OneNote to work offline and then go back online.  You can do this by going to File-->Sync-->Sync Status.  You can choose Work Offline and then go online when you are ready to sync.  Will this work for you?

  • Dan, that's (going offline and back on later) exactly what I and the others I'm working with are doing for the moment.  I just don't have access to a SharePoint service (or IIS WebDAV) and am not too keen on investing in a pay-for service at this point.  

    Of course, every once in a while someone forgets to go online when were collaborating and has a doh moment after wondering why they're not seeing updates.

    So it would be nice if the OneNote team could consider adding per-notebook sync schedule tailoring regardless of the type of share used.

    Thanks.

  • Evan - Good feedback...have you thought about having just a simple account with Office Live?  I believe they have a free service that will let you do SharePoint over the Internet.  Perfect for what you are doing.  Maybe this doesn't work for you but it is a solution.

    Otherwise good feedback

  • I'll take a look into it.  I did do a quick search on "free sharepoint" and "free webdav" a while ago but most outfits had ridiculously small disk space offerings.  500MB for the Office Live Basics might do me for a while.

  • Can we change cache location programmatically?

Page 1 of 2 (26 items) 12