Welcome to MSDN Blogs Sign in | Join | Help

Deep dive: how Groove Forms synchronizes data

As I mentioned earlier, Groove's essential magic is that all members of a workspace see the same information in the space.  That's maintained even though members can work offline (each member has a copy of each of their workspaces, so all the information is available when you're disconnected, and you can work in the space regardless of network connectivity), or be in separate companies behind firewalls.  And there's no central server; none of your workspace data is permanently stored anywhere else than on the machines of the workspace members.  The synchronization happens automatically, immediately, and is extremely resilient; it's designed to be "sufficiently advanced".

There's lots of depth in the mechanics, and I don't plan to dive into all of it.  (If you want reading material, try the product docs, or even look up the patent numbers on Groove's About box: 6446113, 6640241, 6859821).  But it'll be helpful to understand the overall structure of Groove and how it works.  (Standard "how-internals-work" disclaimer goes here.  I'm just writing down what I believe to be the case about how things work; it may not all be correct, and of course might change in the future too, so don't rely on this the way you'd use official documentation.)

Physically, each workspace is a database file; they're files with .XSS extension (and corresponding transaction logs), usually to be found under <user>\Local Settings\Application Data\Groove Networks\Groove\accounts\<id>\telespaces.  These databases are encrypted (with your account key).  The file format is Groove-developed; essentially each XSS file is a collection of binary and/or XML documents.  There's a Groove subsystem called Storage Manager which is responsible for reading and writing the databases on disk, managing the transaction integrity, and so on.

Your Groove account is also stored in an XSS database file (id.xss).  Actually, the "account telespace" shares lots of characteristics with regular Groove workspaces; it's a distributed database in exactly the same way, so you can have your whole Groove account installed on multiple devices, which will all keep in sync.

Synchronization is performed by a subsystem called Dynamics Manager, reading and writing to the network via a subsystem called Communications Manager.  But to understand Dynamics, we should first take apart a workspace and peek inside.

The workspace is a container for Tools (there's a ToolContainer COM component which implements tool-containing-stuff).  We've already seen some of the user-visible tools; and there are several invisible tools too;  one to manage the member list, one for managing the tabs which display UI (the RootDisplay tool), and a few others.

Tools themselves are quite complex.  There's a tool template: an XML document which defines all the components and their connections (the components, at this level, are really COM components which implement parts of the tool's functionality).  Tools mostly all have the same sorts of components, in a Model-View-Controller structure:  there's a data-management piece (one or more Engines, and a DataDelegate providing programming interfaces to the outside world), a user interface (with its widgets usually being COM components arranged according to the spec in the tool template, and a ViewContainer managing the layout), and glue between them (UIDelegate code of some sort, usually written in C++, but historically also plenty of JavaScript).  A tool also has a ToolDescriptor, which is a small XML document describing the tool: its name, version, the location of the template, and so on.  (If you invite someone to a workspace, usually they're sent each tool descriptor and the engine data but not the tool template; the descriptor tells the client where to get the template).

Internally, every component -- every component -- is addressable by URL.  In the case of an Engine, this URL is really important; it's used as the name of a message queue.

 

Engines are where work happens on a tool's data.  Let's take the example of a Forms tool; the tool has an engine called RDB (for "record database"), and RDB knows how to maintain reasonably large data sets with multiple schemas and indexes.  (RDB uses a btree-type structure; maybe Ed can expand on the details sometime, but honestly the details aren't very important.  One change in RDB over its predecessor RSE (recordset engine) is that RSE stored XML, where RDB is binary - Jack Ozzie and Jon Udell discussed this a while back).

The outside world talks to an Engine by sending it transactions, or commands.  So in the Forms tool and the RDB underneath it, there are a few basic transaction commands: "CreateRecord", "DeleteRecord", "SetField".  When a user (at the Groove user interface) saves a Forms record, the View code tells the DataDelegate to create a record with certain field values, and the Engine receives a "CreateRecord" command with those field values inside.  Or, if the user edits an existing record to change a field value, the engine sees one or more "SetField" commands (possibly wrapped inside an outer transaction, so the field changes all apply atomically).

Synchronization between workspace members happens at this level:  the transaction (a distributed transaction in Groove is called a Delta).  Dynamics Manager sees the command to the local engine, and creates a copy of that command for each other member of the workspace, wraps them inside a routing packet (XML), and puts those commands into a queue.  The queue is "targeted" at the same engine URL, but for each other device in the workspace.  Communications Manager can then route data in the queues (encrypted) across the network to every other member;  depending on your communications state, the transactions can be sent directly to the receiving device (peer to peer), or be sent to the recipient's relay server (to sit in a queue -- if you have access to a relay server's management UI you can see the queues -- the relay admin guide has lots more detail, but unfortunately no screenshots).  Or you might be completely offline, in which case the transactions sit in queue on your own machine until they can be offloaded.

Meanwhile, Communications Manager is also listening to connections from other devices, and dequeueing transactions from your own relay server, and sending those transactions to Dynamics Manager to tell your tools' Engines that their state must change.  Dynamics Manager is the middleman, and for good reason: it takes care of transaction sequencing.

 

Since you can work offline, there's no guarantee that your machine will receive transactions in anything like the same order as other members of the workspace.  You might create some deltas yourself offline, which then need to be "later" than work done by other members while you were disconnected.  In an extreme (but common) case, for example, you could be working offline for a long period, then drive by a WiFi connection and manage to send a fraction of your outgoing queue data, then disconnect again, do some more work, and reconnect.  So it's entirely possible that other workspace members see a different set of deltas than you.

Each delta is assigned a sequence number, and Dynamics Manager maintains a table of which deltas are known to you, and which of those you know are also known to all other members of the space (so they can be purged from the log), and which might be fetchable from other online members, and so on.  If you have delta A and C, then receive delta B from another member, then Dynamics will decide (based on the dependency nesting of the transactions) whether it can just insert B right away, or whether it should tell the engine to undo transaction C, then apply B, then reapply C.

The magic of this is twofold.  First, the Dynamics Manager sequencing doesn't care about the actual data, or the implementation of the particular Engine; it's only concerned with the transaction sequencing and dependencies, and filling in any gaps in your transaction history.  Second, everything happens completely transparently to the user (even if there's a fairly large rollback and rollforward -- you might notice a performance glitch there, but big rollbacks are very rare), and the result is that all members of the workspace will always see a consistent transaction sequence, and they'll all see the same sequence (eventually).

 

So, back to the Forms tool.  The RDB engine underneath Forms manages distributed transactions (insert, update, delete) against records, and other things such as schema changes, within your tool.  If a record changes because another user changed it, the engine will assimilate that transaction, then notify the View to update the display -- immediately, if the user's working in that tool at the time.

Now, not all data in Groove is always disseminated.  Files in a Files tool are one example; the "stub" of the file is always disseminated, but the file contents can be fetched on demand if appropriate.  The Forms Designer Sandbox is another example: if you open the forms designer to make changes in the tool (adding or modifying fields, forms, views), those changes are only stored in your local device until you hit the "Save to Groove" button.

The end result is that users can work with your forms, creating, reading and updating rich documents, and Groove takes care of all the synchronization stuff to ensure that everyone has a consistent view of the world.   (Should I talk about conflicts at this point?  Naah, never happens.  Or rather, with most Forms applications conflicts are rare enough that I can leave that for another day.  Remind me, later).

Published Friday, July 22, 2005 11:36 AM by hpyle
Filed under:

Comments

# Furthur!

Friday, September 09, 2005 3:39 PM by hughpyle
Here's a recap and subject-index of the story so far.

Getting started with Groove Forms; the component...

# Furthur!

Friday, September 09, 2005 3:40 PM by hughpyle
Here's a recap and subject-index of the story so far.

Getting started with Groove Forms; the component...

# hughpyle Deep dive how Groove Forms synchronizes data | work from home

Anonymous comments are disabled
 
Page view tracker