Please comment on the following proposal…

Some types of data – such as exercise sample data or genetic test data – store large amounts of data. Storing that data in an XML format involves a considerable amount of overhead, and the data is therefore stored in a comma-separated format in the other-data section of a thing instance.

This document describes how the OtherItemData class and the CSV format are used to store such data.

Goals

The format should be:

  1. Simple to understand and parse
  2. Space efficient
  3. Extensible to support out-of-band data in with the normal data.
Basic format

The format is merely a comma-separated list of items, with optional escape values. The list is inherently a one-dimensional series of string values, though each thing type may interpret that list differently.

Given the following list:

15,333,999,10,00,399

If this is heart rate sample data, it might represent 6 individual samples. If it is GPS location data, it might represent 3 latitude/longitude pairs. The specifics of such interpretation can be found in the documentation for a specific thing type.

Escapes

It is useful to be able to insert out-of-band data (that is, data that is not part of the list of values) in the comma-separated information to indicate that the data that follows has a different set of assumptions. For example, an escape might be used to indicate a change in sampling frequency or to provide a value that is common in a batch of data.

Escapes use a simple “name=value” format. For example:

15,1333,interval=555,33,22,11

Denotes the following list:

15
1333
33
22
11

With an escape before 33 (ie at index=2).

The name and value may include any character with the exception of “=”.

Restrictions and special characters

Because the “,” and “=” characters are used in the format, they cannot be used directly in a CSV list, but are expressed as follows:

Literal “,” in a string value:

Literal commas are expressed by enclosing the entire list item in double quotes. For example:

15,1333,”12,13”,15

Encodes the following list:

15
133
12,13
15

Literal “=” in a string value:

A literal equals sign is expressed by doubling it in the string value. For example:

15,133,12==13,15

Encodes the following list:

15
133
12=13
15

OtherItemData class

The other item data section of the thing type (expressed as the OtherItemData class in the .NET library) is used to store the csv data, and specify its type. Its properties should be set as follows:

Content type:

The content type is set to the following:

text/x-hvcs[1]

The “x-hvcs” tag marks this data as being expressed in the HealthVault comma-separated format.

Content encoding:

x-deflate-base64[2]
x-gzip-base64
x-plain

The content encoding is set to one of these three values.

“x-plain” indicates that the data is stored directly as text.

“x-gzip-base64” indicates that the data is compressed using GZIP (RFC 1952, GZipStream class in .NET) and then encoded using base64 encoding (RFC 2045, Convert class in .NET).

“x-deflate-base64” indicates that the data is compressed using deflate (RFC 1951, DeflateStream class in .NET) and then encoded using base64 encoding (RFC 2045, Convert class in .NET).

Data:

The data property stores the comma-separated data.

Compressed or plain text?

The choice between storing plain text or one of the compressed options should be made based upon the size of the final data. Small amounts of data (< 512 bytes) are generally smaller in plain format, while large amounts (>2K bytes) are generally smaller when compressed and encoded.

.NET Library

There will very likely be a library in the .NET SDK that encapsulates the type, so a user would write this sort of code:

[Test]
public void TestRoundTrip()
{
    OtherDataHelperMock helperSave = new OtherDataHelperMock();

    List<string> valuesSave = new List<string>();
    valuesSave.Add("A");
    valuesSave.Add("b");
    valuesSave.Add("iii");

    helperSave.SetOtherData(valuesSave);
    //helperSave.ForceCompressAndEncode();    // force because compress doesn't happen on small amounts of data...

    OtherDataHelper helperLoad = new OtherDataHelper();

    helperLoad.OtherItemData = helperSave.OtherItemData;
    //helperLoad.DecodeAndDecompress();

    Collection<string> valuesLoad = helperLoad.ParseAsString();

    Assert.AreEqual(3, valuesLoad.Count);
    Assert.AreEqual("A", valuesLoad[0]);
    Assert.AreEqual("b", valuesLoad[1]);
    Assert.AreEqual("iii", valuesLoad[2]);
}

[1] While the content type “text/csv” is commonly used to indicate the comma-separated-value type, that gives insufficient information as we need to indicate what kind of CSV is being stored. The “x token” approach is therefore used to define an application-specific name.

[2] According to the RFCs on mime types, there are no standardized ways of specifying that a string of characters are first compressed using gzip or deflate and then encoded as text (and, in fact, the RFC discourages the creations of new encodings). The content encoding type is therefore expressed using the “x token” syntax for indicating that this method is application specific, while using a name that should be informative to most developers.