Hi, my name is George Mileka. I’m a developer on the C++ Libraries team. I have been working on the Project Code Name Austin for many months with Jorge, Eric, and Alan. To learn more about what the Project Code Name Austin is, you can read this great post by Jorge Pereira.
For Project Austin, we have used ESE (Extensible Storage Engine) as the storage engine. In this blog post, I will explain why we chose ESE, how ESE works, and finally what abstractions we have created around it for our own use.
When we started thinking about how to store the pages, the strokes, and the photos in the notebook, we came to realize we need to support the following scenarios:
Taking these requirements into consideration, we realized that implementing our own layer would be a significant amount of work in terms of time and complexity.
At the same time, Jorge pointed out that we can use a database engine since it really does everything we are looking for.
So – it is going to be a database engine. But which database engine?
We needed a database engine that is accessible from Windows Store apps and has a relatively easy deployment story. For the initial release, we have decided to not support remote storage or sync’ing across devices to scope the work.
A few database engines came to mind:
Talking to the ESE team, it turns out that they are committed to having their engine work for Windows Store applications. Also, the fact that ESE is already in box (Windows maintains and services it), made it more attractive to us over SQL Lite (we had a pre-determined set of queries to run, so that lack of a query processor was not a huge minus). So, we decided to use ESE.
In this section I will explain the high-level concepts in ESE. The most complete source for documentation is, of course, MSDN. ESE interface is a flat C API. Whether you want to create tables, define indexes, insert data, update data, or run queries, you do that by filling a data structure and passing it to the appropriate API.
We create a table by passing an instance of JET_TABLECREATE4 to JetCreateTableColumnIndex4 JET_TABLECREATE4 describes the table and its columns and indexes. Columns are described by an array of JET_COLUMNCREATE instances and indexes are described by an array of JET_INDEXCREATE3 instances.
Writing data (inserting or updating) is done by building an array of type JET_SETCOLUMN where each element points to the value to be written as well as the column the value should be written to. Then, we pass the array to the appropriate ESE APIs (see JetPrepareUpdate/JetSetColumns/JetUpdate2).
In case of updates, ESE uses a cursor to decide where to apply the new set of values. So, we need to position the cursor at the right record (or records), and then invoke the writing API.
Note that we can build an array of only the fields we want to target.
In case of inserts, positioning the cursor is not necessary since it is going to be a new record.
Reading, updating, and deletion are common operations that we sometimes apply to a specific row or set of rows that share some criteria.
ESE APIs allows us to achieve this by first defining a range, and then iterating through it.
Keys
To define a range, we first need to understand the concept of a key. We can think of a key as a fixed pointer to a specific record in a table. The pointer is set by specifying values for one or more columns -where those columns are part of an index on that table.
For example, if we have:
Then, a key can be defined as
Partial Keys
As mentioned above, the pointer is set by specifying values for one or more columns - where those columns are part of an index on that table.
A key is considered partial if it does not specify values for all the columns of an index. If the key partial, matching happens against the specified columns only.
In the above example, we can define a partial key as follows:
One important aspect here is that for an index (c1, c2, c3), we can have partial keys by omitting values for columns on the right only.
In other words, the following are valid (partial) keys:
The following are not valid keys:
Ranges
Now that we can define keys, we just need to define two of keys to define a range. In some cases, the criteria might be the same for both keys except that we want one key to point to the first row of the set, while we want the other key to point to the last row of the same set (for example, “get me all students with age=15”).
ESE allows us to make this distinction when we are creating the key. We basically tell it whether we want the key to be a start key (JET_bitFullColumnStartLimit) or an end key (JET_bitFullColumnEndLimit).
Cursors
A cursor is a pointer to a row targeted by the next operation ESE row-specific operation. By setting the keys (above), the cursor is also set to the same row as the start key. The cursor can then be moved by calling JetMove.
ESE Range APIs
To define the ranges as described above, the user needs to do the following:
Iterating involves the moving of the cursor (JetMove) from one record to the next in the returned result set. The cursor knows the start from the start key, and knows the end from the end key. Note that both the cursor and the result set are not exposed to the user.
Typically, we start by defining the range as described above, and iterate on the result using JetMove. The following are examples of row-specific operations.
We can retrieve the values of the current row by calling JetRetrieveColumn for each column we want to read.
We update values in the current row by calling JetPrepareUpdate / JetSetColumns / JetUpdate2.
We delete the current record by calling JetDelete.
ESE supports access to the same database from multiple threads. The recommendation on MSDN is to create a session for each thread. This becomes handy when submitting transactions form multiple threads as the session id is what is used to associate the commit transaction to the matching begin transaction. Also, all calls in between are associated through the same session id.
In the Austin project, since all our begin/commit transaction pairs happen in the same function, it is sufficient to make those functions non-re-entrant and avoid race conditions. This means that Austin calls into ESE from different threads will be serialized. This simplifies our thread model significantly – however, should we ever need to avoid serializing ESE access, we’ll need to change this part of the implementation and create a session for each thread.
As mentioned earlier in this blog post, ESE interface is flat C API. Below I'm listing some of the reasons that made us create a higher level abstraction on top of ESE flat C API:
So, we ended up creating an engine class which wraps the functionality of the ESE APIs. We also created a set of classes that represent tables, columns, indexes, and cells. Along with satisfying the requirements outlined above, the engine is capable of interacting with those data structures to perform various operations (like creating tables, inserting/updating data, finding data, data deletion, etc).
Creating tables follows the same logical sequence ESE defines.
std::wstring name = L"Photo";std::shared_ptr<s::itable> table = s::createTable(name);
std::shared_ptr<s::icolumn_typed<b::uint32>> photoIdColumn;photoIdColumn = s::createColumn<b::uint32>(L"ID", s::column_type::type_int32);table->addColumn(photoIdColumn);
Note: There are definitely lots of improvements to be made. For example, we should not be passing column_type::type_int32 to the createColumn function since the template parameter already has this information.
std::shared_ptr<s::iindex_column> activeIndexColumn; activeIndexColumn = s::createIndexColumnDefinition(_activeColumn, true); std::shared_ptr<s::iindex> activeIndex; activeIndex = s::createIndex(L"ActivePhoto_Index", false, false); activeIndex->addIndexColumn(_activeIndexColumn); table->addIndex(activeIndex);
While table, column, and index objects hold definitions – the cell object is an instance of a column value in the database.
The only way to create a cell is through a column. A cell holds the same type as that of the column which created it.
std::shared_ptr<s::icell_typed<b::uint32>> photoIdCell; photoIdCell = photoIdColumn->createCell(0); // initialize with 0
Whenever there is an operation that requires passing values to the engine, we use a set of pre-created cells and populate them with the values. Then, this set is passed to the engine which knows how to traverse them, identify the parent columns, and translate the data structure to ESE C-structures before making the API calls.
void photo_table::insert(b::uint32 photoId, b::uint32 pageId, bool active) { _photoIdCell->setValue(photoId); _pageIdCell->setValue(pageId); _activeCell->setValue(active); std::shared_ptr<std::vector<std::shared_ptr<s::icell>>> row; row->push_back(_photoIdCell); row->push_back(_pageIdCell); row->push_back(_activeCell); table->insert(row); }
To build a query, we need to specify which fields we are interested in (SELECT c1, c2), and then specify which criteria (set of values) need to be satisfied in the returned rows.
// select columns we are interested in... std::shared_ptr<s::iresult> result = s::createResult(); result->addColumn(_photoIdColumn); result->addColumn(_pageIdColumn); result->addColumn(_activeColumn); // set staging row entries as search criteria and execute… _photoIdCell->setValue(photoId); _pageIdCell->setValue(pageId); std::shared_ptr<std::vector<std::shared_ptr<s::icell>>> row; row->push_back(_pageIdCell); row->push_back(_photoIdCell); // note : omitting field would have // resulted in a partial key. // execute table->read(row, result); // where values = those in row
The result is just a two dimensional array. One dimension is a vector of rows (iresult_row), and the second dimension is a vector of cells (icell). To read the content, we iterate through each row, and through each cell. For each cell, we identify which column it belongs to and cast it to extract the value.
// iterator through the result... std::shared_ptr<std::vector<std::shared_ptr<s::iresult_row>>> rows = result->getRows(); for (auto &row : *rows) { std::shared_ptr<std::vector<std::shared_ptr<s::icell>>> cells = row->getCells(); for (auto &cell : cells) { if( cell->getColumnId() == _photoIdColumn->getId() ) { std::shared_ptr<s::icell_typed<b::uint32>> typedCell = std::dynamic_pointer_cast<s::icell_typed<b::uint32>>(cell); b::uint32 readPhotoId == typedCell->getValue() } . .
You can see how we implemented this functionality by looking at the source code files. The source code is available for download on our CodePlex page.
The following folders have the files relevant for this post:
We have an improved version of the storage wrapper along with a minimal client application to demonstrate the functionality. We have uploaded the sources to prototypes/storage. It is much easier to understand how the storage works by looking at those files.