Are you a startup?
Get BizSpark cloud access
Get up to $3,700 of cloud benefits
Don’t have MSDN?
Here’s cloud access
NoSQL databases are often employed in public, massively scaled Web site scenarios, where fast fetching of relatively simple data sets matters most.
Relational databases get the nod for transactional, atomic writes, indexing of non-key columns, query optimizers, and declarative, set-oriented query.
NoSQL provide some or all of the following features:
This post describes the main features of NoSQL, provides some general guidance on when to use NoSQL, and how you can get started using NoSQL on Windows Azure. I’ll go in depth on how you can use MongoDB and sones GraphDB in your Azure application, and explain how you can get started with those technologies. I’ll also explain how two Azure offerings fit some NoSQL traits.
Most NoSQL databases feature key-value mechanisms. A key-value pair might consist of a key like “Phone Number” that is associated with a value like “(212) 555-1212.”
Key-Value stores can be used as collections, dictionaries, associative arrays and caches. Key-Value Stores would work well for anything where lists, like product categories, individual product attributes, shopping cart contents, or individual values like color schemes, a landing page URI, or a default account number.
Values can consist of long text content, not just numeric and short string data. As such, content like comments, reviews, status messages or even private emails can be stored in a Key-Value Store. And values can be described as fields, and each value can have completely different fields.
Document Stores are NoSQL databases that treat “records” or “rows” as “documents.”
Documents themselves can be addressed by unique URLs, which makes document databases automatically REST-friendly.
HTTP and application orientation distinguishes Documents Stores from Key-Value Stores.
Wide Column Stores, also known as Column Family Stores, manage key-value pairs, but they organize their storage in a semi-schematized and hierarchical pattern.
Some of the Wide Column Stores nomenclature is similar to RDBMS technology. For example, the keys in a Wide Column Store are referred to as columns and are stored in structures that are sometimes referred to as tables. Between the table and the column level lie various intermediate structure that vary depending on your vendor.
Although the schema within the intermediate structures can vary from row to row, tables and the intermediate structures themselves must be declared. Wide Column Stores, while they tolerate schema variation at the “leaf” column level, are not completely schema-free.
As an example, in a product catalog, we may have a collection of items, each of which has a size and a rating associated with it, and we may want to store these items together in a table.
Graph databases recognize entities in a business or other domain, and explicitly track the relationships between them. In the graph database world, these entities are called nodes and the relationships between them are called edges; all of these terms come from mathematical graph theory.
New edges can be added (or old ones removed) at any time, allowing one-to-many and many-to-many relationships to be expressed easily and avoiding anything like an intermediate relationship table that you might use in a relational database to accommodate many-to-many joins.
Constructs like friends, followers, degrees of separation, lists, endorsements, status messages and responses to them are very naturally accommodated in graph databases. Semantic Web data also maps quite nicely on to the graph database structure.
Shared Legacy: MapReduce, Hadoop, BigTable and HBase. NoSQL databases often require queries to be broken up and executed across multiple repositories on different servers. At some point, the resulting segmented result sets need to be collected and unified. An approach called map-reduce acknowledges and addresses this. Specifically, the process of distributing the query across multiple agents is the Map step, and the process of coalescing the results into a single result set is the Reduce step.
NoSQL Database Consistency: Many NoSQL databases use an “eventual consistency” model for database updates and schema changes. This means that changes made at one replica will be transmitted asynchronously to the others. That said, not all NoSQL databases use eventual consistency. Some are fully transactional. Others use an optimistic concurrency model.
NSQL Indexing: Some NoSQL databases index on little else than the keys used for rows/entities/documents and/or partitions. Others go a bit beyond this.
Instead of storing data in tables as is made in a "classical" relational database, MongoDB stores data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON).
MongoDB has databases, collections, and indexes much like a traditional relational database. In some cases (databases and collections) these objects can be implicitly created, however once created they exist in a system catalog (db.systems.collections, db.system.indexes).
In MongoDB do do not need to define fields or what what the relational databases call columns in advance. There is no schema for fields within documents – the fields and their value datatypes can vary. In practice, you typically would store documents of the same structure within collections.
The collection itself is not defined. The database creates a collection on the first insert. And when you do the insertion, MongoDB the object is assigned an object ID.
You run a MongoDB replica set on Windows Azure. Replica set members are run as Azure worker role instances. MongoDB data files are stored in an Azure Blob mounted as a cloud drive. You can use any MongoDB driver to connect to the MongoDB server instance.
Microsoft provides a tutorial for how to use MongoDB on Azure. In Node.js Web Application with Storage on MongoDB, you will learn how to:
Publish your MongoDB Node.js application to Windows Azure.
See SQL to Mongo Mapping Chart for more examples.
For more information on MongoDB, see MongoDB on Azure. You’ll find out more about setting it up, building an application on Azure, deploying and running. Also,
According to its Website, sones GraphDB is the first graph database which is available on Microsoft Windows Azure. Since the sones GraphDB is written in C# and based upon Microsoft .NET it can run as an Azure Service in it's natural environment.
The sones GraphDB is an object-orientated graph data storage for a large amount of highly connected semi-structured data in a distributed environment. In contrast to classical relational but also purely object orientated databases this implies two very important consequences: First its main focus is no longer the data, objects or vertices itself, but their (type-safe) interconnections or edges. This means we are interested in the name of an user within a large scale social network, but we are much more interested to know which films his friends-friends watched last summer and thought that they were amazing. In the near future we will provide a large framework of graph algorithms for these problems and usage scenarios.
For more information and to get started, see sones GraphDB Wiki and Documentation.
Andew Brust wrote a paper for Microsoft entitled NoSQL and the Windows Azure platform -- Investigation of an Unlikely Combination available from Microsoft Download.
In the paper, Andrew makes the case that Azure Table Storage is in fact a NoSQL database. Of the various categories of NoSQL database discussed in the last section, Azure Table Storage fits most snugly with Key-Value Stores.
Azure Storage key-value pairs are called Properties; they belong to Entities which, in turn, are organized into so-called Tables. Azure Table Storage features optimistic concurrency and, as with other NoSQL databases, is schema-free, so the properties of each entity in a table may differ.
The Windows Azure Table service is structured storage in the cloud. An application may create many tables within a storage account. A table contains a set of entities (rows). Each entity contains a set of properties. An entity can have at most 255 properties including the mandatory system properties - PartitionKey, RowKey, and Timestamp. "PartitionKey" and "RowKey" form the unique key for the entity.
For a tutorial on using Windows Azure Storage, see Windows Azure and SQL Azure Tutorials - Tutorial 1: Using Windows Azure Web Role and Windows Azure Table Service on TechNet. In the tutorial, you will learn how to:
Also see Windows Azure Table Storage – Not Your Father’s Database in MSDN Magazine.
For a tutorial for about how Windows Azure Storage works with PHP, see Tutorial - Using Table Storage.
Cihan Biyikoglu in his blog post The “NoSQL” Gene in SQL Azure Federations describes how the tenants of NoSQL apply in SQL Azure Federations.
Federations in SQL Azure are a way to achieve greater scalability and performance from the database tier of your application through horizontal partitioning. One or more tables within a database are split by row and portioned across multiple databases (Federation members). This type of horizontal partitioning is often referred to as ‘sharding’. The primary scenarios in which this is useful are where you need to achieve scale, performance, or to manage capacity.
SQL Azure database can deliver scale, performance, and additional capacity through federation, and can do so dynamically with no downtime; client applications can continue accessing data during repartitioning operations with no interruption in service.
Chinan submits that SQL Azure Federation has many of the principles of NoSQL due to its support of the following NoSQL ideas:
Scale-out for Massive Parallelism. Federations provide the ability to take advantage of the full computational power of a cluster to parallelize processing. By federating your workload, atomic-unit focused work (a.k.a OLTP work by many of the SQL minded folks), such as “placing an order” or “shopping cart management”, get parallelized to scale to massive concurrent user load… There is little coordination between nodes needed thus the full power of the cluster is focused on processing the user workload.
Loosened Consistency or Eventual Consistency. With federations, each federation member and atomic unit provide the familiar local consistency guarantees of ‘databases’. However, you can have different schema between federation members. That is fine in federations. Federations also push to a looser model of consistency for query results across multiple federation members.
Lightweight Local Storage Besides Reliable Storage. One of NoSQL traits is arguably the ability to move processing close to the data. You can continue to use stored procedures, triggers, tables, views indexes and all other objects you are used to, to take full advantage of the powerful programmability surface of SQL Azure. SQL Azure databases are not lightweight local stores however. They are highly available, none volatile, replicated and protected. And you can use tempdb that is purely local.
Unstructured or Semi Structured Data. SQL Azure also support hierarchy data type and indexing as well as XML data type for semi structured data. Blob types are there for completely unstructured data.
For more information about Azure Federations, see Federations: Building Scalable, Elastic, and Multi-tenant Database Solutions with SQL Azure
Also see TechNet and George Huey’s MSDN Magazine article Scaling Out with SQL Azure Federation.
Start at the Windows Azure Development Center, where you can get started with .NET, Node.js, Java, PHP, and more.
Get Windows Azure
Windows Azure Training Kit includes a comprehensive set of technical content to help you learn how to use Windows Azure.
Bruce D. Kyle ISV Architect Evangelist | Microsoft Corporation
Thanks to Andew Brust for his paper entitled NoSQL and the Windows Azure platform -- Investigation of an Unlikely Combination.