Today is an extremely exciting day as we release Microsoft Azure DocumentDB, a fully managed, JSON document database service.
DocumentDB was built from ground up in response to the increasing demands of applications being developed here at Microsoft and by Microsoft Azure customers. We heard from customers that they need a database that can keep pace with their rapidly evolving applications – something fast, flexible and scalable. Increasingly NoSQL databases are becoming the tool of choice for many developers but running and managing these databases can be costly, especially at scale. We also heard that customers wanted more of the capabilities inherent to relational database systems – rich queries and transactional processing are still important. Most data stores offer extreme choices to developers – strong or eventual consistency, schema-free with limited query capabilities or schematized and rich queries capabilities, transactions or scale and so on. The fact is that numerous real world scenarios exist between these extremes and we want to address them.
Meeting the promise of schema-free
We wanted DocumentDB to support SQL queries over arbitrary documents without forcing the developer to create explicit schema or secondary indices or views. We wanted to give developers the freedom to rapidly iterate on application schema while preserving the ability to execute ad hoc queries. We also felt that queries should yield consistent results even when write rates are high.
We have designed the storage and indexing subsystem to serve consistent queries in the face of sustained high volumes of writes. This is accomplished using novel log structured storage techniques for index maintenance and indexing algorithms which fully exploit the SSDs. By default, all document properties are indexed and can be queried through the DocumentDB SQL query language. More on DocumentDB SQL Query
Tunable consistency and predictable performance
Eventually consistent systems can offer high availability and improved performance for applications. However as a developer it can be very challenging to build experiences in the face of eventually consistent data. There are no promises – data can be stale and out of order. While we are strong advocates of weaker consistency models (pun intended), we want to make sure that we provide a service that gives developers predictability, especially when it comes to data consistency. Why not give you the control to make smart and predictable tradeoffs when it comes to performance and consistency?
DocumentDB offers four distinct consistency levels for reads and queries - Strong, Bounded Staleness, Session, and Eventual. These well-defined consistency levels allow you to make sound tradeoffs between consistency, availability and latency. Bounded staleness guarantees both total ordering of writes as well as maximum staleness, a consistency level that is useful for applications dealing with time and ordered operations. Session consistency provides read your own write guarantees and can be a good match for user centric apps. These consistency levels are backed by predictable performance levels ensuring you can achieve consistent results for your application.
More on DocumentDB Consistency Levels
Seamless scale and delivered as a service
We hear frequently from customers that they don’t want to be consumed by managing, scaling and maintaining their database infrastructure. This is true for customers using relational databases as well as NoSQL databases. We feel that part of as-a-Service delivery means that developers should get fine grained control over how much of the service they consume and that scaling should be as simple as turning a dial. If you need more, turn the dial to increase your usage. If you need less, turn the dial back down. In either case, no downtime, no fuss, no problem. Continue to scale to as much as your application needs in either database storage and request throughput.
DocumentDB is a fully managed, multi-tenant Azure service and can be configured to scale with your user base. Database accounts can be easily created through the Azure portal with capacity to serve an application’s needs today. As these needs change, you can easily add or remove capacity. DocumentDB will allocate and reserve capacity exclusively for your application – this includes high performance database storage as well as dedicated request throughput capacity. This means that you get predictable performance with the ability to elastically scale by purchasing more capacity units.
Open and approachable
The world doesn’t necessarily need more data formats, procedural languages or protocols. The learning curve for new systems can be steep. Not to mention working with new and unfamiliar tools can slow you down. As we developed DocumentDB we firmly believed that we should resist the urge to be inventive where it didn’t deliver real value to you - the developer. Our goal with DocumentDB is to eliminate any friction associated with getting data in and using the service.
We have validated DocumentDB with first party applications at consumer scale. Today we are delighted to make DocumentDB is available to you through the Azure portal. In the coming weeks we’ll post more on both how to use DocumentDB as well as, the technical design of various sub-systems that make up the service.
To get started, visit the Azure DocumentDB service page.
- Azure DocumentDB Team
2 questions that come to mind, that need answering for me to even consider this..
1. Seeing as I will store ALL my documents in Azure for this to work, how much will it cost me (ALL Costs from bandwidth, storage, etc)
2. How do I get my documents out , could be terrabytes, if I need to . Or if I build an archiving solution that needs to pull documents out.. How will this work..
I love this technology you've built BUT the biggest hurdles for me are what I mentioned above..
Drawbacks are the costs (the preview price is OK but given this will be 100 % more expensive when out of Beta, it could be a hurdle for people needing lots of storage for historical data like me) and the document query limit of 2.000/Sec - if I'm right and this also means that a single query returning 2.000 docs already puts me to the limit if I don't throw more money at you :)
However, it looks like it could be the solution I've been waiting for in Azure ...
Do you have document encryption on the road map?
My current system stores millions of documents quite efficiently using Couchbase Server (CS, hereafter). The downside in Azure is that I have to maintain VMs to run CS on myself. Issuing patches, etc. It seems to me that this could be a reasonable replacement for CS and save me some work. That said, there are features I use quite heavily that Azure DocumentDB does not seem to have yet. The first one is the concept of a view. It is essentially an index of documents that is built on the fly as data comes in. I can quickly pull large lists of data using it. The second one is an in-memory layer (memcached) of the most popular documents/views. This allow me very quick access for both pulling and pushing data. The speed of these features are absolutely vital to the performance of my very data intensive system.
Do you expect Azure DocumentDB to have these features in the future?
Most of the application which were going with Table storage (no-SQL Storage)was facing transactional problems and were a bottleneck for the transition to cloud.
Azure DocumentDB will solve remove all those obstacles ! Future is cloud !
Thanks for the awesome alternative !
To second Corey's question: Lists/Views/Paging seems to be missing. There is an example in the code samples that shows how to somewhat do this in a stored proc, but it seems as if it can't handle millions of items, and requires you to pull back all data to sort. Ideally this could be controlled by a View or a special index of some kind.
Also, is there a story on partial document updates? If I have a large document but just want to change one property or maybe add something to a collection, can I just do that?
Lastly, the guidance on CUs and Collections is a little confusing. I can't really tell if you are recommending a sharing approach (as it is mentioned that Collections are how data is partitioned for scale). So, if I have 10 CUs, do I need to have 10 collections, or can I just have 1 collection that gets all of the resources of the 10 CUs?
Is Azure DocumentDB covered under Microsoft Trust / HIPAA compliant? (as it appears there is governance setup robustly within a multi-tenant environment)?
Is there a BA agreement on this product?
thank you for your time and assistance on this question.
Its great to see the Azure team innovating in this space and will be great if their effort with DocumentDB can be released as a standalone db with a community (free) and self-hosting(use your own nodes at an affordable price point) version. This allows different projects and businesses to have a pick at what deployment (community, self-hosting, azure cloud) works for them.
@jose fajardo, to find out how much DocumentDB will cost, please take a look at this page azure.microsoft.com/.../documentdb.
In order to export documents in bulk from DocumentDB, you can use the ReadFeed method from any of the client SDKs. The response from the method can then be streamed to the local file system or e.g., Azure Blob storage for archival. If you’d like to see archiving capabilities in the service please post your suggestions to feedback.azure.com/.../263030-documentdb
@Jeff, we do have encryption of the list of future feature work. Please help us prioritize the timing by voting for it on feedback.azure.com/.../263030-documentdb.
@Corey, these are great suggestions.
Regarding views, note that views that are based on filters and projections can be accomplished by simple SQL queries since DocumentDB supports automatic indexing. We do understand that there are scenarios where views based on aggregates are quite useful. Please post your feedback at feedback.azure.com/.../263030-documentdb.
Common documents will be read from memory in DocumentDB. You can also implement a caching layer in your application by using DocumentDB with Azure Cache. We will add more documentation and tooling on how to do this.
@Ryan LM, for paging, please take a look at the QueryWithPaging method in the MSDN samples in this file (code.msdn.microsoft.com/.../sourcecode).
Sorting (order by) and partial document updates are planned for future updates. Please vote for these features at feedback.azure.com/.../263030-documentdb.
In the preview offers, the maximum size of a collection is 10GB. To fully utilize 10CUs, you should create at least 10 collections. With 10 CUs:
• If you create 10 collections, they will be allocated 2000 request units each = total of 20,000 request units.
• If you create 30 collections, they will be allocated 667 request units each also = total of 20,000 request units.
Hope that helps.
@Emmaneul Buah, thanks for the feedback. Please vote for standalone installations of DocumentDB at feedback.azure.com/.../263030-documentdb. Be sure to distinguish whether you want a stand-alone deployment option vs. a local emulator.