<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Pablo Castro's blog : Azure</title><link>http://blogs.msdn.com/pablo/archive/tags/Azure/default.aspx</link><description>Tags: Azure</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>SQL Data Services goes full relational</title><link>http://blogs.msdn.com/pablo/archive/2009/03/12/sql-data-services-goes-full-relational.aspx</link><pubDate>Thu, 12 Mar 2009 19:42:50 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9471885</guid><dc:creator>pabloc</dc:creator><slash:comments>8</slash:comments><comments>http://blogs.msdn.com/pablo/comments/9471885.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pablo/commentrss.aspx?PostID=9471885</wfw:commentRss><description>&lt;p&gt;A few days ago we &lt;a href="http://blogs.msdn.com/ssds/archive/2009/03/10/9469228.aspx" target="_blank"&gt;announced&lt;/a&gt; the big news about SQL Data Services (SDS) switching to being a full relational database on the cloud.&lt;/p&gt;  &lt;p&gt;I’ve been a strong supporter of this path for a number of reasons. Relational databases are very well understood and there is a large base of expertise for them in the market. Also, a lot of the existing applications and libraries out there are ready to run against a relational database, so SDS is enabling them to be ported to the cloud with minimum (or perhaps sometimes, no) effort.&lt;/p&gt;  &lt;p&gt;With SDS going relational not only you get to reuse all your knowledge and codebase in the cloud, but you also get all the benefits of a cloud-based infrastructure: high availability, piece-of-cake provisioning, pay-as-you-go growth, etc.&lt;/p&gt;  &lt;p&gt;One of the concerns I read about is the impact on scalability. My observation is that when you look at most of the storage systems in the cloud, they don’t have some magic formula for scalability, the trick is partitioning. Some systems are smarter than others in how they partition data and how dynamic the partitioning scheme is to adapt to varying system workloads. But in the end, you need to partition your data such that it’s spread across a bunch of nodes; if across your system you never (or rarely) depend on cross-partition operations, then you have a sustainable scalability path. That is independent of the actual organization of the data (e.g. relational, flexible entities, etc.) The ACE model on top of SDS had partitioning embedded in the model through scale units that surfaced as “containers”. In the new SDS world you can just partition your data across nodes, where each node has full relational capabilities. So it’s similar (partitioning), but each node gives you very rich ways of organizing and interacting with your data (full SQL!).&lt;/p&gt;  &lt;p&gt;The other concern I heard is around TDS, the SQL Server client-server protocol, and how it would play in the Internet. In many cases the actual application that connects to SDS will be running in Azure as “web” or “worker” roles, and things should go smoothly. For the scenarios where the client is connecting to SDS from across the web, there are two challenges: firewalls and latency. &lt;/p&gt;  &lt;p&gt;The server side of TDS by default listens in TCP port 1433, which a lot of firewalls will just block; furthermore, TDS is not HTTP, so a packet-inspecting intermediary could choose not to let the traffic through, regardless of the port number. This could certainly create some trouble that will need to be addressed at some point. &lt;/p&gt;  &lt;p&gt;From the latency perspective, the short story is that I think it’s fine. TDS follows a simple request/response model, so interactions between clients and servers are straightforward and not chatty at all (things are more complicated when MARS is enabled, but that’s another story). We have experience tuning TDS for large WANs with high latency and things work out well as long as you optimize for those scenarios (e.g. batch queries together, etc.). &lt;/p&gt;  &lt;p&gt;As a final note, there is the question about the SOAP/REST interfaces. In my opinion whenever you’re building the kind of rich applications that needs full SQL, rarely the data in the database can stand alone for direct access by consumers. Most of the time there is code on front (in the form of a middle tier) that manages access control, shaping, and even application-level constraints that don’t belong to the database. If you need a REST head on top of an SDS database, you can add ADO.NET Data Services to the equation, which will let you add all that logic fronting your data.&lt;/p&gt;  &lt;p&gt;All in all, I’m really exited to see this happening. This gives Azure a whole spectrum of storage services, from blobs in Azure Blob Storage, to schema-less tables in Azure Table Storage, now to full relational with SQL Data Services.&lt;/p&gt;  &lt;p&gt;-pablo&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9471885" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pablo/archive/tags/Azure/default.aspx">Azure</category><category domain="http://blogs.msdn.com/pablo/archive/tags/SQL+Data+Services/default.aspx">SQL Data Services</category><category domain="http://blogs.msdn.com/pablo/archive/tags/Cloud/default.aspx">Cloud</category></item><item><title>ADO.NET Data Services in Windows Azure: pushing scalability to the next level</title><link>http://blogs.msdn.com/pablo/archive/2008/11/01/ado-net-data-services-in-windows-azure-pushing-scalability-to-the-next-level.aspx</link><pubDate>Sun, 02 Nov 2008 05:40:40 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9028809</guid><dc:creator>pabloc</dc:creator><slash:comments>15</slash:comments><comments>http://blogs.msdn.com/pablo/comments/9028809.aspx</comments><wfw:commentRss>http://blogs.msdn.com/pablo/commentrss.aspx?PostID=9028809</wfw:commentRss><description>&lt;p&gt;The announcement of Windows Azure is a big milestone for us in the Astoria team. We got a chance to add our little contribution to the platform by providing data service interfaces for a couple of the Azure services. &lt;/p&gt;  &lt;p&gt;Currently there are two services that use the ADO.NET Data Services runtime: the &lt;a href="http://www.microsoft.com/azure/windowsazure.mspx" target="_blank"&gt;Windows Azure&lt;/a&gt; Tables Service, which was announced this week as part of the whole Windows Azure story, and &lt;a href="http://www.microsoft.com/azure/sql.mspx" target="_blank"&gt;SQL Data Services&lt;/a&gt;, which has been around for a while but got &lt;a href="http://sqlserviceslabs.net/SDSAstoria.html" target="_blank"&gt;a new experimental Data Services interface&lt;/a&gt; this week to coincide with the PDC.&lt;/p&gt;  &lt;p&gt;These services -and others that will come in the future also based on Data Services- share a common aspect: they have extreme scalability requirements.&lt;/p&gt;  &lt;p&gt;In order to enable them to use our Data Services server runtime we had to extend the data service framework to make it scale in various new dimensions. In the rest of this post I'll summarize some of the walls we hit and the changes we made to the system to handle these scenarios.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Things that already scaled&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The Data Services runtime already incorporates many design principles that help with scalability. &lt;/p&gt;  &lt;p&gt;For example, the system does not keep any required state between requests (we do cache stuff, but we can throw it away at any time), so scale out of the front-end servers of the storage systems is relatively straightforward. This allows the existing runtime to handle an arbitrarily large number of requests by throwing more front-ends to the problem (as long as the back-end systems can take it, of course).&lt;/p&gt;  &lt;p&gt;Also, we don't make any assumptions around the size of the data and provide mechanisms to push-down filters in requests to the data source, so effectively in principle there are no limits to the amount of data that a data service may be fronting.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Hitting the scalability wall&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;While some things scaled, there are certain aspects in which we ran into a scalability wall that required a number of changes in the system.&lt;/p&gt;  &lt;p&gt;Using .NET types to represent the shape of the services is great in a single application, but not-so-great if you have millions of users with hundreds or thousands. We needed another way of describing the &amp;quot;shape of the data in the service&amp;quot;, that is the metadata or schema of the service.&lt;/p&gt;  &lt;p&gt;Since you can't practically create a distinct type for every user/application/table in the system, that means that the instances of objects that represent data flowing through the data services runtime cannot be of a specific type for each entity type. Instead, we needed independent of the flow format with respect of the declared types.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Metadata and service schema&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The data services runtime needs to know the &amp;quot;schema&amp;quot; of each service it exposes. That is, the list of entity-sets, the entity-types of the instances living in those entity sets and the relationships between the various entities.&lt;/p&gt;  &lt;p&gt;In a typical data service, the service exposes data for a given application or domain-specific service, so the schema of the service is known and static (within a given version at least) and all the front-end servers simple share the same schema.&lt;/p&gt;  &lt;p&gt;The way a service author specifies the schema of a service in the shipping version of the Data Services runtime is by using .NET classes or an Entity Framework model (which in turn generates .NET classes). That works great for application developers, because .NET classes are a simple and natural way of defining the shape of your objects. &lt;/p&gt;  &lt;p&gt;Now, if the requirement is to be able to handle millions of applications, each of which can have hundreds or thousands of tables, does that mean that we have to create a .NET type for each service and for each table, and the corresponding number of properties and such? And if so, since the front-end systems are stateless and potentially don't have any affinity to parts of the data, does that mean that any given system may end up having to load up millions of types in memory? To complicate things further, once you load an assembly (the only container in which .NET types can exist), you can't unload it unless you unload the AppDomain.&lt;/p&gt;  &lt;p&gt;.NET types are a great solution for the scenarios where the schema is known and more or less bounded, and will continue to be the primary way of creating services in that context. However, we needed something else to handle the high-end side of the spectrum.&lt;/p&gt;  &lt;p&gt;To address this need we introduced a new interface that data services can optionally implement. We already had the internals of the system organized more or less like this, but didn't expose it in the first release. The idea is that there is main split between the &amp;quot;upper half&amp;quot; of the runtime that deals with URL translation, LINQ expression tree generation, interceptors, policies and all aspects that make a Data Service look like a Data Service. The &amp;quot;bottom half&amp;quot; is the &amp;quot;data service provider&amp;quot;, and is responsible for describing the shape of the service among other things. There are two built-in &amp;quot;data service providers&amp;quot;, the Entity Framework provider which is what you use when you create a data service over an Entity Framework model, and the reflection-based provider which is what you use when creating a service on top of an arbitrary object graph. With the new change you can now create new implementations of these data service provider thingies that can obtain and manage metadata any way they want. &lt;/p&gt;  &lt;p&gt;The way we interact with the provider is carefully designed to avoid requiring long term state state in the provider or the consumer of the provider in any way, while at the same time allowing the provider to do caching of metadata and control information if desired.&lt;/p&gt;  &lt;p&gt;First, we never hold on to information returned by the provider beyond the scope of a single request. So for all we know the provider could be reloading all the metadata in every request. In practice, providers will probably cache this metadata in some way or another.&lt;/p&gt;  &lt;p&gt;Second, we load metadata on demand and piecemeal. For example, during URI translation we do a small scale version of the usual binding and semantic analysis that any compiler does, and for that we need metadata. In those cases we don't load all the metadata, but only the pieces we need to do type checks, symbol lookups, etc.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Making metadata dynamic&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Another aspects around metadata to consider is the fact that the shape of one of these data services can be altered at any time. For example, the Azure Table service has the concept of tables, and you can add and remove tables whenever you want. &lt;/p&gt;  &lt;p&gt;The new scheme with custom data service providers make this possible because we don't remember anything at all across requests. So all the provider needs to do when the underlying shape of the data changes is report a different schema on the next request, and the data services runtime will happily take it. &lt;/p&gt;  &lt;p&gt;With .NET types this would have meant creating and re-distributing new types (or creating them on demand on each node), and dealing with not being able to unload the old types from memory. Clearly not an option at this scale.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Flow format independence&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;With the addition of the &amp;quot;data service provider&amp;quot; interfaces we no longer have .NET types to use for the instances of each entity-type that flows through the system (e.g. from the data source to the runtime via the IEnumerables returned in LINQ queries, and from there to the serialization stack).&lt;/p&gt;  &lt;p&gt;Another important change we made in the system is that we no longer assume anything about the shape of each CLR object returned by the query. We treat instances just as &amp;quot;object&amp;quot; all over the code base. When we need to access a member, we use methods in the data service provider interface to do that, imagine something like GetPropertyValue(object o, string name).&lt;/p&gt;  &lt;p&gt;That means it's now possible to use some form of generic record type across the system. Not only this avoids the need for specific types, but also allows providers to piggyback control information in the instances themselves, avoid copies from the original format into CLR objects just to flow them through our runtime and a few more benefits.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Impact on LINQ expressions&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;While having flow format independence is great, it did complicate things for query formulation.&lt;/p&gt;  &lt;p&gt;We typically translate URLs to expression trees, and since we have all the CLR types in the server that correspond to the entity types, all the expression trees are nice and clear.&lt;/p&gt;  &lt;p&gt;When we're operating against unknown types we can't generated &amp;quot;typed&amp;quot; expression trees anymore. In those cases we still produce expression trees, but the member-access operations (and certain operators) are represented using custom calls to a well-known set of static members. The providers that enable this feature need to know about this and do proper translation of these expression trees.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Extension to the data model&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;We did one more major change that while it's not directly related to scalability it has a lot to do with the database/storage services in Windows Azure.&lt;/p&gt;  &lt;p&gt;In the current version of Data Services types are &amp;quot;closed&amp;quot; in the sense that they have a structure that's final. You list a set of properties for each type and instances of that type cannot have properties added dynamically.&lt;/p&gt;  &lt;p&gt;It turns out that the data services we have online have a more flexible model, where each entity has a fixed portion but also a dynamic portion. Typically the fixed portion includes a key or some sort and a version property. The dynamic portion is a property bag where you can add any name/typed-value pair.&lt;/p&gt;  &lt;p&gt;We call these types that can be extended on a per-instance basis at runtime &amp;quot;open types&amp;quot;. We introduced support for open types in the Data Services runtime such that you can mark a given entity type as &amp;quot;open&amp;quot; in metadata and that would cause the system to allow unknown properties to be set, as well as the use of unknown properties in queries (e.g. in filter predicates). &lt;/p&gt;  &lt;p&gt;There is a lot of details around open types that I won't go into here, maybe the topic for another post, but I wanted to point out the change because it was a significant addition.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;What do these changes mean for developers?&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;What does all this mean to current users of data services. Well...not much other than some background on how the system is evolving. Other than open types, services created with custom metadata/custom flow formats are indistinguishable from the ones created the &amp;quot;classic&amp;quot; way.&lt;/p&gt;  &lt;p&gt;Furthermore, we will preserve the existing model where creating a service based on some .NET objects or an Entity Framework schema is really straightforward, and we consider that our primary scenario for developers.&lt;/p&gt;  &lt;p&gt;At the same time, addressing the needs for the highest-end services out there is important, so many (if not all) of these changes will eventually make it into the shipping product so that other folks out there can use them if they chose to. Beware that these interfaces are not designed to be &amp;quot;nice&amp;quot;, but rather optimized for control and efficiency, so it may not be exactly a fun experience, but you'll get all the scalability you'll need out of them.&lt;/p&gt;  &lt;p&gt;-pablo&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9028809" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/pablo/archive/tags/Data/default.aspx">Data</category><category domain="http://blogs.msdn.com/pablo/archive/tags/Astoria/default.aspx">Astoria</category><category domain="http://blogs.msdn.com/pablo/archive/tags/Services/default.aspx">Services</category><category domain="http://blogs.msdn.com/pablo/archive/tags/ADO.NET+Data+Services/default.aspx">ADO.NET Data Services</category><category domain="http://blogs.msdn.com/pablo/archive/tags/Azure/default.aspx">Azure</category></item></channel></rss>