Working With Large Models In Entity Framework – Part 1

Published 24 November 08 08:51 AM | dpblogs 

We have seen quite a few requests coming in from various folks asking for some guidance on best practices around working with large entity models in an Entity Framework application. The following post tries to describe the typical issues you would face when using a large entity model and tries to provide some guidance that hopefully will help mitigate some of these issues.

Issues with using one large Entity Model

The easiest way to create an Entity Model today is through the Entity Data Model Wizard in Visual Studio by pointing it at an existing database. The experience is very straight forward if the database size is not too big. Of course ‘big’ is a relative word. In general you should start thinking about breaking up a model when it has reached 50-100 entities.  The Entity Framework can handle larger models but you could run into performance problems if the model is too inter connected (more details below). More importantly, though, it just becomes unwieldy to interact with very large models and the application complexity increases as the size of model increases beyond a certain level.

The typical problems you would see with a single large entity model:

I. Performance

One of the major problems you could run into with models generated from big database schemas is performance. There are two main areas where performance gets impacted because of the size of the model:

a. Metadata Load Times

The size of our Xml schema files is somewhat proportional to the number of tables in the database that you generated the model from. As the size of the schema files increase, the time we take to parse and create an in-memory model for this metadata would also increase. This is a onetime cost incurred per ObjectContext instance. We also cache this metadata per app domain based on Entity Connection String. So if you use the same EntityConnection string in multiple ObjectContext instances in a single app domain, you would hit the cost of metadata loading only once. But still this could account for a significant cost if the size of model gets pretty big and the application is not a long running one.

b. View Generation

View generation is a process that compiles the declarative mapping provided by the user into client side Entity Sql views that will be used to query and store Entities to the database. The process runs the first time either a query or SaveChanges happens. The performance of view generation step not only depends on the size of your model but also on how interconnected the model is. If two Entities are connected via an inheritance chain or an Association, they are said to be connected. Similarly if two tables are connected via a foreign key, they are connected. As the number of connected Entities and tables in your schemas increase, the view generation cost increases.

II. Cluttered Designer Surface

When you generate an Edm model from a big database schema, the designer surface is cluttered with a lot of Entities and it would be hard to make sense of how your Entity model in total looks like. If you don’t have a good overview of the Entity Model, how are you going to customize it? If you want to experience the problem I am talking about, try to create a default model for AdventureWorks sample database and try to make sense of the Entity model that is produced.

III. Intellisense experience is not great

When you generate an Edm model from a database with say 1000 tables, you will end up with 1000 different entity sets. Imagine how your intellisense experience would be when you type “context.” in the VS code window.

IV. Cluttered CLR Namespaces

Since a model schema will have a single EDM namespace, the generated code will place the classes in a single namespace. Some users have complained that they don’t like the idea of having so many classes in a single namespace.

Possible Solutions

Unfortunately there is no out of the box solution that we can offer at this point to solve some of these problems. But there are quite a few things that mitigate some specific issues listed above. Some of these make sense in specific scenarios and should be chosen as such.

I. Compile time view generation

Because view generation is a significant part of the overall cost of executing a single query, the Entity Framework enables you to pre-generate these views and include them in the compiled project. The cost is especially significant in big interconnected models as described in the problem definition. So you should definitely pre-generate views for large models. But the prescriptive guidance from EF team is to pre-generate views for all EF applications. You can read more about the process of pre-generating views here.

II. Choosing the right set of tables

There will be cases where your application might not require all the tables in a database to be mapped to the Entity Model. You could run into two different scenarios when you are selecting the subset of tables.

a. Naturally Disconnected Subset

In this scenario, the tables you want to work with are totally disconnected from the other tables in the database i.e. there are no outgoing foreign keys. This case is pretty simple to implement from the designer. If this approach fits your needs, I would strongly suggest using this since it is both straight forward and works great with the designer.

b. Choosing the subset by exposing foreign keys

This is an example where the subset of tables you want to work may have out going foreign keys to other tables in the database. When you do this, you would have to take the responsibility of setting the foreign key appropriately. There would be no navigation property that allows you to get the Entity that represents this foreign key. You could manually query for this Entity in the other container if needed. For example, let’s say your program works with just the Products and Suppliers table in Northwind. You can choose these tables and work with them. But CategoryID column in Products table which is a foreign key would show up as a scalar column instead of being an association. One important thing to note is that the Entity Framework’s update pipeline won’t be able to resolve dependencies across different subsets since you have removed the foreign key information from your storage schema( SSDL file). You would have to manage these dependencies and order the SaveChanges calls correctly when working with multiple subsets.

The schemas for this example can be found at in the attached .zip file under the SubsettingUsingForeignKeys folder.

The solutions I have described in this post have one major advantage in that they don’t require you to edit the Xml directly. You can do this all using the designer. But the above two options might not be ideal for your situation. You might end up in a world where you want to split up your model into smaller models but some types have to exist in multiple models simultaneously. You can still do this using the designer but you would have the same type defined in multiple models if you do this using the designer. The other option is to use a feature in Entity Framework usually referred to as “Using” that allows you to reuse types defined in one CSDL in another CSDL file. In my next post, I will have a couple of examples on how to do model splitting with “Using” and type reuse.

Srikanth Mandadi
Development Lead, Entity Framework

Attachment(s): MultipleSchemaSets.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# infoblog » Working With Large Models In Entity Framework ??? Part 1 said on November 24, 2008 12:21 PM:

PingBack from http://blog.a-foton.ru/index.php/2008/11/24/working-with-large-models-in-entity-framework-%e2%80%93-part-1/

# juliel said on November 24, 2008 5:51 PM:

I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together?

thanks

julie

# Erik Wynne Stepp said on November 25, 2008 12:06 AM:

You state above that "the prescriptive guidance from EF team is to pre-generate views for all EF applications."

If this is the case, then why do you not provide a better integration scenario in Visual Studio?

The steps that you suggest are not onerous, but they are also not obvious either.  

I would expect Visual Studio to implement the best practice by default, but allow me to easily change it.  In the next release of EF, could you please do the best solution by default?

# Srikanth said on November 25, 2008 5:20 AM:

Julie,

Out  of the 3 folders in the zip, only one(SubsettingUsingForeignKeys) corresponds to the post today. The other two are for the second part of the post where I will go over type reuse with "Using". Since designer does not support "Using", the Edmx files would not be very useful for these.

I will try to share the Edmx file for SubsettingUsingForeignKeys sample but in the mean while you can put it together pretty easily from the CSDL, SSDL and MSL files following the steps from Sanjay in this post : http://blogs.msdn.com/dsimmons/archive/2007/12/07/how-to-use-your-existing-csdl-msl-ssdl-files-in-the-entity-designer-ctp2.aspx.

Thanks

Srikanth

# Matthieu MEZIL said on November 25, 2008 5:53 AM:

J'ai récemment été sollicité pour proposer des solutions afin de résoudre des problèmes de performances

# Dariusz Jankowski said on November 25, 2008 6:24 AM:

I work with model with more than 70 tables and it will grow.

I think that it would be great to be able to work with EDM Model like we work with database model in SQL Server. In SQL Server we are able to generate different diagrams which describes some aspects of relations. It could be also implemented in EDM Diagram in some way.

Helpful may be creating boundaries inside Model so we would work with all model or only with a part of it, but the part would still have relations with other parts (tables in parts).

Example slices:

OrderSlice which consists tables: Order, OrderDetails, OrderStatus, OrderType, OrderHistory

ProductSlice which consists tables: Product, productCategory, ProductFamily, ProductImages, ProductJme, ProductDescription, etc

Is this all has sense to implement in future EF?

# DM said on November 25, 2008 6:28 AM:

A bit more helpful than Elisa Flasko's comment "Well, big entities are big entities...!" when someone asked this question at TechEd Europe recently.

# Matthieu MEZIL said on November 25, 2008 6:35 AM:

Last week, a customer asked me how to solve a big EDM performance problem? In his case, his model was

# Glenn Gailey [MSFT] said on December 2, 2008 2:19 PM:

More general information about Entity Framework runtime performance can be found at http://msdn.microsoft.com/en-us/library/cc853327.aspx.

# Steve Strong's Blog said on December 3, 2008 3:44 PM:

Weekly digest of interesting stuff

# Norbert Siegel said on December 9, 2008 2:08 AM:

I worked with 250 tables in the Entity Model an can not split it into 2 or more Entity Models. I used allways pregenerated Views, but the compiletime is much to high. The Runtime Performance is good.

Planed Microsoft a Performance Patch in the next Month ?

# BrianLei said on December 10, 2008 8:34 AM:

EntityFramework的开发领导SrikanthMandadi称这个包含两部分内容的文章为

# Dave C said on February 17, 2009 7:41 AM:

Hey, we're working with quite a large database and using edmgen2.exe to generate our emdx and .cs files. I found this link very helpful as i didn't know that pre-generating the Views would actually speed everything up. It's created an 80 meg .cs file which VS actually struggles to build.. Once it's built though. It means development is much faster than it used to be. Every time we used to make a change and started up the web site we'd have to wait ages before linq would respond.

I'd recommend to anyone to do this view generation stuff before they work with linq to entities on a day to day basis.

I hope in the next version alot of the speed issues and this hidden stuff is going available as options or properties. Also that linq to entities catches up with Linq to SQL.

# Pietro Brambati Blog said on April 8, 2009 10:43 AM:

Direi che una buona pagina da cui partire è questo documento su MSDN :Performance Considerations for

# Medyum said on May 22, 2009 6:58 AM:

I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together?

thanks

# Matthieu MEZIL said on May 26, 2009 7:47 PM:

Un de mes clients veut développer un ERP avec EF. Sa base contient plus de 600 tables quasiment toutes

# Matthieu MEZIL said on May 27, 2009 9:28 AM:

One of my customers wants to code an ERP. To make it, he wants to use EF. His DB has more than 600 tables

# Andreas said on June 24, 2009 5:00 AM:

Does anyone know if there has been some improvement for big database structure? Does the VS 2010/.NET 4 handle it better?

We are in the development process of an application that will grow. For the moment and for the next year we are not expecting a very huge model, but it might become large later on.

What has changed with the new upcoming versions?

Thanks

# radyo dinle said on July 19, 2009 10:31 AM:

Last week, a customer asked me how to solve a big EDM performance problem? In his case, his model was

# ssk sorgulama said on July 23, 2009 10:22 AM:

I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together?

# zerrin egeliler said on July 23, 2009 12:41 PM:

I'm really looking forward to seeing these and have donloaded them but they are the raw files. One of the points you have made is that using these patterns you can do all of the work in the designer. Any chance of sharing some EDMX files so we don't have to dizzy ourselves looking at the raw xml and moving back and forth from one file to another to mentally put it all together?

# juan said on August 8, 2009 5:08 PM:

una porkeria su Entity Framework..

# Luc said on October 8, 2009 11:12 PM:

52 entities, 55 associations (Foreign keys)

The Validate step worked slower and slower.

Now it crashes both in VS and at run time ...

EF is big and clumsy. I even wonder if it can be fixed. Many unnecessary features

that should have been orthogonal to the framework not built in.

I actually wanted to use Linq to SQL, that's a lean piece of software. But MS drops it and picks EF as the "winner".

I apologize for being this harsh but it's ridiculous to consider a 50-100 tables system as being big. What's a 500 tables system then?

I concur with Juan above ...

Leave a Comment

(required) 
(optional)
(required) 

  
Enter Code Here: Required

Search

This Blog

Syndication

Page view tracker