Sharing the goodness…
Beth Massi is a Senior Program Manager on the Visual Studio team at Microsoft and a community champion for business application developers. Learn more about Beth.
More videos »
Unlike LINQ to SQL, in Visual Studio 2008 Service Pack 1 explicit loading is the default behavior in the Entity Framework. When a query is executed within the Entity Framework, the entities returned by the query are fully accessible, but any associated entities are not immediately loaded. This means that if we have a Master-Detail relationship in our model, say Customer related to many Orders, and we query for a Customer then only the Customer entity will be returned, not the related Orders.
Now LINQ to SQL will load any related entities by default when/if they are accessed, meaning that LINQ to SQL lazily loads the related data by default when it's accessed. The EF team wanted to make this much more explicit. They believe that a developer should always be very aware when data is being fetched and I tend to agree here. All the business applications I have ever written were very careful about how much data was being loaded and how often network/service calls were being made. This is very important to the scalability of your solutions. In any data application you need to always balance the amount of data being returned versus the number of queries you need to make to your database, and that can be very expensive depending how "far" away your database is from the caller.
Implementing lazy loading works well when you only need to make a few additional calls for the related data or the data is optional (like triggered by a user action). On the other hand, if you know you will always need the related data then it is probably more efficient to load the related entities in a single call. This is called eager loading -- there is more data returned to you but there is less chit-chat between the client and the database.
So although you have to be explicit about loading related entities with EF, this gives you the power to decide how to best fit these loading techniques in your applications. And luckily the code is straightforward. Let's take a look.
Creating a Simple Master-Detail Model
In order to illustrate these loading techniques let's create a simple one-to-many example using our favorite database, Northwind. Using Visual Studio 2008 Service Pack 1 I've added a new item to the project and selected Entity Data Model, chose the "Generate from Database" option, and selected Customers and Orders tables from Northwind. So here's my Entity Data Model (EDM):
Setting up Windows Form Data Binding to an EDM
Let's display this data on a Windows Form in two related DataGridViews. You can design the form using the data sources window against the Customer and Order entities in the model, you just need to select Data on the main menu and then Add New Data Source. When the Data Source Configuration wizard opens select Object data source and then select the Customers entity. (BTW, this is exactly how you would do it if using LINQ to SQL classes or your own business objects). The related Orders collection will automatically be pulled in so just select Customers and click Next.
Now you can use your data sources window to design your windows form. Drag the Customers entity as a DataGridView onto a Windows form and then drag the related Orders entity onto the form below the Customers to set up a Master-Detail form. (Sorry if this is a recap for you all but I always get questions on this when I assume people know this! ;-))
Lazy Loading Related Entities
Okay back to the point. First write a query that selects the Customer who's CustomerID is ALFKI. Then we can set the CustomersBindingSource.DataSource property to the query result and it will execute the query on the load of our form.
Public Class Form1
Private db As New NorthwindEntities
Private Sub Form1_Load() Handles MyBase.Load
Dim query = From c In db.Customers _
Where c.CustomerID = "ALFKI"
Me.CustomersBindingSource.DataSource = query.ToList()
Run the form and you'll see that we are only getting Customer ALFKI, and none of the related Orders. If you do this exact code above with LINQ to SQL classes then it would execute two queries on load and you would see the related Orders. This is because the CustomerBindingSource requests the collection of Customers so it executes the query that we wrote above. When the parent BindingSource's CurrentItem changes when the results are returned then it triggers the child BindingSource to display any related data. With LINQ to SQL this causes an additional query to execute to pick up the children. So as the position moves in the parent, the system is actually hitting the database to get the children in the case of LINQ to SQL.
To do this with EF we can handle the CurrentChanged event of the BindingSource and explicitly load the Orders for that Customers entity:
Private Sub CustomersBindingSource_CurrentChanged() Handles CustomersBindingSource.CurrentChanged
Dim c = TryCast(Me.CustomersBindingSource.Current, Customers)
If c IsNot Nothing Then
If Not c.Orders.IsLoaded Then
Now when we run this again you will see the related Orders and two queries will execute against the database.
Eager Loading Entities
But what if we don't want to make two separate queries against the database? In the scenario above we always want to display the related Orders with the Customers so it would be better to make a single call to the database to retrieve the Customer and the Orders. In that case we can use the Include method that is available on an entity and we add this to our query in our form's Load:
Dim query = From c In db.Customers.Include("Orders") _
Where c.CustomerID = "ALFKI"
Now when we run the form again we will see the same results, but this time there is only one query being made against the database. So although EF forces us to think harder about how our data should be loaded, this is a generally good thing for most business application development. It's up to us developers to determine the best technique for our situations.
There is a major flaw in all this that the behavior is completely non-obvious. If the customer's orders are not loaded you don't get any warning. Instead you just get back an empty collection.
The fact that Link To Entity does not throw an Exception when the child collection is not loaded makes it very hard to track down bugs. Most of my code should know nothing about how the data is loaded and just fail (or automatically load the data) if the data it’s need has not be prelaoded
Also having to pass a string to the Include() method is very poor, as it stops the compiler finding errors if the Orders property is every renamed. It would be better if Link To Entity allow “Include(x-> Orders)”, hence letting refactoring work etc.
If only Link-to-Sql worked with Oracle… Or if Link and all other Microsoft tecknolgy surported nhibernate as well as Link To Entity
Well.... that's what explicit loading is all about ;-). Why would you want an exception when accessing the collection? Just check for IsLoaded.
I agree that the Include syntax is crappy having to specify a string. I'll pass your feedback to the team, it's possible they are already considering this for v2.
Totally agree with Ian on the string syntax. Why not have an enumeration on the entities that can be passed?
-> Why would you want an exception when accessing the collection?
Because I told the system my code does not need the values in the collection (otherwise I would have loaded it). Then my code used the values in the collection, therefore my code has a bug, and the system should make it easy for me to find bugs in my code.
From thinks like that is seems to me that Entity Framework is NOT READY for commercial applications!
It is terrible to build a typed OR/M schema which needs a string !!!!! in order to work!
If so why bother using all these stuff?
I could just write sql ...
Thanks for the useful details on lazy loading, Beth. It's exactly what I was looking for!
Beth- your article nearly answers my problem, but now quite. I have a master detail scenario, and I have the master grid and detail grids loading fine. I'm doing exactly what you show in your article- however it seems like you're missing one piece- the actual data binding to the child control. I handled this by adding a line of code:
OrdersBindingSource.DataSource = c.Orders;
This is right after the line c.Orders.Load();
This works fine- however, when I delete a record for the child datagridview, and then call ObjectContext.SaveChanges(), ONLY the reference is marked as deleted (in the ObjectStateManager).
Why is the actual entity that I deleted not marked as deleted? Notably, when I delete a record from the master grid and call SaveChanges, there will be both a Reference AND an Entity marked as deleted.
The point is- if there is not an entity marked as deleted, an exception is thrown regarding cardinality issues because SQL thinks your trying to delete only the relationship, and not the child record.
Hopefully this makes sense- I'm really stumped on how to deal with this problem. I fear it might be a bug with the EF and datagridviews. Hopefully not.
I actually used the designer to hook up the child bindingsource in my example. Because there is an association between Customer and Orders the Ordersbindingsource is automatically bound to the CustomerBindingSource (it chains as you navigate the relations).
The other issue you're probably running into is because you probably have a non null foreign key to your parent table from the child (as any good database should). Unfortunately the lists that you are bound to will not properly notify the objectstatemanager of the deletion (removal from the list) in that case. So when you delete the child you have to call DeleteObject(child) on the objectcontext. You will also need to delete all the children and the parent if a parent is removed from the list.
It's not a limitation of Winforms or the DataGridView this will be the same issue in WPF. It's an issue with EF and binding to the lists it returns.
Thanks Beth- using DeleteObject directly works fine. This would seem to be a bit of a shortcoming in EF though, are you aware of this issue being handled in a future release?
Type safe code is something Microsoft simply doesn't get. They are doing the same mistakes all over the place, over and over again. Take any of their datacontrols and drop it on a form and bind to it, and the designer makes a string based binding for you!
Is it so difficult to understand that catching an error at compiletime is 10 times better than at runtime? It's simply not good enough to say, "I'll pass it on to the team". What the hell have this team been doing, forgetting something as obvious as this??? It's not like it's difficult or anything. It's like a big pink elephant running around singing 'You Don't Know What You're Doing'! How can you miss that?
Workable, yes, perfect no. But I'm easy, so it good for me. Thanks for posting the knowledge.
You are Great !!!
Thanks for sharing !!!!
I agree with Tony and Ian.
Entity Framework is very interesting but it seems too poor at the moment. On some blogs it's written Linq to sql is on an end way... EF is not ready for real enterprise application.... not a good situation. :(
Thanks Beth. This is the best EF master detail example I have seen so far. But I have a question. How would I include a WHERE clause or ORDER BY clause in the detail table query (sorry I may not be using EF terminology).
The example has a where in the master table query and I have done order by as well. But I can't see how to include either clause in the detail table query.
Hope you can help.