There's a lot going on today! Not only has VS 2008 SP1 Beta been released (with the EF fully integrated into setup and including a number of new features and bug fixes), but also Jarek Kowalski, who is a really smart colleague on the EF team, put up the first blog post in a series he has written which shows some exciting ways that you can layer some code on top of the EF and achieve lazy loading. The goals for his effort are pretty ambitious, and he has achieved them admirably. You might find this code very useful, and it's also a great sign of the flexibility of the EF. I'm so excited to see these kinds of things popping up both from folks inside Microsoft and those on the outside as well. This is how we translate the work of the EF team into real value for folks on many projects.
- Danny
I had so much fun talking with folks at DevConnections in Las Vegas last fall that I really wanted to attend a conference again this spring so I could share more about what we're doing with the Entity Framework and meet more great people who are interested in or even using the EF or LINQ to SQL. As it turns out, they needed someone to present a couple of sessions at the upcoming DevConnections in Orlando, so I signed up.
My two talks are:
VMD314: Entity Framework in the Real World
Daniel Simmons
Come see the Entity Framework in action! Check out an exciting new open source application built on the Entity Framework. During this session we’ll discuss and demo the use of the Entity Framework to build a LOB application in the healthcare vertical.
VDM215: Entity Framework: Application Patterns
Daniel Simmons
Microsoft is introducing the ADO.NET Entity Framework and the Entity Data Model to help developers code against first-class business objects when creating business applications. Like any new technology, most of the information available on the topic focuses on what the Entity Framework is, what are its constituting components and related aspects. This session takes the audience beyond the “what“ of the Entity Framework, instead delving into various application scenarios and approaches to application architecture to show how one can use the Entity Framework today. We shall discuss the role of Entity Framework in Web, rich-client, and service-oriented applications.
But I also plan to attend as many of the EF and L2S sessions as I possibly can, so if you are going to be there, I'd love to meet you. I'm usually pretty easy to find because a) I often carry around a bright, lime-green backpack, and b) when it comes to these topics I have an awful hard time keeping my mouth shut. ;-)
- Danny
Probably the most frequent question I get these days is, "by the way, when will the EF finally ship?" I can imagine how frustrating it must be not having more concrete data about this stuff, and sadly all we're able to tell you is that it will ship some time this summer. Unfortunately I don't have any new news on that front, but what I can (finally) share with you is the ship vehicle.
There will be an SP1 of VS 2008/.Net Framework 3.5, and the Entity Framework and its designer will ship with it. In fact, there will be a beta for SP1 coming out sometime soon with all of our bits nicely integrated with VS and the .Net Framework--no more install Orcas, then install this beta thing, then a patch, then another beta thing or whatever that song and dance was.
- Danny
Today in the forum a question came up that illuminates some non-obvious aspects of the EntityFramework which is pretty important so I wanted to copy some of my response here and expand a little to give it a bit more exposure.
The problem statement is essentially this: I have the entity key for something I want to delete, and I want to minimize round trips to the database so I fake up an entity object, set the key, attach it to the context, call DeleteObject and then call SaveChanges. Sounds great right? Unfortunately, in a number of cases SaveChanges will throw an exception at this point without even attempting to make a call to the database. To make things more confusing, the answer is to retrieve more info from the DB first (either through a query or by explicitly attaching). You might say, "So in order to delete something successfully I have to load more data? Huh? Why can't you just delete the record with the key I gave you and be done?"
The background is a bit interesting. What we are encountering here is the fact that the EF both allows you to describe some fairly high level semantics about your model and abstracts away the underlying physical representation of that model in the database. The place where this bites us is when I have required relationships for an entity. For example, I might have a model with customers and salespeople where the association is modeled such that ever customer must have a salesperson. With this kind of model, I might map this to the database as a foreign key column in the customer table which contains the id of the salesperson -- or -- I might model it as a separate link table which has customer ids and salesperson ids but just has a constraint that the customer id is unique (that way the same salesperson can appear many times saying that the salesperson has multiple customers, but the customer can only appear once and will specify which salesperson goes with that customer).
The way the EF deals with the fact that either database schema is valid with this model (just with different mappings) is that it reasons about entities and relationships largely independently of one another. When it comes time to save changes, the EF looks over the ObjectStateManager to find the operations it must map to and carry out on the database and then it goes through a validation phase to make sure that the operations will make sense and result in a coherent final database state. So if you have an entry in the state manager saying that an entity should be deleted, then the EF will look at your model to determine if there are any required relationships. If the model says that there are (because for instance, deleting a customer means you must also delete the relationship between that customer and their salesperson), then the EF will make sure that the state manager also has an entry indicating the relationship that should be deleted. If that entry doesn't exist, then the SaveChanges operation will fail. If it didn't know about the relationships and didn't make this validation step, then the EF could try to delete the row in the customer table without deleting the corresponding row in the link table if your database schema worked that way which would cause even less clear exceptions to flow up from the database. With the relationship info, the EF can delete things from both tables automatically if necessary.
Given this relatively non-intuitive requirement, the EF goes to some lengths to make most scenarios handle this automatically. In particular, when you retrieve entities via a query, the query is automatically rewritten to bring along relationship info (a feature we internally call "relationship span"). That way the state manager will be aware of those relationships, and if you indicate that an entity should be deleted, then the relationships are marked deleted as well, and the EF has all the info it needs to delete things.
In the event that you attach things, though, you must supply all the needed info. This can be done either by querying for the entity you want to delete using its key, or it can be done by setting the key of required relationships on the EntityReference's EntityKey property. In our example this would be the key of the salesperson. If you set that property and then attach the customer, the attach operation will automatically create the relationship entry as well as the entry for the customer entity.
For more information about relationship span, you might want to check out this previous blog post: http://blogs.msdn.com/dsimmons/archive/2007/12/21/filtered-association-loading-and-re-creating-an-entity-graph-across-a-web-service-boundary.aspx
- Danny
In my ongoing series of trying to repurpose / further broadcast important topics that come up in the forum, I thought folks might be interested in the discussion going on in this thread. A key part of this discussion is coming back to this question of ObjectContext lifetime that I discussed in this previous post.
Of course there's more than one way to do things, and Rick Strahl made a comment to that post pushing back on the idea of keeping a context around for an extended period of time because of concerns over things like building up an overall set of changes and then deciding that you want to "undo" part of those changes but not all of them before saving, etc. One approach he suggests is using a separate context per business object. The problems here, though, are around things like relationships which need to be tracked and modified somewhere. Yes, you can do this, and in some situations it may be better, but it certainly brings along its own complexity when it comes to managing all these contexts and context lifetimes.
For many scenarios I recommend using a single context (or one per thread if your app is multi-threaded). When it comes to these partial undo kinds of scenarios, I admit that there is some extra complexity--in fact there was some discussion on the team just in the last couple of days about looking into a number of related scenarios as we do planning for v2 which may lead to some additional features which will help here (how about a "RejectChanges" method which could be used to undo changes to an entity and its relationships without throwing out the whole context, for instance). Often times, though, those kinds of concerns can also be addressed by reducing the granularity of your units of work to the point where you can either commit an entire set of changes in the context or throw it out.
Taking this kind of approach allows you to sort of "go with the flow of the EF" which brings me back to the forum thread that started this post. In that thread, there's discussion about what belongs in what layer of the app--should you explicitly write a layer that hides all data access like we used to do? For most cases I would suggest "no". Part of the idea of the Entity Framework is that *it* provides the abstraction which hides the database. I think about it like this:
UI Layer / Top-level Application coordination
This is code which may reference lower layers but doesn't contain any EF framework code at all.
-----------------
Business Objects layer
These are your entities -- some of this code may be generated by the EF tools, but it is augmented with your own code supplied in partial classes, partial methods, event handlers, etc.
------------------
Data Layer
This is largely code which the EF supplies. It abstracts your business objects away from the details of the database. In past you would have written this pretty much all by hand, and this would be the only code that really interacts directly with ado.net. In the EF paradigm, you might often allow code to interact with EF framework code which takes the place of some of what you would have written by hand in this layer. The ObjectContext largely represents this layer, and you can extend it with its partial class as well as various event handlers and such so that you can customize the experience here, but you just don't have to write as much yourself.
Of course there's nothing that says you can't write your own data layer which encapsulates the EF code that I described as the data layer, but for many applications I tend to think that will make your life harder for relatively little benefit. One of the main reasons we would write a completely separate data layer in the past is that we wanted to isolate the business logic from changes in database schema and the like. The EF now does that for you with its mapping layer.
- Danny
It was somewhat delayed, but a CTP of SQL Compact Edition with support for the EF that works with beta 3 is now available. You can download it from:
http://www.microsoft.com/downloads/details.aspx?FamilyID=68539FAE-CF03-4C3B-AEDA-769CC205FE5F&displaylang=en
OK. First off I have to apologize for the fact that I've been pretty silently lately. I've got a whole backlog of questions to answer, fixes to entity bag, comments to respond to, etc. Life seems to be composed of different seasons--there's a time to write 58 blog posts, and it seems there's a time to put my head down take care of other things and pretend I never had a blog... In any case, though, I just have to make a short comment on something that has me very, very excited.
If you haven't seen it yet, please, PLEASE go take a look at http://www.codeplex.com/efcontrib. Some great folks have started contributing projects which could make use of the entity framework much richer and more interesting in the long run. There are two projects up now--one which makes it much easier to create entity classes which are more persistent ignorant (YEAH!) and one that makes it easier to extend and customize code generation in EF projects within visual studio. The overall project is active with new posts going up regularly, and it's my hope that more and more people will contribute.
- Danny
Here's another topic which I believe is important to educate more folks on, but I just haven't had the time to carefully write something up. As I was answering a question on the forum today, I realized that the answer might be good to share a bit more broadly, so here's a copy of that response. The question which has come up several times is "When do I dispose of a context and recreate it vs when should I keep the existing context around and re-use it?"
If you are building a rich client application then you may well want to keep a context around for the life of the application, but you need to keep track of how many entities you have in that context. It's all about understanding the overall pattern of your app. There are some extremes and a whole spectrum of possibilities across the middle.
-
If your app's purpose is to efficiently add 200,000 entities to the database and then quit, then I would recommend doing batches rather than trying to cram all of them into the context at once and do one big save. (This will clearly yield better performance.)
-
In the middle somewhere come things like a rich client email or order entry app where there's some amount of data which is relatively constant in size and relatively small (the contacts you work with regularly, the products you usually sell or reference data like zipcode->state map or something) plus some amount of data that is transient (email messages arriving, being read and deleted, or orders entered and then sent off for processing). In this kind of application I would use a single context and then keep careful track of what data is which kind. For the "re-usable" data I would just leave it in the context, and for the transient data I would make sure that when I know it is no longer needed that I call Detach on it. The detach method on the object context will remove items from the context without destroying it. If you have a LARGE batch of items attached to a context, then the fastest thing generally is to destroy the context and recreate it (metadata caching will make this relatively fast), but if your context has some data you want to keep and some you want to detach, then you can call the detach method on everything you want to detach and continue using the context.
As one of my high-school math teachers used to say, clear as mud?
- Danny
Over the course of quite a few posts during the last several weeks I’ve shared source code that adds up to an implementation for EntityBag<T>. Piecing together a project from all those snippets, though, would be a pretty painful task, so I put the whole thing together in a zip file and posted it up at the new MSDN Code Gallery site: http://code.msdn.microsoft.com/entitybag/
Since I finished the series, though, I’ve done a bit more testing on the project and ran into a small bug. ContextSnapshot includes a copy of the connection string from the context which is the initial source of the EntityBag, and it uses that connection string to create its own internal context for tracking changes to the graph. In my initial testing, I always constructed the context by passing a connection string to the constructor, and everything worked fine, but an even more common pattern is to use the parameterless constructor which causes the context to have a connection string of the form: name=”containerName” and the full connection string is actually in the application’s config file. The problem with this is that on the client the config file would not be present. So, we add the following code to the ContextSnapshot.ConnectionString setter after the line which constructs the connection string builder:
if ((csBuilder.Name != null) && (csBuilder.Name != ""))
{
// lookup the actual connection string from the config file
string connectionString = ConfigurationManager.ConnectionStrings[csBuilder.Name].ConnectionString;
csBuilder = new EntityConnectionStringBuilder(connectionString);
}
The project file on code gallery includes this fix. I’ve also had some questions from folks who were having trouble getting a WCF service working using EntityBag, so I included a sample service as well.
Creating an EntityBag-based Service
I’m no WCF expert, but I’ve been able to muddle my way through. It seems like it might be useful to point out some of the trickier steps involved in putting together the sample:
- Known types for the service methods are an issue—especially since EntityBag is generic and can contain a graph of many entity types. This is something which will be addressed automatically in the next release, but for now this is worked around with a static method (GetKnownEntityTypes) and an operation contract attribute which points to it as described in this previous post.
- If you create your entities in a separate DLL like I’ve done in the sample, then you need to copy the connection string from the model.dll’s app.config file to the web.config for the service.
- If you are going to use regular metadata files, you need to make sure they are in the right directory--otherwise, you can embed them as resources, but then you need to deal with a beta 3 bug by building with the metadata not embedded first and then rebuilding set to embed.
- One common and frustrating problem is that it’s easy for the size of the message to grow beyond the default maximum message size. This can be addressed by changing the maxReceivedMessageSize in the app.config for the client and in the web.config for the service.
- Finally, it’s important to make sure the service is set to re-use referenced types. This is turned on by default with VS 2008, but it’s worthwhile to make sure that it is on, and of course you need to reference system.data.entity.dll, perseus and your model on the client before adding the service reference.
Future Directions
With all of this we have a decent proof of concept, but clearly there’s a lot of room for additional improvements. Here are a few ideas both to give you an idea of current limitations and future possibilities:
· It needs to be tested against a wider array of models.
o Entities with complex types will likely present problems because there are currently limitations in the CurrentValues and OriginalValues records on ObjectStateEntry when dealing with properties that are complex types.
o Associations with referential integrity constraints likely need careful ordering which the system doesn’t yet impose.
· There are various limitations due to bugs or not yet implemented features in the Entity Framework.
o Entities used with EntityBag must implement IEntityWithKey. Classes generated by the system implement this interface, but IPOCO classes are not required to implement it. Unfortunately, an API change in beta 3 makes it impossible to do certain operations in a general-purpose component unless the entities in question implement the interface. A future release will remedy the problem, and it will be possible to remove the requirement.
o Currently the system uses reflection to set the EntityKey property on IRelatedEnd when the end in question is an EntityReference. Future releases will remove that requirement. For the most part this isn’t an issue, but it could create partial trust issues—I haven’t yet had a chance to investigate fully.
· The message size could be further optimized.
o Right now the system can’t distinguish between the serialization used when transmitting from the service to the client and that used when transmitting from the client back to the service. This causes unnecessarily large messages because when transmitting to the client you clearly want to include everything, but when transmitting back to the service in most cases you could skip all unchanged entities.
o The component could also take advantage of work underway for the Entity Data Source Control to store and transmit only the minimum set of original values which the system needs rather than a whole copy of the original entity. Of course in some cases you will need the whole entity anyway because business logic may prevent constructing an entity with only this minimum set of properties, but in many cases this would be a useful optimization.
· Additional steps could be taken to strengthen a web service’s contract.
o Events or some other mechanism could be added to allow validation when an EntityBag is passed from the client back to the service for updates.
o Even better, code could be generated to create a stronger contract and just use EntityBag as an internal storage/serialization mechanism.
I’m sure there are other things we could do as well, but that’s probably enough to set you thinking. As always, if you have questions or ideas, please don’t hesitate to share them.
- Danny
Here’s the last piece in the EntityBag saga. RelationshipEntry is a small, DataContract serializeable class which wraps an ObjectStateEntry that represents a pair of related entities. It contains the name of the relationship, the state of the entry, and the key or index of the entity for each end organized by the role of that entity in the relationship. In addition to acting as a serializable container, this class provides methods which simplify applying the relationship to a context.
Fields and Properties
[DataContract]
internal class RelationshipEntry
{
//
// serializable properties
//
[DataMember] string RelationshipName { get; set; }
[DataMember] EntityState State { get; set; }
[DataMember] string Role1 { get; set; }
[DataMember] EntityKey Key1 { get; set; }
[DataMember] int AddedEntityIndex1 { get; set; }
[DataMember] string Role2 { get; set; }
[DataMember] EntityKey Key2 { get; set; }
[DataMember] int AddedEntityIndex2 { get; set; }
//
// non-serializable properties
//
IEntityWithRelationships Entity1 { get; set; }
IEntityWithRelationships Entity2 { get; set; }
Constructor
The interesting part of this constructor is the fact that it requires a Dictionary mapping from EntityKey to an index be passed in containing entries for any entities which are in the added state. This dictionary is created during the construction of the ContextSnapshot and the index is the position with the snapshot’s list of added entities. As described in previous posts, temp keys are unique based on object identity rather than values in the key, so their identity is lost in serialization and we have to use this other mechanism to make sure the identity is preserved.
//
// constructor
//
internal RelationshipEntry(ObjectStateEntry stateEntry, ObjectContext context,
Dictionary<EntityKey,int> addedEntityKeyToIndex)
{
Debug.Assert(stateEntry.IsRelationship);
this.RelationshipName = stateEntry.EdmType().FullName;
this.State = stateEntry.State;
this.Role1 = stateEntry.UsableValues().GetName(0);
this.Key1 = (EntityKey)stateEntry.UsableValues().GetValue(0);
if (context.GetEntityState(this.Key1) == EntityState.Added)
{
this.AddedEntityIndex1 = addedEntityKeyToIndex[this.Key1];
this.Key1 = null;
}
this.Role2 = stateEntry.UsableValues().GetName(1);
this.Key2 = (EntityKey)stateEntry.UsableValues().GetValue(1);
if (context.GetEntityState(this.Key2) == EntityState.Added)
{
this.AddedEntityIndex2 = addedEntityKeyToIndex[this.Key2];
this.Key2 = null;
}
}
ResolveEntitiesAndKeys
This is a private method used in each of the public methods after the relationship entry has deserialized to lookup the keys and entities for the relationship (takes care of the added entity indexes, etc.).
void ResolveEntitiesAndKeys(ObjectContext context, List<IEntityWithKey> addedEntities)
{
if (this.Key1 == null)
{
this.Entity1 = (IEntityWithRelationships)addedEntities[this.AddedEntityIndex1];
this.Key1 = ((IEntityWithKey)this.Entity1).EntityKey;
}
else
{
this.Entity1 = (IEntityWithRelationships)context.GetEntityByKey(this.Key1);
}
if (this.Key2 == null)
{
this.Entity2 = (IEntityWithRelationships)addedEntities[this.AddedEntityIndex2];
this.Key2 = ((IEntityWithKey)this.Entity2).EntityKey;
}
else
{
this.Entity2 = (IEntityWithRelationships)context.GetEntityByKey(this.Key2);
}
}
Methods
The three methods for use by calling classes will attach, add or delete a relationship using the relationship manager on one of the related entities. The framework will automatically handle fixup for the other end.
//
// methods
//
internal void AddRelationship(ObjectContext context, List<IEntityWithKey> addedEntities)
{
Debug.Assert(this.State == EntityState.Added);
ResolveEntitiesAndKeys(context, addedEntities);
IRelatedEnd relatedEnd = this.Entity1.RelationshipManager.GetRelatedEnd(this.RelationshipName,
this.Role2);
if (!relatedEnd.Contains(this.Key2))
{
relatedEnd.Add(this.Entity2);
}
}
internal void AttachRelationship(ObjectContext context)
{
Debug.Assert((this.State == EntityState.Deleted) || (this.State == EntityState.Unchanged));
// Unchanged and deleted relationships cannot involve added entities, so no need for
// addedEntities list.
ResolveEntitiesAndKeys(context, null);
IRelatedEnd relatedEnd = this.Entity1.RelationshipManager.GetRelatedEnd(this.RelationshipName,
this.Role2);
if (!relatedEnd.Contains(this.Key2))
{
relatedEnd.Attach(this.Entity2);
}
}
internal void DeleteRelationship(ObjectContext context)
{
Debug.Assert(this.State == EntityState.Deleted);
// DeletedRelationships cannot involve added entities, so no need for addedEntities list
ResolveEntitiesAndKeys(context, null);
IRelatedEnd relatedEnd = this.Entity1.RelationshipManager.GetRelatedEnd(this.RelationshipName,
this.Role2);
if (relatedEnd.Contains(this.Key2))
{
relatedEnd.Remove(this.Entity2);
}
}
Not too bad is it? I guess this is a testament to the power of decomposing a problem. If each piece is small enough and cohesive enough, the overall story can become much less complicated.
- Danny
For this post, I’m going to set a new personal record for least prose/most code. We’re going to look at the core of ContextSnapshot, and the code includes extensive comments so we’ll mostly let it speak for itself.
Constructors
We have two different constructors because in some scenarios with EntityBag it’s most convenient to construct the snapshot by just passing an ObjectContext, while in others the connection string in the ObjectContext has been cleared, so it’s necessary to explicitly pass the connection string in addition to the context. Regardless of the passed in parameters, the core piece of the constructor creates the lists of entities and relationships by state. The code below makes extensive use of the methods I described previously in my Extension Methods Extravaganza posts and in my posts about creating original values objects.
//
// constructors
//
public ContextSnapshot(ObjectContext context) : this(context, context.Connection.ConnectionString)
{
}
public ContextSnapshot(ObjectContext context, string connectionString)
{
this.ConnectionString = connectionString;
// For unchanged entities we only need the current values.
this.unchangedEntities = new List<IEntityWithKey>(context.GetEntities(EntityState.Unchanged));
// For modified entities we need current plus the original values object along with EntityReferences
// (the addition of references to the original values object here is critical since this is how
// 1-1 and 1-many relationships are serialized for modified entities since it's the original values
// object which is attached on modified entities. In the case of deleted entities, the
// relationships don't exist, and in the cases of added or unchanged entities the current values
// objects bring along the relationships and they are what are added or attached.
this.modifiedEntities = new List<IEntityWithKey>();
this.modifiedOriginalEntities = new List<IEntityWithKey>();
foreach (var entity in context.GetEntities(EntityState.Modified))
{
this.modifiedOriginalEntities.Add((IEntityWithKey)context
.CreateOriginalValuesObjectWithReferences(entity));
this.modifiedEntities.Add(entity);
}
// For deleted, just the original values.
this.deletedEntities = new List<IEntityWithKey>();
foreach (var entity in context.GetEntities(EntityState.Deleted))
{
this.deletedEntities.Add((IEntityWithKey)context.CreateOriginalValuesObject(entity));
}
// For added entities we need the current values, plus we need to make an index mapping from
// the key to where the entity l