Evolving an API to support new requirements, like POCO, while maintaining backward compatibility is challenging.

The following design discussion from members of the Object Services team illustrates some of the issues and hard calls involved.

Have a read, and tell us what you think.

In particular are we missing something, or overstating the importance of something? Let us know...

Anyway over to Diego and Mirek...

POCO API Discussion: Snapshot State Management

What is in a snapshot?

In Entity Framework v1 there is a single way for the state manager to learn about changes in entity instances: the change tracking mechanism is set in such a way that the entity instances themselves notify the state manager of any property change.

This works seamlessly if you use default code-generated classes, and it is also part of the IPOCO contract for developers willing to create their own entity class hierarchies.

For version 2, we are currently working on a snapshot-based change tracking mechanism that removes the requirement from classes to send notifications, and enables us to provide full POCO support.

The basic idea with snapshots is that when you start tracking an entity, a copy of its scalar values is made so that at a later point – typically but not always at the moment of saving changes into the database - you can detect whether anything has changed and needs to be persisted.

The challenge

When we created the v1 API, we made a few assumptions that were totally safe with notification-based change tracking but don’t completely prevail in a snapshot change tracking world.

We now need to choose the right set of adjustments for the API for it to gracefully adapt to the new scenarios we want to support.

Mainline scenario: SaveChanges() Method

In notification-based change tracking, by the time SaveChanges is invoked, the required entity state information is ready to use.

With snapshot thought, a property by property comparison needs to be computed for each tracked entity just to know whether it is unchanged or modified.

Once the snapshot comparison has taken place, SaveChanges can proceed normally.

In fact, assuming the process is triggered implicitly on every call to SaveChanges, a typical unit of work in which the program queries and attaches entities, then modifies and adds new entities, and finally persists the changes to the database, works unmodified with POCO classes:

Category category = context.Categories.First();

category.Name = "some new name"; // modify existing entity

Category newCategory = new Category(); // create a new entity

newCategory.ID = 2;

newCategory.Name = "new category";

context.AddObject("Categories", newCategory); // add a new entity

context.SaveChanges(); // detects all changes before saving

Things get more complicated when a user wants to use lower level APIs that deal with state management.

State Manager Public API

ObjectStateManager and ObjectStateEntry classes comprise the APIs that you need to deal with if you want to either get input from, or customize the behavior of Entity Framework’s state management in your own code.

Typically, you use these APIs if you want to:

  • Query for objects that are already loaded into memory
  • Manipulate the state of tracked entities
  • Validate state transitions or data just before persisting to the database
  • Etc.

As the name implies, ObjectStateManager is Entity Framework’s state manager object, which maintains state and original values for each tracked entity and performs identity management.

ObjectStateEntries represent entities and relationships tracked in the state manager. ObjectStateEntry functions as a wrapper around each tracked entity or relationship and allows you to get its current state (Unchanged, Modified, Added, Deleted and Detached) as well as the current and original values of its properties.

Needless to say, the primary client for these APIs is the Entity Framework itself.

ObjectStateEntry.State Property

The fundamental issue with snapshot is exemplified by this property.

With notification-based change tracking, the value for the new state is computed on the fly on each notification, and saved to an internal field. Getting the state later only encompasses reading the state from the internal field.

With snapshot, the state manager no longer gets notifications, and so the actual state at any time depends on the state when the entity began being tracked, whatever state transitions happened, and the current contents of the object.

Example: Use ObjectStateEntry to check the current state of the object.

Category category = context.Categories.First();

category.Name = "some new name"; // modify existing entity

ObjectStateEntry entry = context.ObjectStateManager.GetObjectStateEntry(category);

Console.WriteLine("State of the object: " + entry.State);

Question #1: What are the interesting scenarios for using the state management API in POCO scenarios?

Proposed solutions

In above example there are two possible behaviors and it's not obvious for us which one is better:

Alternative 1: Public ObjectStateManager.DetectChanges() Method

In the first alternative, computation of the snapshot comparison for the whole ObjectStateManager is deferred until a new ObjectStateManager method called DetectChanges is invoked. DetectChanges would iterate through all entities tracked by a state manager and would detects changes in entity's scalar values, references and collections using a snapshot comparison in order to compute the actual state for each ObjectStateEntry.

In the example below, the first time ObjectStateEntry.State is accessed, it returns EntityState.Unchanged. In order to get a current state of the "category" entity, we would need to invoke DetectChanges first:

Category category = context.Categories.First();

category.Name = "some new name"; // modify existing entity

ObjectStateEntry entry = context.ObjectStateManager.GetObjectStateEntry(category);

Console.WriteLine("State of the object: " + entry.State); // Displays "Unchanged"

context.ObjectStateManager.DetectChanges();

Console.WriteLine("State of the object: " + entry.State); // Displays "Modified"

DetectChanges would be implicitly invoked from within ObjectContext.SaveChanges.

Pros:

· User knows when detection of changes is performed

· DetectChanges is a method that user would expect to throw exceptions if some constraint is violated

· This alternative requires minimal changes in Entity Framework current API implementation

Cons:

· Since DetectChanges iterates through all the ObjectStateEntries, it is a potentially expensive method

· User has to remember to explicitly call DetectChanges() before using several methods/properties, otherwise, their result will be inaccurate:

o ObjectStateEntry.State

o ObjectStateEntry.GetModifiedProperties()

o ObjectStateManager.GetObjectStateEntries()

· This alternative implies adding a new method to ObjectStateManager

Alternative 2: Private ObjectStateEntry.DetectChanges() Method

In the second alternative, there is no public DetectChanges method. Instead, the computation of the current state of an individual entity or relationship is deferred until state of the entry is accessed.

Pros:

· User doesn't have to remember explicitly calling DetectChanges to get other APIs to work correctly

· Existing API works as expected in positive cases regardless of notification-based or snapshot tracking

· No new public API is added

Cons:

· The following methods now require additional processing to return accurate results in snapshot:

o ObjectStateEntry.State

o ObjectStateEntry.GetModifiedProperties()

o ObjectStateManager.GetObjectStateEntries()

· Existing API works differently in negative cases in notification-based and snapshot tracking:

o some of the existing methods would start throwing exceptions

or

o would require to introduce a new state of an entry – Invalid state (see below for details)

Question #2: What API pattern is better? Having an explicit method to compute the current state based on the snapshot comparisons or having the state to be computed automatically when accessing the state?

How this affects other APIs

While the State property exemplifies the issue, there are other APIs that would have different behaviors with the two proposed solutions.

ObjectStateEntry.GetModifiedProperties() Method

GetModifiedProperties returns the names of the properties that have been modified in an entity. Similar to the State property, with notification-based change tracking, an internal structure is modified on the fly on each notification. Producing the list later on only encompasses iterating through that structure.

With snapshot, the state manager no longer gets notifications, and at any given point in time, the actual list of modified properties really depends on a comparison between the original value and the current value of each property.

Therefore, for alternative #1, this APIs will potentially return wrong results unless it is invoke immediately after DetectChanges. For alternative #2, the behavior would be always correct.

ObjectStateManager.GetObjectStateEntries()

This is the case in which an implementation detail that was a good idea for notification-based change tracking stops offering performance benefits in snapshot. Internally, ObjectStateManager stores ObjectStateEntires in separate dictionaries depending on their state. But in snapshot, any unchanged or modified entity can become deleted because of a referential integrity constraint, and any unmodified entity can become modified.

In alternative #1, DetectChanges would iterate thought the whole contents of the ObjectSateManager, and thus it would update the internal dictionaries. Once this it done, it becomes safe to do in-memory queries using GetObjectStateEntires the same way it is done today.

In alternative #2, GetObjectStateEntires would need to look in unchanged, modified and deleted storage when asked for deleted entries; also in unchanged and modified storage when asked for modified entities.

Querying with MergeOption.PreserveChanges

In Entity Framework, MergeOptions is a setting that changes the behavior of a query with regards to its effects on the state manager. Of all the possibilities, PreserveChanges requires an entity to be in the Unchanged state in order to overwrite it with values coming from the database. In order for PreserveChanges to work correctly, accurate information on the actual state of entities is needed.

Therefore, with alternative #1, querying with PreserveChanges will not have a correct behavior unless it is done immediately after invoking DetectChanges. For alternative #2, querying with PreserverChanges would behavior correctly at any time.

Referential Integrity constraints

When there is a referential integrity constraint defined in the model, a dependent entity may become deleted if the principal entity or the relationship with the principal entity becomes deleted.

For alternative #1, DetectChanges would also trigger Deletes to propagate downwards throw all referential integrity constraints.

For alternative #2, finding out the current state of an entity that is dependent in a RIC would requires traversing the graph upwards to find if either the principal entity or a relationship has been deleted.

Mixed Mode Change Tracking

At the same time we are working on snapshot change tracking, we are also working on another feature, denominated dynamic proxies. This feature consists of Entity Framework automatically creating derived classes of POCOs that override virtual properties. Overriding properties enables us to inject automatic lazy loading on property getters, and notifications for notification-based change tracking in property setters. This introduces a subtle scenario: when using proxies, it is possible that:

a. Not all properties in the class are declared virtual. The remaining properties need to still be processed using snapshot.

b. Sometimes, non-proxy instances of POCOs and proxy instances of the same type have to coexist in the same ObjectStateManager. For the former, it will have to use snapshot tracking. For thelatter, a combination of snapshot and notifications.

All in all, it becomes clear that the actual state of an entity does not entirely depend on the value of the internal state field, nor on the snapshot comparison, but on a combination of both.

ObjectStateEntry.SetModified()

As with mixed mode change tracking, SetModified() requires a combination of the internal state and the snapshot comparison to return valid results.

Handling of invalid states

When working in notification-based change tracking, Entity Framework throws exceptions as soon it learns that entity key values has been modified. With the default code-generated classes, the exception prevents the change for being accepted.

For alternative #1, DetectChanges can throw exceptions if some model constraint (i.e. key immutability) is violated. It is too late to prevent the value from changing.

Example:

Category category = context.Categories.First();

category.ID = 123;  // Modify a key property. This would throw if category wasn't a POCO class.

context.ObjectStateManager.DetectChanges(); // Throws because key property change was detected.

For alternative #2, reading the state of modified properties from an entity with modified keys could throw an exception:

Example:

Category category = context.Categories.First();

category.ID = 123;  // Modify a key property. This would throw if category wasn't a POCO class.

ObjectStateEntry entry = context.ObjectStateManager.GetObjectStateEntry(category);

Console.WriteLine("State of the object: " + entry.State);

// Throws because key property change was detected.

Getting an exception thrown here would be unexpected. But there is an alternative design that is to define a new EntityState that indicates that an entity is Invalid. This new state would account for the fact that POCO classes per se do not enforce immutable keys.

Since EntityState is a flag enum, Invalid could be potentially combined with other states.

SaveChanges would still need to throw an exception if any entity in the state manager is invalid.

It would be possible to query the state manager for entities in the Invalid state using GetObjectStateEntries method.

Question #3: Is it better to have an Invalid state for entries or should the state manager just throw exceptions immediately every time it finds a change on a key?

Our questions:

1. What are the interesting scenarios for using the state management API in POCO scenarios?

2. What API pattern is better? Having an explicit method to compute the current state based on the snapshot comparisons or having the state to be computed automatically when accessing the state?

3. Is it better to have an Invalid state for entries or should the state manager just throw exceptions immediately every time it finds a change on a key?

---

We really want to hear your thoughts on the above questions.

Alex James
Program Manager,
Entity Framework Team

This post is part of the transparent design exercise in the Entity Framework Team. To understand how it works and how your feedback will be used please look at this post.