EDMGen2 – Now with Reverse Engineering options

Published 04 April 09 07:46 PM | dpblogs 

Visual Studio 2008 and .Net 3.5 SP1 come with two tools that are capable of supporting a database-first application design scenario: the EdmGen command-line tool and the New ADO.Net Entity Model item template. These two tools produce Entity Framework metadata artifacts from a database instance, but the generated model and mapping have a one-to-one correspondence with the database. Any inheritance or partitioning relationships that may be naturally present in the schema or data of that instance are not represented in the generated model.

The new version of EdmGen2, recently released on Code Gallery, includes an option “RetrofitModel”. This extension is the result of collaboration with the database team at Microsoft Research, notably summer intern Ankit Malpani. When using this option, EdmGen2 uses data mining techniques to determine if there are TPH, TPT, or vertical partitioning relationships that are inherent in the instance. It then generates a suitable model and mapping in the form of csdl, msl, ssdl, and edmx files. The following figure illustrates one of the patterns that EdmGen2 looks for in an instance, and the resulting model:

clip_image002

To use the tool, launch EdmGen2 with the following arguments:

/RetrofitModel "connection string" "provider" "model name"

For instance, the following will connect to a local AdventureWorks database, and create files and an EDM instance named AVWorks:

EdmGen2 /RetrofitModel "Server=(local); Integrated Security=true; Initial Catalog=AdventureWorks;" "System.Data.SqlClient" "AVWorks"

There is an optional final parameter that can be added. The parameter must be a numeric value between 0 and 1. This parameter represents a threshold ratio of elements in a generated child entity to the number of elements in its parent in a hierarchy. If the ratio of any generated inheritance falls below this threshold, the child entity is effectively removed and merged back into the parent. The tool may identify potential inheritance relationships on an insignificant portion of an entity; this parameter allows the user to define what “insignificant” means. If no parameter is specified, a default value of 0.05 is used.

Internally, EdmGen2 uses a sequence of rules that mine the database for patterns in data and schema that provide evidence of inheritance or partitioning relationships. We first use the System.Data.Entity.Design API to generate a default model. We then identify any entities that represent many-to-many associations and turn them into associations. Finally, we run the following rules, in order, to iteratively refine the model:

  1. Look for TPH patterns by looking for a discriminator column. We look for a column with relatively few active values (currently, 6 or fewer), partition the entity on that value, and determine if there are patterns in null values from the partitioning. We do this pattern twice – the first time assumes that a good discriminator column can be found by following associations, and the second time assumes that the discriminator column is found inline.
  2. Look for TPH patterns by looking strictly for patterns in null values, without a discriminator column. We build a frequent-pattern tree out of a table, using the presence or absence of values in each column as the features to identify. This tree can be built from two passes of the data. We use the frequent-pattern tree to determine if there are mutual exclusion relationships between properties in an entity.
  3. Push down associations in a hierarchy. If entity A has an association with entity B, C is a child of B, and all of A’s association set references instances of C, push the association down to C.
  4. Look for TPT and vertical partitioning using associations. Look for any 1-1 associations that remain after the previous steps. If there is a 1-1 relationship between the instances of the entities as well, merge the entities. If not, create an inheritance link.
  5. Check for abstraction. If, for any entity E, the union of all of the instances of E’s children sum up to all of E’s instances, mark E as abstract.

We have defined rules that can identify TPC relationships and horizontal partitioning as well. These rules were excluded from this release to ensure that all generated models could be viewable in the Visual Studio design surface. In the future, if there is sufficient demand, we will add this functionality to the release. We are also interested to see if there are additional patterns that signify inheritance hierarchies in live production databases that are not covered by these rules, and if so, how we can generalize them for automatic identification.

 

- James Terwilliger, Post Doc researcher, MSR

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# EDMGen2 ??? Now with Reverse Engineering options | ASP.NET MVC said on April 5, 2009 2:45 AM:

PingBack from http://aspmvc.co.cc/2009/04/04/edmgen2-%e2%80%93-now-with-reverse-engineering-options/

# Community Blogs said on April 5, 2009 6:09 AM:

URL : http://blogs.msdn.com/adonet/archive/2009/04/04... One of my favorite patterns in the Entity Framework

# Pop Catalin said on April 6, 2009 3:47 PM:

Instead of automatic identification of TPH patterns, I think I'd prefer a functionality in the designer that creates a hierarchy from values of  a user selected column or if the column is a FK the related table values.  

We never use char columns as discriminants for entity hierarchies, we always use integer columns that are foreign keys to other tables that serve as enumerations. (the reasons are FK integrity checks, easier localization, reduced row size (which is very important that it doesn't grow very large as our tables are pretty big, and performance is always an issue)).

Also we have far more complex rules for hierarchies, involving multiple columns. Ex:

Product(Source_ID, Destination_ID, ... ) - Source_ID and Destination_ID reference ProductionLines.

- When Source_ID == Destination_ID the entity stored in the row is a TransformedProduct (which also has more types).

- When Source_ID != Destination_ID the entity stored is a LoadPackage.

(*this model is not supported by EF as a hierarchy)

Sometime the discriminants  are on more columns. For example we have recipe tables, and each line, can be a recipe header (which can also be a processing header a processing group header, a faze header, a chemical product group header etc) , a processing line (witch can also be of many types), a chemical product (of more types...) etc, and the discriminants are on different columns witch are of different types (int, bit) determined by null value of some columns or by column value.

More modeling capabilities are always welcome, but I sincerely doubt that automatic determination of hierarchy as it is, is a real world useful tool or that it brings more value than, let's say an easy way to create a hierarchy or subhierarchy using the designer, using a selected discriminant.

Sincerely,

     Pop Catalin

# Meta-Me said on April 28, 2009 11:37 PM:

Over the weekend in between run throughs that I did prepping for a webcast, I did some catch-up blog

# dsoltesz said on April 29, 2009 9:31 AM:

I would have to agree with pop that this tool needs to support integer (fk) hierachies.

# walter verhoeven said on October 22, 2009 12:38 AM:

In my case the code was generating inheritance  in reverse.

intresting problem

Leave a Comment

(required) 
(optional)
(required) 

  
Enter Code Here: Required

Search

This Blog

Syndication

Page view tracker