I'm still not sure what I think about code generators. This may sound strange, coming from someone who has spent much of the last few years working on and talking up software factories, of which code generation is a significant part - but it's true. On one hand, I love the idea of eliminating manual coding of routine tasks and recurring patterns, improving productivity and minimising bugs. On the other hand, every code generator I've ever worked with has had problems, whether it is in the cost of maintaining the tool and templates, or issues with the generated code.
I like to divide code generators into two categories. The first is the "black magic" type, where you never change, or even look at the generated code. The good thing about this type is that you can re-run the code generator as often as you want without worrying about overwriting any of your changes. The bad thing is that if the generated code isn't exactly what you want, you're in trouble. There are a few ways you can tweak the code without actually changing anything written by the generator, such as using partial classes or inheritance, but your options are always going to be very limited.
The other category is the "one-time accelerator" type, which will spit out code which is hopefully pretty close to what you want, but which will need to be modified by hand to get it exactly right. The advantage of this approach is that you should always eventually be able to get what you want, but it means that you'll have to manually re-apply your changes every time you regenerate. It also means you need to fully understand the generated code, since you're ultimately responsible for maintaining it.
My main quarrel with code generators stems from the fact that we all want the "black magic" type, but in my experience they hardly ever deliver on their promise. The problem is that all too often the generated code just doesn't do what you want. This leads to a few possible outcomes:
One possible explanation as to why these problems are so common is that if you ever do find a problem that can be solved well by "black magic" code generators, you can probably codify the solution in a framework or component library, eliminating the need for any kind code generation whatsoever. The challenges with "black magic" code generation are the reason why patterns & practices software factories generally don't even try that approach. We tried to mitigate the "one-time accelerator" limitations by only generating a small amount of code in one go, but this brings its own set of problems.
This topic is at the front of my mind as code generators have caused a bit of angst in our team lately. We're using a generator (home-grown, but much the same as other solutions you've probably seen) to generate data access layers, stored procedures and business entities. The code that it generates is generally very good (otherwise we wouldn't be using it), but as always it isn't perfect. The biggest problem is that, for any given table, the generator will give you a complete suite of CRUD operations whether you want them or not. For many people this may not be a big deal - but I downright refuse to have code in my solution that is unnecessary and untested. My fear is that if we leave this code in the solution, at some stage some developer will be tempted to call it - and since nobody ever asked for it or tested it, it may be completely unsuitable for the application. So my rule is, if the generator builds something you don't need for your current task (even if it may be needed later), it's not allowed in the solution.
The problem is, since we're using a lot of agile development techniques, we tend to update our database schemas quite a lot. This means that we need to regenerate our data access artifacts a lot as well. To make matters worse, we've also found the need for the occasional tweak to the generated code to make sure it meets our requirements. So the combination of frequent schema changes, my rules about stripping out unneeded code, and the need to hand-tweak the code means that the generation process is fast becoming more trouble than it's worth. I know we could make changes to our generator or use an existing one with more features to get around some of these problems (such as being able to specify and save which operations are generated for each table), but I fear that we'll never get quite where we want to be. But on the other hand I'm concerned that if we stop using a generator for our data access artifacts then we'll face a swag of different problems, such as inconsistent implementations and increased development time.
This is where you can come in and save the day. The first person to explain how to make code generation work well in this situation (preferably without causing any disruption to our team or schedule) gets a six pack of the Aussie beer of their choice. Unfortunately due to customs regulations you'll need to come by to collect - but believe me it will be worth it.
For me, 1. and 3. are definitely not an option. So that leaves option number 2.
* For not creating code there shouldn't be problems :-) - you have to store a list of what has to be generated (or what shouldn't be) somewhere and somehow pass it to code generator at code generating time.
* For tweaks it depends (I don't know the extent or the nature of your tweaks). Perhaps you could implement tweaked methods in a partial class by hand?
Sounds like your complaints aren't necessarily about generated code; but about code you don't really have ownership of. Any external library you use will contain code that you're likely not going to use. If you view the generated code as an external library, unused code is just cost of "reuse".
Short of providing configuration to govern which of the C,R,U, or D methods to generate; I don't see much of a "friendly" alternative. That "configuration" could involve meta-data on the database side (either database-specific properties) or inferred information like if there's a stored procedure named tablename_CreateRow don't generate the create method for that table...
Maybe partial methods in C# 3 might be helpful...
If you're generating partial classes, maybe the generator can analyse the non-generated file and detect if the C,R,U, or D methods already exist (maybe using CodeDOM). Or, maybe test for attributes that configure which of the CRUD methods to generate...
It's too bad a #define declared in one partial class file doesn't get proliferated to the others; otherwise you could simply generate code like:
#if !NO_CREATE_METHOD
public void CreateMethod()
{
}
#endif
and add
#define NO_CREATE_METHOD in the non-generated CS file so the CreateMethod never gets compiled...
Or, what about generating multiple files per class? If you keep each CRUD method in it's own file you can simply not include, or exclude, that CS file in the build...
Being stuck in DB land for a few months now, I have an idea... Maybe, you could expand on your code generator that checks for the existence of a table within your db that explicitly outlines which objects in the database to generate code on, and what methods (CRUD) should be created. If this table doesn't exist, create all ojects and CRUD method...? This way you can dynamically create your code based on the data in this "generator template" table.
This would give you some more control over the areas that you have spoken on that cause you frustration.
This doesn't even have to be a table, it could be an XML file similar to what netTiers uses, but expanding on it to also contains methods that should be created.
I just realized I am echoing Miha's comments... Sorry Miha, but I agree with you whole heartedly.
Life (and software development) is full of trade-offs. I typically prefer the approach of creating a framework and have done so for database access via stored procs. The developers just called the appropriate method (Add, Update, Select, Delete) and provided the name of the proc. The downside/tradeoff, is that in the case of a Select, the method returned a Recordset (this was ADO-days) instead of a pre-defined, strongly-typed object. The upside was that developer only created procs they needed and there were only four methods in the whole system that actually had to interact with the database.
If you are using C# or VB, then code generators can be a joy.
1. If you don't want your code generator to export CRUD operations for certain inputs, then specify that in a config file.
2. Use Partial classes to seperate the auto-generated part from the manually created part. That way you can regenerate as often as you want without fragging your changes.
3. Use Partial methods (C# 3/VB 9) to add hooks everywhere without killing performance.
4. Go all the way. Use application-specific code generators to build lots of stuff, not just simple data classes.
My application relies on a bunch of lookup tables and got tired of repeating the same code over and over again. So I built a simple code generator exports a data class, some caching logic, a single-select search control, and a multi-select search control. Oh, and a bunch of adapters so we can also plug it into our reporting engine.
All this stuff wouldn't make any sense in any other application. But for this one, I can add new lookup tables faster than I can populate a drop-down box.
5. Don't bother trying to reuse code generators from project to project.
Say it takes you you 1 unit of time to code a pattern and you need to do it N time. Alternately, it takes M units of time to create a code generation template for said pattern.
Clearly there is a break point, M < N, where it is cheaper to code the template than to keep repeating the pattern. Watch for those.
Have you tried CodeFluent ?
I think our R&D did put a lot of effort into following up a couple of principles :
- Generating components (not only code) so you should not care too much about generated code at least on the business model tier (it is NOT template-based on this layer)
- Do the proper separation in terms of .NET business classes architecture so you can add custom code or business rules the right way and regenerate your schema as much as you want (with an embedded SQL differential engine)
- Provide extensibility at multiple level so you can preprocess the model or override generation producers to add a specific behaviour into any layer
- Provide template-based generation where it makes sense, to build user interfaces for example
I would be curious about your opinion once you have really tried it.
Feel free to send any questions directly to info@softfluent.com.
I am quite sure we have the right solution, and if not, I would like to understand your issue.
Regards,
Daniel COHEN-ZARDI
Thanks for the comments so far everyone. Peter, I get your point that every framework or library will contain unused code, and it's fair to ask why I'm worried about unused generated code. I guess I see a difference between unused code in System.Globalization.HijriCalendar and in MyApplication.DataAcceess.DeleteAuditLog. The former is obviously not designed for my application (so some kind analysis will be needed to see if it will help), but if you do choose it you can be confident that it's highly tested. The latter looks like it was designed just for my app (so there may be more temptation to call it without thinking carefully), but it may never be appropriate and it may not do what people assume.
I've never been a fan of working with code generators either until I worked with the one developed internally at a company I worked for in Milwaukee (fairly small company at the time).
They seemed to have fine tuned an amazing code generator that really answers most of the issues that I see in common ones today, and they even build a GUI on top you can use to 'configure' your object. This is the general path to create a 'business object'
You open the tool, open a project file, otherwise start a new one and point it at some tables. It derives the relationship. And you can tweak each table allowing too many features that I could even fully explain here, including how to get the identity of a table (auto increment from db, generated from db, generated from client, etc), you can tweak all of the properties, marking fields readonly, write once, writeable, etc. You can add virtual properties which do a lazy call to a db, dynamic or cached; you can add business rules. Some based on field calculation items with the dependencies recorded for which fields are affected, so if you change a different field, it will know what to do. You were able to tell the system what is a 'lock' object, which means you can load the object in a lock or readonly manner - if it is locked, then anyone else trying to load that object in a lockable way will be unable to do so, with timing in place so that ui locks expire after 30 mins, and a MSMQ backend for automation applications to notify the client and steal a lock after a minute (causing the ui app to be unable to save, but the msmq message has already been reported to the user and in 99% of the cases, the user just closes the screen and mitigation is done). The objects have a beginedit/apply/cancel, etc which also cascades to the children .
It really just goes on and on. And the system originally generated a backend dll for a middle tier (dcom) which you could click a box to say if you wanted SPs or raw sql; all generated. And of course any save on the parent lock object would update all the child objects (only those that changed) in one big generated call instead of one by one (always a benefit of code generators). Loadhints were available, similar to LinQ. I see Linq and so many little pieces that were in the old system in use there.
Some of the really incredible stuff though was a BusinessObjectTester, a generic program which would load any of these created objects up and it had a fully dynamic ui to display all of the properties and items, and you could change things, save things, drill into the children, etc. One of my projects was actually extending that to take an XML file config to determine what to show and allowed ways to hide id's for other things, provide ui's that would select from the domain (oh yeah, anything that was a lookup, had a domain object within the object you could use to get all of the lookup object table values - code1, code2, even an image to represent the id;).
And when the program ended, you were pretty much dropped right into its code, and by placing 'blocks' to note where your code is, you could add logic anywhere and it would be preserved. Since there was so much power in the configurator, most changes would not impact any code you created, but if so, you'd just have to review and fix your side, but you would never 'lose' code, and it would put it's code in there that you can just delete as needed.
So all of this, amazing.. all build in VB6 back around 2000. And the middle tier piece was dumped (2 dlls for each object were deployed to a client/server) eventually because the middle tier infrastructure was stale and underpowered.
It's easy to sit here today and think, wow, how do you do security and such? DCOM is so old, vb6, wow ancient...
But as any developer knows, there is nothing in that system that couldn't be updated, but these days I just sit back and hope someone will create something as great in C# :> It was similar to perhaps what the CLSA Project offers today (from what i've heard), and some of the negatives is that at least at that place, it had some things you had to do .. you needed a database with some databases for the codes system, the resource manager (you could always find out who is in your system, what they've locked, and other interesting details);
I remember Many extremely large objects were build, using hundreds of tables and lookups dealing with some very complex environments, and due to that as well, optimizations were made which are the kind that rarely make it into a 3rd party generic code template generator.
So where do I sit today on this? The one thing i see over and over which is rarely present - a UI to configure the framework. People like using XML, and perhaps the config could be put there. - But when you have a system with 100+ tables and lots of properties with different settings, having a ui to bounce in and go to change a field or add things, makes it go so much faster.
This may be too simplistic of an idea, but what about saving a copy of your original generated code, then when you have to regenerate because of a DB change, do a diff against the original generated code and your current codebase. Generate the new code, and apply the changes to the new generated code?
Hope this helps spark an idea that'll work for you.
Take care,
Jeff
Have you had a look at nettiers (www.nettiers.com). This tool has given us minimal disruption after schema changes, and via partial classes allows us to keep our additions to both entities and data access methods. Highly recommended. Has advanced features like processing pipeline, extensible entity vaildation and Deepload / Deepsave for working with fuller object graphs.
Code generators can be useful. And, where they are useful I think that they identify an oportunity for one or a comination of the following:
- Improve the framework. With good methods you can reduce the amount of generated code needed. You can do this yourself.
- Improved tools. Maybe the way that you want to work with the form isn't the way the designers of the form automation tools imagined. This is where the code generator lives. It would be nice if it were easier to hook a code generator into Visual Studio where you need it.
- Improved or alternate languages. Language extensions like LINQ reduce the amount of code need for common tasks. The danger is that over time the language turns into a monster. You could look upon a data file that drives a code generator as a small special purpose alternate language.
Don't generate code, generate the config files and have generic processes to manage record maintenance and other common requirements.
This approach is far more flexible than generating code and if you have very high use or critical routines you can always generate a more focussed or efficient piece of code or call a service.
Warm regards, Mike
Tom Hollander just posted a note Code Generators: Can't live with them, can't live without them . His
Most of the comments are about the specific case of generating the DB mapping code, but don't address the key question: When it is worth creating a code generator as opposed to either (a) writing the code by hand or (b) trying to bake the logic (in this case db mapping) into a framework?
Please see my blog entry that is trying to address this very question: http://blogs.msdn.com/wojtek/archive/2007/11/18/code-generators-when-can-you-live-with-them.aspx
Tom Hollander just posted a note Code Generators: Can't live with them, can't live without them