I have been spending some time recently creating a custom code generator that outputs C++ and C# code from a custom XML format. This blog is about some of what I have learned while working on this.
There is a really cheap way to get a debugging experience for your custom language: #line. With #line, you get to change the debug info that ilasm/csc/vbc/cl writes into the PDB so that the debug info points back to the original source file instead of the one that your code generator creates. This is quite handy if you custom language has at least parts where there is a strong connection between the generated code and the custom language. Notes:
Example:
Generated file:
static void Main(string[] args)
{
#line 1 "HelloWorld.ExampleLanguage"
Console.WriteLine("Hello World");
#line default }
HelloWorld.ExampleLanguage (custom language source file):
Hello World
XML is a great way to do a custom language these days because at no cost, you get a lexer, a parser, syntax validation, a language sevice, and your compiler gets to work with deserialized classes instead of with text. Here is the procedure that I would recommend:
Example target for running xsd.exe: <!--Generate Example.cs using xsd.exe --> <Target Name="GenerateXSDClasses" Inputs="Example.xsd" Outputs="$(IntermediateOutputPath)\Example.cs"> <Exec Command="$(RunManagedToolPath) xsd.exe Example.xsd /classes /fields /namespace:ExampleCompiler /out:$(IntermediateOutputPath)"/> </Target>
<ItemGroup>
<Compile Include="$(IntermediateOutputPath)\Example.cs"/>
</ItemGroup>
You would also need to wire the target into a property group that runs before compiling
Example of embedding the schema as a resource:
<EmbeddedResource Include="Example.xsd" />
Example of using the schema:
public static void InitializeSchema()
if (s_schemaSet != null)
throw new InvalidOperationException();
}
System.Reflection.Assembly ThisAssembly = typeof(MyType).Assembly;
Stream stream = ThisAssembly.GetManifestResourceStream("ExampleCompiler.Example.xsd");
XmlReader schemaDocument = XmlReader.Create(stream);
s_schemaSet = new System.Xml.Schema.XmlSchemaSet();
s_schemaSet.Add("http://schemas.microsoft.com/vstudio/Example/2008", schemaDocument);
s_schemaSet.Compile();
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Document;
settings.Schemas = s_schemaSet;
settings.ValidationEventHandler += MyValidationEventHandler;
settings.ValidationType = ValidationType.Schema;
using (XmlReader reader = XmlReader.Create(filename, settings))
If you are writing a very high quality compiler that you are trying to productize then, when people use your compiler, you would have them wire it into their build process such that they would input your custom language and get back a dll or exe.But this is not necessarily the correct bar for a custom code generator. In my case, I am creating a truly custom compiler. Very few people are going to author the input language, and it doesn’t make sense to expend valuable QA resources directly testing the compiler (rather they would test the generated code). So rather than taking the output of my compiler and directly building these files, I instead checkin the compiler output as a baseline, and the build process runs the custom compiler and compares the output to the baseline. If they differ, it issues a build error.
There are a number of valuable properties that I get out of this 'baseline' approach: