by David Matson, Test Engineer for Microsoft code name "Oslo"
The examples in this article work with the May 2009 CTP.
Using the M languages and types features together can be tricky for the uninitiated. I've collected some of the most common issues together with the best current solutions or workarounds.
You are creating your own Domain Specific Language (DSL) that you want to parse against a custom grammar. You want to take the parsed output of your grammar and load it into a Repository. You may also want to have this parsed output conform to a specific schema that you have defined in an external .m file.
For this scenario, assume you have a list of names and birthdays that you want to parse and load into a database.
John Smith 1/1
Jane Doe 5/15
module Samples
{
language Birthdays
token Name = ("a".."z" | "A".."Z")+;
token Month = "1".."9" | "10" | "11" | "12";
token Day = "1".."9" | "10" | "11" | "12" | "13" | "14" | "15" | "16" | "17"
| "18" | "19" | "20" | "21" | "22" | "23" | "24" | "25" | "26" | "27" | "28"
| "29" | "30" | "31";
token FullName = Name " " Name;
syntax BirthdayLine = n:FullName " " m:Month "/" d:Day "\r\n"
=> {Name => n, Month => m, Day => d};
syntax Main = l:BirthdayLine* => Birthdays{valuesof(l)};
}
Birthdays : {
Name : Text;
Month: Integer8;
Day : Integer8;
}*;
1. Save the above files as Sample.birthdays, Birthdays.mg, and BirthdaysSchema.m, respectively.
2. Compile the grammar.
m.exe Birthdays.mg
3. Execute the DSL text against the grammar to produce Sample.m.mgx.exe Sample.birthdays -reference:Birthdays.mx -mmodulename:Samples
4. Try to compile the DSL output together with your schema and produce a SQL script.m.exe BirthdaysSchema.m Sample.m -package:Script
5. You get the following errors:BirthdaysSchema.m(3,3): error M0144: The initial value 'Birthdays { { Name = "John Smith", ... }, ... }' of the extent 'Birthdays' is not compatible with the extent's type 'Collection(anonymous entity { Name; Month; Day; })'.Sample.m(5,22): error M0152: Literal '"1"' cannot be converted to the type 'Integer8'.Sample.m(6,20): error M0152: Literal '"1"' cannot be converted to the type 'Integer8'.Sample.m(10,22): error M0152: Literal '"5"' cannot be converted to the type 'Integer8'.Sample.m(11,20): error M0152: Literal '"15"' cannot be converted to the type 'Integer8'.
MGrammar's parsed output currently only supports the text data type.
Change all schema types to text and leave them that way in the database.
Month: Integer8Text;
Day : Integer8Text;
Use two schemas, one designed for importing data from your DSL and another designed for querying your data in the form you expect. These tables could live in the same database or even two different databases (for example SamplesImport and SamplesQuery). Use your favorite database API (T-SQL scripts, SSIS, ADO.NET via C# or VB, etc.) to write logic to move/transform data from the import format to the query-friendly format.
Create a custom tool that reads your parsed DSL output and changes the data types as necessary. In this example, that would mean removing the quotes after Month and Day in Sample.m. The revised set of commands would be as follows. (For brevity, the remaining examples use the abbreviated aliases for command-line parameters.)
1. m.exe Birthdays.mg
2. mgx.exe Sample.birthdays -r:Birthdays.mx -m:Samples
3. FixMyDataTypes.exe Sample.m (your custom tool that updates Sample.m in place)
4. m.exe BirthdaysSchema.m Sample.m -p:Script
Create a custom tool that replaces mgx.exe and fixes the data types before writing Sample.m. The revised set of commands would be:
2. MyMgxWithDataTypeConversion.exe Sample.birthdays Birthdays.mx Samples
3. m.exe BirthdaysSchema.m Sample.m -p:Script
Below is a skeletal example of how this program could be implemented (for brevity and clarity, no checks are made for errors such as null reference exceptions).
using System;
using System.Collections.Generic;
using System.Dataflow;
using System.IO;
using System.Text;
namespace MyMgxWithDataTypeConversion
class Program
static int Main(string[] args)
if (args.Length != 3)
Console.Error.WriteLine(
"Usage: MyMgxWithDataTypeConversion.exe FileToParse CompiledGrammarMx ModuleName");
return 1;
string fileToParse = args[0];
string compiledGrammarMx = args[1];
string moduleName = args[2];
DynamicParser parser = DynamicParser.LoadFromFile(compiledGrammarMx, null);
parser.GraphBuilder = new NodeGraphBuilder();
Node root = (Node)parser.Parse(fileToParse, null);
GraphStore store = root.Store;
foreach (Edge recordEdge in root.Edges)
Node record = recordEdge.Node;
foreach (Edge field in record.Edges)
string label = field.Label.Text;
if (label == "Month" || label == "Day")
store.SetConstantValue(field.Node, int.Parse((string)field.Node.AtomicValue));
string destinationFile = Path.ChangeExtension(fileToParse, "m");
StringBuilder fullMFile = new StringBuilder();
fullMFile.AppendFormat("module {0} ", moduleName);
fullMFile.AppendLine("{");
fullMFile.Append(root.WriteToString());
fullMFile.AppendLine("}");
File.WriteAllText(destinationFile, fullMFile.ToString());
return 0;
The source code is also available in the download that accompanies this post.
The code above references objects in the System.Dataflow assembly. To compile, you must add a reference to this assembly (located at C:\Program Files\Microsoft Oslo\1.0\bin\System.Dataflow.dll or a similar location).
The basic cookbook for writing an mgx.exe replacement is as follows:
1. Load a parser for the grammar (using DynamicParser.LoadFromFile).
2. Run the parser against an input file (using parser.Parse).
3. Perform some custom actions on the output graph from the parser.
4. Serialize the modified output graph to an .m file.
In this example, the custom step is finding any fields named Month or Day and replacing the contents of each field with a parsed integer instead of the original string. For more information on working with the M graph object model, see the paper by Clemens Szyperski (http://msdn.microsoft.com/en-us/library/dd878360.aspx).
For this scenario, you use mgx.exe to create a .m file with your parsed DSL data. You try to match the DSL data with a schema in another .m file. When you run m.exe with these two files, you get strange error messages or incorrect results.
When you run mgx.exe, make sure you specify the same module name that your schema file uses. For example (when using BirthdaysSchema.m from Common Error #1 above):
mgx.exe Sample.birthdays -r:Birthdays.mx -m:Samples
If you leave off -m, mgx.exe uses the name of the DSL text file as the name of the schema, which usually is not what you want. In this example, m.exe would still run without errors, but the resulting SQL script will contain two schemas: Samples.Birthdays (with your schema from BirthdaysSchema.m) and Sample.Birthdays (with your data from Sample.m). This is an easy mistake to make, so the best practice is always to specify the module name when calling mgx.exe, and double-check that the module name here matches any .m schema files.
New MGrammar users often assign labels to everything. Sometimes, however, a label can do more harm than good. Record entities parsed through MGrammar are usually better left unlabeled, and definitely should not have duplicate labels. The following is a common error based on the Birthdays grammar above:
=> Birthday{Name => n, Month => m, Day => d};
In this case, the output M would be:
module Samples {
Birthdays {
Birthday {
Name => "John Smith",
Month => "1",
Day => "1"
},
Name => "Jane Doe",
Month => "5",
Day => "15"
The highlighted portions result in duplicate labels, which will cause compilation problems. In this case, the specific errors would be as follows:
BirthdaysSchema.m(3,3): error M0144: The initial value 'Birthdays { Birthday { Name = "John Smith", ..., ... } }' of the extent 'Birthdays' is not compatible with the extent's type 'Collection(anonymous entity{ Name; Month; Day; })'.
Sample.m(4,13): error M0187: Cannot initialize a node more than once using '='.
Sample.m(5,13): error M0187: Cannot initialize a node more than once using '='.
Sample.m(6,13): error M0187: Cannot initialize a node more than once using '='.
Do not label record entities (recommended). Or, if you do label record entities, make sure each record has a unique label, for example:
=> id(n){Name => n, Month => m, Day => d};
For this scenario, assume you have a list of managers and their direct reports that you want to parse and load into a database.
Steve manages Bob and Craig
Lisa manages Carol and Cindy
Ray manages Amy
language Managers
syntax ManagerLine = m:Name " manages " n1:(ns1:Name => {Name => ns1}) n2:(" and " ns2:Name => {Name => ns2})* "\r\n" => {Name{m}, DirectReports{n1, valuesof(n2)}};
syntax Main = l:ManagerLine* => Managers{valuesof(l)};
type Manager
Id : Integer32 => AutoNumber();
Name : Text where value.Count <= 100;
DirectReports : Report*;
} where identity Id && value.DirectReports <= Reports;
Managers : Manager*;
type Report
} where identity Id;
Reports : Report*;
1. Save the above files as Sample.managers, Managers.mg, and ManagersSchema.m, respectively.
m.exe Managers.mg
3. Execute the DSL text against the grammar to produce Sample.m.mgx.exe Sample.managers -r:Managers.mx -m:Samples
4. Try to compile the DSL output together with your schema and produce a SQL script.m.exe ManagersSchema.m Sample.m -p:Script
5. You get the following errors:ManagersSchema.m(10,5): error M0144: The initial value 'Managers { { Name { "Steve" }, ... }, ... }' of the extent 'Managers' is not compatible with the extent's type 'Collection(anonymous entity { Id; Name; DirectReports; FieldNames; })'.
Sample.m(3,10): error M0195: To be an entity field, 'DirectReports' must have one and only one successor. (Did you forget an '='?).
Sample.m(16,10): error M0195: To be an entity field, 'DirectReports' must have one and only one successor. (Did you forget an '='?).
Sample.m(32,17): error M0206: To be an entity field, '"Amy"' must have a unique label.
When your DSL text file is parsed to M, the results are as follows:
Managers {
Name {
"Steve"
DirectReports {
Name => "Bob"
Name => "Craig"
"Lisa"
Name => "Carol"
Name => "Cindy"
Name => "Ray",
DirectReports => {
Name => "Amy"
Notice that after DirectReports there is no => operator (or, equivalently, there is no extra set of braces around each record). The correct version of this file would be as follows:
Name => "Steve",
Name => "Lisa",
In order to produce this output, change the following line in your grammar:
syntax ManagerLine = m:Name " manages " n1:(ns1:Name => {Name => ns1}) n2:(" and " ns2:Name => {Name => ns2})* "\r\n" => {Name => m, DirectReports => {n1, valuesof(n2)}};
Note that the grammar must produce a total of three sets of curly braces (or => operators) between DirectReports and the first label/value pair inside it: 1 set of braces (or, in this case, the => operator) to denote the value of DirectReports, 1 set of braces to denote that the value of DirectReports is a collection, and 1 set of braces to denote an entity inside that collection. Also note that each record is unlabeled (see "Common Error #3: Duplicate Labeled Instances" above). Finally, in cases where the output is a valid entity, note that the .m file contains the => operator instead of one set of braces.
In the schema above, the parent (manager) contains the child collection (direct reports). The child (direct report) does not directly reference the parent (manager) via a foreign key. Instead, a many-to-many table is automatically generated to store the links between parents (managers) and children (direct reports).
Live with the many-to-many table in the database.
Use two schemas, one designed for importing data from your DSL (with the many-to-many table) and another designed for querying your data in the form you expect (with a foreign key from direct report to manager). These tables could live in the same database or even two different databases (for example SamplesImport and SamplesQuery). Use your favorite database API (T-SQL scripts, SSIS, ADO.NET via C# or VB, etc.) to write logic to move/transform data from the import format to the query-friendly format.
Create a custom tool that reads your parsed DSL output and changes the shape as necessary to conform to the a schema you prefer, such as:
Manager : Manager;
} where identity Id && value.Manager in Managers;
In this example, your custom tool would need to add labels to each Manager, move the DirectReports to a separate collection off of the root, and add links from each DirectReport back to the correct manager. For the input above, the correct revised M would be:
Steve {
Lisa {
Ray {
Reports {
Name => "Bob",
Manager => Managers.Steve
Name => "Craig",
Name => "Carol",
Manager => Managers.Lisa
Name => "Cindy",
Name => "Amy",
Manager => Managers.Ray
Note that, when using this approach, your sequence of commands would change as follows.
1. m.exe Managers.mg
2. mgx.exe Sample.managers -r:Managers.mx -m:Samples
3. FixMyDataShape.exe Sample.m (your custom tool that updates Sample.m in place)
4. m.exe ManagersSchema2.m Sample.m -p:Script
2. MyMgxWithDataShapeConversion.exe Sample.managers Managers.mx Samples
3. m.exe ManagersSchema2.m Sample.m -p:Script
namespace MyMgxWithDataShapeConversion
"Usage: MyMgxWithDataShapeConversion.exe FileToParse CompiledGrammarMx ModuleName");
Dictionary<string, List<string>> directReports = new Dictionary<string, List<string>>();
// Make the root modifiable (specifically, its edge collection).
root = new Node(root.Store, root.Brand, root.Edges);
string manager = null;
List<string> manages = new List<string>();
Edge? directReportsField = null;
foreach (Edge field in recordEdge.Node.Edges)
if (label == "Name")
// Get the name of the manager.
manager = (string)field.Node.AtomicValue;
else if (label == "DirectReports")
directReportsField = field;
foreach (Edge childRecord in field.Node.Edges)
foreach (Edge childField in childRecord.Node.Edges)
string childLabel = childField.Label.Text;
if (childLabel == "Name")
// Get the name of each direct report.
manages.Add((string)childField.Node.AtomicValue);
if (manager != null && manages.Count > 0)
// Cache the list of managers and the people they manage.
directReports.Add(manager, manages);
if (directReportsField.HasValue)
// Remove the direct reports field from each manager; it will be moved to a separate extent.
recordEdge.Node.Edges.Remove(directReportsField.Value);
if (manager != null)
// Re-write each manager record by adding a brand on the record with the manager name.
root.Store.SetBrand(recordEdge.Node, Identifier.Get(manager));
// Create a list of all records for the new Reports extent.
List<Node> reports = new List<Node>();
foreach (KeyValuePair<string, List<string>> item in directReports)
string manager = item.Key;
foreach (string report in item.Value)
// Create a list of fields for the new record.
List<Edge> fields = new List<Edge>();
// The Name field of the new record to the name of the direct report.
fields.Add(new Edge(Label.Get("Name"), root.Store.CreateConstant(Identifier.Empty, report)));
// The Manager field of the new record is a reference to the Manager's record (the manager name is
// the local record label, the extent is Managers).
fields.Add(new Edge(Label.Get("Manager"), new Node(root.Store, new GraphReference(new string[] {
"Managers", manager }))));
reports.Add(new Node(root.Store, fields));
// Turn the list of report records into a new extent node.
Node extraExtent = root.Store.CreateCollection(Identifier.Get("Reports"), NodeKind.Collection, reports);
// Save the graph to an m file. Use the name of the original text file as the name for the new m file.
fullMFile.Append(extraExtent.WriteToString());
In this example, the custom steps are adding labels to each manager and moving direct reports to a separate extent. For more information on working with the M graph object model, see the paper by Clemens Szyperski (http://msdn.microsoft.com/en-us/library/dd878360.aspx).
In many scenarios, the shape of the data in the input DSL does not match the shape you desire for the data stored in the Repository.
For example, suppose you have a flat list of transactions (date, amount) in your DSL:
1/1/2000 -100
2/1/2000 1000
2/1/2000 100
Suppose you want to populate the following schema:
type Summary
Date : DateTime;
Amount : Decimal19;
} where identity Date;
Summaries : Summary*;
In your schema, you want to have a total of transactions for each day (grouping and summing).
Currently, MSchema does not provide any facility to do such reshaping between the input text and the output graph. (Also note that parsing directly to DateTime or Decimal19 rather than Text is not currently supported. See "Common Error #1: Getting Non-Text Output from MGrammar").
The workarounds for these scenarios are similar to those for "Common Error #4: Populating Parent/Child Data" above. Essentially, you can either forgo the transformation, or do the transformation yourself using custom tools at various layers of the process (during parsing, after parsing but before populating the Repository, or after populating the Repository).
For this scenario, assume you are parsing transaction lines (account, date, amount). Assume you have a static list of accounts in your schema.
Checking 1/1/2000 -100
Checking 2/1/2000 1000
Savings 2/1/2000 100
language Transactions
token Year = ("0".."9")+;
token Date = Month "/" Day "/" Year;
token Amount = "-"? ("0".."9")+;
syntax TransactionLine = n:Name " " d:Date " " a:Amount "\r\n"
=> {Account => n, Date => d, Amount => a};
syntax Main = l:TransactionLine* => Transactions{valuesof(l)};
type Account : {
Accounts : Account* {
Checking {
Name => "Checking"
Savings {
Name => "Savings"
type Transaction : {
Account : Account;
} where identity Id && value.Account in Accounts;
Transactions : Transaction*;
MGrammar does not currently provide any way to lookup up data from an existing extent and dynamically create a label reference to that record. (Also note that parsing directly to Decimal19 or DateTime rather than Text is not currently supported. See "Common Error #1: Getting Non-Text Output from MGrammar").
Forgo the lookup and change your schema to something like the following:
Account : Text where value.Count <= 100;
Using this approach, you would just store the text value used for lookup directly in the referencing table (instead of storing a foreign key).
Use two schemas, the schema from Workaround1 for importing data from your DSL (without the foreign key relationship) and another designed for querying your data in the form you expect (with a foreign key relationship). These tables could live in the same database or even two different databases (for example SamplesImport and SamplesQuery). Use your favorite database API (T-SQL scripts, SSIS, ADO.NET via C# or VB, etc.) to write logic to move/transform data from the import format to the query-friendly format.
Create a custom tool that reads your parsed DSL output and changes the shape as necessary to conform to the a schema you prefer.
In this example, your custom tool would need change the Account name into a label reference (from "Checking" to Accounts.Checking). For the input above, the correct revised M would be:
Transactions {
Account => Accounts.Checking,
Date => 2000-01-01,
Amount => -100
Date => 2000-02-01,
Amount => 1000
Account => Accounts.Savings,
Amount => 100
1. m.exe Transactions.mg
2. mgx.exe Sample.transactions -r:Transactions.mx -m:Samples
3. AddLabelReferences.exe Sample.m (your custom tool that updates Sample.m in place)
4. m.exe TransactionsSchema.m Sample.m -p:Script
Create a custom tool that replaces mgx.exe and adds the label references before writing Sample.m. The revised set of commands would be:
2. MyMgxWithLabelReferences.exe Sample.transactions Transactions.mx Samples
3. m.exe Transactions TransactionsSchema.m Sample.m -p:Script
using System.Globalization;
using System.Reflection;
namespace MyMgxWithLabelReferences
"Usage: MyMgxWithLabelReferences.exe FileToParse CompiledGrammarMx ModuleName");
Edge? date = null;
Edge? account = null;
if (label == "Date")
date = new Edge(field.Label, new Node(store, Identifier.Empty, Date.Parse((
string)field.Node.AtomicValue)));
else if (label == "Amount")
store.SetConstantValue(field.Node, decimal.Parse((string)field.Node.AtomicValue));
else if (label == "Account")
string name = (string)field.Node.AtomicValue;
account = new Edge(field.Label, new Node(store, new GraphReference(new string[] { "Accounts",
name })));
if (date.HasValue)
store.SetEdge(record, date.Value.Label, date.Value.Node);
if (account.HasValue)
store.SetEdge(record, account.Value.Label, account.Value.Node);
class FixedGraphTextWriter : GraphTextWriter
TextWriter writer;
public FixedGraphTextWriter(TextWriter writer)
: base(writer)
this.writer = writer;
public override void Write(GraphStreamTokenKind tokenKind, Label label, Identifier brand, object value)
if (tokenKind == GraphStreamTokenKind.Constant && value is Date)
// Work around an issue with the serialization of dates.
string correctlyFormatedValue = ((Date)value).ToString("yyyy-MM-dd", CultureInfo.InvariantCulture);
typeof(GraphTextWriter).GetMethod("Write", BindingFlags.NonPublic | BindingFlags.Instance, null, new
Type[] { typeof(GraphStreamTokenKind), typeof(Label), typeof(Identifier), typeof(string), typeof(
char)}, null).Invoke(this, new object[] { tokenKind, label, brand, (object)correctlyFormatedValue,
'\0' });
else
base.Write(tokenKind, label, brand, value);
In this example, the custom steps are 1) finding any fields named Date or Amount and replacing the contents of the field with a parsed version of the original string, and 2) changing the transaction name from text to a record reference . For more information on working with the M graph object model, see the paper by Clemens Szyperski (http://msdn.microsoft.com/en-us/library/dd878360.aspx).