Welcome to Part II of our look at the “Oslo” May CTP.  In Part I, we created a simple domain model for a fictional company’s employee information. In this installment, we will create a domain specific language (DSL) that will take a simple text input and generate model values for our previously created domain model. 

To re-cap, in Part I, we created a module called EmployeeInfo that defined an entity called Employee which has the following definition:

type Employee{
  FirstName: Text#50;
  LastName: Text#50;
  Number:Text#10;
  id : Integer32 => AutoNumber();   
} where identity(id);

EmployeeInfo also defined and extent called Employees which contains zero or more Employee instances.

Now we will use M language support to create a very simple DSL called EmployeeGrammar which describes an Employee.  Below is an example of the EmployeeGrammar:

My name is Edward Brown, my employee number is 364321
My name is Joe Smith, my employee number is 342343

The above input text is placed in a text file called Employee.txt

Now lets create the language used to parse the input text for our language. A grammar is a set of rules which determine if a sequence of characters conform to a language.  The grammar parser will parse the input text based on the rules of the grammar and build a syntax tree which is a hierarchy of data that was parsed.  The syntax tree can be used in a number of ways in your application including storing it (e.g. in the Repository or as XML) or by iterating through the tree in memory at runtime directly in your application.  For this example we will be using the M tool chain to generate the syntax tree as a set of model values and import it into the Oslo Repository.

We can build our grammar in the Oslo Intellipad tool.   For our grammar, first we need to specify the interleave for the language which indicates the rule for which values should be used as whitespaces.  For our language, whitespaces will be defined as spaces, tabs, carriage returns/linefeed and commas.  The interleave command looks like this:

interleave Whitespace = ' ' | '\t' | '\n' | '\r' | ',';

Next we need to define tokens which are used to designate rules that defines the language.  The language parser will try to match the tokens in the input text.  The first token we define will be for the start text for our language which is “My name is”.  The token statement looks like this:

token TkStart = "My name is";

We then define a token for name which can be any letter, lower or uppercase.  A name can be made up of one or more letters.  The definition for the name token looks like this:

token TkName = ("A".."Z" | "a".."z")+;

In the above token definition, the “+” is what indicates that the token is made up of one or more characters.  That takes care of tokens for the name part of the DSL but we still need to specify the tokens for the employee number.  The tokens for employee number section looks like this:

token TkNumStart = "my employee number is";  
token TkEmpNum = ("0".."9")+;

Now that we have the tokens we need to define the syntax of how the tokens should be used.  M language below is the syntax statement that tells how the tokens can be used to parse the input text.  All M language files must have a Main syntax statement which is the starting point for the grammar rules.  Our will look like this:

syntax Main =  EmployeeData*;

The above indicates that the EmployeeData rule should be used for the grammar which means there needs to be a syntax statement for EmployeeData and that there will be one or more lines of the input textto be processed.   

Next we need to define the syntax for EmployeeData which will consist of the start token, a token for first name, a token for last name, a token for the start of the employee num and finally the token for the employee number.  The EmployeeData syntax statement looks like this:

syntax EmployeeData =  TkStart TkName TkName TkNumStart TkEmpNum;

The complete text for our language so far is shown in Intellipad below.  Again we use the module name EmployeeInfo.  Our language will be called EmployeeLanguage. 

image

Save the grammar in a file called EmployeeGrammar.mg.  Now we can test it.  To do this we put Intellipad in MGrammar mode and load up our input text in the Employee.txt file we created above.  With the EmployeeGrammar.mg file loaded in Intellipad, press <CTRL> <SHIFT> T.  Intellipad should prompt you to select the input text file and then it will go into Tree Preview Mode.  The input text will be shown on the left side, the grammar in the middle and the generated syntax tree will be on the right side.  At the bottom will be a window that will display parse errors.  Below is a screen shot of Intellipad in Tree Preview Mode:

image

If you look at the generated syntax tree on the far right for our sample language input text you will notice that it seems rather ambiguous.  It also contains extra text such as the text of the static tokens. To fix this, we will use projections.  Projections specify how the values should be generated from the language input text should look.  The projection operator “=>” is used and immediately followed by the pattern we want to use in place of the default tree structure.

Let start by cleaning up the EmployeeData syntax.  You can reference the values of the tokens by prefacing the token with a <name>:.  In this example we will reference the first name of TkName with f, the last name with l and the employee number as n.  We will then use those references in the projection for instance indicated the first name should be output in the syntax tree as with the text “FirstName =” and the value of f by writing  FirstName => f. The full definition for the EmployeeData syntax should look like this:

syntax EmployeeData =  TkStart f:TkName l:TkName TkNumStart n:TkEmpNum =>{FirstName => f, LastName => l, Number =>n};

Now let’s add a projection to the Main syntax.  The values from the EmployeeData rule will be referenced in the production by the identifier e.  We need to make sure the projection generates the name of our extent Employees from the our previously defined model and we will use the valuesof grammar keyword to remove an extra set of brackets that was being generated in the default projection.  The Main syntax should look like this:

syntax Main =  e:EmployeeData* => Employees{valuesof(e)};

The complete text of our language is shown below:

module EmployeeInfo{
language EmployeeLanguage{
   syntax Main =  e:EmployeeData* => Employees{valuesof(e)};
   syntax EmployeeData =  TkStart f:TkName l:TkName TkNumStart n:TkEmpNum =>{FirstName => f, LastName => l, Number =>n};
   token TkStart = "My name is";
   token TkName = ("A".."Z" | "a".."z")+;
   token TkNumStart = "my employee number is";
   token TkEmpNum = ("0".."9")+;
   interleave Whitespace = ' ' | '\t' | '\n' | '\r' | ',';
  }
}

The generated syntax tree using the added projections looks much cleaner and now maps to our schema definition of Employee.  Here is a screenshot of Intellipad with the new syntax tree displayed:

image

Now let’s use the M tool chain to compile our grammar and generate M values of our input text.  We compile the grammar using the m.exe command, passing it our language file EmployeeGrammar.mg.  The result will be  an mx file called EmployeeGrammar.mx.  We then use the mgx.exe command to generate M values for our input text.  We pass the mgx command a the name of our input text file, use the /r switch to reference the EmployeeGrammar.mx file we generated previously and use the /m switch to tell it that the module generated with the values should be EmployeeInfo to be consistent with the modules we defined previous for our model schema.  The full command will looks like this:

mgx Employee.txt /r:EmployeeGrammar.mx /m:EmployeeInfo

This generates the following M values in a file called Employee.m:

module EmployeeInfo {
    Employees {
         {
            FirstName => "Edward",
            LastName => "Brown",
            Number => "364321"
        },
         {
            FirstName => "Joe",
            LastName => "Smith",
            Number => "342343"
        }
    }
}

We can then compile the newly generated M values from our input text using the m.exe command, which will generate a file called Employee.mx.  To compile the newly generated M values from our input text we need to reference the Employee model schema which we compiled previously which is in the EmployeeModel.mx file.   The command looks like this:

m Employee.m /r:EmployeeModel.mx

And here is a screenshot of the commands executing:

image

That’s it for Part II.  As you can see, creating new domain specific languages is very easy using the Oslo M tools.  In Part III we will install our Employee domain model and values into the “Oslo” Repository and then customize the visual view of our model using “Quadrant”.