IgorShare Thoughts and Ideas

Consulting and Training

Oslo: Basic M grammar for a generic Language

Posted by Igor Moochnick on 03/23/2009

I’ve noticed that each time when I need to create yet another language definition or some rules grammar, I write the same (or a very similar) “prelude”. I guess that you do the same thing. To simplify the process I’ve crystallized this basis into a simple module. It contains the definition of a Whitespaces, Single- and Multi-line comments and a commands list. You can reuse this module by importing it into your language definition.

Here is the example of a very simple language:

// This is a comment


/* This is a multi-
line comment */


The grammar is pretty trivial (if you reuse the foundation grammar – note the import in the beginning):

module SomeLanguageModule
    import LanguageBaseModule { Common as C };

    language RulesDefinition
        // Delimiters
        token commandDelimiter = ";";

        token Command = c:(C.AlphaNum | "_")+;

        // The Program
        syntax Main
            = C.CommandList(Command, commandDelimiter);

        // Ignore whitespace
        interleave Whitespace = C.Whitespace | C.Comment | C.MultiLineComment;

The middle section of the grammar  (Command) is where you concentrate your development efforts. Make sure to replace the Command “token” with Command “syntax”.

The Tree result from this grammar and the sample, shown at the top of this post, looks like this:


Download the sample language and the grammar from my SkyDrive.

See the LanguageBaseModule grammar after the break …

This is the LanguageBaseModule. It is as basic as it can get. The more complex and full version of a similar effort you can find in the Oslo SKD (Samples\MGrammar\Languages).

module LanguageBaseModule
    export Common;

    language Common
        token Letter = "a".."z" | "A".."Z";
        token Digit = "0".."9";
        token AlphaNum = Letter | Digit;

        token Number = Digit+;
        // New line
        token LF = "\u000A";
        token CR = "\u000D";

        token NewLine
            = LF
            | CR
            | CR LF

        // Whitespace
        token Whitespace = WhitespaceCharacter+;

        token Space = "\u0020";
        token Tab = ‘\u0009’;
        token WhitespaceCharacter
            = Tab   // Horizontal Tab
            | Space // Space
            | NewLine
        // Comments       
        token CommentDelimitedContent =
            | ‘*’  ^(‘/’)
        token Comment = "//" ^("\r" | "\n")+;
        token MultiLineComment = "/*" CommentDelimitedContent* "*/";

        // Commands list
        syntax CommandList(cmd, commandDelimiter)
            = c:cmd commandDelimiter => [c]
            | c:cmd commandDelimiter l:CommandList(cmd, commandDelimiter) => [c, valuesof(l)];


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: