
EMFText: Concrete Syntax Specification Language (CS)
An EMFText syntax specification must be contained in a file with the extension '.cs' and consists of three main blocks:
1. A configuration block, which contains the name, the basemodel and the root meta class (startsymbol). Optionally other syntaxes and metamodels can be imported and generation options can be specified.
2. A (optional) TOKEN section. Here, tokens for the lexical analyser can be secified.
3. A RULES section, which defines the syntax for each concrete meta class.
In the following sections we will glance at the three main blocks in more detail.
Contents |
Specify which files will contain your synax (required):
SYNTAXDEF fileExtension
Specify the EMF Genmodel that contains the meta classes for which the syntax will be used. The genmodel is found using its URI.
FOR <genModelURI> <locationOfYourGenmodel>
Giving the location of the genmodel is optional. If it is not given, EMFText searches at two different places for the genmodel. First, in the EMF Registry, second, it looks for a genmodel in the folder that contains your .cs file, which has the same name, but the extension '.genmodel'.
The root element (startsymbol) must be an element in the metamodel:
START nameOfAMetaClass
A CS specification can also have multiple root elements, which must be separated by a comma.
Models and syntaxes can be imported in a dedicated import section:
IMPORTS {
// imports go here
}
The list of imports must contain at least contain one entry. If no imports are needed the whole section can be left out. An import entry consists of a prefix that can be used to refer imported elements in productions, the model URI and optionally the name of a concrete syntax defined for that model. If a syntax is imported all its rules are reused and need not to be specified in the current CS specification.
prefix : <genModelURI> <locationOfTheGenmodel> WITH SYNTAX syntaxURI <locationOfTheSyntax>;
The two locations are optional.
As the imports, options are specified in a dedicated optional section:
OPTIONS {
// options go here in the following form:
optionName = "optionValue";
}
The list of valid options can be found in the Javadoc of the enumeration OptionTypes.
EMFText allows to specify an ordered set of token definitions. During runtime, input characters are converted to tokens. They form the smallest unit processed by the generated parser. Note: By default, EMFText implicitly uses a set of standard tokens, namingly TEXT : ('A'..'Z'|'a'..'z'|'0'..'9'|'_'|'-')+, LINEBREAK : ('\r\n'|'\r'|'\n'), WHITESPACE : (' '|'\t'|'\f'). Default tokens can be switched off using the usePredefinedTokens option.
The TOKEN section has the following form:
TOKENS {
// token definition go here in the form:
DEFINE MY_TOKEN $someRegularExpression$;
}
Every token name has to start with a capital letter. A regular expression must conform to the ANTLRv3 syntax for regular expressions without semantic annotations.
Token definitions might be ambiguous, i.e. the languages defined for two different token may be not disjoint. In such cases EMFText will always prefer the token defined first in the specification. By default, the predefined tokens (TEXT, WS and LB) have lower precedence than any defined token. However, they can be given a higher priority be redefining them before other token using the following directive:
PREDEFINED nameOfPredefinedToken ;
To define the default syntax highlighting for a language a special section TOKENSTYLES can be used. For each token or keyword the color and style (BOLD, ITALIC, STRIKETHROUGH, UNDERLINE) can be defined as follows:
TOKENSTYLES {
// show MY_TOKEN in black
"MY_TOKEN" COLOR #000000;
// show keyword public in red and bold font face
"public" COLOR #FF0000, BOLD;
}
For each conrete meta class you can define a syntax rule. The rule specifies what the text that represents instances of the class looks like. Rule have two sides - a left and right-hand side. The left side denotes the name of the meta class, while the right-hand side defines the syntax elements.
The most basic form a rule is:
MyMetaClass ::= "someKeyword" ;
This rule states that whenever the text "someKeyword" is found, an instance of MyMetaClass must be created. Besides text elements that are expected "as is", parts of the syntax can be optional or repeating. For example the syntax rule:
MyMetaClassWithOptionalSyntax ::= ("#")? "someKeyword" ;
states the instances of MyMetaClassWithOptionalSyntax can be represented both by "#someKeyword" and "someKeyword". Similar behavior can be definied using a star instead of a question mark. The syntax enclosed in the paranthesis can then be repeated.
If meta classes have attributes we can also specify syntax for the value of these attributes. To do so simple add brackets after the name of the attribute:
MyMetaClassWithAttributes ::= myAttribute[] ;
Optionally one can specify the name of a token inside the brackets. For example:
MyMetaClassWithAttributes ::= myAttribute[MY_TOKEN] ;
If this name is omitted (as in the first example) EMFText uses the token TEXT, which includes alphanumeric characters. The found text is automatically converted to the type of the attribute. If this conversion is not successfull an error is raised when opening a file containing wrong syntax.
Another possibility to specify the token definition that shall be used to match the text for the attribute value is do it inline. For example
MyMetaClassWithAttributes ::= myAttribute['(',')'] ;
can be used to express that the text for the attribute value must be enclosed in round brackets. Between the brackets arbitrary characters (except the closing bracket) are allowed. Other characters can be used as prefix and suffix here as well.
By default the suffix character (in the example above this was the closing bracket) can not be part of the text for the attribute value. To allow this an escape character needs to be supplied:
MyMetaClassWithAttributes ::= myAttribute['(',')','\'] ;
Here the backslash can be used inside the brackets to escape the closing bracket. It must then also be used to escape itself. For example,
(some text (some more\))
yields the value
some text (somemore)
Meta classes can have references and consequently there is a way to specify syntax for these. EMF distinguished between "containment" and "non-containment" references. In an EMF model, the elements that are references with the former type are contained in the parent elements. EMFText thus expects the syntax of the referenced elements (children) to be also contained in the parent syntax. The latter (non-containment) references are referenced only and are contained in another (parent) element. Analogous to the containment references, EMFText expectes the actual syntax for those elements to be contained in the parent. The referenced elements are represented by some symbolic name only.
A basic example for defining a rule for a meta class that has a containment reference look like this:
MyContainerMetaClass ::= "CONTAINER" myContainmentReference ;
It allows to represent instances of MyContainerMetaClass using the keyword "CONTAINER" followed by one instance of the type that "myContainmentReference" points to. If multiple children need to be contained the following rule can be used:
MyContainerMetaClass ::= "CONTAINER" myContainmentReference* ;
A basic example for defining a rule for a meta class that has a non-containment reference look like this:
MyPointerMetaClass ::= "POINTER" myNonContainmentReference[] ;
The rule is very similar to the one for containment references, but used the additional brackets after the name of the reference. The brackets defined the token that the symbolic name must match. In the case above, the default token (TEXT) is used. So the syntax for an example instance of class MyPointerMetaClass can be:
POINTER a
Since "a" is just a symbolic name that must be resolved to an actual model element, EMFText generates a Java class that implements the interface IReferenceResolver. This class be customized to specify how symbolic names are resolved to model elements. The default implementation of the resolver looks for all model elements that have the correct type (the type of "myNonContainmentReference") and that have a name or id attribute that matches the symbolic name.