With the TextTransformer project Yacc2TT.ttp Yacc grammars can be taken to a form which is suitable for LL parser generators. A tti file is produced which can be imported directly into the TextTransformer.
There are a lot of grammars written for the parser generator "Yacc" and it would be nice if one could use them for the TextTransformer. However, a conversion of these grammars isn't trivial because TextTransformer uses the LL parsing algorithm and Yacc uses the LR parsing algorithm. These algorithms require a different logical construction of the grammars.
While both algorithms are reading the input from left to right, what the first 'L' in "LL" or "LR" stands for, the match with a grammar rule is tested in a different direction. The matching rule is already found in the LL parser after the recognition of the next token while the LR algorithm mostly needs several tokens, which are then evaluated backwards.
For a given token the LL parser only can find the matching rule, if the rule system (the grammar) is formulated in a manner that there are no alternatives. All alternatives must start with different, nonoverlapping token sets.
The steps which are required in accordance with this principle at conversion of the Yacc grammars, shall now be demonstrated. It is explained at the same time how these transformations are carried out in the TextTransformer.
The following Yacc rule is used as example:
direct_declarator : IDENTIFIER | '(' declarator ')' | direct_declarator '[' constant_expression ']' | direct_declarator '[' ']' | direct_declarator '(' parameter_type_list ')' | direct_declarator '(' identifier_list ')' | direct_declarator '(' ')' ;
This definition is from a grammar C of Jeff Lee for the programming language C.
The grammar of Yacc2TT is derived from "Appendix B: Yacc input syntax" of the Yacc manual. Besides the grammar there is the element in Yacc2TT:
This is a map in which a rule tree is stored to every rule name. The construction of the trees is carried out in the rule production and in the productions called by it. At first the tree for the above example looks like:
T : IDENTIFIER T : '(' NT : declarator T ')' NT : direct_declarator T : '[' NT : constant_expression T : ']' NT : direct_declarator T : '[' T : ']' NT : direct_declarator T : '(' NT : parameter_type_list T : ')' NT : direct_declarator T : '(' NT : identifier_list T : ')' NT : direct_declarator T : '(' T : ')'
"T" and "NT" are the labels for terminals and non-terminals. The tree was produced so that alternatives are expressed by sibbling nodes while child nodes represent the following-relation. These trees are tranformed step by step. Finally the LL-grammar can be produced from them.
The first step to the avoidance of alternatives with the same terminal beginners is to left factor common beginnings. This results in following tree:
T : IDENTIFIER T : '(' NT : declarator T ')' NT : direct_declarator T : '[' NT : constant_expression T : ']' T : ']' T : '(' NT : parameter_type_list T : ')' NT : identifier_list T : ')' T : ')'
For the preparations for the important next step, the elimination of left recursions, alternatives which follow on a common predecessor are summarized to groups at first. Marking nodes are inserted in the tree with the labels "ALT_BEGIN" and "ALT_END". Later this bracketing also makes it easy to use the operators of the EBNF syntax of the TextTransformer, e.g. the repeat operator.
T : IDENTIFIER T : '(' NT : declarator T ')' NT : direct_declarator ALT_BEGIN T : '[' ALT_BEGIN NT : constant_expression T : ']' T : ']' ALT_END T : '(' ALT_BEGIN NT : parameter_type_list T : ')' NT : identifier_list T : ')' T : ')' ALT_END ALT_END
Left recursive rules like:
a = a C | B
are forbidden for LL-parsers. They would end up in infinite loops. However, the rule is logically equivalent with:
a = B C*
where '*' is the repeat operator. In Yacc2TT this remodelling is executed automatically and gives:
ALT_BEGIN T : IDENTIFIER T : '(' NT : declarator T ')' ALT_END OREP_BEGIN T : '[' ALT_BEGIN NT : constant_expression T : ']' T : ']' ALT_END T : '(' ALT_BEGIN NT : parameter_type_list T : ')' NT : identifier_list T : ')' T : ')' ALT_END OREP_END
At rules which contain empty alternatives Yacc2TT still carries out another remodelling. E.g.
a | b |
( a | b )?
Finally an output file is produced by the Yacc2TT project with the remodeled Yacc rules. It should be stored with the extension "tti" and then can be imported directly into the TextTransformer. The "direct_declarator" rule appears now as:
direct_declarator() (> ( IDENTIFIER "(" declarator ")" ) ( "[" ( constant_expression "]" | "]" ) | "(" ( parameter_type_list ")" | identifier_list ")" | ")" ) ) <)
There still are possibilities of improving the rules. Particularly one problem isn't treated by Yacc2TT: although the individual rules have a form suitable for LL parser generators now, there still may be conflicts between the rules. The grammar tests of the TextTransformer show where there are such conflicts. They can relatively easily be removed by hand within the IDE.
With the C grammar is demonstrated how it works.
|to the top|