|Delphi pretty printer|
The beginnings of the Delphi.ttp project for the development of a parser for the programming language Delphi go back to the beginnings of the development of TextTransformer. The real aim was to create a translator to C++. Such a translator could make it easier for a C++ developer to adapt Delphi components to his programs. In the end, also TextTransformer would benefit from such a translator, since much such components are used in it.
The greatest difficulty at the development of a Delphi parser is, that there isn't any official standard for this programming language. The specification of the language was primarily taken from the help of the Delphi development environment. The information is scattered there fragmentarily, remains altogether incomplete and non-uniform. Moreover, Delphi permanently was developed further.
In the meantime, there is an excellent attempt to represent the Delphi specification: the "reference guide" to "Free Pascal".
There still are discrepancies in details. The parser project nevertheless was revised with this reference again, particularly the namings were adapted as far as possible. However, there is no identity, not at least, because the rules in the TextTransformer parser were made for efficiency reasons LL(1), if possible.
The parser presented here might approximately represent the state of Delphi 5. It is able to parse the complete VCL which is part of the CBuilder 6. It presupposes that the code to be parsed is syntactically correct. It cannot be used to check the correctness, because it also allows constructions which the Delphi compiler would reject.
Dr. Hans-Peter Diettrich has supported me at the development of the parser for a while; therefore my thanks!
The project options for the parser/scanner differ from the standard settings in three points:
The use of the preprocessor permits to test the parser directly at the VCL.
The second point takes into account simply the language definition of Delphi.
The last point is necessary, because in Delphi also keywords like "index" or "end" may be used as names for variables and parameters. The disadvantage of this option is that unexpected keywords can be recognized as identifiers, what makes it more difficult, to find errors in the parser.
Ther are three styles for comments in Delphi:
In Delphi.ttp, these three forms are combined into one "COMMENT" production, which is set in the project options as an inclusion. A nesting of the comments wasn't allowed because there are cases in the VCL in which appropriately required closing brackets are missing. However, the occurrence of one kind of comment within another one is permitted. (The latter isn't the case for the alternatively usable regular expression "IGNORE", which was left in the project but not used.)
When the project is compiled, there are some warnings of the kind, that certain tokens are start and successor of deletable structures. They result from the treatment of the semicolon. Semicolons divide both whole structures and sub-structures in Delphi, however, aren't always a necessary part of these structures. In this TextTransformer project the difficult decision, when the semicolons is necessary and when not are fairly often avoided by assuming their occurrence as optional; either as isolated optional tokens or as an alternative in repetitions. Since the parser presupposes that the code is correct, necessary semicolons are always recognized by the optional expressions. The token after the semicolon then must be able to decide on the further course of parsing. However, these nullable optional semicolons are causing mentioned warnings.
An example of a correct but complicated treatment of the semicolon is the production "stmt_list". This production specifies a list of instructions. In accordance with the method above it simply could be formulated in the following way:
All tokens, by which an instruction "stmt" can start and end are causing warnings then.
To avoid these warnings it can be tested with the look-ahead "is_end_of_stmt_list" whether a token follows the semicolon, by which lists of instructions are finished. A semicolon is not needed before one of the key words "end", "except", "finalization", "finally", and "until".
|stmt_list ::=||";" stmt_list_not_empty? | stmt_list_not_empty|
|stmt_list_not_empty ::=||stmt ( IF(!is_end_of_stmt_list()) ";" stmt ELSE ";" END )*|
There are some attempts to represent the Delphi grammar, which haven't prospered far. But the following seems worth mentioning:
However, this grammar is written for a parser generator "GenCot", apparently of their own and in some cases additions of handwritten code is needed, which isn't published.
There are more grammars at:
It was mentioned already above that the parser can parse the complete VCL belonging to the CBuilder 6. In addition it passes successfully a test suite, which was derived from from the Free Pascal tests.
The following changes were carried out at the original test suite:
Last update: 12/31/09
1.1.6 local options of the comment productions corrected, such that no characters are ignored any more
1.1.0 : passes successfully a test suite.
1.0.2 : A fault in the regular expression of CTRL_CHAR had hidden a conflict. The grammar for types was, revised. Now the conflict is avoided and in addition a look-ahead is removed.
|to the top|