Text2Html --------- The TextTransformer project "Text2Html" is used for the automatic transformation of plain text files in HTML files. There is no need for special formatting statements, so that y o u d o n' t r e c o g n i z e, t h a t t h e t e x t i s a c o p y f o r a H T M L d o c u m e n t. Only the blanks in the texts must be put carefully: text sections have to be separated by blank lines and tables must indented correctly and - new - by spaces hidden instructions can be coded. The project is an example application for the TextTransformer and doesn't claim to treat all possible texts correctly. It can be, that incomplete, indented and nested structures aren't analyzed correctly. However, the project can be used for common texts. So all pages of this web site were created with it. Everybody can carry out individual customizations and extensions with the TextTransformer program. To make this easier, the project is explained in detail in the following. - At first a intelligible representation of the analysis of the text documents follows - A more special part in which the technical details are explained follows then. The project can be downloaded here: Text2Html.ttp The text copy for this HTML page is: Text2Html.txt Plain text ---------- Presumably plain texts are still the most spreading way to store texts, mostly in the ASCII or ANSI font. Shown on the screen, they look like texts, which were typed with a typewriter. This is because these files contain nothing further than the letters in binary form, which one can type on a typewriter. The plain text files contain no instructions for the use of different fonts and sizes or or for the drawing of tables etc.. Indeed, meanwhile, almost every computer user has complex word processing software which store the text data in their respective own formats. Just the simplicity and the independence of the software used are the great advantage of plain text files, however. Texts also are much more easier and thus faster to be written, if one doesn't have to pay attention to the respective formatting. HTML text --------- HTML is the file format by which web pages are stored. HTML pages are looking more beautiful than plain texts since headings are represented with bold characters, lists are indented and tables are put into frames. Also HTML files are text files. They contain, however, instructions for the formatting of the text besides the pure text data. Transformation of plain text files in HTML text ----------------------------------------------- The transformation of a text into a HTML document is relatively simple if the text was prepared with a word processing software: only the formatting instructions of the original text must be translated adequately. By the transformation of plain text files in HTML text one faces, however, the problem that the original text contains no explicit details on formatting. There are three solutions for this problem in principle: 1. One adds special syntax elements to the original text which instruct a compiler, how the HTML files shall look like. This procedure makes sense in so far, as the complex possibilities which HTML offers could be reduced to a simplified syntax which suffices for the individual purposes. The Wikipedia is an example of this. However, one then would have to learn this new syntax, one would be bound to it and one would disfigure the original document with that. This isn't aim of this project. (An exeption are the hidden instructions. See below) 2. One prepares a HTML document as a simple copy of the original text. i.e. without formattings. After all, this is already a first approach. Text files can actually be shown on the browser directly often without any manipulations having to be carried out. Who would like to make it better and has seen his original code of a HTML side before could think merely the original text must be included into the (tag) couples: and : original text If you then store the new text with the extender ".html" instead of ".txt" and load it into a browser, you will remark, though, so that all line breaks have been lost and that only the first of a number of blanks is always shown. In addition, it's not guaranteed that special characters are represented correctly and the characters by which the HTML tags are defined are mixing up the advertorial completely. This possibility nevertheless forms the basic scaffolding for the HTML converter under item 3. 3. One develops a scheme to derive a formatting of the new text from the construction and contents of the original text. This is the method that shall be used in the Text2Html project. Only in the case that no formatting is derivable the text shall be represented as originally as possible, like outlined under item 2. How shall the formatting be derivable from the text, however? The answer is already indicated under item 3: from construction and contents. Text construction ----------------- You can recognizes the construction of a text best if you look at it from a distance in which you cannot read it any more. The structure results just from that, what is not text: from its gaps. Gaps result from line breaks, blank lines, blank characters and tabulators. At first the Text2Html project takes the same perspective. A text shall be changed into a HTML document so, that the described structure not only remains unchanged, but is strengthened. A chapter heading shall be represented a little more greatly as the other text and in boldface printing and the regular pattern of a listing or a table shall get accented by additional markings or lines. Text content ------------ A second criterion for the use of certain HTML elements arises from a closer analysis of the text content. E.g. the underlining of links on a HTML page and the possibility of reaching another page by selecting a link are carried out in HTML by putting: "