Parse

What is Parse?

In computer science to parse is an act of parsing a string or any other text. Parsing is done via a parser. A parser describes a computer program for translating an input into a suitable format for further processing. It first breaks down the information and then outputs it in an orderly system. When translating programming languages, the parser is an important part of the compiler, a program that makes source texts readable for machines. The compiler is still able to detect and correct errors in the code, which is not possible with a simple parser. Therefore, the compiler is mostly used to check the syntax in the source code.

The parser analyzes a given text or source code with the help of a lexer. The lexical scanner not only queries all data, but also breaks it down into tokens. Tokens are characters that the parser understands. These are character strings or input symbols that are assigned a type by the formal grammar. For example, the string 123 is recognized by the parser as a character type number.

What follows next is the actual core task of the parser where it checks the syntax of the input and creates a structure from the data obtained and displays it as a parse tree. This structure is the basis for further processing of the data.

Different types of parsers

There are two different types of parsers, top-down parsers, and bottom-up parsers. The main difference between the two is that they have different start and end points for the structure of the syntax tree.

Top-down parsers: Top-down parsers (e.g. LL and LF parsers) work by deriving from the start symbol to the individual tokens: The analysis runs from the entire source text to the functions and expressions it contains and finally to the tokens contained therein.

Bottom-up parser: With the bottom-up parser (e.g. different LR parsers), processing begins with a token, i.e. a leaf of the tree. By reducing individual tokens, the parser works its way up to larger contexts such as expressions and functions until it reaches the start symbol. With the bottom-up parser, the start symbol signals that the input has been completely analyzed.

Parser generator

With a parser generator it is possible to automatically create an efficient parser for a given lexical system. There are also scanner generators that generate a lexical scanner from a formal description. These tools are used in compiler construction – full-fledged compiler generators are still considered experimental.

Parsers require correctly structured input

A parser usually relies on the inputs complying with a certain syntax. For example, instructions must follow a standardized format in order to be correctly recognized by a parser.

XML, for example, is a widely used markup language that can be used to structure information hierarchically. The format can be read directly by humans and by machines using an XML parser. However, the XML parser only works if the structure is free of errors.

If an unexpected character came first, the syntactic analysis of the entire document could fail. Most parsers report when they encounter incorrect syntax. This not only helps with troubleshooting, but also helps avoid many errors during development.