Recognition of Tokens in Compiler Design - BtechVibes
Recognition of Tokens in Compiler Design is a crucial step. It breaks down the source code into understandable parts, like words and symbols. This helps the compiler understand the code's structure and meaning.
Assume the following grammar fragment to generate a specific language
Recognition of Tokens
- Tokens obtained during lexical analysis are recognized by Finite Automata.
- Finite Automata (FA) is a simple idealized machine that can be used to recognize patterns within input taken from a character set or alphabet (denoted as C). The primary task of an FA is to accept or reject an input based on whether the defined pattern occurs within the input.
- There are two notations for representing Finite Automata. They are:
- Transition Table
- Transition Diagram
1. Transition Table
It is a tabular representation that lists all possible transitions for each state and input symbol combination.
EXAMPLEAssume the following grammar fragment to generate a specific language
where the terminals if, then, else, relop, id and num generates sets of strings given by following regular definitions.
- where letter and digits are defined as - (letter → [A-Z a-z] & digit → [0-9])
- For this language, the lexical analyzer will recognize the keywords if, then, and else, as well as lexemes that match the patterns for relop, id, and number.
- To simplify matters, we make the common assumption that keywords are also reserved words: that is they cannot be used as identifiers.
- The num represents the unsigned integer and real numbers of Pascal.
- In addition, we assume lexemes are separated by white space, consisting of nonnull sequences of blanks, tabs, and newlines.
- Our lexical analyzer will strip out white space. It will do so by comparing a string against the regular definition ws, below.
- If a match for ws is found, the lexical analyzer does not return a token to the parser.
- It is the following token that gets returned to the parser.
2. Transition Diagram
It is a directed labeled graph consisting of nodes and edges. Nodes represent states, while edges represent state transitions.
Components of Transition Diagram
- One state is labelled the Start State. It is the initial state of transition diagram where control resides when we begin to recognize a token.
- Position is a transition diagram are drawn as circles and are called states.
- The states are connected by Arrows called edges. Labels on edges are indicating the input characters
- Zero or more final states or Accepting states are represented by double circle in which the tokens has been found.
- Example:
- Where state "1" is initial state and state 3 is final state.
Here is the transition diagram of Finite Automata that recognizes the lexemes matching the token relop.
Here is the Finite Automata Transition Diagram for the Identifiers and Keywords.
Here is the Finite Automata Transition Diagram for recognizing white spaces.
Note:
These Finite Automata can be constructed using either the transition diagram or the transition table representation. Both transition diagrams and transition tables serve the same purpose of defining and representing the behavior of an FA. They provide different visual and structural representations, allowing designers to choose the format that best suits their preferences or requirements.