TableGen was originally written to help writing LLVM backends. Currently TableGen is used also by other parts of LLVM, the two major users being the LLVM Target-Independent Code Generator and Clang (for describing frontend diagnostics and attributes). This wiki article focuses on what TableGen was originally designed for: creating LLVM backends.
When creating an LLVM backend, a lot of repetitive code needs to be written to describe a target architecture. For example, when describing an instruction set, the description of an
add instruction is almost identical to a description of a
sub instruction. TableGen removes the burden of writing repetitious code by generating target code based on high-level descriptions, specified by the developer as
Using TableGen makes LLVM easier to port and maintain because it reduces the amount of C++ code that needs to be written and the region of code that needs to be changed when writing or refactoring a code generator.
TableGen input files (i.e.
.td files) are interpreted by the TableGen program
llvm-tblgen. (The implementation of TableGen is located at
\llvm-master\lib\TableGen with some utils at
llvm-tblgen takes the input file as first (optional) argument. If no filename is specified,
llvm-tblgen reads input from
Besides input file, one needs to specify also a backend to be used with TableGen. Some examples of backend include "CodeEmitter" (for constructing an automated code emitter), "RegisterInfo" (for emitting a description of a target register file for a code generator), and "AsmWriter" (for creating an assembly printer for the current target). Many other backends exist; for a full list, type
TableGen processes description files essentially in two passes. The first pass is a template/macro preprocessor and the second pass consists of domain specific backend. The output of the first phase is an expanded set of class and record definitions. There may be multiple second passes i.e. domain specific backends.These backends generate
.inc files (pure C++ code) which can be included by C++ files.The following figure illustrates the basic operation of TableGen.
The input files (i.e.
.td files) are specified using TableGen language which is a declarative record-oriented language based on C++ templates.TableGen files consist of two key parts: classes (marked with the
class keyword) and definitions (marked with the
def keyword). Both classes and definitions are considered records in TableGen nomenclature. Classes are abstract records that describe an entity (e.g. "Register”, “Instruction” or "FPInst") of target domain's code generator or LLVM backend. Definitions are used to instantiate records from the classes.
TableGen’s syntax includes some automation concepts that facilitate development and reduce amount of code, for example
let allows derived class or definition to override a value defined by a superclass. Multiclasses are used to describe groups of records that may be instantiated all at once.
TableGen syntax supports both BCPL/C++ -style (// ...) and nestable C-style (/* ... */) comments.
The following toy example (Lopes & Auler: 2014) defines
sub instructions of an imaginary target architecture. The
add instruction has two forms: 1) all operands are registers and 2) operands are a register and an immediate, and the
sub instruction only one form 1) all operands are registers.
In this example the class
Insn represents a regular instruction and the class
RegAndImmInsn multiclass (short for "register and immediate instruction") represents instructions with the two forms mentioned earlier. The next code block (Lopes & Auler: 2014) shows how
llvm-tblgen expands these definitions into records. The command used is
This output shows how the multiclass definition
defm expands into two definitions
LLVM Backend Overview
This chapter describes shortly the operation of an LLVM backend to illustrate the operating context of TableGen. The main function of an LLVM backend is to translate LLVM Intermediate Representation (IR) to target code (assembly or object code, machine or another language). LLVM backends are a lot more than just code generators. The following figure describes the super-passes of an LLVM backend. Besides the ones listed here, there are also other passes related mostly to code optimization, but those are not illustrated in the figure. The difference between super-passes and regular passes is that super-passes are composed of smaller passes and are critical for compilation while regular passes are nice to have. The figure also shows the current instruction representation after each pass.
Each super-pass lowers the level of the representation and brings it closer to the target code. For instance, the instruction selection pass matches IR code to the target instruction set specified in the
*.td files and creates a DAG of target instructions (in SSA form). The first instruction scheduling pass creates a list of
MachineInstr based on the DAG (and target architecture constraints).
MachineInstr are still pretty high-level since, unlike
MCInst, they eg. contain global variables and use jump tables instead of labels. The register allocation pass drops the number of registers from an infinite amount of virtual ones to the amount given by the architecture.
TableGen is involved with all of these passes. More accurately, when writing an LLVM backend, TableGen is used to describe the following elements of target architecture:
- Register set
- Instruction set
- Selection and conversion of the LLVM IR from a Directed Acyclic Graph (DAG) representation of instructions to native target-specific instructions
Lopes, B. C., and Auler, R. Getting Started With LLVM Core Libraries. Birmigham, UK: Packt Pub, 2014. ISBN 13: 9781782166924. Print.