Child pages
  • LLVM TableGen
Skip to end of metadata
Go to start of metadata

Introduction

TableGen was originally written to help writing LLVM backends. Currently TableGen is used also by other parts of LLVM, the two major users being the LLVM Target-Independent Code Generator and Clang (for describing frontend diagnostics and attributes). This wiki article focuses on what TableGen was originally designed for: creating LLVM backends.

When creating an LLVM backend, a lot of repetitive code needs to be written to describe a target architecture. For example, when describing an instruction set, the description of an add instruction is almost identical to a description of a sub instruction. TableGen removes the burden of writing repetitious code by generating target code based on high-level descriptions, specified by the developer as .td files.

Using TableGen makes LLVM easier to port and maintain because it reduces the amount of C++ code that needs to be written and the region of code that needs to be changed when writing or refactoring a code generator.

TableGen Overview

TableGen input files (i.e. .td files) are interpreted by the TableGen program llvm-tblgen. (The implementation of TableGen is located at \llvm-master\lib\TableGen with some utils at \llvm-master\utils\TableGen.) Program llvm-tblgen takes the input file as first (optional) argument. If no filename is specified, llvm-tblgen reads input from STDIN.

Besides input file, one needs to specify also a backend to be used with TableGen. Some examples of backend include "CodeEmitter" (for constructing an automated code emitter), "RegisterInfo" (for emitting a description of a target register file for a code generator), and "AsmWriter" (for creating an assembly printer for the current target). Many other backends exist; for a full list, type llc -version.

.

TableGen processes description files essentially in two passes. The first pass is a template/macro preprocessor and the second pass consists of domain specific backend. The output of the first phase is an expanded set of class and record definitions. There may be multiple second passes i.e. domain specific backends.These backends generate .inc files (pure C++ code) which can be included by C++ files.The following figure illustrates the basic operation of TableGen.

 

TableGen Syntax

The input files (i.e. .td files) are specified using TableGen language which is a declarative record-oriented language based on C++ templates.TableGen files consist of two key parts: classes (marked with the class keyword) and definitions (marked with the def keyword). Both classes and definitions are considered records in TableGen nomenclature. Classes are abstract records that describe an entity (e.g. "Register”, “Instruction” or "FPInst") of target domain's code generator or LLVM backend. Definitions are used to instantiate records from the classes. 

TableGen’s syntax includes some automation concepts that facilitate development and reduce amount of code, for example foreach, let and multiclass. let allows derived class or definition to override a value defined by a superclass. Multiclasses are used to describe groups of records that may be instantiated all at once.

TableGen syntax supports both BCPL/C++ -style (// ...) and nestable C-style (/* ... */) comments.

The following toy example (Lopes & Auler: 2014) defines add and sub instructions of an imaginary target architecture. The add instruction has two forms: 1) all operands are registers and 2) operands are a register and an immediate, and the sub instruction only one form 1) all operands are registers.

// File: insns.td
class Insn<bits <4> MajOpc, bit MinOpc> {
    bits<32> insnEncoding;
    let insnEncoding{15-12} = MajOpc;
    let insnEncoding{11} = MinOpc;
}

multiclass RegAndImmInsn<bits <4> opcode> {
    def rr : Insn<opcode, 0>;
    def ri : Insn<opcode, 1>;
}

def SUB : Insn<0x00, 0>;
defm ADD : RegAndImmInsn<0x01>;

 

In this example the class Insn represents a regular instruction and the class RegAndImmInsn multiclass (short for "register and immediate instruction") represents instructions with the two forms mentioned earlier. The next code block (Lopes & Auler: 2014) shows how llvm-tblgen expands these definitions into records. The command used is llvm-tblgen insns.td -print-records .

------------- Classes -----------------
class Insn<bits<4> Insn:MajOpc = { ?, ?, ?, ? }, bit Insn:MinOpc = ?> {
    bits<5> insnEncoding = { Insn:MinOpc, Insn:MajOpc{0},
    Insn:MajOpc{1}, Insn:MajOpc{2}, Insn:MajOpc{3} };
    string NAME = ?;
}

------------- Defs -----------------
def ADDri { // Insn ri
    bits<5> insnEncoding = { 1, 1, 0, 0, 0 };
    string NAME = "ADD";
}

def ADDrr { // Insn rr
    bits<5> insnEncoding = { 0, 1, 0, 0, 0 };
    string NAME = "ADD";
}

def SUB { // Insn
    bits<5> insnEncoding = { 0, 0, 0, 0, 0 };
    string NAME = ?;
}

 

This output shows how the multiclass definition defm expands into two definitions ADDri and ADDrr.

LLVM Backend Overview

This chapter describes shortly the operation of an LLVM backend to illustrate the operating context of TableGen. The main function of an LLVM backend is to translate LLVM Intermediate Representation (IR) to target code (assembly or object code, machine or another language). LLVM backends are a lot more than just code generators. The following figure describes the super-passes of an LLVM backend. Besides the ones listed here, there are also other passes related mostly to code optimization, but those are not illustrated in the figure. The difference between super-passes and regular passes is that super-passes are composed of smaller passes and are critical for compilation while regular passes are nice to have. The figure also shows the current instruction representation after each pass.

 

 

Each super-pass lowers the level of the representation and brings it closer to the target code. For instance, the instruction selection pass matches IR code to the target instruction set specified in the *.td files and creates a DAG of target instructions (in SSA form). The first instruction scheduling pass creates a list of MachineInstr based on the DAG (and target architecture constraints). MachineInstr are still pretty high-level since, unlike MCInst, they eg. contain global variables and use jump tables instead of labels. The register allocation pass drops the number of registers from an infinite amount of virtual ones to the amount given by the architecture.

TableGen is involved with all of these passes. More accurately, when writing an LLVM backend, TableGen is used to describe the following elements of target architecture:

  • Register set
  • Instruction set
  • Selection and conversion of the LLVM IR from a Directed Acyclic Graph (DAG) representation of instructions to native target-specific instructions

References

http://llvm.org/docs/TableGen/index.html

http://llvm.org/docs/WritingAnLLVMBackend.html

http://llvm.org/docs/CodeGenerator.html

http://llvm.org/docs/TableGen/BackEnds.html

http://llvm.org/docs/TableGen/LangIntro.html

http://llvm.org/docs/TableGen/LangRef.html

http://www.aosabook.org/en/llvm.html

Lopes, B. C., and Auler, R. Getting Started With LLVM Core Libraries. Birmigham, UK: Packt Pub, 2014. ISBN 13: 9781782166924. Print.  


 

 

  • No labels