Child pages
  • LLVM SSA
Skip to end of metadata
Go to start of metadata

SSA basics

SSA stands for Static Single Assignment. It is a property of program representations (usually IR) designed for enabling various optimizations. Instead of 'normal' variables that can be assigned multiple times, SSA variables can only be assigned once. Such form allows the compiler to do more efficient register allocation because it is easy to approximate how many variables are live at any given point in the program.

Branches in SSA form need special handling because variables can have different values after the branches join depending on which branch was executed. For this purpose SSA form uses the so-called Phi(ϕ) function. The Phi functions are entered by the compiler roughly to the points right before branch dependent values of a variable are used. The precise points are defined for example by using Dominance Frontiers.

Dominance frontier of a CFG (Control Flow Graph) node is the set of nodes that dominate a predecessor of the node but are not necessarily strict dominators of the node. Efficient algorithm for determining the dominance frontiers of nodes was presented by Ron Cytron et al. in the paper http://dl.acm.org/citation.cfm?doid=115372.115320

For more detailed information on SSA refer to

The lecture notes https://wiki.aalto.fi/download/attachments/103747731/slides04.pdf?version=1&modificationDate=1423419233428&api=v2

Wikipedia http://en.wikipedia.org/wiki/Static_single_assignment_form

Constructing SSA in LLVM

The LLVM IR requires SSA form by definition. However compiling c with LLVM / Clang to LLVM IR without optimizations doesn't appear to generate code in SSA form. By default the variables are handled as stack variables by allocating them with 'alloca'. For example:

C
int x = 1;
LLVM IR
%x = alloca i32, align 4
store i32 1, i32* %x, align 4


The trick here is that while register values are required to be in SSA form, memory objects are not required / permitted to be in SSA form. This design allows for compiler frontends (such as clang) to generate LLVM IR without having to perform the potentially complex transformations from source language to strict SSA. Convention which LLVM frontend developers are encouraged to follow is to generate LLVM IR with all mutable variables handled as memory variables. This means each mutable variable becomes a stack allocation. The alloca / store / load is very inefficient way of handling simple local variables and the code generated from such IR is slow. The idea of using such representation is to keep the transformation of the variables to registers in a single highly optimized part of LLVM and allow for frontend developers to generate IR more simply.

In LLVM the transformation from stack variables to register values is performed in optimization passes. Running a mem2reg optimization pass on the IR will transform memory objects to register values whenever possible (or the heuristics say so). The optimization pass is implemented in PromoteMemoryToRegister.cpp which analyzes the BasicBlocks and the alloca instructions for PHINode placement. The PHINode placement is calculated with algorithm by Sreedhar and Gao that has been modified to not use the DJ (Dominator edge, Join edge) graphs. According to Sreedhar and Gao the algorithm is approximately five times faster on average than the Cytron et al. algorithm. The speed gain results from calculating dominance frontiers for only nodes that potentially need phi nodes and well designed data structures.

Curiosity: LLVM used to contain exactly the Cytron et al algorithm in http://llvm.org/docs/doxygen/html/DominanceFrontierImpl_8h_source.html but it's now deprecated.

A good introduction to SSA in LLVM can be found at http://llvm.org/docs/tutorial/OCamlLangImpl7.html

SSA in LLVM IR

Example

Example of generating the IR, with and without phi-nodes, for the code studied in the individual assignment 2. 

example1.c
int main(int argc, char *argv[]) {
  int a = 4, b = 5, c, d, e = 1;
  goto L1;
  
  L1: {
    c = a + b;
    d = c - a;
    if (d) goto L2;
    d = b * d;
    e = e + 1;
  }
  L2: {
    b = a + b;
    e = c - a;
    if (e) goto L1;
    a = b * d;
    b = a - d;
  }
}

IR representation of the example1.c, without phi-nodes presented in example1-no-opt.ll below. This can be created with command clang -S -emit-llvm ./example1.c -o example1-no-opt.ll. Notice, how all the variables (a, b, c, d, e) are kept in the stack memory instead of registers.

example1-no-opt.ll
; ModuleID = './example1.c'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"
; Function Attrs: nounwind ssp uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i8**, align 8
  %a = alloca i32, align 4
  %b = alloca i32, align 4
  %c = alloca i32, align 4
  %d = alloca i32, align 4
  %e = alloca i32, align 4
  store i32 0, i32* %1
  store i32 %argc, i32* %2, align 4
  store i8** %argv, i8*** %3, align 8
  store i32 4, i32* %a, align 4
  store i32 5, i32* %b, align 4
  store i32 1, i32* %e, align 4
  br label %4
; <label>:4                                       ; preds = %29, %0
  %5 = load i32* %a, align 4
  %6 = load i32* %b, align 4
  %7 = add nsw i32 %5, %6
  store i32 %7, i32* %c, align 4
  %8 = load i32* %c, align 4
  %9 = load i32* %a, align 4
  %10 = sub nsw i32 %8, %9
  store i32 %10, i32* %d, align 4
  %11 = load i32* %d, align 4
  %12 = icmp ne i32 %11, 0
  br i1 %12, label %13, label %14
; <label>:13                                      ; preds = %4
  br label %20
; <label>:14                                      ; preds = %4
  %15 = load i32* %b, align 4
  %16 = load i32* %d, align 4
  %17 = mul nsw i32 %15, %16
  store i32 %17, i32* %d, align 4
  %18 = load i32* %e, align 4
  %19 = add nsw i32 %18, 1
  store i32 %19, i32* %e, align 4
  br label %20
; <label>:20                                      ; preds = %14, %13
  %21 = load i32* %a, align 4
  %22 = load i32* %b, align 4
  %23 = add nsw i32 %21, %22
  store i32 %23, i32* %b, align 4
  %24 = load i32* %c, align 4
  %25 = load i32* %a, align 4
  %26 = sub nsw i32 %24, %25
  store i32 %26, i32* %e, align 4
  %27 = load i32* %e, align 4
  %28 = icmp ne i32 %27, 0
  br i1 %28, label %29, label %30
; <label>:29                                      ; preds = %20
  br label %4
; <label>:30                                      ; preds = %20
  %31 = load i32* %b, align 4
  %32 = load i32* %d, align 4
  %33 = mul nsw i32 %31, %32
  store i32 %33, i32* %a, align 4
  %34 = load i32* %a, align 4
  %35 = load i32* %d, align 4
  %36 = sub nsw i32 %34, %35
  store i32 %36, i32* %b, align 4
  %37 = load i32* %1
  ret i32 %37
}
attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)"}

IR representation of the example1.c, with phi-nodes presented in example1-opt.ll below. This can be created with command clang -S -emit-llvm ./example1.c -o /dev/stdout | opt -S -mem2reg -o example1-opt.ll. Notice that all the variables are stored in the registers, unlike in the example above, where stack memory was used. The result is otherwise exactly the same as the one seen in the example solution for the exercise 2, except that there is no optimizations done for the the code. For example, there still exists useless labels (label 5, label 13), and some dead code (line 18).

example1-opt.ll
; ModuleID = '<stdin>'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.10.0"
; Function Attrs: nounwind ssp uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
  br label %1
; <label>:1                                       ; preds = %13, %0
  %b.0 = phi i32 [ 5, %0 ], [ %10, %13 ]
  %e.0 = phi i32 [ 1, %0 ], [ %11, %13 ]
  %2 = add nsw i32 4, %b.0
  %3 = sub nsw i32 %2, 4
  %4 = icmp ne i32 %3, 0
  br i1 %4, label %5, label %6
; <label>:5                                       ; preds = %1
  br label %9
; <label>:6                                       ; preds = %1
  %7 = mul nsw i32 %b.0, %3
  %8 = add nsw i32 %e.0, 1
  br label %9
; <label>:9                                       ; preds = %6, %5
  %d.0 = phi i32 [ %3, %5 ], [ %7, %6 ]
  %10 = add nsw i32 4, %b.0
  %11 = sub nsw i32 %2, 4
  %12 = icmp ne i32 %11, 0
  br i1 %12, label %13, label %14
; <label>:13                                      ; preds = %9
  br label %1
; <label>:14                                      ; preds = %9
  %15 = mul nsw i32 %10, %d.0
  %16 = sub nsw i32 %15, %d.0
  ret i32 0
}
attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)"}

Classes

PromoteMemToReg ( http://llvm.org/docs/doxygen/html/PromoteMemToReg_8h.html )

The mem2reg optimisation pass is implemented in the PromoteMemToReg.cpp file. The mem2reg pass takes the major responsibility of converting IR to SSA representation. PromoteMemToReg exposes an interface to promote alloca instructions to SSA registers, by using the SSA construction algorithm.

PHINode ( http://llvm.org/docs/doxygen/html/classllvm_1_1PHINode.html )

"PHINode - The PHINode class is used to represent the magical mystical PHI node, that can not exist in nature, but can be synthesized in a computer scientist's overactive imagination." [http://llvm.org/docs/doxygen/html/Instructions_8h_source.html]. PHINode is called from PromoteMemToReg, when there exists a value that should be converted to a phi-node.

ScalarReplAggregates ( http://llvm.org/docs/doxygen/html/Transforms_2Scalar_8h_source.html )

ScalarReplAggregate is used to transform the aggregate type (structs, arrays) alloca instructions in the LLVM IR into the SSA form. In practice, ScalarReplAggregates tries to break the alloca instructions of the arrays or structs into a individual instructions and, if possible, turns those individual instructions to phi-nodes.

 

 

  • No labels