Introduction(s) to LLVM

11 Jul 2021

This post is my notes while going through LLVM tutorials [1], [2], and [3]. I do not own any of these materials.

LLVM for Grad Students [1]

Writing a Pass

The author provided a template LLVM pass. The meat of this pass is:

virtual bool runOnFunction(Function &F) {
  errs() << "I saw a function called " << F.getName() << "!\n";
  return false;
}

This is a function pass, where the above method is invoked with every function call in the program. The method returns false to indicate that it did not modify F.

Building this will produce a shared library (build/skeleton/libSkeletonPass.so in this case). We will then need to load this library and run some real code. To run the new pass, invoke clang and add the shared library. (This actually did not print for me. Moving on for now…)

$ clang -Xclang -load -Xclang build/skeleton/libSkeletonPass.* something.c
I saw a function called main!

Understanding LLVM IR

The below figure shows the important components of an LLVM program.

IR Instruction

This example instruction %5 = add i32 %4, 2 adds two 32 bit integers. % means register, and 2 is the number 2. See, human readable. Note that LLVM has infinitely many registers. In the compiler, this instruction is represented as a Instruction object. It has an opcode (addition), a type, and a list of operands which are pointers to other Value objects. In this case, the list of operands points to a Constant object 2 and another instruction representing the register &4. Hmm, why is the register an instruction? This is because LLVM IR uses static single assignment (SSA) form, meaning that registers and instructions are actually the same. See question 2.

To generate the LLVM IR, use

$ clang -emit-llvm -S -o - something.c

Example IR

If we change our previous runOnFunction IR to the following, we can (or at least should. I still can’t get this running) see more information about the functions:

virtual bool runOnFunction(Function &F) {
  errs() << "In a function called " << F.getName() << "!\n";

  errs() << "Function body:\n";
  //F.dump(); Out-of-Date: no more dump support in modern llvm unless you enable it at compile time.
  F.print(llvm::errs());

  for (auto &B : F) {
    errs() << "Basic block:\n";
    B.print(llvm::errs(), true);

    for (auto &I : B) {
      errs() << "Instruction: \n";
      I.print(llvm::errs(), true);
      errs() << "\n";
    }
  }

  return false;
}

Another (and perhaps more interesting) Example IR

The below example pass look for patterns in the program and change the code when we find them. Let us say we want to replace the first binary operator (ex. +, -) in every function with a multiplication. Very useful indeed.

The meat of the pass is here:

for (auto& B : F) {
  for (auto& I : B) {
    // Go through each instruction in the function. Check if the instruction is of the type BinaryOperator.
    if (auto* op = dyn_cast<BinaryOperator>(&I)) {
      // Insert at the point where the instruction `op` appears.
      IRBuilder<> builder(op);

      // Make a multiply with the same operands as `op`.
      Value* lhs = op->getOperand(0);
      Value* rhs = op->getOperand(1);
      Value* mul = builder.CreateMul(lhs, rhs);

      // Everywhere the old instruction was used as an operand, use our
      // new multiply instruction instead.
      for (auto& U : op->uses()) {
        User* user = U.getUser();  // A User is anything with operands.
        user->setOperand(U.getOperandNo(), mul);
      }

      // We modified the code.
      return true;
    }
  }
}

Some notes:

Now if we compile the below program:

#include <stdio.h>
int main(int argc, const char** argv) {
    int num;
    scanf("%i", &num);  // read input from stdin
    printf("%i\n", num + 2);
    return 0;
}

We see our pass taking an effect vs. an ordinary compiler:

$ cc example.c
$ ./a.out
10
12
$ clang -Xclang -load -Xclang build/skeleton/libSkeletonPass.so example.c
$ ./a.out
10
20

Linking with Runtime Library

In the previous example, all of our pass logic are written in the LLVM cpp file. We could also write our pass in C and link it with the application. For example, we could have a separate C file rtlib.c:

#include <stdio.h>
void logop(int i) {
  printf("computed: %i\n", i);
}

Then in our LLVM pass code:

// Get the function to call from our runtime library.
LLVMContext& Ctx = F.getContext();
FunctionCallee logFunc = F.getParent()->getOrInsertFunction(
  "logop", Type::getVoidTy(Ctx), Type::getInt32Ty(Ctx)   // add a declaration of the function logop. The definition is in rtlib.c
);

for (auto& B : F) {
  for (auto& I : B) {
    if (auto* op = dyn_cast<BinaryOperator>(&I)) {
      // Insert *after* `op`.
      IRBuilder<> builder(op);
      builder.SetInsertPoint(&B, ++builder.GetInsertPoint());

      // Insert a call to our function.
      Value* args[] = {op};
      builder.CreateCall(logFunc, args);

      return true;
    }
  }
}

To run the application, we need to compile the runtime library rtlib.c and link it:

$ cc -c rtlib.c
$ clang -Xclang -load -Xclang build/skeleton/libSkeletonPass.so -c example.c
$ cc example.o rtlib.o
$ ./a.out
12
computed: 14
14

LLVM Design Decisions [2]

My First Language Frontend with LLVM Tutorial [3]

Questions

  1. “A Value is any data that can be used in a computation”. How do I do computation with a function or an instruction?
  2. What exactly is static single assignment?

Sources

[1] https://www.cs.cornell.edu/~asampson/blog/llvm.html

[2] http://www.aosabook.org/en/llvm.html

[3] https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index.html

[4] https://llvm.org/docs/WritingAnLLVMPass.html