Single Line Replacement via LLVM

This document describes how to run the example code that replaces a single line of LLVM which does addition into multiplication.

Running the Example

The code for a single line replacement is found in the codebase in decompy/llvmtransform/ExampleSingleLineTransformation/Transform.cpp. To run the example, run ./RunMeInThisDirectory.sh from that directory, and it will generate example.bc, example.ll, example_transformed.ll, and Transform.out.

What’s In the Example

The two .ll files are the ones that show the difference before and after Transform is run. Line 12 in example.ll is %5 = add nsw i32 %4, 3, and in example_transformed.ll it is %5 = mul i32 %4, 3. %5 is just a variable created by the compiler. add and mul are binary operators.

BinaryOperator is a child class of Instruction, which is a child class of User, which is a child class of Value. Value is the most important class in the LLVM source base, and more information can be found about it at http://llvm.org/docs/ProgrammersManual.html#the-value-class. For this document, only BinaryOperator and Instruction will be discussed.

After the binary operator, example.ll has nsw, which stands for something along the lines of No Sign Wrap. Then there is i32, for 32 bit operation, %4 for another variable, and 3 for a constant.

Going Through Transform.cpp

Header

The only header is Transform.hpp, which contains a bunch of LLVM includes which are helpful. I lost track of what does what, and probably put too many includes in that file.

using namespace llvm

Using this namespace just saves me a lot of trouble because otherwise I’d have to type llvm a lot.

void print()

Just a nice function so I don’t have to type std::cout << someString << std::endl everytime I want to see something.

int main()

The fun stuff

LLVMContext context

I don’t know how to describe it, it’s just important. I use it later to read in LLVM bitcode.

SMDiagnostic error

Also no idea what this does. This is also needed to read in LLVM bitcode.

Instruction \*fromInstruction, \*toInstruction

These are placeholders for instructions that will be replaced later. I’m sure there’s a better way than what I’m doing it, but it works without breaking anything.

std::unique_ptr<Module> module = parseIRFile("example.bc", error, context);

Reads in the bitcode file so I can manipulate it.

for (Module::iterator function = module->begin(); function != module->end(); function++)

Iterator for functions within a module. It can be more compact written as for (auto &function : module), but I feel like the way I wrote it makes it more obvious what’s going on.

for (Function::iterator basicBlock = function->begin(); basicBlock != function->end(); basicBlock++)

Same as the other for loop, but for BasicBlocks

basicBlock->print(errs());

LLVM’s way to print information to the terminal.

for (BasicBlock::iterator instruction = basicBlock->begin(); instruction != basicBlock->end(); instruction++)

Same as the other two for loops, just for Instructions this time.

if (isa<BinaryOperator>(instruction))

The basically checks if Instruction “is a” BinaryOperator. Important for this example because addition is a binary operator, and I want to change all binary operations to multiply.

BinaryOperator *binOp = dyn_cast<BinaryOperator>(instruction);

Casts instruction (which is class Instruction) to BinaryOperator, which is a child class of Instruction. dyn_cast is LLVM’s way of safely casting an object of one type to another. This line and the previous line can be merges int if (BinaryOperator *binOp = dyn_cast<BinaryOperator> (instruction)), but I thought the way I did it makes it bit more clear.

instruction->print(errs());

prints instruction to terminal

fromInstruction = binOp;

Since I know that the binOp is an instruction I want to replace, I save the pointer to it so I can replace it outside the loop. Attempting to change it inside the loop causes segmentation faults, probably because the iterator breaks.

toInstruction = BinaryOperator::Create(

Instruction::BinaryOps::Mul,

instruction->getOperand(0),

instruction->getOperand(1)

);

Creates and stores an instruction that will replace binOp later. It takes in three parameters: BinaryOps, which is an enum, and two operands. A list of BinaryOps can be found at https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/Instruction.def. The two operands are taken from binOps (or instruction, they’re really the same thing, one is just cast into the other) to create the new multiplication instruction. The new instruction is just stored because it can’t replace the current instruction yet.

ReplaceInstWithInstr(fromInstruction, toInstruction);

This does the actual replacing of instructions. It takes care of keeping replacing, keeping the lvalue in place, deleting the old instruction, deallocating memory, etc.

Conclusion

It’s pretty straightforward how to replace instructions once I read through dozens of documentation pages, StackOverflow pages, lecture notes and slides, and miscellaneous websites. Basically, iterate through instructions until you find one that you want to replace, save it to a place outside of the scope of the iterator, create and save the replacement instruction so that it’s also out of the scope of the iterator, then once the iterator is out of scope, replace the instruction.