Program Representations

A Java source program is first converted to a class file containing bytecode, which is then translated into an IR representation in the compiler called Quads. This page familiarizes you with the program representations by way of examples.


Java Bytecode

The Java bytecode is a rather high-level representation of a Java program. While some information, like local variable names, is dropped, high-level information such as class layouts and object hierarchies is retained. Java bytecode is stack-oriented--operands are pushed on the operand stack and arithmetic operations are applied to the top variables on the stack. The stack architecture was chosen because their programs are compact.

One can examine the bytecode of a class by invoking the bytecode disassembler using the command javap -c <classname>. You do not need to know the details of Java bytecode for this class. We include a brief discussion here so that you can understand the process by which Java source code is translated to our own internal compiler representation. If you are interested in finding out more, here are the overviews of the class file format, and the compilation from Java source to bytecode.


The Quad IR

The representation that you will be using for the first two assignments is also a rather high-level IR. Like Java bytecode, it retains source program information such as field accesses and virtual method invocations. This supports the implementation of high-level optimizations such as minimizing the cost of virtual function invocations.

Instead of a stack architecture, however, we will use as our model a machine with an unbounded number of pseudo registers. Pseudo registers hold local variables of a method, as well as temporary variables generated by the compiler to store intermediate results. All data must first be loaded into pseudo registers before they can be operated on. This architecture is more conducive to program optimization than the stack architecture.

All the stack operations in the class files are translated into a series of simple instructions, each accepting up to three input operands and writing to one result variable. Hence, the IR is called Quads. Instructions are organized in a control flow graph, where nodes are the basic blocks and edges are the possible flow of control. Furthermore, the compiler also puts in the verification checks imposed by the Java semantics. For example, references are checked for NULL values before they can be used. These checks are inserted into the Quad representation directly.


Examples

Below we will how a few simple examples to illustrate how the same program can be represented at the source, bytecode, and the quad representation. You are encouraged to write new Java examples and use the same steps to find out how they are represented at the byte code and more importantly as quads. The sources to all the examples can be found in /usr/class/cs243/examples.

Example 1: ExprTest.java

This example illustrates how basic expressions are represented as quads.

Source Program

class ExprTest {
    int test (int a) {
        int b, c, d, e, f;
        c = a + 10;
        f = a + c;
        if (f > 2) {
            f = f - c;
        }
        return (f);
    }
}

Bytecode

We first use javac to compile the Java source to a class file, then run the disassembler over the class file.

elaine6:~/examples> javac ExprTest.java
elaine6:~/examples> javap -c ExprTest
Compiled from ExprTest.java
class ExprTest extends java.lang.Object {
    ExprTest();
    int test(int);
}
 
Method ExprTest()
   0 aload_0
   1 invokespecial #1 <Method java.lang.Object()>
   4 return
 
Method int test(int)
   0 iload_1
   1 bipush 10
   3 iadd
   4 istore_3
   5 iload_1
   6 iload_3
   7 iadd
   8 istore 6
  10 iload 6
  12 iconst_2
  13 if_icmple 22
  16 iload 6
  18 iload_3
  19 isub
  20 istore 6
  22 iload 6
  24 ireturn
elaine6:~/examples>

javap first prints out the names of the methods defined for each class, then the definition of the individual methods. By default, all classes extend java.lang.Object; an appropriate constructor is automatically generated by the compiler if one does not exist.

For each method, javap prints out its signature--for example, test accepts an integer and returns an integer. A frame is created for each invocation. Location 0 holds the this pointer; the parameter and local variables a,b,c,d,e,f are numbered 1 to 6, respectively. Instructions are labeled by their position in the array of bytecodes representing the procedure.

Instructions such as load are prefixed by the result type: a,b,c,d,f,i,j,s, and z represent reference, byte, character, double, float, integer, long, short, boolean, respectively. An instruction's parameter is either represented as a suffix or an extra operand. iload_1 and iload 6 load the 1st and 6th variables from the frame onto the stack, respectively. The difference is just an optimization in encoding; the former, which is more common, is encoded in one byte and the latter is encoded in two.

iconst refers to pushing an integer constant on the stack. if_icmple 22 is a conditional branch based on an integer comparison between two operands on the stack. Namely, if the top of stack is less than or equal to the second operand on the stack then go to instruction 22.

Quad Representation

You can print out a textual representation of the quad IR by using the following commands:

elaine6:~/examples> javac PrintQuads
elaine6:~/examples> java PrintQuads ExprTest
Class: ExprTest
Method: <init>()V
Control flow graph for ExprTest.<init> ()V:
BB0 (ENTRY)     (in: <none>, out: BB2)
 
BB2     (in: BB0 (ENTRY), out: BB1 (EXIT))
2   NULL_CHECK              T-1 <g>,    R0 ExprTest
1   INVOKESPECIAL_V%                    java.lang.Object.<init> ()V,    (R0 ExprTest)
3   RETURN_V                
 
BB1 (EXIT)      (in: BB2, out: <none>)
 
Exception handlers: []
Register factory: Local: (I=1,F=1,L=1,D=1,A=1) Stack: (I=1,F=1,L=1,D=1,A=1)
Method: test(I)I
Control flow graph for ExprTest.test (I)I:
BB0 (ENTRY)     (in: <none>, out: BB2)
 
BB2     (in: BB0 (ENTRY), out: BB3, BB4)
1   ADD_I                   T0 int,     R1 int, IConst: 10
2   MOVE_I                  R3 int,     T0 int
3   ADD_I                   T0 int,     R1 int, R3 int
4   MOVE_I                  R6 int,     T0 int
5   IFCMP_I                 R6 int,     IConst: 2,      LE,     BB4
 
BB3     (in: BB2, out: BB4)
6   SUB_I                   T0 int,     R6 int, R3 int
7   MOVE_I                  R6 int,     T0 int
 
BB4     (in: BB2, BB3, out: BB1 (EXIT))
8   RETURN_I                R6 int
 
BB1 (EXIT)      (in: BB4, out: <none>)
 
Exception handlers: []
Register factory: Local: (I=7,F=7,L=7,D=7,A=7) Stack: (I=2,F=2,L=2,D=2,A=2)
elaine6:~/examples>

This command invokes a program that loads in classes, then invokes the compiler pass joeq.Compiler.Quad.PrintCFG on each method in the class given.

Here we see that BB0 and BB1 are the entry and exit blocks, respectively. There is a conditional flow of control from BB2 around BB3 arriving at BB4. The first operand of each quad is the destination variable.

The this pointer is allocated to R0. The parameters and local variables a,b,c,d,e,f are allocated to pseudo registers R1 to R6, respectively. Intermediate results are stored into temporary registers. For example, the result of R1 + 10 is stored into T0, before it is stored into R3.

The IFCMP_I instruction is similar to the if_icmpl instruction, except that the comparison operation is one of the parameters and the target is basic block BB4. The type of the operations is attached to the operation as a suffix. The initialization routine includes an INVOKESPECIAL_V% operation. INVOKESPECIAL invokes an instance method which requires special handling, such as an instance initialization method, a private method, or a superclass method. The suffix _V indicates that the function invoked returns void, and the % symbol indicates that the invoked function may need to be loaded dynamically. java.lang.Object.<init> ()V says to invoke the initialization function in java.lang.Object, its superclass. The signature of the class is that it takes no explicit argument and returns a void. It passes to it the this pointer in R0 which is an instance of the class ExprTest.


Example 2: ArrayTest.java

Here is another example to illustrate how fields and arrays are handled.

Source Program

class ArrayTest {
    int A[];
    ArrayTest() {
        A = new int[10];
    }
    int access (int i) {
        return (A[i]);
    }
}

Quads

Control flow graph for ArrayTest.access (I)I:
BB0 (ENTRY)     (in: <none>, out: BB2)
 
BB2     (in: BB0 (ENTRY), out: BB1 (EXIT))
1   NULL_CHECK              T-1 <g>,    R0 ArrayTest
2   GETFIELD_A              T0 int[],   R0 ArrayTest,   .A,     T-1 
3   NULL_CHECK              T-1 <g>,    T0 int[]
4   BOUNDS_CHECK            T0 int[],   R1 int, T-1 
5   ALOAD_I                 T0 int,     T0 int[],       R1 int, T-1 
6   RETURN_I                T0 int
 
BB1 (EXIT)      (in: BB2, out: <none>)

The first NULL_CHECK checks if the this pointer is not null.  T-1, read T minus one, is a fake location referenced by the subsequent operation (GETFIELD_A) that uses the checked pointer.  This fake dependence between the definition and the use of T-1 prevents the instruction scheduler from inverting the order of NULL_CHECK and GETFIELD_A. The GETFIELD_A operation stores the A field of the instance, which is a reference to an array, into the temporary variable T0.  The NULL and BOUNDS checks are then performed. The ALOAD_I instruction loads into a register an indexed array location of type int.  The NEWARRAY is a special instruction that creates a new array of a given size.

<-- previous home next -->