A Java source program is first
converted to a class file
containing bytecode, which is then translated into an IR
representation in the compiler called Quads. This page familiarizes
you with the program representations by way of examples.
The Java bytecode is a rather
high-level representation of a Java program. While some information, like local
variable names, is dropped, high-level information such as class layouts and
object hierarchies is retained. Java bytecode is stack-oriented--operands are
pushed on the operand stack and arithmetic operations are applied to the top
variables on the stack. The stack architecture was chosen because their
programs are compact.
One can examine the bytecode of a
class by invoking the bytecode disassembler using the command javap -c
<classname>. You
do not need to know the details of Java bytecode for this class. We include a
brief discussion here so that you can understand the process by which Java
source code is translated to our own internal compiler representation. If you
are interested in finding out more, here are the overviews of the
class file format, and the compilation
from Java source to bytecode.
The representation that you will
be using for the first two assignments is also a rather high-level IR. Like
Java bytecode, it retains source program information such as field accesses and
virtual method invocations. This supports the implementation of high-level
optimizations such as minimizing the cost of virtual function invocations.
Instead of a stack architecture,
however, we will use as our model a machine with an unbounded number of pseudo
registers. Pseudo registers hold local variables of a method, as well as
temporary variables generated by the compiler to store intermediate results. All
data must first be loaded into pseudo registers before they can be operated on.
This architecture is more conducive to program optimization than the stack
architecture.
All the stack operations in the
class files are translated into a series of simple instructions, each accepting
up to three input operands and writing to one result variable. Hence, the IR is
called Quads. Instructions are organized in a control flow graph,
where nodes are the basic blocks and edges are the possible flow of control.
Furthermore, the compiler also puts in the verification checks imposed by the
Java semantics. For example, references are checked for NULL values before they can be used. These
checks are inserted into the Quad representation directly.
Below we will how a few simple
examples to illustrate how the same program can be represented at the source,
bytecode, and the quad representation. You are encouraged to write new Java
examples and use the same steps to find out how they are represented at the
byte code and more importantly as quads. The sources to all the examples can be
found in /usr/class/cs243/examples.
This example illustrates how
basic expressions are represented as quads.
class ExprTest { int test (int a) { int b, c, d, e, f; c = a + 10; f = a + c; if (f > 2) { f = f - c; } return (f); }}
We first use javac to compile the Java source to a class file, then run
the disassembler over the class file.
elaine6:~/examples> javac ExprTest.javaelaine6:~/examples> javap -c ExprTestCompiled from ExprTest.javaclass ExprTest extends java.lang.Object { ExprTest(); int test(int);} Method ExprTest() 0 aload_0 1 invokespecial #1 <Method java.lang.Object()> 4 return Method int test(int) 0 iload_1 1 bipush 10 3 iadd 4 istore_3 5 iload_1 6 iload_3 7 iadd 8 istore 6 10 iload 6 12 iconst_2 13 if_icmple 22 16 iload 6 18 iload_3 19 isub 20 istore 6 22 iload 6 24 ireturnelaine6:~/examples>
javap first prints out the names of the methods
defined for each class, then the definition of the individual methods. By
default, all classes extend java.lang.Object; an appropriate constructor is automatically
generated by the compiler if one does not exist.
For each method, javap prints out its signature--for example, test accepts an integer and returns an
integer. A frame is created for each invocation. Location 0 holds the this pointer; the parameter and local
variables a,b,c,d,e,f are
numbered 1 to 6, respectively. Instructions are labeled by their position in
the array of bytecodes representing the procedure.
Instructions such as load are prefixed by the result type: a,b,c,d,f,i,j,s, and z represent reference, byte, character, double,
float, integer, long, short, boolean, respectively. An instruction's parameter
is either represented as a suffix or an extra operand. iload_1 and iload 6 load the 1st and 6th variables from the
frame onto the stack, respectively. The difference is just an optimization in
encoding; the former, which is more common, is encoded in one byte and the
latter is encoded in two.
iconst refers to pushing an integer constant on
the stack. if_icmple 22 is a
conditional branch based on an integer comparison between two operands on the
stack. Namely, if the top of stack is less than or equal to the second operand
on the stack then go to instruction 22.
You can print out a textual representation
of the quad IR by using the following commands:
elaine6:~/examples> javac PrintQuadselaine6:~/examples> java PrintQuads ExprTestClass: ExprTestMethod: <init>()VControl flow graph for ExprTest.<init> ()V:BB0 (ENTRY) (in: <none>, out: BB2) BB2 (in: BB0 (ENTRY), out: BB1 (EXIT))2 NULL_CHECK T-1 <g>, R0 ExprTest1 INVOKESPECIAL_V% java.lang.Object.<init> ()V, (R0 ExprTest)3 RETURN_V BB1 (EXIT) (in: BB2, out: <none>) Exception handlers: []Register factory: Local: (I=1,F=1,L=1,D=1,A=1) Stack: (I=1,F=1,L=1,D=1,A=1)Method: test(I)IControl flow graph for ExprTest.test (I)I:BB0 (ENTRY) (in: <none>, out: BB2) BB2 (in: BB0 (ENTRY), out: BB3, BB4)1 ADD_I T0 int, R1 int, IConst: 102 MOVE_I R3 int, T0 int3 ADD_I T0 int, R1 int, R3 int4 MOVE_I R6 int, T0 int5 IFCMP_I R6 int, IConst: 2, LE, BB4 BB3 (in: BB2, out: BB4)6 SUB_I T0 int, R6 int, R3 int7 MOVE_I R6 int, T0 int BB4 (in: BB2, BB3, out: BB1 (EXIT))8 RETURN_I R6 int BB1 (EXIT) (in: BB4, out: <none>) Exception handlers: []Register factory: Local: (I=7,F=7,L=7,D=7,A=7) Stack: (I=2,F=2,L=2,D=2,A=2)elaine6:~/examples>
This command invokes a program that loads
in classes, then invokes the compiler pass joeq.Compiler.Quad.PrintCFG on each method in the class given.
Here we see that BB0 and BB1 are the entry and exit blocks, respectively.
There is a conditional flow of control from BB2 around BB3 arriving at BB4. The first operand of each quad is the
destination variable.
The this pointer is allocated to R0. The parameters and local variables a,b,c,d,e,f are allocated to pseudo registers R1 to R6, respectively. Intermediate results are stored
into temporary registers. For example, the result of R1 + 10 is stored into T0, before it is stored into R3.
The IFCMP_I instruction is similar to the if_icmpl instruction, except that the comparison
operation is one of the parameters and the target is basic block BB4. The type of the operations is attached
to the operation as a suffix. The initialization routine includes an INVOKESPECIAL_V% operation. INVOKESPECIAL invokes an instance method which requires
special handling, such as an instance initialization method, a private method,
or a superclass method. The suffix _V indicates that the function invoked returns void, and the % symbol indicates that the invoked
function may need to be loaded dynamically. java.lang.Object.<init> ()V says to invoke the initialization
function in java.lang.Object,
its superclass. The signature of the class is that it takes no explicit
argument and returns a void. It passes to it the this pointer in R0 which is an
instance of the class ExprTest.
Here is another example to illustrate how
fields and arrays are handled.
class ArrayTest { int A[]; ArrayTest() { A = new int[10]; } int access (int i) { return (A[i]); }}
Control flow graph for ArrayTest.access (I)I:BB0 (ENTRY) (in: <none>, out: BB2) BB2 (in: BB0 (ENTRY), out: BB1 (EXIT))1 NULL_CHECK T-1 <g>, R0 ArrayTest2 GETFIELD_A T0 int[], R0 ArrayTest, .A, T-1 3 NULL_CHECK T-1 <g>, T0 int[] 4 BOUNDS_CHECK T0 int[], R1 int, T-1 5 ALOAD_I T0 int, T0 int[], R1 int, T-1 6 RETURN_I T0 int BB1 (EXIT) (in: BB2, out: <none>)
The first NULL_CHECK checks if the this pointer is not null. T-1, read T minus one, is a fake location referenced
by the subsequent operation (GETFIELD_A) that uses the checked pointer. This fake dependence between the
definition and the use of T-1 prevents the instruction scheduler from inverting the order of NULL_CHECK and GETFIELD_A. The GETFIELD_A operation stores the A field of the
instance, which is a reference to an array, into the temporary variable T0. The NULL and BOUNDS checks are
then performed. The ALOAD_I instruction loads into a register an indexed array location of type int. The NEWARRAY is a special instruction that creates a
new array of a given size.