In this homework, you will interpret the results of and extend some interprocedural program analyses using the BDDBDDB framework.
You may work in groups of one or two for this homework.
This assignment consists of exercises and problems. Exercises are intended to help you learn BDDBDDB and you do not need to submit anything for the exercises. For the problems, you need to submit the results. Details of what should be submitted is explained in the problem description.
/usr/class/cs243/bddbddb/hw6.tar.gzFor this project we recommend using the CS myth machines.
Change into the hw6/ directory and type make. This should
compile AddressBook.java, which is the program that we will be
analyzing throughout this homework. Take a few minutes to familiarize
yourself with the contents of AddressBook.java. If you read
carefully, you will notice that it contains a few bugs, but that is what
we are going to try to detect by writing some simple analyses in
Datalog!
Execute: python run_bddbddb.py AddressBook pacs.dtl
pacs.dtl) on the Java program AddressBook, which you've just
compiled. There may be some warning messages printed to the
terminal. This is normal, so don't worry about it. As long as it
prints Done at the end, then it probably ran successfully.If the analysis ran successfully, there will now be a results/
sub-directory containing various .bdd, .map, .tuples and
other files. (Don't worry about what these files mean for now.)
Execute: python print_bddbddb_results.py AddressBook
results/hP.tuples
hP points-to
relations from the current run in a (reasonably) human-readable text
format. You'll soon find out what hP means, so don't worry about it
for now. (Actually, it doesn't print out all of the relations; it
only prints the ones relevant to AddressBook. There are many more
relations generated by Java Standard Library code that this tiny program
imports.)If the last step successfully outputs lines of text to your terminal, then you are all set and ready to begin this homework! If you've gotten stuck or can't get any output, please post your questions to the newsgroup.
BDDBDDB is an implementation of the Datalog logic programming language that allows programmers to specify program analyses in a compact declarative manner.
Let's briefly run through how a Java program proceeds through an analysis using BDDBDDB when you execute this script:
python run_bddbddb.py <main class name> <Datalog filename>
The Joeq front-end (a program called joeq.Main.GenRelations)
extracts components (i.e., variables, types) from the .class bytecode
file supplied on the command line and encodes them into a BDD (binary
decision diagram) representation, stored in the results/*.bdd files.
The solver executes the specified Datalog file on the BDD
representation and outputs tuples files in results/*.tuples. If you
view those files, they are simply tuples of numbers. The
results/*.map files map between these numbers and human-readable
source-level names.
Then when you execute this script:
python print_bddbddb_results.py <main class name> <tuples filename>
it will print out a set of tuples to the command-line.
We encourage you to take your own Java programs and experiment with
running analyses on them. The simplest way to get started is to compile
your programs in the same directory as our scripts. Let us know if you
have problems setting up the infrastructure. Note that BDDBDDB is
sensitive to Java memory limits and if you are getting OutOfMemory
errors with larger test programs you should edit run_bddbddb.py to increase
these limits (the -mx command line argument).
Let's use pacs.dtl (context-sensitive pointer analysis) as an example.
We will step through parts of the file in turn:
First, the header includes various implementation-specific declarations,
such as what directory to output the results of the analysis (e.g.,
results/) and what file to use to refer to the domains of interest
(e.g., results/fielddomains.pa). We've annotated that file with
plenty of comments, so hopefully those will be useful:
.basedir "results" .include "fielddomains.pa" # Here are the abbreviations for domains used in the relations: # - V: variables # - H: heap objects, named by invocation sites of object creation methods # - T: types # - F: fields # - I: invocation sites # - Z: numbers for parameters # - N: names # - M: methods # - C: classes
The next section declares all of the relations:
###### Relations # All input relations are extracted by Joeq front-end # variable 'v' points to heap object 'h' vP0 (v:V0, h:H0) input # ha.field may point to hb hP0 (ha:H0, field:F0, hb:H1) input # statement: dest = source (includes passing of arguments and return values) A (dest:V0, source:V1) input # statement: base.field = src S (base:V0, field:F0, src:V1) input # statement: dest = base.field L (base:V0, field:F0, dest:V1) input # actual parameters for invocation sites: # 'actualparam' is passed as parameter number 'num' at 'invoke' actual (invoke:I0, num:Z0, actualparam:V1) input # formal parameters for methods: # the formal parameter number 'num' of 'method' is represented as 'formalparam' formal (method:M0, num:Z0, formalparam:V0) input # Method 'method' contains an invocation site 'invoke' with virtual # method name 'name' mI (method:M0, invoke:I0, name:N0) input # Statically-bound method invocation edges: # invocation site 'invoke' calls method 'target' IE0 (invoke:I0, target:M0) input # variable 'var' has type 'type' vT (var:V0, type:T0) input # heap object 'heap' has type 'type' hT (heap:H0, type:T1) input # type T1 is assignable to type T0 (based on sub-classing relationship) aT (type:T0, type:T1) input # It is type-safe to assign heap object 'h' to variable 'v' vPfilter (v:V0, h:H0) # virtual method dispatch info: # method 'method' is the target of dispatching method name 'name' on type 'type' cha (type:T1, name:N0, method:M0) input # 'v' is a local variable in method 'm' mV(m : M0, v : V0) input # 'v' is the return value of method 'method' Mret (method:M0, v:V1) input # 'v' is return value of invocation site 'invoke' Iret (invoke:I0, v:V0) input IEnum (invoke:I0, target:M0, ccaller:VC1, ccallee:VC0) input #Mthr (method:M0, v:V1) input #Ithr (invoke:I0, v:V0) input #IE1 (invoke:I0, target:M0) # Context-sensitive output relations (cannot print as outputtuples # because there are astronomical numbers of contexts) IEcs (ccaller:VC0, invoke:I0, ccallee:VC1, target:M0) output cA (vcdest:VC0, dest:V0, vcsrc:VC1, source:V1) output cvP (vc:VC0, v:V0, h:H0) output # These relations are output as tuples that can be # printed by print_bddbddb_results.py # field 'ha.field' of heap object 'ha' may point to heap object 'hb' hP (ha:H0, field:F0, hb:H1) outputtuples # variable 'v' may point to heap object 'h' vP (v:V0, h:H0) outputtuples # invocation site 'invoke' may call method 'target' IE (invoke:I0, target:M0) outputtuples ### Insert your new relations below here:
Each relation contains one or more terms, each of which may take on
values from a particular domain. input (extensional database)
relations are constructed from the BDD representation that the Joeq
front-end builds from the source program. output and outputtuples
(intensional database) relations are inferred by running the Datalog
program. Any relations marked as outputtuples will have their tuples
dumped to results/*.tuples files, which can later be printed in
human-readable format.
Each relation is like a 'declaration' in an imperative programming
language. For example, the hP relation which you will be mainly
dealing with in this assignment is shown below:
# field 'ha.field' of heap object 'ha' may point to heap object 'hb' hP (ha:H0, field:F0, hb:H1) outputtuples
The name of the relation is listed, along with its terms: ha, field,
and hb. Each term can take on values drawn from a domain. Here, the
H stands for the domain of heap-allocated objects, and the F stands
for the domain of field names (ignore the numbers like H0 and H1 for
now). Because outputtuples is specified, when the analysis is run,
the tuples will be written to a file called results/hP.tuples. Recall
that to print out the hP relations, the command we used was:
python print_bddbddb_results.py AddressBook results/hP.tuples
Now you know where that tuples file comes from.
The meaning of each relation is documented briefly in the comments above each line. If necessary, please refer to Chapter 3 of John Whaley's Ph.D. thesis (the creator of BDDBDDB) for more detailed definitions.
The final section consists of the rules which tell Datalog how to infer new relations from the input relations:
###### Rules cvP(_,v,h) :- vP0(v,h). cA(_,v1,_,v2) :- A(v1,v2). IEcs(vc2,i,vc1,m) :- IE0(i,m), IEnum(i,m,vc2,vc1). vPfilter(v,h) :- vT(v,tv), aT(tv,th), hT(h,th). cA(vc1,v1,vc2,v2) :- formal(m,z,v1), IEcs(vc2,i,vc1,m), actual(i,z,v2). cA(vc2,v2,vc1,v1) :- Mret(m,v1), IEcs(vc2,i,vc1,m), Iret(i,v2). cvP(vc1,v1,h) :- cA(vc1,v1,vc2,v2), cvP(vc2,v2,h), vPfilter(v1,h). hP(h1,f,h2) :- S(v1,f,v2), cvP(vc1,v1,h1), cvP(vc1,v2,h2). cvP(vc1,v2,h2) :- L(v1,f,v2), cvP(vc1,v1,h1), hP(h1,f,h2), vPfilter(v2,h2). split vP(v, h) :- cvP(_, v, h). IE(i, m) :- IEcs (_,i,_,m). ### Insert your new rules below here:
The relation on the lefthand side of each rule is true if all
relations on the righthand side are true (the two sides are separated
by the :- symbol). Relations on the righthand side are separated by
commas, and each line must end with a period. The special _ symbol
represents don't care, so any term can be substituted in its place.
For example, this rule:
cvP(_,v,h) :- vP0(v,h).
means that in all possible contexts, cvP is true for a given v and
h if vP0(v,h) is true.
Let's now study the output from running: python print_bddbddb_results.py AddressBook results/hP.tuples
The hP relation specifies that a particular field in a particular
heap-allocated object may point to another heap-allocated object.
Let's look at one relation in particular, which you should find in your
output:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1
The || string separates each line into components. There are three
components in this particular relation:
AddressBook.java:63 Concrete: MaleRecord @ 6Record/nameAddressBook.java:37 Concrete: String @ 1The first represents a heap-allocated object, in this case a
MaleRecord object allocated in line 63 of AddressBook.java (look at
the corresponding place in the source code to convince yourself of what
line performed the allocation).
The second represents the name field of the Record class.
The third represents a heap-allocated object, in this case a
String object allocated at line 37 ("Joe").
What this line says is: The field name of the MaleRecord object
allocated on line 63 of AddressBook.java can point to the String
"Joe" allocated on line 37. Go ahead and inspect the code manually to
make sure that this relation makes sense to you.
For some points-to relations the first component is the value global(null).
This is used for static fields.
Copy the output of running python print_bddbddb_results.py AddressBook
results/hP.tuples into a text file named exercise1.txt. There
should be 26 total lines, each representing one hP relation. For
every line or group of lines, explain what the points-to relation
intuitively means in terms of what someone inspecting the source code
can understand. For example, for the line:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1
an appropriate answer would be:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:37 Concrete: String @ 1
- Line 63 ('Record joeRecord = new MaleRecord(joe, joeEmail);') sets
joeRecord.name to point to the String "Joe" allocated on line 37.
In this exercise, we want you to build up some intuition about what the results of a pointer analysis means.
There is a bug in AddressBook.java:main() where it forgets to add some
Record object into the address book. Which Record object is it, and more importantly,
by looking at the output of the pointer analysis that you created in
Exercise 1, what relations in particular could you consult in order to
corroborate your answer? In other words, if the program had added that
record into the address book, how would the hP relations look
different? (You can try to fix the bug yourself, re-compile, and re-run
the analysis to see for sure.)
(Note: Apologies if this problem seems like a contrived example, which it is, but the principle is that for real programs of considerable size, it's infeasible to answer questions like this one by inspecting the source code. However, the pointer analysis we use can scale up to hundreds of thousands of lines of code.)
The goal of a pointer analysis is to answer questions like: For an object X, what objects can the fields of X possibly point to? The goal of an alias analysis is to answer questions like: For objects X and Y with fields fX and fY, respectively, can X.fX and Y.fY possibly point to the same object Z?
By doing Exercises 1 and 2, you will likely notice that the alias analysis results can be extracted from the pointer analysis results. In this problem, you will write a simple alias analysis in Datalog atop of the existing pointer analysis.
First, copy pacs.dtl into a new file called alias.dtl
Next, add a new relation in alias.dtl called hALIAS with the following
declaration:
### Insert your new relations below here: # Can h1.field1 and h2.field2 point to the same object? hALIAS (h1:H0, field1:F0, h2:H1, field2:F1) outputtuples
Add a new rule (or multiple rules if you'd like) such that the hALIAS relation returns all of the pairs h1.field1 and h2.field2 which can point to the same object.
Then you can test your new analysis by running it:
python run_bddbddb.py AddressBook alias.dtl
and printing out the hALIAS relations to your terminal:
python print_bddbddb_results.py AddressBook results/hALIAS.tuples
When you are satisfied that your analysis has met the specs, copy your
alias.dtl file into the submit/ sub-directory. Also, copy the
results of printing out the hALIAS relations into a file called
submit/problem1.txt. At the top of that file, write a paragraph
describing what parts of this output you found to be useful and what
parts you found to be not useful (i.e., noisy output). Don't stress
about details on this part; we just want you to build up some intuition
about interpreting the results of your own analysis.
To sanity-check your results, your analysis should be able to report
that joeRecord.name and joeAltRecord.name are aliased.
Now you will refine the alias analysis you created in Problem 1 so that
it produces less output (actually, so that it produces more relevant
output for one specific debugging task). You want to be able to use
your refined analysis to detect the following bug in AddressBook.java:
66. Record maryRecord = new FemaleRecord(mary, maryEmail); 67. Record janeRecord = new FemaleRecord(jane, maryEmail);
janeRecord should have been constructed with janeEmail rather than
maryEmail (this might be due to a copy-and-paste error, for instance).
Due to this bug, maryRecord.emailAddr and janeRecord.emailAddr are
aliased, when in fact they should not be. Copy alias.dtl into
alias_refined.dtl and modify the rules in alias_refined.dtl such
that aliasing relationships are only reported when:
For instance, we don't care that maryRecord.emailAddr aliases
maryRecord.emailAddr (because they belong to the same heap object, and
in fact, they are the same field), nor that the static String field
joe refers to the same string object as joeRecord.name (because the
base objects have different types).
The expected hALIAS result from running alias_refined.dtl should be:
AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name || AddressBook.java:69 Concrete: MaleRecord @ 34 || Record/name AddressBook.java:69 Concrete: MaleRecord @ 34 || Record/name || AddressBook.java:63 Concrete: MaleRecord @ 6 || Record/name AddressBook.java:66 Concrete: FemaleRecord @ 20 || Record/emailAddr || AddressBook.java:67 Concrete: FemaleRecord @ 27 || Record/emailAddr AddressBook.java:67 Concrete: FemaleRecord @ 27 || Record/emailAddr || AddressBook.java:66 Concrete: FemaleRecord @ 20 || Record/emailAddr
which says that joeRecord.name and joeAltRecord.name are aliased and
that maryRecord.emailAddr and janeRecord.emailAddr are aliased.
There should be no other output.
Hints: != is the inequality operator in Datalog. hT is the relation
that maps a heap object to its type.
Remember to copy alias_refined.dtl into submit/ when you are done.
The pointer analysis we've been studying thus far is field sensitive, which means that it makes a distinction between different fields within an object. Field sensitivity is easy to achieve in Java because it is type-safe and pointer arithmetic is not allowed. In contrast, pointer analyses for C are usually field insensitive, due to the difficulties of distinguishing between field accesses in a sound manner (since pointer arithmetic can be used to access arbitrary fields in a type-unsafe way).
Your task is to modify the rules in pacs.dtl to simulate a field
insensitive analysis.
First copy pacs.dtl to a new file named pacs_field_insensitive.dtl.
Then add a printsize directive to the declaration of the hP relation:
hP (ha:H0, field:F0, hb:H1) outputtuples printsize
Now run pacs_field_insensitive.dtl, and when it finishes, it should
print out how many total tuples are in the hP relation:
Saving results: SIZE OF hP: 105.
There is a lot of other output, but look for the Saving results: line.
There should be 105 tuples in hP. (Actually, depending on your
machine type, there may be 110 tuples, so don't worry about the exact
number, as long as it's in this ballpark.) Note that this includes tuples from
code in the Java Standard Library, which are not printed out by
print_bddbddb_results.py.
Now modify the declaration of hP to remove the field term:
hP (ha:H0, hb:H1) outputtuples printsize
Finally, modify the rules involving hP to make it field insensitive.
That is, hP is true whenever any field of ha may point to hb.
(Hint: Remember that the _ term means don't care.)
Copy pacs_field_insensitive.dtl into submit/ when you are done.
Copy the output of running python print_bddbddb_results.py AddressBook
results/hP.tuples on the results of your field-insensitive analysis
into a text file named submit/problem3.txt. In that same file, record
the size of the hP relation. Did it increase or decrease from its
original value of 105? Why? How does your set of tuples differ from
the originals in exercise1.txt and why are they different?
As you've learned in class, a context-insensitive analysis can be much faster to compute than a context-sensitive analysis, but it is often less precise. In this problem, you will compare the results from context-sensitive and context-insensitive analyses.
pa.dtl contains a context-insensitive pointer analysis. Run it using:
python run_bddbddb.py AddressBook pa.dtl
and print out the resulting hP results using:
python print_bddbddb_results.py AddressBook results/hP.tuples
Copy the results into submit/problem4.txt and, in that same file,
write a comparison of these results with the context-sensitive pointer
analysis results in Exercise 1. In particular, point out
where the context-insensitive analysis leads to less precise results,
and how the precision is lost (by inspecting what occurs in
AddressBook.java). Consult Chapter 12 of the Dragon book for hints.
To submit your assignment, please follow the following steps (assuming your working directory is ~/hw5/submit):
1. Login to myth 2. cd ~/hw6/submit 3. /usr/class/cs243/bddbddb/submit_hw6.shOR, if you're working with a partner, type:
3. /usr/class/cs243/bddbddb/submit_hw5.sh partner_SUID (ex: /usr/class/cs243/bddbddb/submit_hw6.sh huangty)Only one submission is required per pair.
alias.dtl problem1.txt alias_refined.dtl pacs_field_insensitive.dtl problem3.txt problem4.txt
If you discover a bug after submitting (and before the due date), simply run the submission script again.
We'll take the latest version.