Improving Program Robustness
via Static and Dynamic Analysis


Objective

Our research project aims to help programmers in writing more reliable programs, detecting errors in their programs and diagnosing errors. Our project includes fundamental research to improve our ability to analyze programs, such as pointer alias analysis, and applying static and dynamic techniques to address specific kinds of errors such as buffer overruns and memory leaks. Our emphasis is to develop techniques that can handle large real-life programs without imposing an onerous burden on the users. Here are a list of publications and an overview of the research results:

Released, Open-Source Software


Current Research Members


Integrated Static and Dynamic Analyses for User-Defined Error and Security Flaw Detectors

Program analysis has been increasingly used in software engineering tasks such as auditing programs for security vulnerabilities and finding errors in general. Such tools often require analyses much more sophisticated than those traditionally used in compiler optimizations. In particular, context-sensitive pointer alias information is a prerequisite for any sound and precise analysis that reasons about uses of heap objects in a program. Context-sensitive analysis is challenging because there are over 1014contexts in a typical large program, even after recursive cycles are collapsed. Moreover, pointers cannot be resolved in general without analyzing the entire program.

We have developed a new framework, based on the concept of deductive databases, for context-sensitive program analysis. In this framework, all program information is stored as relations; data access and analyses are written as Datalog queries. To handle the large number of contexts in a program, the database represents relations with binary decision diagrams (BDDs). The system we have developed, called bddbddb, automatically translates database queries into highly optimized BDD programs.

Our preliminary experiences suggest that a large class of analyses involving heap objects can be described succinctly in Datalog and implemented efficiently with BDDs. To make developing application-specific analyses easy for programmers, we have also created a language called PQL that makes a subset of Datalog queries more intuitive to define. We have used the language to find many security holes in Web applications.


IPSSA: An Unsound Path- and Context-Sensitive Pointer Analysis for Bug Detection

We have developed a pointer alias analysis expressly for automatic error detection. State-of-the-art pointer alias analyses are either too slow or too imprecise for finding errors in real-life programs. We propose a hybrid pointer analysis that tracks actively manipulated pointers held in local variables and parameters accurately with path and context sensitivity and handles pointers stored in recursive data structures less precisely but efficiently. We make the unsound assumption that pointers passed into a procedure, in parameters, global variables, and locations reached by applying simple access paths to parameters and global variables, are all distinct from each other and from any other locations. This assumption matches the semantics of many functions, reduces spurious aliases and speeds up the analysis.

We designed a program representation, called IPSSA, for capturing intraprocedural and interprocedural definition-use relationships of directly and indirectly accessed memory locations. This representation makes it easy to create demand-driven path-sensitive and context-sensitive analyses.

We demonstrated how a program checker based on IPSSA can be used to find security violations. Our checker, when applied to 10 programs, found 6 new violations and 8 previously reported ones. The checker generated only one false warning, suggesting that our approach is effective in creating practical and easy-to-use bug detection tools.

Efficient Inclusion-Based Points-To Analysis

We have designed and implemented an efficient inclusion-based points-to analysis for strictly-typed object-oriented languages. Our implementation easily scales to millions of lines of Java code, and it supports language features such as inheritance, object fields, exceptional control flow, type casting, dynamic dispatch, and reflection.

Our algorithm is based on Heintze and Tardieu's Andersen-style points-to analysis designed originally for C programs. We have improved the precision of their algorithm by tracking the fields of individual objects separately and by analyzing the local variables in a method in a flow-sensitive manner. Our algorithm represents the semantics of each procedure concisely using a sparse summary graph representation based on access paths; it iterates over this sparse representation until it reaches a fixed point solution. By utilizing the access path and field information present in the summary graphs, along with minimizing redundant operations and memory management overheads, we are able to quickly and effectively analyze very large programs. Our experimental results demonstrate that this technique can be used to compute precise static call graphs for very large Java programs.


Clouseau: Detecting memory leaks in C++ programs automatically by static analysis

Memory leaks and double deletes in large C++ programs are hard to find; detecting them require not just local analysis, but whole program analysis. We have made formal an object ownership model that is commonly used in practice. Objects in this model are owned at all times. The ownership may be held by a parent object, or by a variable in a method, and can be transferred during the lifetime of the object.

We have developed an algorithm that can automatically identify objects that follow the model and locate potential memory leaks and deletions of dangling pointers in C++ programs, without requiring user intervention. The algorithm uses a fully flow-sensitive and context-sensitive analysis to derive the likely object invariants and to check that the objects are used consistently throughout the program. The algorithm has been implemented in a tool called Clouseau.

Previous attempts to model memory management with linear types were too restrictive to be useful on real-life programs. To create a practical model, we allow object ownership be transferred optionally in assignments and parameter passing. We use an automatic procedure to determine if there are any inconsistencies in transfers of ownership. We make ownership inferencing feasible by introducing an object invariant that is often found in good object-oriented programming practice. Namely, members of a class are classified as either "owned" or "not owned"; i.e. a member is either "always owned" or "never owned" by the parent object at all public method boundaries. This approach results in an efficient whole-program flow-sensitive and context-sensitive analysis, which takes less than 8 minutes on a PC to analyze a program with over 100,000 lines of code.

Clouseau successfully identifies tens of errors in our SUIF2 research compiler, and several bugs each in the open-source Mozilla web browser and the OpenOffice productivity suite. In addition, it found numerous classes in most of the systems where system extenders could easily misuse the class implementations because of undocumented interface restriction on the object. Besides producing a useful tool, this research contributes to our understanding of deep program analysis by producing one of the fully flow-sensitive and context-sensitive analysis tools that works on large real-life C++ programs.


Metacompilation: Detecting critical errors in system software by static analysis

System software often obey rules such as "accesses to variable A must be guarded by lock B". We have developed an approach called metacompilation where programmers can add simple system-specific compiler extensions that check their code. We have written roughly fifty checkers to find errors in operating systems such as Linux and OpenBSD. We have also developed techniques to automatically extract such rules from the program.

Metacompilation checkers find many serious errors in complex, real systems code. We have written roughly 50 checkers that in aggregate have found thousands of errors in widely-used systems such as Linux and OpenBSD. Many errors were the worst type of systems bugs: those that crash the system, but only after it has been running continuously for days. We have validated the results by reporting the errors to the main system developers, spurring hundreds of fixes (not all errors have been fixed --- there were so many that the backlog could easily take over a year to process).

Many checkers are less than 100 lines of code; some of the most effective are less than 20. MC checkers are easily written. Most of our checkers were written by programmers who had only a passing familiarity with the systems that they checked. Extension writers also need little background knowledge of compilers in order to write checkers. A good example of this can be seen in that an undergraduate with no real knowledge of Linux and OpenBSD and no compiler experience implemented an MC checker that found roughly two hundred security holes in these systems in a few months (which is likely a world record for security holes per unit of time, if not close to the record for sheer numbers).

(More papers by Dawson Engler and his group can be found here).


Using Redundancy To Find Errors

Programmers generally attempt to perform useful work. If they performed an action, it was because they believed it served some purpose. Redundant operations violate this belief. However, in the past redundant operations have been typically regarded as minor cosmetic problems rather than serious errors. This paper demonstrates that in fact many redundancies are as serious as traditional hard errors (such as race conditions or null pointer dereferences). We experimentally test this idea by writing and applying five redundancy checkers to a number of large open source projects, finding many errors. We then show that even when redundancies are harmless they strongly correlate with the presence of traditional hard errors. Finally we show how flagging redundant operations gives a way to detect mistakes and omissions in specifications. For example, a locking specification that binds shared variables to their protecting locks can use redundancies to detect missing bindings by flagging critical sections that include no shared state.

RacerX: static detection of race conditions and deadlocks

RacerX is a static tool that uses flow-sensitive, interprocedural analysis to detect both race conditions and deadlocks. It is explicitly designed to find errors in large, complex multithreaded systems. It aggressively infers checking information such as which locks protect which operations, which code contexts are multithreaded, and which shared accesses are dangerous. It tracks a set of code features which it uses to sort errors both from most to least severe. It uses novel techniques to counter the impact of analysis mistakes. The tool is fast, requiring between 2-14 minutes to analyze a 1.8 million line system. We have applied it to Linux, FreeBSD, and a large commercial code base, finding serious errors in all of them.

DIDUCE: Tracking down software errors using dynamic anomaly detection.

DIDUCE is a practical and effective tool that aids programmers in detecting complex program errors and identifying their root causes. By instrumenting a program and observing its behavior as it runs, DIDUCE dynamically formulates hypotheses of invariants obeyed by the program. DIDUCE hypothesizes the strictest invariants at the beginning, and gradually relaxes the hypothesis as violations are detected to allow for new behavior. The violations reported help users to catch software bugs as soon as they occur. They also give programmers new visibility into the behavior of the programs such as identifying rare corner cases in the program logic or even locating hidden errors that corrupt the program's results. We implemented the DIDUCE system for Java programs and applied it to four programs of significant size and complexity. DIDUCE succeeded in identifying the root causes of programming errors in each of the programs quickly and automatically. In particular, DIDUCE is effective in isolating a timing-dependent bug in a released JSSE (Java Secure Socket Extension) library, which would have taken an experienced programmer days to find. Our experience suggests that detecting and checking program invariants dynamically is a simple and effective methodology for debugging many different kinds of program errors across a wide variety of application domains.

CRED: A Practical Dynamic Overflow Detector.

Despite previous efforts in auditing software manually and automatically, buffer overruns are still being discovered in programs in use. A dynamic bounds checker detects buffer overruns in erroneous software before it occurs and thereby prevents attacks from corrupting the integrity of the system. Dynamic buffer overrun detectors have not been adopted widely because they either (1) cannot guard against all buffer overrun attacks, (2) break existing code, or (3) incur too high an overhead. This paper presents a practical detector called CRED (C Range Error Detector) that avoids each of these deficiencies. CRED finds all buffer overrun attacks as it directly checks for the bounds of memory accesses. Unlike the original referent-object based bounds-checking technique, CRED does not break existing code because it uses a novel solution to support program manipulation of out-of-bounds addresses. Finally, by restricting the bounds checks to strings in a program, CRED's overhead is greatly reduced without sacrificing protection in the experiments we performed.

CRED is implemented as an extension of the GNU C compiler version 3.3.1. The simplicity of our design makes possible a robust implementation that has been tested on over 20 open-source programs, comprising over 1.2 million lines of C code. CRED proved effective in detecting buffer overrun attacks on programs with known vulnerabilities, and is the only tool found to guard against a testbed of 20 different buffer overflow attacks. Finding overruns only on strings impose an overhead of less than 26% for 14 of the programs, and an overhead of up to 130% for the remaining six, while the previous state-of-the-art bounds checker by Jones and Kelly breaks 60% of the programs and is 12 times slower. Incorporating well-known techniques for optimizing bounds checking into CRED could lead to further performance improvements.


Automatic Extraction of Object-Oriented Component Interfaces

Today's large programs are often made out of components, whose methods must be invoked according to some predefined but often undocumented sequencing order. Component interfaces in our system are modeled as a product of finite state machines, where each state-modifying method is represented as a state in the FSM, and transitions of the FSMs represent allowable pairs of consecutive methods.

Automatically extracted models are useful as documentation or constraints to be enforced by a static or dynamic checker; developers can check the models to ensure that they match the original design intents and testers can use them to evaluate the completeness of a test suite. Techniques developed include static analyses to deduce illegal call sequences in a program, dynamic instrumentation techniques to extract models from the execution runs, and finally a dynamic model checker that ensures that the code conforms to the model.

We have tested our model extraction tools over 1.2 millions of lines of code. The model automatically extracted for the socket implementation in java.net captures the essence of the class effectively. The models extracted for the standard Java libraries are useful as documentation and can be used by static tools to look for errors in programs. The scalability of the dynamic analysis tool is demonstrated by running it on the million-line J2EE enterprise edition platform. The extracted models convey interesting information about the program being tested and the test suite itself. Finally, when applied to the joeq virtual machine, the tools found discrepancies betwen the intended and implemented API. This information is helpful in the evolution of programs.


Speculative Threads: Architectural Support to Improve Software Reliability

A particularly attractive approach to improving software reliability is to have the program monitor its execution and recover from errors when they are detected. Unfortunately, the approach is expensive not just because it slows down a program's execution, but it also places additional burden on the programmers. We believe that future computer architectures should provide features that make it easy and efficient to monitor a program execution and recover for errors. Our research shows that the mechanisms originally invented to support speculative thread-level parallelism can be used for this purpose. We can speed up the monitoring code by executing it in parallel with the original execution, and use the rollback mechanism to allow for error recovery.

Limited hardware support has been provided in the past for performance monitoring. Here, we are interested in monitoring the execution for the sake of software reliability, as such, the support must be more flexible. The program-specific monitor functions are executed in parallel with the continuation of the normal execution. Because the hardware buffers up all the side effects of the speculative execution of the normal thread, the monitoring functions can simply assume they are operating on the state in which the function is invoked. Because the side effects of the monitoring functions seldom affect the normal threads, it is often possible to execute the normal threads in parallel. In the rare occasion where the monitoring threads wish to raise an exception, the hardware can simply discard the speculative state. For error recovery, we advocate the use of fine-grain transactions in our programming. We execute the code within the transaction in a speculative thread. The side effects of the transactions are committed if the thread completes normally, and discarded otherwise.

We have measured the effectiveness of our proposal by experimenting with a number of case studies including the implementation of DIDUCE and Purify, and code to recover from buffer overruns. Our preliminary results suggest that instrumented code indeed has more parallelism. For example, our proposed architectural feature manages to execute 80% more instructions with no performance degradation in one of our test cases. Also, we showed that the speculative thread support allows computations to recover quickly and thus fends off denial-of-service attacks that try to crash the processes of an application.

This work suggests a new direction in architectural research. There has been so much energy devoted to squeezing the last drop of performance out of existing applications, we are seeing diminishing returns. Instead of tuning the micro-architecture to speed up existing applications, we propose that we look to the future to promote emerging programming paradigms that enhance reliability and productivity, and which may be too expensive on today's architecture.


This research is supported in part by the National Science Foundation under grants No. 0326227 and No. 0086160, NSF student fellowships and Stanford Graduate Fellowships. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.