For each variable involved in a reduction, the compiler makes a private copy of the variable for each processor. The executable code for the loop containing the reduction manipulates the private copy of the reduction variable in three separate parts. First, the private copy is initialized prior to executing the loop with the identity element for (e.g., 0 for ). Second, the reduction operation is applied to the private copy within the parallel loop. Finally, the program performs a global accumulation following the loop execution whereby all non-identity elements of the local copies of the variable are accumulated into the original variable. Synchronization locks are used to guard accesses to the original variable to guarantee that the updates are atomic.