JUnitBenchmarks:

Tutorial

Quick start

There are two key elements of JUnitBenchmarks to master: annotations and global system properties. Luckily, most of these are optional and only required if one wants to achieve some specific goal.

Turning your JUnit4 tests into benchmarks

The assumption is your benchmark is already a JUnit4 test. For example, let's say your test class looks like this:

    public class MyTest {
      @Test
      public void twentyMillis() throws Exception {
        Thread.sleep(20);
      }
    }

To turn it into a benchmark, you must add a JUnit4 rule that tells JUnit runners to attach benchmarking code to your tests. Adding such a rule is easy, just add a field, like so (import statements omitted):

    public class MyTest {
      @Rule
      public TestRule benchmarkRun = new BenchmarkRule();

      @Test
      public void twentyMillis() throws Exception {
        Thread.sleep(20);
      }
    }

Alternatively, your test class can extend AbstractBenchmark, which declares this field for you:

    public class MyTest extends AbstractBenchmark {
      @Test
      public void twentyMillis() throws Exception {
        Thread.sleep(20);
      }
    }

When you run this test using any JUnit4 runner, it is already a benchmark and the results will be printed to the console. Note the test's execution time in Eclipse, for example:

Eclipse JUnit View with JUnitBenchmarks tests

It is much larger than the sleep interval we set in the sleep method. It is so because benchmarks are repeated multiple times to get a better estimation of the actual average execution time of the tested method. The message printed to the console contains the details for this particular example:

MyTest.twentyMillis: [measured 10 out of 15 rounds]
 round: 0.02 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 0.32, time.warmup: 0.12, time.bench: 0.20

The test was therefore repeated 15 times: 5 initial warmup rounds were discarded (to give the JVM a chance to optimize the code), for the 10 following rounds the execution time was measured and contributed to the average called round, which is exactly 0.02 seconds or 20 milliseconds. Additional information includes the number of times garbage collector was called and the time spent inside garbage collection.

Tuning your benchmarks with annotations

You can tune the basic benchmarks shown above using additional annotations on test methods. For example, you can adjust the number of warmup and benchmark rounds with the BenchmarkOptions annotation as shown below.

    public class MyTest extends AbstractBenchmark {
      @BenchmarkOptions(benchmarkRounds = 20, warmupRounds = 0)
      @Test
      public void twentyMillis() throws Exception {
        Thread.sleep(20);
      }
    }

We set the number of benchmark passes to 20 and the number of warmup rounds to zero, effectively causing the benchmark to measure all executions of the test method. Next section gradually introduces more of these tuning annotations, showing their use in context.

Full benchmarking

Designing your test case, running rough benchmarks

Let's say our task is to compare the performance of three standard Java list implementations: ArrayList, Vector and LinkedList. First of all, we need to design a test case that mimics the code of our eventual application. Let's assume the application adds some elements to the list and then removes elements from the list in random order. The following test does just this:

public class Lists1
{
    private static Object singleton = new Object();
    private static int COUNT = 50000;
    private static int [] rnd;

    /** Prepare random numbers for tests. */
    @BeforeClass
    public static void prepare()
    {
        rnd = new int [COUNT];

        final Random random = new Random();
        for (int i = 0; i < COUNT; i++)
        {
            rnd[i] = Math.abs(random.nextInt());
        }
    }

    @Test
    public void arrayList() throws Exception
    {
        runTest(new ArrayList<Object>());
    }

    @Test
    public void linkedList() throws Exception
    {
        runTest(new LinkedList<Object>());
    }

    @Test
    public void vector() throws Exception
    {
        runTest(new Vector<Object>());
    }

    private void runTest(List<Object> list)
    {
        assert list.isEmpty();
        
        // First, add a number of objects to the list.
        for (int i = 0; i < COUNT; i++)
            list.add(singleton);
        
        // Randomly delete objects from the list.
        for (int i = 0; i < rnd.length; i++)
            list.remove(rnd[i] % list.size());
    }
}

Note the following key aspects of the above code:

To turn this test into a micro-benchmark, we will add the familiar BenchmarkRule.

    @Rule
    public TestRule benchmarkRun = new BenchmarkRule();
    

When this test is re-run in Eclipse, we will get the following result on the console:

Lists1.arrayList: [measured 10 out of 15 rounds]
 round: 0.60 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 15, GC.time: 0.02, time.total: 9.02, time.warmup: 3.01, time.bench: 6.01
Lists1.linkedList: [measured 10 out of 15 rounds]
 round: 1.14 [+- 0.07], round.gc: 0.00 [+- 0.00], GC.calls: 23, GC.time: 0.07, time.total: 17.09, time.warmup: 5.67, time.bench: 11.43
Lists1.vector: [measured 10 out of 15 rounds]
 round: 0.60 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 15, GC.time: 0.00, time.total: 9.04, time.warmup: 3.02, time.bench: 6.02

The difference between the random-access lists (Vector and ArrayList) and a LinkedList is clearly seen in the round time. Another difference is the much longer GC time (and the number of GC calls), resulting from more work the GC must perform on internal node structures in LinkedList (although this is only a wild guess).

There is no observable difference between the thread-safe Vector and un-synchronized ArrayList. In fact, the locks inside Vector are uncontended, so the JVM most likely removed them entirely. Now, just by upgrading the JVM to a newer version (from 1.5.0_18 to 1.6.0_18) the results change a lot, compare:

Lists2.arrayList: [measured 10 out of 15 rounds]
 round: 0.25 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 1, GC.time: 0.00, time.total: 3.77, time.warmup: 1.29, time.bench: 2.48
Lists2.linkedList: [measured 10 out of 15 rounds]
 round: 1.27 [+- 0.02], round.gc: 0.00 [+- 0.00], GC.calls: 1, GC.time: 0.00, time.total: 19.16, time.warmup: 6.42, time.bench: 12.74
Lists2.vector: [measured 10 out of 15 rounds]
 round: 0.24 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 3.73, time.warmup: 1.28, time.bench: 2.45

Random-access lists are almost three times faster than before, while linked-list is even slower than it was, but without (global) GC-activity. Just to stress this again: this is the same code, the same machine, just a different virtual machine in action. Why this is the case, we will leave as an exercise to the reader.

Changing benchmark options

If you wish to alter the default number of rounds or other aspects of the benchmarking environment, use the BenchmarkOptions annotation. It can be applied to methods or classes. Benchmarked methods inherit options in the following order:

JUnit-benchmarks hints a full GC before every method run to ensure similar test conditions for every invocation. In practice, methods will not be executed with a cleaned heap, so it may be sensible to disable GC-ing of memory and simply take an average from multiple test runs. We thus add the following declaration to the class (it is inherited by all methods):

@BenchmarkOptions(callgc = false, benchmarkRounds = 20, warmupRounds = 3)

We will re-run the benchmarks now, using an even newer JVM (1.7.0, ea-b83). We get:

Lists2.arrayList: [measured 20 out of 23 rounds]
 round: 0.12 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 0, GC.time: 0.00, time.total: 2.76, time.warmup: 0.39, time.bench: 2.36
Lists2.linkedList: [measured 20 out of 23 rounds]
 round: 0.82 [+- 0.01], round.gc: 0.00 [+- 0.00], GC.calls: 1, GC.time: 0.00, time.total: 18.78, time.warmup: 2.48, time.bench: 16.30
Lists2.vector: [measured 20 out of 23 rounds]
 round: 0.12 [+- 0.00], round.gc: 0.00 [+- 0.00], GC.calls: 1, GC.time: 0.00, time.total: 2.80, time.warmup: 0.40, time.bench: 2.40

Impressive, huh?

Persistent benchmark history

While you work on your code and experiment, it is useful to persist the results of each benchmark, so that you can make comparisons later on. Or, for that matter, you may wish to set up an automated build that runs benchmarks on different virtual machines, stores the results in the same database and then draws graphical comparisons of results. For this, JUnitBenchmarks provides an option to save your benchmark results in a file-based, relational database H2.

To enable persistent storage, define the following global system properties:

For example, to output the results to the console and the H2 database, use the following properties:

-Djub.consumers=CONSOLE,H2 -Djub.db.file=.benchmarks

In Eclipse, you can type these properties into the VM arguments area, as in the figure below.

Eclipse Launch Configuration for storing JUnitBenchmarks in the H2 database

You will also need to download the H2 JAR separately and add it to your classpath.

Drawing charts for comparisons

Figures will tell you much, but a picture is worth a thousand digits... or something. Still in Eclipse, let's add the following annotation to the test class:

@AxisRange(min = 0, max = 1)
@BenchmarkMethodChart(filePrefix = "benchmark-lists")

This annotation causes junit-benchmark's H2-based consumer to draw a chart comparing the results of all methods inside the test class. Once the JUnit run is finished, a new file will be created in the project's default folder: benchmark-lists.html and benchmark-lists.json. The HTML file is written using Google Charts and requires an internet connection. Once opened, the chart looks like this:

Method benchmark results chart generated by JUnitBenchmarks

A different type of graphical visualization shows the history of benchmark runs for a given class (and all of its test methods). Let's compare different JVMs using this chart. We will run the same JUnit test from Eclipse using different JVMs each time adding a custom run key so that we know which run corresponded to which JVM. The custom key is a global system property and is stored in the H2 database together with the run's data. We will set it to the name of each JVM. We execute Eclipse JUnit test four times, each time changing the JRE used for the test and modifying the jub.customkey property.

Then, we add the following annotation to the test class:

@BenchmarkHistoryChart(labelWith = LabelType.CUSTOM_KEY, maxRuns = 20)

We tun the test again and open the resulting file.

History chart generated by JUnitBenchmarks

Interestingly, IBM's virtual machine has a visible difference between Vector and ArrayList.

The target directory for chart generation can be changed by passing a global property (see JavaDoc). Do use and experiment with the H2 database directly for more advanced charting or analysis; H2 is a regular SQL database and provides a convenient access GUI via the browser. You can ctart it from the command-line using: java -jar lib/h2*.