Stress testing asynchronous REST service with Gatling

In this blog post, I am not going to deep dive into application performance testing theory nor methodology but rather focus on explaining more advanced features of Gatling performance testing tool on the relatively common scenario when testing asynchronous service. Asynchronous service we are going to test in this post, report generation service, has following API specification:

  1. A report is requested for a given userId via POST method on /report endpoint and in the response request tracking reportId is returned
  2. Report status is tracked via GET method on /report/{reportId} when response status code is 202 Accepted it means that report generation is still in progress. 5xx status code means error during generation. When the report is done service returns 200 OK and the generated report.

Gatling testing tool allows you to create any testing scenario. Testing synchronous service is pretty simple – call the service with the request payload, get the response and assert the content. Performance, in this case, is relatively simple: response time of single service, error rates etc. However, testing an asynchronous service is a bit more difficult as it cannot be done via a single call. Testing scenario needs to follow the structure:

  1. request a service with given parameters and store id for this particular request
  2. poll the request status for tracing the progress
  3. download result when the request is completed

Performance of the service cannot be measured directly as processing is no longer bounded by a single call. What we would like to measure is processing time since the report is requested until processing is complete.

Test scenario can be nicely transformed into test scenario in Gatling DSL:

 val generateReport = scenario("Generate report")
 .exec(requestReport)
 .exec(pollReportGenerationState)
 .exec(getGeneratedReport)

Each test step contains validations that assure correctness of test scenario for a given virtual user. Test scenario relies on Gatling sessions which allow us to carry over user specific data between test steps. In this particular case, we need user specific reportId in order to track request progress and finally collect the report. ReportId is parsed from JSON Report generation response :

 private val requestReport: HttpRequestBuilder = http("Request report generation")
 .post("/report")
 .queryParamMap(withApiKey)
 .body(StringBody(reportRequestBody(userId, reportParameter)))
 .asJSON
 .check(status is 202)
 .check(jsonPath("$..reportId").ofType[String].exists.saveAs("reportId"))

Probably most interesting part is polling the report generation state where we need to keep a virtual user in polling mode until the report is ready. Again we are going to use Session in combination with asLongAs looping operator:

 private def notGenerated(sess: Session): Validation[Boolean] = {
val reportInProgress = 202
val generationStatus = sess("generationStatus").asOption[Int].getOrElse(reportInProgress)
logger.debug(s"Report generation status: $generationStatus")
generationStatus == reportInProgress
}

private val pollReportGenerationState = asLongAs(notGenerated)(
pause(100.millis)
.exec(
http("Poll report generation state")
.get("/report/${reportId}")
.queryParamMap(withApiKey)
.asJSON
.check(status saveAs ("generationStatus"))
)
)

When working with asynchronous service another operator worth consideration is “tryMax”. The disadvantage of this operator is that request failed are counted towards failed requests what in our case would drastically falsify results. Silencing the request would erase this test step completely.

Collecting report when is generated is pretty straightforward. In order to collect statistics on generating report scenario – time to get the report, we need to wrap the part of the scenario in group combinator which will result in group statistics for that group.

val generateReport = scenario("Generate report").group("Generation report completion"){
exec(requestReport)
.exec(pollReportGenerationState)
.exec(getGeneratedReport)
}

In this blog post, we used more advanced features of Gatling performance testing tool when testing asynchronous service using virtual users sharing important data between test steps via Session. Complete code snippet can be found on my GitHub gist. If you use different way when testing asynchronous service or any suggestion how to support custom metrics in Gatling report let me know in comment section below this blog post.

Advertisements

JVM JIT compilation as a way of performance optimisation

Previous article structure of JVM – java memory model briefly mentions bytecode executions modes and article JVM internal threads provides additional insight into the internal architecture of JVM execution. In this article, we focus on Just In Time compilation and on some of its basic optimisation techniques. We also discuss performance impact of one optimisation technique namely method inlining. In the remainder of this article we focus solely on HotSpot JVM, however, principles are valid in general.
HotSpot JVM is a mixed-mode VM which means that it starts off interpreting the bytecode, but it can compile code into very highly optimised native machine code for faster execution. This optimised code runs extremely fast and performance can be compared with C/C++ code.  JIT compilation happens on method basis during runtime after the method has been run a number of times and considered as a hot method. The compilation into machine code happens on a separate JVM thread and will not interrupt the execution of the program. While the compiler thread is compiling a hot method JVM keeps on using the interpreted version of the method until the compiled version is ready.  Thanks to code runtime characteristics HotSpot JVM can make a sophisticated decision about how to optimise the code.
Java HotSpot VM is capable of running in two separate modes (C1 and C2) and each mode has a different situation in which it is usually preferred:
  • C1 (-client) – used for application where quick startup and solid optimization are needed, typically GUI application are good candidates.
  • C2 (-server) – for long running server application
Those two compiler modes use different techniques for JIT compilation so it is possible to get for the same method very different machine code. Modern java application can take advantage of both compilation modes and starting from Java SE 7 feature called tiered compilation is available.  An application starts with C2 compilation which enables fast startup and once the application is warmed up compiler C2 takes over. Since Java SE 8 tiered compilation is a default. Server optimisation is more aggressive based on assumptions which may not always hold. These optimizations are always protected with guard condition to check whether the assumption is correct. If an assumption is not valid JVM reverts the optimisation and drops back to interpreted mode. In server mode HotSpot VM runs a method in interpreted mode 10 000 times before compiling it (can be adjusted via -XX:CompileThreshold=5000). Changing this threshold should be considered thoroughly as HotSpot VM works best when it can accumulate enough statistics in order to make an intelligent decision what to compile. If you wanna inspect what is compiled using-XX:PrintCompilation.
Among most common JIT compilation techniques used by HotSpot VM is method inlining, which is a practice of substituting the body of a method into the places where the method is called. This technique saves the cost of calling the method. In the HotSpot there is a limit on method size which can be substituted. Next technique commonly used is monomorphic dispatch which relies on a fact that there are paths through method code which belongs to one reference type most of the time and other paths that belong to other types. So the exact method definitions are known without checking thanks to this observation and the overhead of virtual method lookup can be eliminated. JIT compiler can emit optimised machine code which is faster. There are many other optimisation techniques as loop optimisation, dead code elimination, intrinsics and others.
The performance gain by inlining optimisation can be demonstrated on simple Scala code:
class IncWhile {

  def main(): Int = {
    var i: Int = 0
    var limit = 0

    while (limit < 1000000000) {
      i = inc(i)
      limit = limit + 1
    }
    i
  }

  def inc(i: Int): Int = i + 1
}

Where method inc is eligible for inlining as the method body is smaller than 35 bytes of JVM bytecode (actual size of inc method is 9 bytes). Inlining optimisation can be verified by looking into JIT optimised machine code.

IncWhile-inlined

Difference is obvious when compared to machine code when inlining is disabled use  –XX:CompileCommand=dontinline,com/jaksky/jvm/tests/jit/IncWhile.inc

IncWhile-dontinlineThe difference in runtime characteristics is also significant as the benchmark results show. With disabled inlining:

[info] Result "com.jaksky.jvm.tests.jit.IncWhile.main":
[info] 2112778741.540 ±(99.9%) 9778298.985 ns/op [Average]
[info] (min, avg, max) = (2040573480.000, 2112778741.540, 2192003946.000), stdev = 28831537.237
[info] CI (99.9%): [2103000442.555, 2122557040.525] (assumes normal distribution)
[info] # Run complete. Total time: 00:08:03
[info] Benchmark Mode Cnt Score Error Units
[info] IncWhile.main avgt 100 2112778741.540 ± 9778298.985 ns/op

When inlining enabled JVM JIT also capable to use next optimizations like loop optimizations which might case that our whole loop is eliminated as it is easily predictable. We would get time around 3 ns which are for 1GHz processor unreal to perform billions of operations. To disable most of loop optimizations use -XX:LoopOptsCount=0 JVM option.

[info] Result "com.jaksky.jvm.tests.jit.IncWhile.main":
[info] 332699064.778 ±(99.9%) 3485503.823 ns/op [Average]
[info] (min, avg, max) = (316312877.000, 332699064.778, 358738827.000), stdev = 10277087.396
[info] CI (99.9%): [329213560.955, 336184568.600] (assumes normal distribution)
[info] # Run complete. Total time: 00:04:55
[info] Benchmark Mode Cnt Score Error Units
[info] IncWhile.main avgt 100 332699064.778 ± 3485503.823 ns/op
so the performance gain by inlining a method body can be quite significant 2 seconds vs 300 milliseconds.
In this post, we discussed mechanics of Java JIT compilation and some optimisation techniques used. We particularly focused on the one of the simplest optimisation technique called method inlining. We demonstrated performance gain brought by eliminating a method call represented by invokevirtual bytecode instruction. Scala also offers a special annotation @inline which should help us with performance aspects of the code under the development. All the code for running the experiments is available online on my GitHub account.