Stress testing asynchronous REST service with Gatling

In this blog post, I am not going to deep dive into application performance testing theory nor methodology but rather focus on explaining more advanced features of Gatling performance testing tool on the relatively common scenario when testing asynchronous service. Asynchronous service we are going to test in this post, report generation service, has following API specification:

  1. A report is requested for a given userId via POST method on /report endpoint and in the response request tracking reportId is returned
  2. Report status is tracked via GET method on /report/{reportId} when response status code is 202 Accepted it means that report generation is still in progress. 5xx status code means error during generation. When the report is done service returns 200 OK and the generated report.

Gatling testing tool allows you to create any testing scenario. Testing synchronous service is pretty simple – call the service with the request payload, get the response and assert the content. Performance, in this case, is relatively simple: response time of single service, error rates etc. However, testing an asynchronous service is a bit more difficult as it cannot be done via a single call. Testing scenario needs to follow the structure:

  1. request a service with given parameters and store id for this particular request
  2. poll the request status for tracing the progress
  3. download result when the request is completed

Performance of the service cannot be measured directly as processing is no longer bounded by a single call. What we would like to measure is processing time since the report is requested until processing is complete.

Test scenario can be nicely transformed into test scenario in Gatling DSL:

 val generateReport = scenario("Generate report")
 .exec(requestReport)
 .exec(pollReportGenerationState)
 .exec(getGeneratedReport)

Each test step contains validations that assure correctness of test scenario for a given virtual user. Test scenario relies on Gatling sessions which allow us to carry over user specific data between test steps. In this particular case, we need user specific reportId in order to track request progress and finally collect the report. ReportId is parsed from JSON Report generation response :

 private val requestReport: HttpRequestBuilder = http("Request report generation")
 .post("/report")
 .queryParamMap(withApiKey)
 .body(StringBody(reportRequestBody(userId, reportParameter)))
 .asJSON
 .check(status is 202)
 .check(jsonPath("$..reportId").ofType[String].exists.saveAs("reportId"))

Probably most interesting part is polling the report generation state where we need to keep a virtual user in polling mode until the report is ready. Again we are going to use Session in combination with asLongAs looping operator:

 private def notGenerated(sess: Session): Validation[Boolean] = {
val reportInProgress = 202
val generationStatus = sess("generationStatus").asOption[Int].getOrElse(reportInProgress)
logger.debug(s"Report generation status: $generationStatus")
generationStatus == reportInProgress
}

private val pollReportGenerationState = asLongAs(notGenerated)(
pause(100.millis)
.exec(
http("Poll report generation state")
.get("/report/${reportId}")
.queryParamMap(withApiKey)
.asJSON
.check(status saveAs ("generationStatus"))
)
)

When working with asynchronous service another operator worth consideration is “tryMax”. The disadvantage of this operator is that request failed are counted towards failed requests what in our case would drastically falsify results. Silencing the request would erase this test step completely.

Collecting report when is generated is pretty straightforward. In order to collect statistics on generating report scenario – time to get the report, we need to wrap the part of the scenario in group combinator which will result in group statistics for that group.

val generateReport = scenario("Generate report").group("Generation report completion"){
exec(requestReport)
.exec(pollReportGenerationState)
.exec(getGeneratedReport)
}

In this blog post, we used more advanced features of Gatling performance testing tool when testing asynchronous service using virtual users sharing important data between test steps via Session. Complete code snippet can be found on my GitHub gist. If you use different way when testing asynchronous service or any suggestion how to support custom metrics in Gatling report let me know in comment section below this blog post.

Advertisements

Testing BPMS component

In the first part of this series, we focused on a designing system using BPMS technology for orchestrating workflow in the organisation, then we shared useful points from the developers perspective. In this third part, we will focus on a quality aspect of the solution.
I do remember a discussion with one of our QA guys regarding BPMS testing I want to share. I was asking QA for requirements on a system and curious what methodology is being used for this component.  The answer I got and  I will probably never forget was: BPMS is a minor part of the system hence we are not supposed to test it at all. The motivation behind this article is simply based on fact that this approach wasn’t correct an provide some insight what’s going on. There is no ambition to provide a complete methodology or best practices regarding testing of BPMS component. That is the role of skilled QA.
As BPMS is a solution for orchestration of your business services inside the house. Simply it drives the workflow. BPMS isn’t usually a decision maker. Decision-making rules are typically required to be flexible, expect frequent changes. It should reflex business changes as quick as possible. So because of that, it is not a good practice to hard-code them into processes in a form of “spaghetti code structure” (structure of if-else in several levels) which is error-prone and hard to maintain. Those are reasons for having a separate component responsible for decision making – BRE (business rule engine). So the QA task can be divided into two main objectives for functional testing. Verify for given input data:
  • all the necessary data for making a decision present at the specified point? This can be difficult because of a large amount of incoming path to the decision point. Despite the execution path you are verifying that all the data needed were gathered in the system.
  • based on decision results are the steps actioned in the correct order? Verification of the required business process.
  • are the fault recovery procedures working correctly? Switching the system to fault recovery mode and verification that the system stored all data correctly and data completeness.
For sure there can be more aspects but those are considered as the main ones. The main problem of the testing is that those aspects cannot be tested in isolation. By isolation, I mean that you cannot use standard methodologies (e.g. black box, white box, … whatever it is) and point somewhere in the system. BPMS is a system component that has “memory”. That means you cannot simply arbitrarily divide the process into parts which are you going to test in separation. Some systems can have something like “point of synchronization” (despite the execution path the system has defined data set) but this depends on a design and hence it isn’t mandatory.
Let’s have a look at possibilities. The product itself offers the feature called BUnit what is alternative to JUnit in java world. It is feature facilitating process unit testing. All invoke activities within the process are mocked – the XML reply is recorded. XML manipulation expressions and gathering data within the flow ( aspect 1) can be tested this way by correct choice of recorded data. But the tests are still taking place in artificial conditions. Aspect 3 – testing fault recovery can be tested relatively easily by this approach if there is no awkward decision during the design phase. Test analyst is the key role in this process. No need to talk about documentation of the system itself. Unit testing of BRE is completely separate chapter not discussed here.
Having verified basic functionality of the blocks – processes and subprocesses we can continue with integration testing. Usually, this kind of systems are systems with high degree of integration so it is really handy to have all back-end systems under your control. The reason no. 1 – data-driven system – the behaviour of your data depends on a data in those systems. The reason no. 2 – BPMS has “memory” (it is “state-full”). If you wanna test from a certain point in the process you have to bring the system in this point. You need to do it repetitively and in a well-defined way. The approach used in web application testing – modification of data in DB to bring the order, application and etc. to a certain state is not sufficient here. Having simulators of real back-end systems was proved as really good practice. This way you simply isolate your system and time to error localization is significantly lower. This way you can conduct integration testing of bigger functional blocks up to end-to-end testing. There is no doubt that higher level of automation is a must.