How to optimise a code to be JIT friendly

In the previous blog post we measured the effect of simplest JVM JIT optimisation technique of method inlining. The code example was a bit unnatural as it was super simple Scala code just for demonstration purposes of method inlining. In this post I would like to share a general approach I am using when I want to check how JIT treats my code or if there is some possibility to improve the code performance in regards to JIT. Even the method inlining requires the code to meet certain criteria as bytecode length of inlined methods etc. For this purpose I am regularly using great OpenJDK project called JITWatch which comes with bunch of handy tools in regard to JIT. I am pretty sure that there is probably more tools and I will be more than happy if you can share your approaches when dealing with JIT in the comment section bellow the article.
Java HotSpot is able to produce a very detailed log of what the JIT compiler is exactly doing and why. Unfortunately the resulting log is very complex and difficult to read. Reading this log would require an understanding of the techniques and theory that underline JIT compilation. Free tool like JITWatch process those logs and abstract this complexity away form the user.
<method id='835' holder='832' name='inc' return='722' arguments='722' flags='1' bytes='4' compile_id='25' compiler='C1' level='1' iicount='113184'/>
<dependency type='unique_concrete_method' ctxk='832' x='835'/>
<call method='835' count='100003' prof_factor='1' inline='1'/>
<inline_success reason='inline (hot)'/>
<parse method='835' uses='100003' stamp='0.114'>
<uncommon_trap bci='12' reason='null_check' action='maybe_recompile'/>
<parse_done nodes='150' live='144' memory='42480' stamp='0.114'/>
In order to produce log suitable for JIT Watch investigation the tested application needs to be run with following JVM flags:


those settings will produce log file hotspot_pidXXXXX.log. For purpose of this article I re-used code from previous blog located on my GitHub account with JVM flags enabled in build.sbt.
In order to look into generated machine code in JITWatch we need to install HotSpot Disassembler (HSDIS) to install it to $JAVA_HOME/jre/lib/server/. For Mac OS X that can be used from here and try renaming it to hsdis-amd64-dylib. In order to include machine code into generated JIT log we need to add JVM flag -XX:+PrintAssembly.
[info] 0x0000000103e5473d: test %r13,%r13
[info] 0x0000000103e54740: jne 0x0000000103e5472a
[info] 0x0000000103e54742: mov $0xfffffff6,%esi
[info] 0x0000000103e54747: mov %r14d,%ebp
[info] 0x0000000103e5474a: nop
[info] 0x0000000103e5474b: callq 0x0000000103d431a0 ; OopMap{off=112}
[info] ;*invokevirtual inc
[info] ; - com.jaksky.jvm.tests.jit.IncWhile::testJit@12 (line 19)
[info] ; {runtime_call}
[info] 0x0000000103e54750: callq 0x0000000102e85c18 ;*invokevirtual inc
[info] ; - com.jaksky.jvm.tests.jit.IncWhile::testJit@12 (line 19)
[info] ; {runtime_call}
[info] 0x0000000103e54755: xor %r13d,%r13d
We run the JITWatch via ./
to configure source files and target generated class files

And finally open prepared JIT log and hit Start.

The most interesting from our perspective is TriView where we can see source code, JVM bytecode and native code. For this particular example we disabled method inlining via JVM Flag “-XX:CompileCommand=dontinline, com/jaksky/jvm/tests/jit/

To just compare with the case when the method body of is inlined – native code size is grater 216 compared to 168 with the same bytecode size.
Compile Chain provides also a great view on what is happening with the code
Inlining report provides a great overview what is happening with the code
As it can be seen the effect of tiered compilation as described in JIT optimisation starts with client C1 JIT optimisation and then switches to server C2 optimisation. The same or even better view can be found on Compiler Thread activity which provides a timeline view. To refresh memory check overview of JVM threads. Note: standard java code is subject to JIT optimizations too that’s why so many compilation activity here.
JITWatch is really awesome tool and provides many others views which doesn’t make sense to screenshot all e.g. cache code allocation, nmethodes etc. For detail information I really suggest reading JITWatch wiki pages.  Now the question is how to write JIT friendly code? Here pure jewel of JITWatch comes in: Suggestion Tool. That is why I like JITWatch so much. For demonstration I selected somewhat more complex problem – N Queens problem.
Suggestion tool clearly describes why certain compilations failed and what was the exact reason. It is coincidence that in this example we hit again just inlining as there is definitely more going on in JIT but this window provides clear view into how we can possibly help JIT.
Another great tool which is also a part of JITWatch is JarScan Tool. This utility will scan a list of jars and count bytecode size of every method and constructor. The purpose of this utility is to highlight the methods that are bigger than HotSpot threshold for inlining hot methods (default 35 bytes) so it provides hints where to focus benchmarking to see whether decomposing code into smaller methods brings some performance gain. Hotness of the method is determined by set of heuristics including call frequency etc. But what can eliminate the method from inlining is its size. For sure just the method size it too big breaching some limit for inlining doesn’t automatically mean that method is performance bottleneck. JarScan tool is static analysis tool which has no knowledge of runtime statistics hence real method hotness.
jakub@MBook ~/Development/GIT_REPO (master) $ ./ --mode=maxMethodSize --limit=35 ./chess-challenge/target/scala-2.12/classes/
To wrap up, JITWatch is a great tool which provides insight into HotSpot JIT compilations happening during program execution and it can help you to understand how decision made at the source code level can affect the performance of the program.

JVM JIT compilation as a way of performance optimisation

Previous article structure of JVM – java memory model briefly mentions bytecode executions modes and article JVM internal threads provides additional insight into internal architecture of JVM execution. In this article we focus on Just In Time compilation and on some of its basic optimisation techniques. We also discuss performance impact of one optimisation technique namely method inlining. In the reminder of this article we focus solely on HotSpot JVM however principles are valid in general.
HotSpot JVM is a mixed-mode VM which means that it starts off interpreting the bytecode, but it can compile code into very highly optimised native machine code for faster execution. This optimised code runs extremely fast and performance can be compared with C/C++ code.  JIT compilation happens on method basis during runtime after the method has been run a number of times and considered as a hot method. The compilation into machine code happens on a separate JVM thread and will not interrupt the execution of the program. While the compiler thread is compiling a hot method JVM keeps on using the interpreted version of the method until the compiled version is ready.  Thanks to code runtime characteristics HotSpot JVM can make sophisticated decision about how to optimise the code.
Java HotSpot VM is capable of running in two separate modes (C1 and C2) and each mode has a different situation in which it is usually preferred:
  • C1 (-client) – used for application where quick startup and solid optimization are needed, typically GUI application are good candidates.
  • C2 (-server) – for long running server application
Those two compiler modes use different techniques for JIT compilation so it is possible to get for the same method very different machine code. Modern java application can take advantage of both compilation modes and starting from Java SE 7 feature called tiered compilation is available.  Application starts with C2 compilation which enables fast startup and once the application is warmed up compiler C2 takes over. Since Java SE 8 tiered compilation is a default. Server optimisation are more aggressive based on assumptions which may not always hold. These optimizations are always protected with guard condition to check whether the assumption is correct. If an assumption is not valid JVM reverts the optimisation and drops back to interpreted mode. In server mode HotSpot VM runs a method in interpreted mode 10 000 times before compiling it (can be adjusted via -XX:CompileThreshold=5000). Changing this threshold should be considered thoroughly as HotSpot VM works best when it can accumulate enough statistics in order to make intelligent decision what to compile. If you wanna inspect what is compiled use -XX:PrintCompilation.
Among most common JIT compilation techniques used by HotSpot VM is method inlining, which is practice of substituting the body of a method into the places where the method is called. This technique saves the cost of calling the method. In the HotSpot there is a limit on method size which can be substituted. Next technique commonly used is monomorphic dispatch which relies on a fact that there are paths through method code which belongs to one reference type most of the time and other paths that belong to other type. So the exact method definitions are known without checking thanks to this observation and the overhead of virtual method lookup can be eliminated. JIT compiler can emit optimised machine code which is faster. There is many other optimisation techniques as loop optimisation, dead code elimination, intrinsics and others.
The performance gain by inlining optimisation can be demonstrated on simple Scala code:
class IncWhile {

  def main(): Int = {
    var i: Int = 0
    var limit = 0

    while (limit < 1000000000) {
      i = inc(i)
      limit = limit + 1

  def inc(i: Int): Int = i + 1

Where method inc is eligible for inlining as the method body is smaller than 35 bytes of JVM bytecode (actual size of inc method is 9 bytes). Inlining optimisation can be verified by looking into JIT optimised machine code.


Difference is obvious when compared to machine code when inlining is disabled use  –XX:CompileCommand=dontinline,com/jaksky/jvm/tests/jit/

IncWhile-dontinlineDifference in runtime characteristics is also significant as the benchmark results show. With disabled inlining:

[info] Result "com.jaksky.jvm.tests.jit.IncWhile.main":
[info] 2112778741.540 ±(99.9%) 9778298.985 ns/op [Average]
[info] (min, avg, max) = (2040573480.000, 2112778741.540, 2192003946.000), stdev = 28831537.237
[info] CI (99.9%): [2103000442.555, 2122557040.525] (assumes normal distribution)
[info] # Run complete. Total time: 00:08:03
[info] Benchmark Mode Cnt Score Error Units
[info] IncWhile.main avgt 100 2112778741.540 ± 9778298.985 ns/op

When inlining enabled JVM JIT also capable to use next optimizations like loop optimizations which might case that our whole loop is eliminated as it is easily predictable. We would get time around 3 ns which is for 1GHz processor unreal to perform billion of operations. To disable most of loop optimizations use -XX:LoopOptsCount=0 JVM option.

[info] Result "com.jaksky.jvm.tests.jit.IncWhile.main":
[info] 332699064.778 ±(99.9%) 3485503.823 ns/op [Average]
[info] (min, avg, max) = (316312877.000, 332699064.778, 358738827.000), stdev = 10277087.396
[info] CI (99.9%): [329213560.955, 336184568.600] (assumes normal distribution)
[info] # Run complete. Total time: 00:04:55
[info] Benchmark Mode Cnt Score Error Units
[info] IncWhile.main avgt 100 332699064.778 ± 3485503.823 ns/op
so the performance gain by inlining a method body can be quite significant 2 seconds vs 300 miliseconds.
In this post we discussed mechanics of Java JIT compilation and some optimisation techniques used. We particularly focused on the one of the simplest optimisation technique called method inlining. We demonstrated performance gain brought by eliminating a method call represented by invokevirtual bytecode instruction. Scala also offers a special annotation @inline which should help us with performance aspects of the code under the development. All the code for running the experiments is available online on my GitHub account.