4 devs by devs: Kubernetes interview question made easy

Introduction to kubernetes

Kubernetes becomes defacto standard runtime environment for modern cloud-native applications. As a developer, you should understand the basic concepts behind and operate the application reasonably in DevOps mode. This post aims to explain basic concepts and provide a solid grounding you can further improve your knowledge and prepare you for day-to-day work and potential interview questions. This post is organized as a series of closed topics where understanding those areas should allow you to answer a range of questions. Kubernetes is a complex system, so we simplify things. If you want to prepare for Kubernetes Certified Developer (KCD), this post can be a stepping stone.

What is Kubernetes

Kubernetes is an open-source container orchestration tool for managing multi-container (e.g. by docker) applications. It is in some sense similar to docker swarm or loosely related to docker-compose. Kubernetes was initially developed by Google and released as open-source in 2014.

Simplified kubernetes architecture diagram

After reading this section, you should be able to answer the following questions:

  • What are the components of the kubernetes cluster?
  • What are the row responsibilities of those components?

We are not going into great detail to provide a high-level context as this post is not aimed at Kubernetes administrators but rather developers.

Simplified kubernetes architecture

Master node

The master node is responsible for the management of the cluster and is running control plane components. Kube-apiserver expose APIs for management. Controller managers consist of two distinct types: cloud and Kube related. Cloud controller embed a cloud-specific control logic while Kube related take care of kubernetes cluster related things, e.g. node statuses, the job runs, access token creations etc. All data are stored in etcd key-value store. Kube-scheduler is responsible for newly created pods and select nodes to run them. It takes into consideration various limitations and specifications for a given pod. 

Worker Nodes

Worker nodes are running workloads and consist of Kublet and kube-proxy. Kube-proxy is a network proxy running on each node in the cluster and is responsible for network communication according to configuration. Kublet runs on each node as well, and his responsibility is to make sure that pod containers are running and healthy.

Kubernetes basic concepts

After reading this section, you should be able to answer the following questions:

  • What is kubernetes manifest?
  • What is a kubernetes pod?
  • What is a control manager?
  • How kubernetes control loop operate?

Kubernetes control loop – the heart of the kubernetes

Kubernetes is based on the modern declarative DevOps principle that you describe the desired state of things rather than a set of steps that lead to that state. At the heart of Kubernetes, there is a control loop that constantly evaluates a current state versus a desired state and requests appropriate corrections that should lead towards it. The desired state in the Kubernetes is described as a Kubernetes manifests, a bunch of YAML files, that describes a state for various kubernetes objects called kinds, e.g. deployment. Controllers continuously execute the control loop and drive changes, usually through an apiserver. The controller is typically responsible for one or more kubernetes types/kinds. The basic kind which acts as a unit of scalability in kubernetes is called a pod. The pod can consist of one or more containers with specified parameters. 

Kubernetes object kinds

After reading this section, you should be able to answer the following questions:

  • What is the difference between service and ingress?
  • How are kubernetes objects organized?
  • What is a daemonset good for?
  • What is the difference between deployment and stateful set?
  • How to store sensitive information in the cluster?
  • What are lower-level kubernetes objects? 

Basic Kubernetes object kinds

Kubernetes deployment consists of various abstractions over infrastructure that are described in kubernetes as object kinds. They can be divided according to different criteria:

  • High-level kinds vs lower level kinds
  • Stateful vs stateless
  • Ephemeral vs permanent kinds 
basic kubernetes kinds

The image above describes the basic kinds used in kubernetes deployment. As mentioned earlier, the basic unit of scalability and deployment is a pod. Pod represents a set of containers with associated resources granted to them. Because a pod is a transient object that can be freely moved in the cluster, e.g. by re-scheduling, kubernetes provide a service object used as a permanent locator/address within a cluster (implements a service locator pattern). Service can be represented by, for example, a load balancer. The sole purpose of ingress is to reach the pods running in the cluster from the outside world. Ingress is typically represented by cloud global HTTPS load balancer and/or Nginx proxy server or similar. Trafic is further routed based on the host header to the appropriate service within the cluster. As the pod is transient, so the container does. To provide persistent storage to a container, the concept of persistent volume claim (pvc) addresses this requirement. ConfigMap object provides a set of “static” files, e.g. for container configuration purposes. In comparison, secrets offer a way how to access sensitive information from containers.

Those lower-level objects can be further abstracted by kubernetes higher-level kinds like deployment or stateful sets, which removes a need to specify replica sets. Deployment is primarily aimed for stateless applications, while stateful set provides ways to maintain pod identity through pod restarts or relocation, e.g. moving storage around or keeping communication to a given pod.

Different runtime requirements

To address the different needs for applications runtime, kubernetes introduces a few more concepts: Kubernetes jobs and deamon sets. A deamon set is a special kind of deployment pod that runs on every VM machine from a node pool. This kind of deployment is typically used for VM metrics collection etc. Kubernetes jobs provide a way how to run a single task in the kubernetes cluster. For example, spinnaker custom stages are managed this way.

Object versioning

Nearly every object in kubernetes has its version. So it is possible to restore any version running by activating those. Unfortunately, the only exceptions to those rules from the basic objects mentioned above are ConfigMaps and Secrets (if I am not mistaken). Restoring a deployment version without a corresponding version of configuration is one of the big shortcomings, and in my opinion, the decision was unlucky. There are techniques how to deal with that.

Interaction with kubernetes cluster, managing kubernetes deployment

After reading this section, you should be able to answer the following questions:

  • What is a key tool for interaction with the cluster?
  • What are the basic operational models/deployments?
  • How do you manage resources?
  • What is the difference between liveness and readiness probe?

Kube control

When interacting with the Kubernetes cluster directly, the key tool is Kube control (kubectl). Kubectl is a command-line utility that interacts with a cluster api server. It allows you to add or remove deployment, get the version of any manifest file. It also provides a set of administrative commands.

Using kubectl, you can drive a deployment as a rolling upgrade ( as a default scenario), a more advanced procedure via replica controllers or just simple scaling up or down pods (as shown in the example).

kubectl scale --replicas=3 deployment/mysql

Resource management

Every pod specification contains a section of manifest dedicated to resources. The resource requests section specifies resources necessary for starting the pod containers, which the scheduler uses to plan a deployment. The resource limit section defines resources that cannot be exceeded, and the pod will get killed or throttled. This is a fundamental mechanism for configuring resources from bursty to guaranteed load and one way that affects the quality of service.

    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

Kubernetes probes

Every pod is evaluated in the kubernetes control loop to correct potential differences from the desired state. To make sure that the pod is ok kubernetes use liveness and readiness probes mechanism. The liveness probe checks whether your application is up and running. If not, kubernetes will kill and restart the pod. The readiness probe checks whether the pod can accept and process new requests. As a software developer, you should provide such endpoints that correctly implement the logic to provide such insight. Only with correctly implemented endpoints, you can achieve a zero-downtime deployment. 

Kubernetes advanced topics

After reading this section, you should be able to answer the following questions:

  • How do you govern application uptime? ~ pod disruption budget
  • How do you adopt an application to various load scenarios? ~ autoscaling
  • How do you isolate applications from each other?
  • How to achieve an advanced deployment strategy?

Advanced kubernetes kinds

A lower-level concept called Replica Set is responsible for scaling out a pod to handle more traffic. This concept is a key for basic and more advanced deployment strategies like rolling upgrade (basic) or blue-green deployment. By extension of this capability is to provide automatic scalability using a horizontal pod autoscaler, where the server changes the number of replicas for a given pod. Contrary to guaranteeing some application availability running in a pod, you can define a pod disruption budget that describes the rules of how pods can be changed.

Kubernetes deployment isolations

Every deployed object to kubernetes is deployed to a namespace which acts as a logical separation of deployments in a single cluster. If a namespace is not specified, then the default namespace is used. System namespaces are also created, e.g. kube-system or kube-public. Those are used for deployments supporting kubernetes cluster operations. 

To isolate deployments on the network level, you need to specify a network policy. Network policy allows you to restrict how a pod is allowed to communicate with various network entities. 

Conclusion

In this post, we briefly touched all core concepts in Kubernetes that developers should be aware of to build a secure, stable and resilient solution. The information loads this post, and we often oversimplified the concept to make it digestible for newcomers. Please, let me know if I missed some vital concept or oversimplified something that deserves a more detailed explanation or is not clear enough. You can leave a comment here or reach me on Twitter

Processing…
Success! You're on the list.
Advertisement

Continuous Integration/Continuous delivery(deployment) tools overview

For any software development team is critical to delivering value as quickly as possible, safely and reliably. It is proven that speed of delivery is directly correlated with the organisation performance (see, e.g. State of DevOps report. So the delivery process influence company valuation and is critical for scaling the engineering effort withholding the desired quality of the product. How to achieve this is one of the cornerstones of DevOps and SRE. To get an idea of how the modern software delivery works in a successful company, see how delivery pipeline works in AWS.
This blog post is by no means a replacement for deep-dive specialised or best practice literature, e.g. Continous Delivery, Continous Integration but rather an evaluation of the current tooling landscape which can help in achieving project goals. No need to mention that tool alone won’t make the magic happen without correct delivery pipeline design. But in the post, we will focus solely on the tooling.

CI/CD key terms clarification

Continuous integration

Continuous integration (CI) is the practice of automating the integration of code changes from multiple contributors into a single software project.

Continuous Delivery

Continuous Delivery (CD) is the ability to get changes of all types—including new features, configuration changes, bug fixes and experiments— into production, or into the hands of users, safely and quickly in a sustainable way.

Continuous deployment

Continuous deployment (CD) is a strategy for software releases wherein any code commit that passes the automated testing phase is automatically released into the production environment. It is paramount to software delivery processes. 

The delivery process is critical in any software company. From my perspective and experience current state is far from being “solved”, and the number of tools appearing every year confirms that. The amount of money spend by VCs is just confirming that. The majority of tools are imperative, while the next big trend seems to be a “declarative” CI/CD tooling. Curious about what the future will bring.

CI/CD tools available

Wide variety of tools available (by no means list is extensive):

Selection criteria

Our selection and evaluation criteria base on our current and future needs:

  • cost-effective (auto-scaling workers, etc.)
  • cost of maintenance
  • speed of development/ability to contribute
  • manual approval stage
  • ability to pass certification (audit-ability, permission and roles, etc.)
  • multi-cloud support
  • support VMs + kubernetes deployments + potentially serverless
  • ability to integrate Infrastructure as Code to delivery pipeline
  • do not scratch all our development infra (keep in mind cost/benefit ratio)
  • majority of our workloads are running in GCP
  • deals with mono-repo
  • support for long term support (LTS) branches

CI/CD tools shortlist

Following tools made it into shortlist for evaluation and deep dive. See dedicated post for each of those:

Summary

Our ideal solution would be tooling provided by our primary cloud provider, which meets our current and near feature needs and is fully managed. We partially matched that with a combination of Cloud Build and Spinnaker for GCP based on tutorial provided by GCP.
Generally, my impression from the study and evaluation of tools listed is that claim of “full CI/CD” support are neither great in CI nor CD and lay somewhere in the middle. They provide a platform a let you code the rest. Another pain point is to tackle the monorepo and provide a means to be efficient. Platforms seem to be somewhat pricy, and the amount of infra work needed is not that low to justify it when providing all necessary features. Curious about what the Harness will provide in this space.
Not promoting the combination with end up with but was clear win moving away from Concourse CI. Where missing resource management for stages was a total killer, insufficient authorisation and role management and absence of manual steps was clear do not continue this journey. For a fresh new project, a GitLab would be a brainer to start with. It provides all needed for development, but when the project grows significantly, it can become pricy, and you are motivated even by GitLab to move partially to your infrastructure. Needless to say, that setup requires some amount of work, especially proxying and create network waypoints.
If you have some experiences with tools evaluated or disagree with the points, please use the comment section to share your view and don’t forget to like and follow me on Twitter!

Processing…
Success! You're on the list.

Continuous Delivery with Spinnaker – tool evaluation

Spinnaker one of the popular continuous delivery platform originally developed in Netflix. I am evaluating a version 1.23.5 . Spinnaker is a multi-cloud continuous delivery platform supporting VM and Kubernetes based deployments (server-less under development). Extensible platform with HA setup possible. This post is supposed to be part of the bigger series with a unified structure.

Spinnaker overview:

Spinnaker Architecture
Spinnaker basic concepts (Spinnaker started for VM deployments, Kubernetes concepts mapped to it in provider)
Pipeline stages
– Support for manual Judgement stage though no detailed permission model for actions (non OSS plugins exists e.g. Armory)
– Nesting pipeline supported (either fire and forget or wait for completion)
Custom stages development (Rest call, Kubernetes job or Jenkins job, …)
– Development of new stage

Authentication & Authorisation (Spinnaker security concepts):
Spinnaker Authentication
Spinnaker Authorisation with Role Based Access
– Spinnaker can be accessed through GCP Identity Aware Proxy (or other service on different cloud providers)
– Authentication G-Suite identity provider or GitHub teams. Other options exist as well, see overview here.
– Authorisation with Google Groups (only support flat structure, role = name of the group), GitHub teams , raw mapping or others
Pipelines are versioned automatically
Pipeline triggers
– Concept of providers which integrates pipelines with target platform or cloud providers, e.g. Kubernetes provider v2
– Support for complex deployment strategies
– Management CLI – Halyard (spinnaker configuration) and Spinn for pipeline management
– Deployment to Kubernetes in the form of native manifests, Helm packages transformed in Helm Bake Stage to native manifests (using native Helm support for templating)
– Terraform stage as a custom stage e.g. oss implementation
– Wide variety of notification options
– Monitoring support via Prometheus
Backup configuration to storage

Spinnaker pricing:

– There is no price for Spinnaker itself only for resources consumed when deployed
– Requires VMs, Redis or CloudSql(Postgress)
– Loadbalancer
Spinnaker for GCP if you are running on GCP, where you pay for resources needed only.

Resources:

https://spinnaker.io/
https://www.slideshare.net/Pivotal/modern-devops-with-spinnaker-olga-kundzich
https://spinnaker.io/concepts/ebook/

Summary:

Tool with focus on CD with manual approval stages, security model which makes it SOC2 compliant. Good audit-ability in place (possible to integrate to GCP audit log). Scripted stages and manual approval stage is possible to specify just a group. It is done on application/ pipeline level. Tool eliminate Helm from kubernetes cluster as it works based on Kubernetes native manifest. Propagates Immutable infrastructure as those artefacts are stored for possible rollbacks.  Authorisation/Authentication seems to be a complex but variable to integrate with wide variety of the systems. Pretty active user group, offering help. Pricing is based on resources used.

Continuous integration/delivery with GitLab – evaluation

GitLab one of the popular DevOps platform out there, currently. I am evaluating a version GitLab 13.7-pre- release features. This post is supposed to be part of the bigger series with a unified structure. Evaluation in the context of existing infrastructure GitHub + Prometheus + Grafana.

GitLab main components: 

Authentication and Authorisation with GitLab:

GitLab continous integration and continous delivery capabilities:

  • You need a runner (self hosted, or shared GitLab Runners ) where you can hit limits/costs 

Pipeline definition

Pipeline additional features

GitLab Pricing model:

  • Has the concept of minutes in the plan + buying extra ($10 per 1000min)
  • Pay for the storage $60/10GB  see details
  • strong push towards Premium with the cost $19/user/month.
  • GitLab pricing

Conclusion

I haven’t studied GitLab offering super profoundly, but for building a new project, I would consider starting with it as it provides complete SDLC support (compared to Spinnaker it is CI + CD). Acts as SDLC management on top of the cloud provider – providing an easy way how to comply with the majority of measures from certification, e.g. SOC 2, but those are the gold plan features ($99/user/month). This might be pricy, but if you use ticket management, documentation (instead of, e.g. Jira), roadmap tooling, release notes management, Terrafrom stage seems like a no-brainer!

Challenges you might hit when growing:

  • Pipeline deployment ordering as parallel pipelines run
  • Shared runners are small machines step to registered add admin infra work
  • A security model is similar to Spinnaker, additionally doesn’t allow custom groups, but I guess that you can create custom apps (users)
  • Pricing seems scary at the end runners probably run on your infra and registered to the platform, OTOH if managed to keep on shared runners, need to buy a lot of build minutes. 
  • Storage cost seems high 
  • Docker registry has 30 days expiry (probably can be extended) => you will be uploading to your GCR

I haven’t studied in deep deployment capabilities:

  • Integration with Helm – probably rendering via helm template and then deploy
  • Support for deployment strategies – requires appropriate kubernetes object manifests as everywhere
  • Registered kubernetes seems to have an agent running in them
  • Has all concepts from Spinnaker more less
  • Has starting support for Terraform in alpha

GitLab Potential pain points:

  • Having a whole pipeline in git(including deployment strategies configurations, approvals) – might pose challenges when there is no pure trunk-based development – requires a need for backporting and harder for surveillance. 

GitLab is built on top of plenty of OS projects where I can imagine that integration between your infrastructure and GL might be extensive.

The only reasonable scenario that you fully migrate to GitLab and reduce extra tooling like Assana, GitHub, Confluence, … or for new projects that might be a no-brainer. That migration can be pretty heavy, but you might get some compliance checks for that in a single workspace. 

Resources for more detail GitLab overview

Continuous integration with GCP Cloud Build – evaluation

Cloud Build one of the services available on Google Cloud Platform. Evaluation happened in January 2021 and I believe that is still improving. This post is supposed to be part of the bigger series with a unified structure.

GCP Cloud Build main features:

  • Even though Cloud Build labels itself as CI/CD tool it lacks the CD features (e.g. deployment strategies, manual approval stages etc.) – nobody prevents you from developing those
  • Run in GCP or has some support for local execution as well
  • Build using wiring Docker containers together. Executed on single VM, you can upscale VM to high cpu machines up to 32cpu. 

Cloud Build Continous Integration features:

Google Cloud Build Pricing:

Conclusion:

Purely CI system with the capability to build (~ Cloud Build). No triggers for time-based related things. So either Event-based (commit, tag, …) or manual trigger. Probably could be emulated via Cloud Function to trigger to simulate Time Based Trigger. Has the ability to run locally which is nice. Scales up to 32cpu machines. Prices based on build time (clock time). Doesn’t offer Approval stages, security model based on IAM and seems that you cannot grant permission on a particular configuration/build. Doesn’t have a concept of a pipeline – but rather a set of tasks steps(stages). Definition lives in Git – so LTS branches should be buildable. To have full end-2-end deployment, you need a CD system. This system manages just “build artefact”. 

Continous integration and delivery with Jenkins – evaluation

Jenkins Evaluation happened in January 2021 and I believe that Jenkins is still improving. This post is supposed to be part of the bigger series with a unified structure.

Jenkins features overview:


Pipeline definition completely lives in GIT together with code ~> Jenkinsfile
– Support for jenkinsfile via graddle DSL
You can chain the pipelines
– Single pipeline triggered on various branches ~> Multi-branch pipelines (tutorial)
Parallel pipeline stages
– Access to build meta-data (e.g.  build-number, commit hash, …)
Jenkins as a code plugin
– Managing secrets via secrets plugin
Audit trail plugin
Try notifier
– Better UI with Blue Ocean
– Tooling – Jenkins Job Builder (Job builder tutorial)
Pull-request Jenkins pipeline
– Deployment topology – master x slave/agent
Jenkins Helm deployment – seems has autoscaling agents – based on Configuration as a code plugin
– Manual approvals – seems as not so straightforward via input option
Jenkins on Google Kubernetes Engine

Security model:


– Default has no roles – all has single view -> plugins
GitHub OAuth and here
Role base authorisation plugin –  (strategy plugin – role) – that probably doesn’t work together with gitHub OAuth, but can work with Matrix access

Resources:


Jenkins for beginners

The security model has various options not sure how all fits together e.g. GitHub OAuth + Roles and Securities but there are multiple ways e.g. control matrix. 

Has the concept of pipelines and jobs. Pipelines are next generation where they live completely in code-base ~> LTS should be ok. Seems that have some basic manual approvals stages, question how that goes together with auth. Has the concept of multi-branch jobs/pipelines = single definition for the whole bunch of branches where the definition is dynamically taken from the source. 

CD capabilities are somewhat simplistic – no advanced release strategies. Like rollback, monitoring etc. That would need to be scripted probably.

Best practices: Scaling infrastructure as code in the team with terraform

Terraform introduction

HashiCorp Terraform is a popular tool for managing your cloud infrastructure as code (IaC) in a cloud-agnostic way (same tool for various cloud platforms). Instead of unifying all capabilities for different cloud platforms, the core concepts are exposed to the end-user via Terraform provider concept. Terraform offers providers for all major cloud vendors and other cloud services and technologies as well, e.g. Kubernetes.  

This blog post doesn’t aim to be an introduction to Terraform concepts (official documentation is quite ok) but instead sharing an experience with using a Terraform in a distributed team, tools that come in handy and all things that make life easier. Even though HashiCorp offers Terraform Enterprise this option is used quite rarely at least on “small/er” projects so we won’t be discussing this option here. I openly admit that I have zero experience with this service so I cannot objectively compare. I will solely focus on using the open-sourced part of the project and Terraform version 0.12.x and higher. 

Terraform state

Terraform maintains the state of the infrastructure that manages in the state file. Format of Terraform state file is version dependant without strict rules on version compatibility (at least I wasn’t able to find one that was reliably followed and guaranteed). Managing state files poses two main challenges:
1) manage/share state across the team
2) control/align the version used across the team

The first aspect, different teams solve differently. Some commit state file alongside the configuration to the version control system which is far from ideal as there might be multiple copies of such resources across the team and requires some team coordination. On top of that, state file contains sensitive information which is impossible to mask and such doesn’t belong to the source control system. A lot better approach is using Terraform remote backend, which allows a true concurrent approach. Capabilities depend on the concrete implementation used. The backend can be changed from local to remote easily as is migrated automatically. The only limitation is that merging and splitting state file is allowed only for the locally managed state. 

Terraform version lock

Managing Terraform version management is centred around providing frictionless version upgrades for different Terraform configurations that align across the team with assuring that state file won’t get upgraded accidentally. To make sure that your state file won’t get upgraded accidentally put version restriction to every configuration managed e.g.

terraform {
  required_version = "0.12.20"
}

TFENV introduction

To align the team on a uniform Terraform version for every single configuration managed use tool for Terraform version management, e.g. tfenv. Put the desired version of to .terraform-version file located in the folder together with configuration. Tfenv automatically switches to appropriate version as needed, when new version encountered you need to run tfenv install to download a new version. If you want to check version available:

$ tfenv list
* 0.12.20 (set by /Users/jakub/test/terraform/.terraform-version)
  0.12.19
  0.12.18

As the number of resources or organisation grows so does the state file. Which leads to increased time for configuration synchronisation and competing for a lock on a remote state file. To increase the throughput and allow team DevOps mode(clear ownership of the solution from end to end), you might want to divide the infrastructure and associated state files into smaller chunks with clear boundaries. To keep your configuration DRY hierarchical configuration tools like Terragrunt comes to the rescue and reduce repetition. 

Terrafrom code validation and formatting

A growing number of users poses challenges as well as benefits which are the same as on application code written in, e.g. java or any other programming language. What is the real motivation for Infrastructure as a Code (IaC). How to setup standards and best practices on the project? Terraform offers a bunch of tools embedded. To make sure that code is properly formatted according to standards use fmt utility, which is pluggable to your CI/CD pipeline or pre-commit hooks.

terraform fmt --recursive --check --diff

For your re-usable Terraform modules it is good to make sure they are valid though it doesn’t catch all the bugs as it doesn’t checks against cloud APIs, so it doesn’t replace integration tests.

terraform validate

Terraform operation

Getting an idea of what will change, diff of your current infrastructure against proposed changes can be easily achieved via the generated plan

terraform plan

Static security scanning – infrastructure security

Enforcing security and standards are a lot easier on IaC as you can use tools like tflint or checkov which allows writing custom policies. We conclude the tool section with awesome Terraform tools which provide a great source if you are looking for something specific.

Conclusion

In this blog post, we just scratched the surface of Terraform tooling and completely skipped design and testing, which are topics for separate posts. What are your favourite tools? What did you find really handy? Leave a comment, share your tips or you can ask me on twitter.

Processing…
Success! You're on the list.

Java performance tunning, make code JIT friendly

JIT watch introduction

In the previous blog post, we measured the effect of basic JIT optimisation technique – method inlining(out of other JIT optimisation techniques). The code example was a bit unnatural as it was super simple Scala code just for demonstration purposes of method inlining. In this post, I would like to share a general approach I am using when I want to check how JIT treats my code or if there is some possibility to improve the code performance in regards to JIT. Even the method inlining requires the code to meet certain criteria as bytecode length of inlined methods etc. For this purpose, I am regularly using great OpenJDK project called JITWatch which comes with a bunch of handy tools in regard to JIT. I am pretty sure that there is probably more tools and I will be more than happy if you can share your approaches when dealing with JIT in the comment section below the article.

Java HotSpot JITWatch configurations

Java HotSpot is able to produce a very detailed log of what the JIT compiler is exactly doing and why. Unfortunately, the resulting log is very complex and difficult to read. Reading this log would require an understanding of the techniques and theory that underline JIT compilation. A free tool like JITWatch process those logs and abstract this complexity away from the user.

In order to produce log suitable for JIT Watch investigation the tested application needs to be run with following JVM flags:
-XX:+UnlockDiagnosticVMOptions

-XX:+LogCompilation

-XX:+TraceClassLoading
those settings will produce log file hotspot_pidXXXXX.log. For purpose of this article, I re-used code from the previous blog located on my GitHub account with JVM flags enabled in build.sbt.
In order to look into generated machine code in JITWatch we need to install HotSpot Disassembler (HSDIS) to install it to $JAVA_HOME/jre/lib/server/. For Mac OS X that can be used from here and try renaming it to hsdis-amd64-dylib. In order to include machine code into generated JIT log we need to add JVM flag -XX:+PrintAssembly.
[info] 0x0000000103e5473d: test %r13,%r13
[info] 0x0000000103e54740: jne 0x0000000103e5472a
[info] 0x0000000103e54742: mov $0xfffffff6,%esi
[info] 0x0000000103e54747: mov %r14d,%ebp
[info] 0x0000000103e5474a: nop
[info] 0x0000000103e5474b: callq 0x0000000103d431a0 ; OopMap{off=112}
[info] ;*invokevirtual inc
[info] ; - com.jaksky.jvm.tests.jit.IncWhile::testJit@12 (line 19)
[info] ; {runtime_call}
[info] 0x0000000103e54750: callq 0x0000000102e85c18 ;*invokevirtual inc
[info] ; - com.jaksky.jvm.tests.jit.IncWhile::testJit@12 (line 19)
[info] ; {runtime_call}
[info] 0x0000000103e54755: xor %r13d,%r13d

JITWatch

We run the JITWatch via ./launchUI.sh
JITWATCH_config
to configure source files and target generated class files
JITWatch_configuration

And finally, open prepared JIT log and hit Start.

The most interesting from our perspective is TriView where we can see the source code, JVM bytecode and native code. For this particular example we disabled method inlining via JVM Flag “-XX:CompileCommand=dontinline, com/jaksky/jvm/tests/jit/IncWhile.inc

JITWatch_notinlined
To just compare with the case when the method body of IncWhile.inc is inlined – native code size is greater 216 compared to 168 with the same bytecode size.
JITWatch-inlined
Compile Chain provides also a great view of what is happening with the code
JITWatch_compileChain
Inlining report provides a great overview what is happening with the code
JITWatch-inlining
As it can be seen the effect of tiered compilation as described in JIT compilation starts with client C1 JIT compilation and then switches to server C2 compilation. The same or even better view can be found on Compiler Thread activity which provides a timeline view. To refresh memory check overview of JVM threads. Note: standard java code is subject to JIT optimizations too that’s why so many compilation activities here.
JITWatch_compilerThreads
JITWatch is a really awesome tool and provides many others views which don’t make sense to screenshot all e.g. cache code allocation, nmethodes etc. For detail information, I really suggest reading JITWatch wiki pages.  Now the question is how to write JIT friendly code? Here pure jewel of JITWatch comes in: Suggestion Tool. That is why I like JITWatch so much. For demonstration, I selected somewhat more complex problem – N Queens problem.
JITWatch_suggestion
Suggestion tool clearly describes why certain compilations failed and what was the exact reason. It is a coincidence that in this example we hit again just inlining as there is definitely more going on in JIT but this window provides a clear view of how we can possibly help JIT.
Another great tool which is also a part of JITWatch is JarScan Tool. This utility will scan a list of jars and count bytecode size of every method and constructor. The purpose of this utility is to highlight the methods that are bigger than HotSpot threshold for inlining hot methods (default 35 bytes) so it provides hints where to focus benchmarking to see whether decomposing code into smaller methods brings some performance gain. The hotness of the method is determined by the set of heuristics including call frequency etc. But what can eliminate the method from inlining is its size. For sure just the method size it too big breaching some limit for inlining doesn’t automatically mean that method is a performance bottleneck. JarScan tool is a static analysis tool which has no knowledge of runtime statistics hence real method hotness.
jakub@MBook ~/Development/GIT_REPO (master) $ ./jarScan.sh --mode=maxMethodSize --limit=35 ./chess-challenge/target/scala-2.12/classes/
"cz.jaksky.chesschallenge","ChessChallange$","delayedEndpoint$cz$jaksky$chesschallenge$ChessChallange$1","",1281
"cz.jaksky.chesschallenge.solver","ChessBoardSolver$","placeFigures$1","scala.collection.immutable.List,scala.collection.immutable.Set",110
"cz.jaksky.chesschallenge.solver","ChessBoardSolver$","visualizeSolution","scala.collection.immutable.Set,int,int",102
"cz.jaksky.chesschallenge.domain","Knight","check","cz.jaksky.chesschallenge.domain.Position,cz.jaksky.chesschallenge.domain.Position",81
"cz.jaksky.chesschallenge.domain","Queen","equals","java.lang.Object",73
"cz.jaksky.chesschallenge.domain","Rook","equals","java.lang.Object",73
"cz.jaksky.chesschallenge.domain","Bishop","equals","java.lang.Object",73
"cz.jaksky.chesschallenge.domain","Knight","equals","java.lang.Object",73
"cz.jaksky.chesschallenge.domain","King","equals","java.lang.Object",73
"cz.jaksky.chesschallenge.domain","Position","Position","int,int",73
"cz.jaksky.chesschallenge.domain","Position","equals","java.lang.Object",72

Conclusion

To wrap up, JITWatch is a great tool which provides insight into HotSpot JIT optimisations happening during program execution and it can help you to understand how a decision made at the source code level can affect the performance of the program. If you want to share your experience and tips please find comment section below or find me on twitter

List of useful collaboration tools for distributed teams

During past several years, working habits and working style has been rapidly changing. And this trend will continue for sure, just visit Google trends and search for “Digital nomad” or “Remote work”. However some profession undergoes this change with a better ease than others. But it is clear that companies that understand that trend benefit from that.

Working in a different style requires a brand new set of tools and approaches which provides you similar working conditions as when people are co-located at the same office. Video conferencing and phone or skype is just the beginning and doesn’t cover all aspects.

In the following paragraphs, I am going to summarize tools I found useful while working as software developer remotely in a fully distributed team. Those tools are either free or offer some free functionality and I still consider them very useful in various situations. The spectrum of the tools starts with some project management or planning tools to communicate.

Remote Communication tools

For the communication – chat and calls Slack becomes standard tool widely adopted now. It allows you to freely organize your teams, let them create channels they need. It supports a wide range of plugins e.g. chatbots and it is well integrated with others tools. Provides application on all desktop and mobile platforms.

slack

When solving some issue or just want to present something to the audience screen sharing becomes a very handy tool. Found Join.me pretty handy. Free plan with the limited size of the audience was just big enough. Working well on Mac OS and Windows, Linux platform I haven’t tried yet.

joinme

When it comes to the pure conferencing phone calls or chat Discord recently took my breath away by the awesome sound quality. Again it offers desktop and mobile clients. plus you can use just browser version if you do not wish to install anything on your PC.

discord

Planning and designing tools for remote collaboration

Now I slightly move to planning and designing tools during software development process and doesn’t matter if you use Scrum or Kanban. Those have their place there.

Remote planning tools

Shared task board with post-it notes. The one I found useful and free is Scrumblr. The only disadvantage is that it is public. It allows you to design the number of sections, change the colours of the notes and add them markers etc.

scrumblr

When we touched an agile development methodology there is no planning and estimation without planning poker. I found useful BitPoints. Simple yet meeting all our needs and free online tool where you invite all participants to the game. It allows you to do a various setting like the type of deck etc.

bitpoints

Virtual Whiteboard

When designing phase reached shared online diagramming tools we found really useful is Sketchboard. It offers a wide range of diagrams types and shapes. It offers traditional UML diagrams for sure. Free versions offer few private diagrams otherwise you go public with your design. Allows comments and team discussion.

sketchboard

Sometimes we just missed traditional whiteboard session and just brainstorm. So a web white board tool AWW meet our needs. Simple yet powerfull.

aww

Conclusion

This concludes the set of tools I found useful during a past year while working in a distributed team remotely. I hope that you found at least one useful or didn’t know it before. Do you have other tools you found useful or have better variants of those mentioned above? Please share it in the comment section!

Quick tutorial: Apache Kafka as a foundation of modern data stream platform

Working on the next project using again awesome Apache Kafka and again fighting against a fundamental misunderstanding of the philosophy of this technology which probably usually comes from previous experience using traditional messaging systems. This blog post aims to make the mindset switch as easy as possible and to understand where this technology fits in. What pitfalls to be aware off and how to avoid them. On the other hand, this article doesn’t try to cover all or goes into much detail.

How Kafka compares to traditional messaging systems

Apache Kafka is system optimized for writes – essentially to keep up with whatever speed or amount producer sends. This technology can be configured to meet any required parameters. That is one of the motivations behind naming this technology after famous writer Franz Kafka. If you want to understand the philosophy of this technology you have to take a look with a fresh eye. Forget what you know from JMS, RabbitMQ, ZeroMQ, AMQP and others. Even though the usage patterns are similar internal workings are completely different – the opposite. Following table provides a quick comparison

JMS, RabbitMQ, …
Apache Kafka
Push model
Pull model
Persistent message with TTL
Retention Policy
Guaranteed delivery
Guaranteed “Consumability”
Hard to scale
Scalable
Fault tolerance – Active – passive
Fault tolerance – ISR (In Sync Replicas)

Kafka queue

Core ideas in Apache Kafka come from RDBMS. I wouldn’t describe Kafka as a messaging system but rather as a distributed database commit log which in order to scale can be partitioned. Once the information is written to the commit log everybody interested can read it at its own pace and responsibility. It is consumers responsibility to read it not the responsibility of the system to deliver the information to the consumer. This is the fundamental twist. Information stays in the commit log for a limited time given by retention policy applied. During this period it can be consumed even multiple times by consumers. As the system has reduced set of responsibilities it is much easier to scale. It is also really fast – as sequence read from the disk is similar to random access memory read thanks to effective file system caching.

kafkaoffsets

Kafka scalability

Topic partition is a basic unit of scalability when scaling out Kafka. Message in Kafka is simple key-value pair represented as byte arrays. When message producer is sending a message to Kafka topic a client partitioner decides to which topic partition message is persisted based on message key. It is a best practice that messages that belong to the same logical group are sent to the same partition.  As that guarantee clear ordering. On the client side, exact position of the client is maintained on per topic partition bases for the assigned consumer group. So point to point communication is achieved by using exactly the same consumer group id when clients are reading from the topic partition. While publish-subscribe is achieved by using distinct consumer group id for each client to topic partition. The offset is maintained for consumer group id and topic partition and can be reset if needed.

kafkacommunication

Topic partitions can be replicated zero or n times and distributed across the Kafka cluster. Each topic partition has one leader and zero or n followers depends on replication factor. The leader maintains so-called In Sync Replicas (ISR) defined by delay behind the partition leader is lower than replica.lag.max.ms. Apache Zookeeper is used for keeping metadata and offsets.

kafkacluster

Kafka defines fault tolerance in following terms:
  • acknowledge – broker acknowledge to producer message write
  • commit – the message is written to all ISR and consumer can read
While producer sends messages to Kafka it can require different levels of consistency:
  • 0 – producer doesn’t wait for confirmation
  • 1 – wait for acknowledge from the leader
  • ALL – wait for acknowledge from all ISR ~ message commit

Apache Kafka configuration options

Apache Kafka is quite flexible in configuration and as such, it can meet many different requirements in terms of throughput, consistency and scalability. Replication of topic partition brings read scalability on the consumer side but also poses some risk as it is some additional level of complexity to achieve this. If you are unaware of those corner cases it might lead to nasty surprises, especially for newcomers. So let’s take a closer look at following scenario.

Loosing messages scenario

We have topic partition with a replication factor 2. Producer requires highest consistency level, set to ack = all. Replica 1 is currently the leader. Message 10 is committed hence available to clients. Message 11 is not acknowledged nor committed due to the failure of replica 3. Replica 3 will be eliminated from ISR or put offline. That causes that message 11 becomes acknowledged and committed.

kafka_uc1

Next time we lose Replica 2 it is eliminated from ISR and the same situation repeats for messages 12 and 13.
kafka_uc2.png
The situation can still be a lot worse if cluster loses current partition leader – Replica 1 is down now.
kafka_uc3
What happens if Replica 2 or Replica 3 goes back online before Replica 1? One of those becomes a new partition leader and we lost data messages 12 and 13 for sure!
kafka_uc4

Is that a problem? Well, the correct answer is: It depends. There are scenarios where this behaviour is perfectly fine. Imagine collecting logs from all machines via sending them through Kafka. On the other hand, if we implement event sourcing and we just lost some events that we cannot recreate the application state correctly. Yes, we have a problem! Unfortunately, if that doesn’t change in latest releases, that is default configuration if you just install new fresh Kafka cluster. It is a set up which favour availability and throughput over other factors. But Kafka allows you to set it up in a way that it meets your requirements for consistency as well but will sacrifice some availability in order to achieve that (CAP theorem). To avoid the described scenario you should use the following configuration. The producer should require acknowledging level ALL. Do not allow kafka perform a new leader election for dirty replicas – use settings unclean.leader.election.enable = false. Use replication factor (default.replication.factor = 3) and require minimal number of replicas to be in sync state to higher than 1 (min.insync.replicas = 2).

Message delivery quarantees

We already quickly touched the topic of message delivery to the consumer. Kafka doesn’t guarantee that message was delivered to all consumers. It is the responsibility of the consumers to read messages. So there is no semantics of persistent message as known from traditional messaging brokers. All messages sent to Kafka are persistent meaning available for consumption by clients according to the retention policy. Retention policy essentially specifies how long the message will be available in Kafka. Currently, there are two basic concepts – limited by space used for keeping messages or time for which the message should be at least available. The one which gets violated first wins.

Data cleanup

When I need to clean the data from the Kafka (triggered by retention policy) there are two options. The simplest one just deletes the message. Or I can compact messages. Compaction is a process where for each message key is just one message, usually the latest one. That is actually the second semantics of key used in the message.

Kafka “missing” features

What features you cannot find in Apache Kafka compared to traditional messaging technologies? Probably the most significant is an absence of any selector in combination with listening (wake me on receive). For sure can be implemented via correlation id, but efficiency is on the completely different level. You have to read all messages, deserialize those and filter. Compared to a traditional selector which uses the custom field in message header where you don’t need even to deserialize message payload that is on the completely different level. Monitoring Kafka on production environment essentially concerns elementary question: Are the consumers fast enough? Hence monitoring consumers offsets with respect to the retention policy.

Kafka was created on LinkedIn to solve a specific problem of modern data-driven application to fill the gap in traditional ETL processes usually working with flat files and DB dumps. It is essentially enterprise service bus for data where software components need exchange data heavily. It unifies and decouples data exchange among components. Typical uses are in “BigData” pipeline together with Hadoop and Spark in lambda or kappa architecture.  It lays down foundations of modern data stream processing.

Conclusion

This post just scratches basic concepts in Apache Kafka. If you are interested in details I really suggest to read following sources which I found quite useful on my way when learning Kafka:

How to search fully qualified java class name for jar file

I am pretty sure that every java developer were in the situation when he was searching for a java archive file having fully qualified class name.

Java fully qualified class name

Fully qualified name is enough info to get this kind of issue resolved. You can either take advantage of sites like http://www.findjar.com/ or features of IDE – search for class. Those approaches works well when missing class is from open source or at least from publicly available jar libraries. If the jar is already in your project but just missing item on the classpath – then the second case is applicable. But then there is vast amount of cases when you are searching for a library from vendor specific product which consists of huge amount of jar files. One way to find a class is to import all those libs to the IDE and then look up for a required class. This approach is a bit awkward. More straightforward approach is to search through product’s filesystem directly. One handy bash script follows – in this case searching for com.oracle.pitchfork.interfaces:

for i in 'find ./  -name "*.jar"'
do
result='$JAVA_HOME/bin/jar -tvf $i'
echo $result | grep -i com.oracle.pitchfork.interfaces >dev/null
if[$? == 0]; then
echo $i;
fi
done

Run this bash from the product’s root folder – all jars containing required class will be listed.

Java class version

Time to time it might happen that you need to know which version the class files were compiled for. Or to be more specific what target were specified while running javac compiler. As target specifies VM version the classes were generated for. This can be specified in maven as follows:

<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<target>1.6</target>
</configuration>
</plugin>

It is not a rocket science, right. To find out the version the code were generated for we use javap (java class file disassembler). The following line do the trick:

javap -verbose -classpath versiontest-1.0.jar cz.test.string.StringPlaying

Compiled from "StringPlaying.java"
public class cz.test.string.StringPlaying extends java.lang.Object
SourceFile: "StringPlaying.java"
minor version: 0
major version: 50
Constant pool:
const #1 = Method       #12.#28;        //  java/lang/Object."<init>":()V
const #2 = String       #29;            //  beekeeper
const #3 = Method       #30.#31;        //  java/lang/String.substring:(II)Ljava/lang/String;

Major version matches java version based on following table


Table taken from Oracle blog