Software Deployment – java applications as a RPM linux package

Java applications archives as jar, war and ear files are elementary distribution blocks in the java world. At the beginning managing all of these libraries and components were a bit cumbersome and error prone as the project dependencies depends on another libraries and all those transitive dependencies creates so called dependency hell. In order to ease developers of this burden Apache Maven (maven like tools) were developed. Every artefact has so called coordinates which uniquely identifies it and all dependencies are driven by those coordinates in a recursive fashion.

Maven ease the management at the stage of the artefact development but doesn’t help that much when we want to deploy the application component. Many times that’s not such a big deal if your run time environment is clustered J2EE aplication server e.g. Weblogic cluster. You hand the ear or war over to your ops team and they deploy it to the cluster via cluster management console to all nodes at once. They need to maintain an archive of deployed components in case of roll back etc. This is the simplest case (isolated component and doesn’t solve dependencies e.g. libraries provided in the cluster etc.) where management is relatively clean but relying on the process a lot. When we consider different run time environment like run the application as a server less java process (opposed to J2EE cluster) then the stuff gets a bit more complicated even for a simplest case. Your java applications are typically distributed as a jar file and you need to distribute it to every single linux server where instance of this process is running. Apart of that standard jar file doesn’t contain dependencies. One possible solution to that would be to create shaded (fat) jar file which has all dependencies embedded. I suppose that you have a repository where all builds are archived. Does it make sense to store those big archives where the major part are 3rd party libraries? This is probably not the right way to go.

Another aspect of roll out process is ability to automate it. In case of J2EE clusters like weblogic there is often scripting tool provided (WLST ~ weblogic scripting tool). The land of pure jar is again a lot worse. You can take some advantage of maven but that doesn’t solve all the problems. Majority of production environments in java world run on linux operating system so why not to try to take advantage of linux server standard distribution management like yum, apt etc. for distributing rpm linux packages. This system provides atomicity, dependency management between rpm linux packages, easy way to roll back (keeps track of versions), minimise the number of manual steps – potential of human error is reduced and involves native auditing. It is pretty easy to get an info about installation history.

To pack your java jar file application you need a tool called rpmbuild which creates a linux package from a SPEC file. SPEC file is something like pom in maven world plus it contains instruction how to install, uninstall etc. Packages containing a required plus handy tools are rpmdevtool and rpmlint. On linux OS it is simple to install it. On Windows OS you need a cygwin installed with the same tool set. In order to build your rpm work space run the following command. It is highly recommended not to run it under root account if there is no special need for it.

rpmdev-setuptree

This command creates rpmbuild folder – that is the place where all linux RPM packaging will happen. It contains sub-folders: BUILD, RPMS, SOURCES, SPECS, SRPMS. For us the important ones are RPMS – will contain final linux rpm and SPECS – this is the place we need to put our SPEC file describing installation and content of our application.
This file is the core of the linux rpm packaging. It contains all the information about version, dependencies, installation, un-installation, upgrade etc. We can create our skeleton SPEC file by running following command:

rpmdev-newspec

Majority of directives in this file are clear from its name, e.g. Name, Version, Summary, BuildArch etc. BuildRoot require special attention. It is a sort of proxy which mimics a root of system under the construction e.g. If I want to install my [application_name] (replace this place holder with actual name) to /usr/local/[application_name] location I have to create this structure under BuildRoot during the installation. Then there are important sections which corresponds to various phases of installation: %prep, %build, %install – which is the most important for as as we do not build from sources but just pack already built jar file to linux rpm package. Last very important section of this file is %files which lists all files which will be in the final linux rpm package and hence installed in the target machine. Apart from that there can be additional hooks to installation and un-installation process as %post, %preun, %postun etc. which allows you to customize a process as you need. Sample SPEC file follows:

%define _tmppath /home/virtual/rpmbuild/tmp
Name: [application_name]
Version: 1.0.2
Release: 1%{?dist}
Summary: Processor component which feed data into DB
Group: Applications/System
License: GPL
URL: https://jaksky.wordpress.com/
BuildRoot: %{_topdir}/%{name}-%{version}-%{release}-root
BuildArch: noarch
Requires: jdk >= 7
%description
Component which process incoming messages and store them to DB.

%prep

%build

%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT/usr/local
cp -r %{_tmppath}/[application_name] $RPM_BUILD_ROOT/usr/local
mkdir -p $RPM_BUILD_ROOT/usr/local/[application_name]/logs
mkdir -p $RPM_BUILD_ROOT/etc/init.d
cp -r %{_tmppath}/[application_name]/bin/[application_name] $RPM_BUILD_ROOT/etc/init.d
mkdir -p $RPM_BUILD_ROOT/var/run/[java application]

%files
%defattr(644,[application_name],[application_name])
%dir %attr(755, [application_name],[application_name])/usr/local/[application_name]
%dir %attr(755,[application_name],[application_name]) /usr/local/[application_name]/lib
/usr/local/[application_name]/lib/*
%attr(755,[application_name],[application_name]) /usr/local/[application_name]/logs
%dir %attr(755,[[application_name],[application_name]) /usr/local/[application_name]/conf
%config /usr/local/[application_name]/conf/[application_name]-config.xml
%config /usr/local/[application_name]/conf/log4j.properties
%dir %attr(755,[application_name],[application_name]) /usr/local/[application_name]/deploy
/usr/local/[application_name]/deploy/*
%doc /usr/local/[application_name]/README.txt
%dir %attr(755,[application_name],[application_name]) /usr/local/[application_name]/bin
%attr(755,[application_name],[application_name]) /usr/local/[application_name]/bin/*
%attr(755,root,root) /etc/init.d/[application_name]
%dir %attr(755,[application_name],[application_name]) /var/run/[application_name]

%changelog
* Wed Nov 13 2013 Jakub Stransky <Jakub.Stransky@jaksky.com> 1.0.2-1
- Bug Fixing wrong messages format
* Wed Nov 13 2013 Jakub Stransky <Jakub.Stransky@jaksky.com> 1.0.1-1
- Bug Fixing wrong messages format
* Mon Nov 11 2013 Jakub Stransky <Jakub.Stransky@jaksky.com> 1.0.0-1
- First relaese of [application_name]

Several things to highlight in the SPEC file example: tmppath points to the location where the installed application is prepared, that is essentially what is going to be packed to rpm package. %defattr set the standard attributes to files if special one are not specified. %config denotes configuration files which means that for the first installation those standard one are provided but in case of upgrade those file will not be overwritten as they are probably customized to this particular instance.
Now we are ready to create the linux rpm package just the last step is pending:

rpmbuild -v -bb --clean SPECS/nameOfTheSpecFile.spec

Created package can be found in RPMS subfolder. We can test the package locally by

rpm -i nameOfTheRpmPackage.rpm

To complete the smoke test lets remove the package by

rpm -e nameOfTheApplication

Creating a SPEC file should be pretty straightforward process and once you create your SPEC file for the application building of linux rpm package is one minute job. But if you want to automate it there is a maven plugin which generates a SPEC file for you. It is essentially wrapper of rpmbuild utility which means that plugin works fine on linux with tool set installed but on windows machine you need have cygwin installed and create wrapper bat file to mimic rpmbuild utility for the plugin. Detailed manual can be found for example here.

Couple things to highlight when creating a SPEC file. Prepare the linux package for all scenarios – install, remove, upgrade and configuration management right from the beginning. Test it properly. It can save you a lot of troubles and manual work in case of large installations. Creating a new version of java application is only about about replacing jar file, re-packaging rpm bundle.

In this quick walk through I tried to show that creating of linux rpm package as a unit for software deployment of the java application is not that difficult and can neaten a roll out process. I just scratch the surface of linux rpm packaging and I was far away from showing all capabilities of this approach. I will conclude this post by several links which I found really useful.

Great tutorial on RPM packaging in general
Good rpmbuld manual pages
Maven rpm plugin
Maximum RPM book

Advertisements

Hadoop High Availability strategies

Scalability, Availability, Resilience – those are just common examples of computer system requirements which forms an overall application architecture very strongly and have a direct impact to “indicators” such as Customer Satisfaction Ratio, Revenue, Cost, etc. The weakest part of the system has the major impact on those parameters. The topic of this post availability is defined as the percentage of time that a system is capable of serving its intended function.

In BigData era Apache Hadoop is a common component of nearly every solution. As the system requirements are shifting from purely batch-oriented systems to near-to-real-time systems this just adds pressure on systems availability. Clearly, if the system in batch mode runs every midnight than 2 hours downtime is not such a big deal as opposed to near-to-real-time systems where result delayed by 10 min is pointless.

I this post I will try to summarize Hadoop high availability strategies as a complete and ready to use solutions I encountered during my research on this topic.

In Hadoop 1.x the well-known fact is that the Name Node is a single point of failure and as such all high availability strategies try to cope with that – strengthen the weakest part of the system. Just to clarify widely spread myth – Secondary Name Node isn’t a backup or recovery node by nature. It has different tasks than Name Node BUT with some changes, Secondary Name Node can be started in the role of Name Node. But neither this doesn’t work automatically nor that wasn’t the original role for SNN.

High availability strategies can be categorized by the state of standby: Hot/Warm Standby or Cold Standby. This has a direct correlation to failover(start up) time. To give a raw idea(according to doc): Cluster with 1500 nodes with PB capacity – the startup time is close to one hour. Startup consists of two major phases: restoring the metadata and then every node in HDFS cluster need to report block location.

The typical solution for Hadoop 1.x which makes use of NFS and logical group of name nodes. Some resources claim that in case of NFS unavailability the name node process aborts what would effectively stop the cluster. I couldn’t verify that fact in different sources of information but I feel important to mention that. Writing name node metadata to NFS need to be exclusive to a single machine in order to keep metadata consistent. To prevent collisions and possible data corruption a fencing method needs to be defined. Fencing method assures that if the name node isn’t responsive that he is really down. In order to have a real confidence, a sequence of fencing strategies can be defined and they are executed in order. Strategies range from simple ssh call to power supply controlled over the network. This concept is sometimes called shot me in the head. The failover is usually manual but can be automated as well. This strategy works as a cold standby and Hadoop providers typically provide this solution in their High Availability Kits.

Because of the relatively long, start up time of back up name node some companies (e.g. Facebook) developed their own solutions which provide hot or warm standby. Facebook’s solution to this problem is called avatar node. The idea behind is relatively simple: Every node is wrapped to so-called avatar(no change to original code base needed!). Primary avatar name node writes to shared NFS filer. Standby avatar node consists of secondary name node and back up name node. This node continuously reads HDFS transaction logs and keeps feeding those transactions to encapsulated name node which is kept in safe mode which prevents him from performing any active duties. This way all name node metadata are kept hot. Avatar in standby mode performs duties of secondary name node. Data nodes are wrapped to avatar data nodes which send block reports to both primary and standby avatar node. Failover time is about a minute. More information can be found here.

Another attempt to create a Hadoop 1.x hot standby coming to form China Mobile Research Institute is based on running synchronization agents and sync master. This solution brings another question and it seems to me that it isn’t so mature and clear as the previous one. Details can be found here.

An ultimate solution to high availability brings Hadoop 2.x which removes a single point of failure from a different architecture. YARN (Yet Another Resource Negotiator) also called MapReduce 2. And for HDFS there is another concept called Quorum Journal Manager (QJM) which can use NFS or Zookeeper as synchronization and coordination framework. Those architectural changes provide the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.

This post just scratches the surface of Hadoop High Availability and doesn’t go deep in detail daemon but I hope that it is a good starting point. If someone from the readers is aware of some other possibility I am looking forward to seeing that in the comment section.

Java application as a Linux service

Using standard J2EE containers for application deployment is not always suitable option. Time to time you need to run an java application (jar file) as a server less, more light weight linux process. Using standard java -cp …. MainClass is feasible but sooner or latter you will reveal that there is something important missing. Especially if you are supposed to run multiple components in this way. I becomes relly messy and hard to manage pretty soon. On linux system there is a solution which is a lot better – run the component as a linux service.
Lets make is simple and easy to understand. Linux service is essentially a “process” which is driven by init script and has defined API – set of standard commands for management of the underlying linux process. Those linux service commands looks as following(processor represents actual name as defined in init script, see latter):

service processor start
service processor status
service processor stop
service processor restart

That’s a lot simpler, easy to manage and monitor, right? You don’t need to know where particular jar file is located etc. Examples of init scripts can be usually located /etc/init.d/samples or just simply read scripts in /etc/init.d which contains various init scripts for different kinds of linux services already present on the system.
For java applications there is a bunch of projects which acts as a service wrappers. That enables you to quickly and easily turn jar file to regular linux service as a program daemon. There are wrappers even for windows OS. For some reasons I was directed to use just linux server standard tools so the reminder of this post will be about making the linux service program daemon in a common way via shell scripts.
First of all there is a necessity to create startup and shutdown script with a need to properly manage pid (process id) file accordingly. A good practice is to have a dedicated user to run a particular linux services and have them installed under /usr/local/xxx .
startup script follows:

#!/bin/sh
#
# Script parameters: [Instalation_Foleder]
#
# JAVA_HOME Must point at your Java Development Kit installation.
# Required to run the with the "debug" argument.
#
# JRE_HOME Must point at your Java Runtime installation.
# Defaults to JAVA_HOME if empty. If JRE_HOME and JAVA_HOME
# are both set, JRE_HOME is used.
#
# JAVA_OPTS (Optional) Java runtime options used when any command
# is executed.

# Check the way the script has been called and set current directory as PROCESSOR_HOME
if [ "X$1" = "X" ]
then
  cd .. >/dev/null
  pwd >/dev/null
  PROCESSOR_HOME=$PWD
  SERVICE_INVOKE="no"
else 
  PROCESSOR_HOME=$1
  SERVICE_INVOKE="yes"
fi
echo PROCESSOR_HOME set to $PROCESSOR_HOME
# Load confing
source $PROCESSOR_HOME/bin/config.sh
# Check if the invocation is according to configuration [asService | asProcess]
if [ ! "$SERVICE_INVOKE" == "$RUN_AS_SERVICE" ]
then
  echo "ERROR - Invocation is not according to configuration - run as a Lunux Service= $RUN_AS_SERVICE"
  exit 6
fi
# check installation
if [ ! -d "$PROCESSOR_HOME/bin" \
-o ! -f "$PROCESSOR_HOME/bin/config.sh" \
-o ! -d "$PROCESSOR_HOME/conf" \
-o ! -d "$PROCESSOR_HOME/deploy" \
-o ! -d "$PROCESSOR_HOME/lib" \
-o ! -f "$PROCESSOR_HOME/conf/log4j.properties" \
-o ! -f "$PROCESSOR_HOME/deploy/test1-1.0-SNAPSHOT.jar" ]; 
then
echo 
echo ERROR - Installation is not correct!
echo Expected installation package looks:
echo "$PROCESSOR_HOME/bin"
echo "$PROCESSOR_HOME/bin/config.sh"
echo "$PROCESSOR_HOME/conf"
echo "$PROCESSOR_HOME/conf/log4j.properties"
echo "$PROCESSOR_HOME/deploy"
echo "$PROCESSOR_HOME/deploy/test1-1.0-SNAPSHOT.jar"
echo "$PROCESSOR_HOME/lib"
exit 1
fi
# clean up
CLASSPATH=
JAVA_OPTS=
JAVA_PATH=
JAVA_EXEC=

# set JAVA
REQUIRED_JVM_VERSION=1.7
if [ -z "$JAVA_HOME" ]; 
then
  if [ -z "$JRE_HOME" ];
    then
      echo ERROR - either JAVA_HOME or JRE_HOME is not set!!!
      exit 1
    else
    echo Java JRE used $JRE_HOME
    JAVA_PATH=$JRE_HOME
  fi
else
  echo Java used $JAVA_HOME
  JAVA_PATH=$JAVA_HOME 
fi

# set JAVA_EXEC
JAVA_EXEC=$JAVA_PATH/bin/java
#check Java bin
if [ ! -x "$JAVA_EXEC" ];
then
  echo Java binaries not found $JAVA_EXEC
  exit 1
fi
# checkJavaVersion
JVM_VERSION=$("$JAVA_EXEC" -version 2>&1 | awk -F '"' '/version/ {print $2}')
#echo version "$JVM_VERSION"
if [[ "$JVM_VERSION" < "$REQUIRED_JVM_VERSION" ]]; 
then
  echo ERROR - $JAVA_EXEC doesnt point to propper java version $REQUIRED_JVM_VERSION 
  exit 1
fi
# setBDHISTP_MAIN
BDHISTP_MAIN=cz.jaksky.PROCESSOR.PROCESSOR
# setClasspath
CLASSPATH=$PROCESSOR_HOME/deploy/*:$PROCESSOR_HOME/lib/*
# echo Classpath set to: $CLASSPATH

# setJAVA_OPTS
JAVA_OPTS=-Dbdconf=$PROCESSOR_HOME/conf
JAVA_OPTS="$JAVA_OPTS -Dlog4j.configuration=file:$PROCESSOR_HOME/conf/log4j.properties"
#echo JAVA_OPTS set to: $JAVA_OPTS
# This is nasty as in the code there is hardcoded location to actual config file for the process
cd $PROCESSOR_HOME
runProgram() {
echo $JAVA_EXEC $JAVA_OPTS -classpath $CLASSPATH $BDHISTP_MAIN
$JAVA_EXEC $JAVA_OPTS -classpath $CLASSPATH $BDHISTP_MAIN & PROCESS_PID=$!
echo $PROCESS_PID > $PIDDIR/$PID_FILENAME
echo "new application instance started as process $PROCESS_PID"
}

if [ ! -f "$PIDDIR/$PID_FILENAME" ]
then 
  echo "I will try to start new process ..."
  runProgram
else
  PID=$(cat $PIDDIR/$PID_FILENAME)
  if ps -p $PID >/dev/null
    then
      echo "WARNING $APP_NAME already running as process $PID"
    else
      echo "process $PID is not running - will try to start a new instance of the application"
      echo " "
      runProgram 
  fi
fi
exit 0
 

shutdown script follows:

#!/bin/sh
# Script usage:
# this script can be invoked either directly in bin folder or from different location with passing information where to locate installation folder
#
# Check the way the script has been called and set current directory as PROCESSOR_HOME
if [ "X$1" = "X" ]
then
  cd .. >/dev/null
  pwd >/dev/null
  PROCESSOR_HOME=$PWD
  SERVICE_INVOKE="no"
else 
  PROCESSOR_HOME=$1
  SERVICE_INVOKE="yes"
fi
echo PROCESSOR_HOME set to $PROCESSOR_HOME
# Load confing
source $PROCESSOR_HOME/bin/config.sh
if [ -z "$PIDDIR" ]
then
  echo "ERROR - Installation configuration file config.sh not found at $PROCESSOR_HOME/bin"
  exit 1
fi
# Load confing
source $PROCESSOR_HOME/bin/config.sh
# Check if the invocation is according to configuration [asService | asProcess]
if [ ! "$SERVICE_INVOKE" == "$RUN_AS_SERVICE" ]
then
  echo "ERROR - Invocation is not according to configuration - run as a Lunux Service= $RUN_AS_SERVICE"
  exit 6
fi
if [ -f "$PIDDIR/$PID_FILENAME" ]
then
  PID=$(cat $PIDDIR/$PID_FILENAME)
  kill $PID
  RC=$?
  rm $PIDDIR/$PID_FILENAME
  echo "Application $APP_NAME - process $PID shut down successfull"
  exit $RC
else
  echo "pid file not exist $PIDDIR/$PID_FILENAME, nothing to shut down"
  exit 0
fi

Those scripts relies on existence of installation configuration shell script – config.sh located in bin folder of installation as follows:

#!/bin/sh 
RUN_AS_SERVICE="yes"
APP_NAME="Processor"
APP_LONG_NAME="Processor instance" 
PIDDIR="/var/run/processor"
PID_FILENAME="processor.pid"

Startup script creates pid file located /var/run/processor – user under which the installation is running needs to have appropriate privileges.
Finally the init script which needs to be placed into /etc/init.d folder:

 ### BEGIN INIT INFO
# Provides: processor
# Required-Start: 
# Required-Stop: 
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: processor daemon
# Description: processor daemon
# This provides example about how to
# write a Init script.
### END INIT INFO
# Config to edit if needed
INSTALL_HOME=/usr/local/Processor
JAVA_HOME=/usr/java/default
SERVICE_USER="processor"
# No modification allowed from here
# Using the lsb functions to perform the operations.
. /lib/lsb/init-functions
#
# If the daemon is not there, then exit.
test -x $INSTALL_HOME/bin/startUp.sh || exit 5
test -x $INSTALL_HOME/bin/shutDown.sh || exit 5
test -x $INSTALL_HOME/bin/config.sh || exit 5
# Load confing
source $INSTALL_HOME/bin/config.sh
export JAVA_HOME
PIDFILE=$PIDDIR/$PID_FILENAME
# Process name ( For display )
NAME=$APP_NAME
CURRENT_USER=`id -nu`
start(){
  echo "Starting $NAME under $SERVICE_USER user..."
  if [ "$CURRENT_USER" == "$SERVICE_USER" ]
  then
    $INSTALL_HOME/bin/startUp.sh $INSTALL_HOME >/dev/null
    RC=$?
  else
    su --preserve-environment --command="$INSTALL_HOME/bin/startUp.sh $INSTALL_HOME >/dev/null" $SERVICE_USER
    RC=$?
  fi
}
stop(){
  echo "Stoping $NAME running under $SERVICE_USER user ..."
  if [ "$CURRENT_USER" == "$SERVICE_USER" ]
  then
    $INSTALL_HOME/bin/shutDown.sh $INSTALL_HOME >/dev/null
    RC=$?
  else
    su --preserve-environment --command="$INSTALL_HOME/bin/shutDown.sh $INSTALL_HOME >/dev/null" $SERVICE_USER
    RC=$?
  fi
}
case $1 in
start)
  start
  exit $RC
;;
stop)
  stop
  exit $RC
;;
restart)
  stop
  start
  exit $RC
;;
status)
  if [ ! -f "$PIDDIR/$PID_FILENAME" ]
  then 
    echo "$NAME is NOT RUNNING"
    exit 1
  else
    PID=$(cat $PIDDIR/$PID_FILENAME)
    if ps -p $PID >/dev/null
    then
      echo "$NAME is RUNNING $PID"
      exit 0
    else
      echo "$NAME is NOT RUNNING"
      exit 1
    fi
  fi
;;
*)
# For invalid arguments, print the usage message.
echo "Usage: $0 {start|stop|restart|status}"
exit 2
;;
esac

In the init script there is a need to change to appropriate java apps installation folder JAVA_HOME if not default and SERVICE_USER to user which is supposed to run this service. Service can be started under root account or SERVICE_USER without password specification or any other user with knowledge of credentials.

If you have a production like experience with java service wrappers mentioned at the beginning of the article don’t hesitate and share it! This way it serves the purpose at given situation.

How to search for jar file

I am pretty sure that every java developer were in the situation when he was searching for a java archive file having fully qualified class name.
Fully qualified name is enough info to get this kind of issue resolved. You can either take advantage of sites like http://www.findjar.com/ or features of IDE – search for class. Those approaches works well when missing class is from open source or at least from publicly available jar libraries. If the jar is already in your project but just missing item on the classpath – then the second case is applicable. But then there is vast amount of cases when you are searching for a library from vendor specific product which consists of huge amount of jar files. One way to find a class is to import all those libs to the IDE and then look up for a required class. This approach is a bit awkward. More straightforward approach is to search through product’s filesystem directly. One handy bash script follows – in this case searching for com.oracle.pitchfork.interfaces:

for i in 'find ./  -name "*.jar"'
do
result='$JAVA_HOME/bin/jar -tvf $i'
echo $result | grep -i com.oracle.pitchfork.interfaces &gt;dev/null
if[$? == 0]; then
echo $i;
fi
done

Run this bash from the product’s root folder – all jars containing required class will be listed.

Weblogic classloading

Getting a java.lang.NoSuchMethodError is usually the beginning of great exploration of your platform – in this case weblogic. Javadoc says:

Thrown if an application tries to call a specified method of a class (either static or instance), and that class no longer has a definition of that method.
Normally, this error is caught by the compiler; this error can only occur at run time if the definition of a class has incompatibly changed
.

What’s the hack going on here! Libraries used are embedded into the final archive I did verified that! If you don’t know simply suspect classloaders, publicly known enemies of java developers 🙂 As rule no.1 which says: “Verify your assumptions”. The fact that the class is in archive doesn’t necessary mean that it gets loaded, so to verify that simply pass -verbose or -verbose:class argument to weblogic’s JVM in startUp.sh/bin and you will get the origin of loaded classes.

Class loaded from WL_HOME/modules, how’s that possible? To understand that general understanding of classloading is essential and then understand your J2EE standard implementation e.g. Weblogic, JBoss, … This post is not going to pretend an expert detail knowledge level on this topic so I will rather stay with general principles with reference to details documentation.

Java has several class loaders (bootstrap, extension, …) the important fact is that they work in some hierarchy (parent-child relationship) with some delegation scheme which says when to load a class and from where. Java elementary delegation principle says: Delegate finding classes and resources to their parent before searching own classpath. Only if the parent cannot find it child is allowed to load it. So far so good. To complicate the matter a bit more – java servlet specification recommends look at child classloader before delegating to parent (if this recommendation were taken you need to check with documentation of J2EE implementation you are using, as you can see you know nothing based on those rules 🙂 ) So in my case of Weblogic J2EE implementation

as you can see system classloader is the parent of all the application’s classloaders, details can be found here. So how the class get loaded from WL_HOME/modules ? The framework library must be on system classpath. On the system classpath is just weblogic.jar not my framework library?
Weblogic 10 in order to better modularity included components under WL_HOME/modules and weblogic.jar now refers to these components in the modules directory from its manifest classpath. So that means that other version of library sits on system classloader – the parent of all the application classloaders, so that means that those libraries included in application archives will be ignores based on the delegation scheme. (That was probably the idea why was recommended in J2EE classloading delegation scheme – child first). However weblogic does offer other way how to solve this case by so called classloader filters/interceptors defined in weblogic specific deployment descriptor either on ear level or war level.
weblogic-application.xml

org.apache.log4j.*
antlr.*

weblogic.xml

      true

Java class version

Time to time it might happen that you need to know which version the class files were compiled for. Or to be more specific what target were specified while running javac compiler. As target specifies VM version the classes were generated for. This can be specified in maven as follows:

               <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <configuration>
                         <target>1.6</target>
                    </configuration>
               </plugin>

It is not a rocket science, right. To find out the version the code were generated for we use javap (java class file disassembler). The following line do the trick:

javap -verbose -classpath versiontest-1.0.jar cz.test.string.StringPlaying

Compiled from "StringPlaying.java"
public class cz.test.string.StringPlaying extends java.lang.Object
  SourceFile: "StringPlaying.java"
  minor version: 0
  major version: 50
  Constant pool:
const #1 = Method       #12.#28;        //  java/lang/Object."<init>":()V
const #2 = String       #29;            //  beekeeper
const #3 = Method       #30.#31;        //  java/lang/String.substring:(II)Ljava/lang/String;

Major version matches java version based on following table


Table taken from Oracle blog

Build Number

One of the most important thing during the SDLC (for sure apart from the other stuff) is to keep control over deployed artifacts to all environments at any given time. Lack of control leads to chaos and generates a lot of extra work to the team, degrades throughput, morale and motivation. No need to even mention that arguments among team members regarding deployed features or fixes definitely do not contribute well to the team spirit.
One of the common approaches mitigating this risk is generating a build number to every single build fully automatically. Let’s take a look at how to accomplish this in common project set up – maven project build on build server e.g. TeamCIty. Sample web application follows.
Common place where to store such kind of info is MANIFEST.MF file. All kind of archives have this file located in /META-INF/MANIFEST.MF. Various technologies like OSGi use this location for various metadata. Taking advantage of maven-war-plugin the content of MANIFEST.MF can be easily customized as follows (${xx} are maven variables):
Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Created-By: Apache Maven
Built-By: ${user.name}
Build-Jdk: ${java.version}
Specification-Title: ${project.name}
Specification-Version: ${project.version}
Specification-Vendor: ${project.organization.name}
Implementation-Title: ${project.name}
Implementation-Version: ${project.version}
Implementation-Vendor-Id: ${project.groupId}
Implementation-Vendor: ${project.organization.name}

To set up a maven project pom file is pretty easy:

         
            
                org.apache.maven.plugins
                maven-war-plugin
                
                    
                        
                            true
                            true
                        
                        
                            ${build.number}
                        
                    
                
            
        

Where build.number variable gets supplied by build server in arbitrary format, e.g. for TeamCity build server:

Build number is visible in build queue status page as well:
To access these project build specific information simple jsp page can be created:
The controller accessing these information using Spring MVC (simplified example) can look like:
@Controller
 public class ProjectInfoController {

     @RequestMapping("/info")
     public ModelAndView getProjectInfo(HttpServletRequest request, HttpServletResponse response) throws IOException {

         ModelAndView modelAndView = new ModelAndView("projectInfoView");

         ServletContext servletContext = request.getSession().getServletContext();

         Properties properties = new Properties();
         InputStream is = servletContext.getResourceAsStream("/META-INF/MANIFEST.MF");

         properties.load(is);

         modelAndView.addObject("buildBy",properties.getProperty("Built-By"));
         modelAndView.addObject("buildJdk",properties.getProperty("Build-Jdk"));
         modelAndView.addObject("specificationVersion",properties.getProperty("Specification-Version"));
         modelAndView.addObject("specificationTitle",properties.getProperty("Specification-Title"));
         modelAndView.addObject("implementationVendor",properties.getProperty("Implementation-Vendor-Id"));
         modelAndView.addObject("buildNumber",properties.getProperty("Build-Number"));

         return modelAndView;
     }
 }

Accessing MANIFEST.MF in JAR file has a different approach. Motivation taken from Spring source code:

Package  package = someClass.getPackage( );
String version = package.getImplementationVersion();

JSP page or other presentation layer shouldn’t be a problem for anyone.

Prague Java Developer Day 2012 Highlights

Java philosophy from very first beginning  is “compile once and run everywhere” this seems to be strengthen even more for the next version of java. The key message for java 8 is “write code once and run everywhere” which implies blurring the edge between Java SE and Java ME. The move of Java towards smartphones and tablets etc. is clear. The approach and impact to language constructs will be described briefly as was presented at the conference. As presented nothing is cut in stone at the moment but the main objective is clear.
Huge effort is being spent on a new java modularization system which would reduce an amount of consumed memory by JVM, reduce the size of final archives etc. The solution should be backward compatible with some question to current organization of JDK and potential reorganization. The solution relies on creating of new logical units composed of existing packages, classes etc. Details can  be found on project pages – Project Jigsaw.
JavaFx as a client rich platform went through a huge rewrite with version 2.0. Now supports full interoperability with Java Swing library.JavaFx scene builder released for major platforms.
Java 7 made next step towards better parallelization with fork-join framework which helps you take advantage of multiple processors. Java 8 should move the matters even further with embedding functional style programming with lambda expressions – project Lambda.
The last main feature presented for Java 8 was Type Anonotations as @Nullable, @NotNull etc. This feature is highly desirable by community as this allows better static code analysis. More info can be found here.
The afore mentioned list is neither an extensive list of features nor a final list of enhancements in java 8 but rather a plan.

BPMS in production environment

I couldn’t find a better topic than “BPMS (Activos 6.1) in production” to conclude the whole series of a designing system with BPMS, sharing hints for developers and testing whole solution.
Although  ActiveVOS is certainly a cool product there is, as usual, a space for future improvements. The production environment is something special and as something special, it should be treated. If the production environment is down there is simply no business. Empowering the business is the main objective of BPMS, isn’t it? So technology should be ready to cope with that kind of situations. To cut a long story short. Every feature which supports maintainability, reliability, security and sustainability in day-to-day life is highly appreciated.
During the development life-cycle, it can be hard (especially in the early stage of the development) to foresee how the system will be maintained, what the standard procedures look like, etc. The goal is to mitigate the probability of a process, human or a technical error as low as possible taking into consideration an ease of problem detection as well.
The following pieces of functionality were found as highly desirable. Some of them are possible to avoid or at least lower the impact during the design time. For the rest of them, some developers’ effort need to be taken into consideration.
  • Different modes of Console – there are no distinct modes neither for development nor production environment. This comes in handy when you need to grant an access to operations for their day-to-day routine and you don’t wanna let them modify all server settings. For example, you just wanna restrict the permission to deploy new processes, start and stop services.
  • Reliable fall over – maybe this question is more on the side of infrastructure. As BPMS fully lives in a DB typical solution consists of cloning a production DB to a backup DB instance. In case of a failure, this instance is started.  If some kind of inconsistency gets into the DB during the crash of the main instance then it is immediately replicated to a backup instance. Does it make sense to start a backup instance?
  • Lack of data archive procedures – the solution itself doesn’t offer any procedure how to archive completed processes. Because of legal restrictions specific to the business domain, you are working in you cannot simply delete completed processes.  As your DB grow in size the response time of BPMS grows as well. You can easily get into trouble with time-out policy. Data growth 200GB per month is feasible. You cannot simply work this problem out by using some advanced features of the underlying DB like partitioning because you wanna have processes which logically belongs together in one archive. You will be struggling to find out such partitioning criteria which could be used in practice and fulfils mentioned requirement.
  • Process upgrade – one of the killer features,  process migration of already running processes to an upgraded version works only in case of small changes of the process. Moreover what if your process consumes an external WS which lives completely on its own? What if someone enhances that service and modify that interface? Yea, versioning of the interfaces comes to attention. Having process upgrade feature without versioned interfaces is almost nonsense or at least need a special attention while releasing. Even with versioned interfaces, it is not applicable in all situations, eg. sending new data field which presence in the system is not guaranteed.  In large companies, this feature is a must. Otherwise, it is hard to manage and coordinate all the releases of all connected application.
  • Consider product roadmap – actually, this item belongs to project planning phase where we make decisions about what technology to use. In some environment like banking, insurance etc. there can be legal requirements to have all products from production environment supported by a vendor. If the vendor’s release strategy is a new major version every half a year and support scope is current major version plus two major back then this could pose a problem for a product maintenance team during the product lifecycle. Migration of all non terminated processes may not be a trivial thing and as such this represents an extra cost.

Testing BPMS component

In the first part of this series, we focused on a designing system using BPMS technology for orchestrating workflow in the organisation, then we shared useful points from the developers perspective. In this third part, we will focus on a quality aspect of the solution.
I do remember a discussion with one of our QA guys regarding BPMS testing I want to share. I was asking QA for requirements on a system and curious what methodology is being used for this component.  The answer I got and  I will probably never forget was: BPMS is a minor part of the system hence we are not supposed to test it at all. The motivation behind this article is simply based on fact that this approach wasn’t correct an provide some insight what’s going on. There is no ambition to provide a complete methodology or best practices regarding testing of BPMS component. That is the role of skilled QA.
As BPMS is a solution for orchestration of your business services inside the house. Simply it drives the workflow. BPMS isn’t usually a decision maker. Decision-making rules are typically required to be flexible, expect frequent changes. It should reflex business changes as quick as possible. So because of that, it is not a good practice to hard-code them into processes in a form of “spaghetti code structure” (structure of if-else in several levels) which is error-prone and hard to maintain. Those are reasons for having a separate component responsible for decision making – BRE (business rule engine). So the QA task can be divided into two main objectives for functional testing. Verify for given input data:
  • all the necessary data for making a decision present at the specified point? This can be difficult because of a large amount of incoming path to the decision point. Despite the execution path you are verifying that all the data needed were gathered in the system.
  • based on decision results are the steps actioned in the correct order? Verification of the required business process.
  • are the fault recovery procedures working correctly? Switching the system to fault recovery mode and verification that the system stored all data correctly and data completeness.
For sure there can be more aspects but those are considered as the main ones. The main problem of the testing is that those aspects cannot be tested in isolation. By isolation, I mean that you cannot use standard methodologies (e.g. black box, white box, … whatever it is) and point somewhere in the system. BPMS is a system component that has “memory”. That means you cannot simply arbitrarily divide the process into parts which are you going to test in separation. Some systems can have something like “point of synchronization” (despite the execution path the system has defined data set) but this depends on a design and hence it isn’t mandatory.
Let’s have a look at possibilities. The product itself offers the feature called BUnit what is alternative to JUnit in java world. It is feature facilitating process unit testing. All invoke activities within the process are mocked – the XML reply is recorded. XML manipulation expressions and gathering data within the flow ( aspect 1) can be tested this way by correct choice of recorded data. But the tests are still taking place in artificial conditions. Aspect 3 – testing fault recovery can be tested relatively easily by this approach if there is no awkward decision during the design phase. Test analyst is the key role in this process. No need to talk about documentation of the system itself. Unit testing of BRE is completely separate chapter not discussed here.
Having verified basic functionality of the blocks – processes and subprocesses we can continue with integration testing. Usually, this kind of systems are systems with high degree of integration so it is really handy to have all back-end systems under your control. The reason no. 1 – data-driven system – the behaviour of your data depends on a data in those systems. The reason no. 2 – BPMS has “memory” (it is “state-full”). If you wanna test from a certain point in the process you have to bring the system in this point. You need to do it repetitively and in a well-defined way. The approach used in web application testing – modification of data in DB to bring the order, application and etc. to a certain state is not sufficient here. Having simulators of real back-end systems was proved as really good practice. This way you simply isolate your system and time to error localization is significantly lower. This way you can conduct integration testing of bigger functional blocks up to end-to-end testing. There is no doubt that higher level of automation is a must.