Sunday, October 20, 2013

Java Performance Pitfalls - DatatypeFactory

The DatatypeFactory class available in JSE since 1.5 is used to create javax.xml.datatype for mapping between XML and java objects. It allows for easy creation of dates and timestamps to use in a JAXB context.

One typical use case of the factory is when you need to create a XML-serializable timestamp with time zone information. In such cases we usually first create an instance of the factory and then use this instance to create the timestamp. It looks as innocent as that:

DatatypeFactory.newInstance().newXMLGregorianCalendar(GregorianCalendar());
As such code fragments might appear quite often in code that needs lots of timestamps, you might have already defined this in a static utility method somewhere.

Although the code looks pretty harmless, it has a performance issue - the creation of datatype factory instances is a pretty expensive operation. It involves going through your class-loaders and trying to determine which class it should instantiate. Not only is that costly in terms of CPU performance but it also involves some class-loader internal synchronization (in our case in the Weblogic classloader) which greatly affects concurrency. Unfortunately, the result of this resolution is not cached internally so the whole process is done each time you call the newInstance() method. The main issue IMHO is that the java API documentation does not clearly state how expensive the operation is (although the javadocs describe the implementation resolution mechanism which might make you suspicious). As a result developers usually end up creating new instances all the time in critical paths. In normal low concurrency scenarios this is not an issue but becomes noticeable once you have a lot of threads trying to create datatype factory instances at the same time and consuming a lot of CPU and/or blocking due to synchronization in the classloader.

The solution to the problem is pretty simple - reuse previously created datatype factory instances. The javadocs do not state whether the implementation is thread-safe (which in our case it was) so you might need to store an instance of the factory in a thread-local variable in your utility class and reuse it when needed. This will make quite an impact in high-concurrency scenarios.