Wednesday, June 23, 2010

Hibernate Caches

Background
Hibernate comes with three different caching mechanisms - first level, second level and query cache. Truly understanding how the Hibernate caches work and interact with each other is important when you need to increase performance - just enabling caching in your entity with an annotation (or in classic .hbm.xml mapping file) is easy. But understanding what and how things happens behind the scenes is not. You might even end up with a less performing system if you do not know what you are doing.

SessionFactory and Session
The purpose of the Hibernate SessionFactory (called EntityManager in JEE) is to create Sessions, initialize JDBC connections and pool them (using a pluggable provider like C3P0). A SessionFactory is immutable and built from a Configuration holding mapping information, cache information and a lot of other information usually provided by means of a hibernate.cfg.cml file or through a Spring bean configuration.
A Session is a unit of work at its lowest level - representing a transaction in database lingua. When a Session is created and operations are done on Hibernate entities, e.g. setting an attribute of an entity, Hibernate does not go of and update the underlying table immediately. Instead Hibernate keeps track of the state of an entity, whether it is dirty or not, and flushes (commits) updates at the end at the end of a unit of work. This is what Hibernate calls the first level cache.

The 1st level cache
Definition: The first level cache is where Hibernate keeps track of the possible dirty states of the ongoing Session's loaded and touched entities. The ongoing Session represents a unit of work and is always used and can not be turned of. The purpose of the first level cache is to hinder to many SQL queries or updates beeing made to the database, and instead batch them together at the end of the Session. When you think about the 1st level cache think Session.




The 2nd level cache
The 2nd level cache is a process scoped cache that is associated with one SessionFactory. It will survive Sessions and can be reused in new Session by same SessionFactory (which usually is one per application). By default the 2nd level cache is not enabled.
The hibernate cache does not store instances of an entity - instead Hibernate uses something called dehydrated state. A dehydrated state can be thought of as a deserialized entity where the dehydrated state is like an array of strings, integers etc and the id of the entity is the pointer to the dehydrated entity. Conceptually you can think of it as a Map which contains the id as key and an array as value. Or something like below for a cache region:

{ id -> { atribute1, attribute2, attribute3 } }
{ 1 -> { "a name", 20, null } }
{ 2 -> { "another name", 30, 4 } }

If the entity holds a collection of other entities then the other entity also needs to be cached. In this case it could look something like:

{ id -> { atribute1, attribute2, attribute3, Set{item1..n} } }
{ 1 -> { "a name", 20, null , {1,2,5} } }
{ 2 -> { "another name", 30, 4 {4,8}} }

The actual implementation of the 2nd level cache is not done by Hibernate (there is a simple Hashtable cache available, not aimed for production though). Hibernate instead has a plugin concept for caching providers which is used by e.g. EHCache.

Enabling the 2nd level cache and EHCache
To get the 2nd level cache working you need to do 2 things:
1 Cache Strategy. Enable a cache strategy for your Hibernate entity - either in the class with an annotation or in the hibernate mapping xml file if you are stuck with pre java5. This can be done for an entity by providing this little snippet into your hbm.xml file (a better place is to store the cache setting strategy in hibernate.cg.xml file )





or using an annotation for your entity (if you are on java5 or greater)

@Entity
@Cache(usage = CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)
public class Router { ... }
And as mentioned above if you want to cache collections of an entity you need to specify caching on collection level:





...


Hibernate has something called a cache region which by default will be the full qualified name of your Java class. And if you like me are a fan of convention over configuration you will use the default region for an entity. A cache region will also be needed for the collection using the full qualified name of the Java class plus the name of the collection name (i.e. org.grouter.domain.entities.Router.nodes)

2 Cache provider. Setting up the physical caching for a cache provider. If you are using EHCache - which is the most common choice i dear to say - then you will need to specify some settings for the cache regions of your entities in a file called ehcache.xml. The EHCache will look for this file in the classpath and if not found it will fallback to ehcache-failsafe.xml which resides in the ehcache.jar library A typical sample for an EHCache configuration could look like (see mind map below for explanations):


and


The name maps to the name of the cache region of your entity. The attribute maxelementsInMemory needs to be set so that Hibernate does not have to swap in and out elements from the cache. A good choice for a read only cache would be as many entities there are in the database table the entity represents. The attribute eternal, if set to true means that any time outs specified will be ignored and entities put into the cache from Hibernate will live for ever.
Below is a mindmap for the second level cache and how it relates to the SessionFactory and the 1st level cache.
The Query cache
The Query cache of Hibernate is not on by default. It uses two cache regions called org.hibernate.cache.StandardQueryCache and org.hibernate.cache.UpdateTimestampsCache. The first one stores the query along with the parameters to the query as a key into the cache and the last one keeps track of stale query results. If an entity part of a cached query is updated the the query cache evicts the query and its cached result from the query cache. Of course to utilize the Query cache the returned and used entities must be set using a cache strategy as discussed previously. A simple load( id ) will not use the query cache but instead if you have a query like:

Query query = session.createQuery("from Router as r where r.created = :creationDate");

query.setParameter("creationDate", new Date());
query.setCacheable(true);
List l = query.list(); // will return one instance with id 4321

Hibernate will cache using as key the query and the parameters the value of the if of the entity.
{ query,{parameters}} ---> {id of cached entity}
{"from Router as r where r.id= :id and r.created = :creationDate", [ new Date() ] } ----> [ 4321 ] ]

Pragmatic approach to the 2nd level cache
How do you now if you are hitting the cache or not? One way is using Hibernates SessionFactory to get statistics for cache hits. In your SessionFactory configuration you can enable the cache statistics by:

true
true
true
true
true
true
true

The you might want to write a unit test to verify that you indeed are hitting the cache. Below is some sample code where the unit test is extending Springs excellent AbstractTransactionalDataSourceSpringContextTests

public class MessageDAOTest extends AbstractDAOTests  // which extends AbstractTransactionalDataSourceSpringContextTests

{
public void testCache()
{
long numberOfMessages = jdbcTemplate.queryForInt("SELECT count(*) FROM message ");
System.out.println("Number of rows :" + numberOfMessages);
final String cacheRegion = Message.class.getCanonicalName();
SecondLevelCacheStatistics settingsStatistics = sessionFactory.getStatistics().
getSecondLevelCacheStatistics(cacheRegion);
StopWatch stopWatch = new StopWatch();
for (int i = 0; i < 10; i++)
{
stopWatch.start();
messageDAO.findAllMessages();
stopWatch.stop();
System.out.println("Query time : " + stopWatch.getTime());
assertEquals(0, settingsStatistics.getMissCount());
assertEquals(numberOfMessages * i, settingsStatistics.getHitCount());
stopWatch.reset();
System.out.println(settingsStatistics);
endTransaction();

// spring creates a transaction when test starts - so we first end it then start a new in the loop
startNewTransaction();
}
}

}

The output could looke something like:

30 Jan 08 23:37:14  INFO org.springframework.test.AbstractTransactionalSpringContextTests:323 - Began transaction (1):

transaction manager [org.springframework.orm.hibernate3.HibernateTransactionManager@ced32d]; default rollback = true
Number of rows :6
Query time : 562
SecondLevelCacheStatistics[hitCount=0,missCount=0,putCount=6,elementCountInMemory=6,elementCountOnDisk=0,sizeInMemory=8814]
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:290 - Rolled back transaction
after test execution
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:323 - Began transaction (2):
transaction manager [org.springframework.orm.hibernate3.HibernateTransactionManager@ced32d]; default rollback = true
Query time : 8
SecondLevelCacheStatistics[hitCount=6,missCount=0,putCount=6,elementCountInMemory=6,elementCountOnDisk=0,sizeInMemory=8814]
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:290 - Rolled back transaction
after test execution
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:323 - Began transaction (3):
transaction manager [org.springframework.orm.hibernate3.HibernateTransactionManager@ced32d]; default rollback = true
Query time : 11

Another way to spy on what Hibernate is doing is to proxy the jdbc driver used by a proxy driver. One excellent one I use is p6spy which will show you exactly what is issued over a JDBC connection to the actual backend database. For other tips have a look below in the mindmap.