Batch execution with Hibernate 4

In this article I will give you a brief introduction to Hibernate batch processing.

Why batch?

Because it is better than having a single transaction open to insert 1 000 000 (one million) entries into the database and commit at the end or open a transaction for each insert of the same volume.

Batch gives you the right tools to have this managed: define a limit after Hibernate automatically calls commit to store your data persistent in the database behind your application.

A naive approach

public static void naiveApproach() {
    final Configuration configuration = new Configuration().configure();
 
  final StandardServiceRegistryBuilder builder = new StandardServiceRegistryBuilder().applySettings(configuration.getProperties());
    final SessionFactory sessionFactory = configuration.buildSessionFactory(builder.build());
    final Session session = sessionFactory.openSession();
    final Transaction tx = session.beginTransaction();
    for (int i = 0; i < 1_000_000; i++) {
        final Book book = new Book("9781617291999", "Java 8 in Action", new Date());
        session.save(book);
    }
    tx.commit();
    session.close();
 }

The code above does what I’ve described in the beginning: it opens one transaction and saves after inserting 1000000 records. Depending on your hardware you can reach the end of the loop and have a commit called at the end. If you are short on memory your application can throw an OutOfMemoryException because Hibernate stores all new Book instances in the second-level cache which is in the memory.

For testing purposes I increased the insert-count to 10000000 (ten million) to see how long it goes until the application crashes. With almost three million new books I reached the 2 GB memory of the application.

This means if your second-level cache consumes all the memory available for your application your database entries vanish and you can start over again.

Setting a batch size, keeping second-level cache low

To keep the second-level cache size low you can introduce batch sizes in your hibernate.cfg.xml. This will tell the hibernate container that every n rows to be inserted as batch. Batch sizes can be set with the property hibernate.jdbc.batch_size.

The interesting thing is that the documentation of Hibernate not only introduces this property but requires to modify the naive code above too (I will only copy the relevant for loop):

for (int i = 0; i < 1_000_000; i++) {
    final Book book = new Book("9781617291999", "Java 8 in Action", new Date());
    session.save(book);
    if(i % 50 == 0) { // 50 is the batch_size
        session.flush();
        session.clear();
    }
 }

If you look carefully at the code above you can see that the solution is to flush() and clear() the session in batch_size big blocks to keep the second-level cache size low.

So why setting a batch size?

Good question, I have searched the documentation of Hibernate to find some information about this but nothing found. But if we think about it, batching enables an efficient execution of insert and update statements by the database efficiently by grouping a bunch of them together.

Batch size has no guarantee

Setting the batch limit does not guarantee much just that the second-level cache size is kept low because of flushing the data to the database.

However there are some transparent limitations too which you will not see eventually.

One example is if you use the GenerationType.IDENTITY Hibernate transparently disables batching.

The second example is that Hibernate looks at the statements to batch together: if the current statement is the same as the previous one they are brought together if the batch_size is not reached. In the example above the statements are similar so they are batched together. However if we would add Authors along with the Books to the database Hibernate would see alternating statements and would start for each one a batch-group. To solve this problem you can use the hibernate.order_inserts and hibernate.order_updates properties. This makes Hibernate sort up the statements before inserting and so it sees that 50 Book inserts can be batched together and 50 Authors.

Manually saving the data

We have solved the problem with huge memory consumption but what about exceptions while inserting? Rolling back one million inserts because the last one failed is a no-go in most of the cases.

The solution is to manually call commit on the transaction along with the batch size:

for (int i = 0; i < 1_000_000; i++) {
    final Book book = new Book("9781617291999", "Java 8 in Action", new Date());
    session.save(book);
    if(i % 50 == 0) { // 50 is the batch_size
        session.flush();
        session.clear();
        session.getTransaction().commit();
        session.beginTransaction();
    }
 }

The code above commits the transaction if we reach the batch_size and begins a new transaction on the session because the previous one is invalidated already with the commit.

Conclusion

Batching helps only to store data efficient in the database. If you want to have some fail-over mechanism you need to implement a manual commit strategy.

Code Download