<< back

JPA is slow, lets make it stupid fast

JPA is not performant. Mainly because of things like batching, and this incessant idea that querying relationships lazily is somehow fast, its literally just deferring the work, which makes sense if you potentially don't want to use the data, but most people want that data anyways, or serialize it.

I experienced this recently with a massive data set, this database was 1 core table of 4 million rows, followed by multiple child One-To-Many table of close to 10 million rows, and a single One-to-One with ~4 million rows as well. To use the service built on top of this massive data, you had to search with JPA search specifications, which turns out also to be insanely slow. The benchmark starts us at 15 or so minutes, which is obviously unacceptable.

Time to go fast

By enabling

logging.level.org.hibernate.SQL=DEBUG
and
logging.level.org.hibernate.type.descriptor.sql.BasicBinder=TRACE
you're able to see the ridiculous number queries JPA is executing, they're un-countable, there's so many!! How can we fix this?

If you're thinking of using @Batch(100), it could work, only issue is JPA will batch each relationship, so each of our tables will still require n/100 calls where n is the number of rows with relationships. So theoretically a 100x speed bump right? Well, it's actually more like a 33x performance bump, since you still have the overhead of making the database connections, and need to get a lock every time you select. Doing this we moved to around 3-5 minutes depending on load per large call. This however, was still unacceptable.

What if instead we just forced JPA to use OUTER JOIN? Isn't that exactly what the ANSI SQL wizards would tell you to do? Obviously, this is the solution, one single connection, 1 lock, and let the B-Tree optimizations inside of the database handle all of this ugliness for us. To do this you need to use something called @EntityGraph on your JPA repository, what this does is forces JPA to use a OUTER JOIN to join relationships. This even works on nested entities, say I have Entity A with a child B which also has a One-to-Many on C, all you need to do is add the relationship to the entity graph with the @NamedEntityGraph annotation, then provide it with a few nice args like name, and attributeNodes which will point to your sub entity. Then finally on your JPA repository call you stitch the entity graph back together.

                
@Repository
public interface ARepository extends JpaRepository, JpaSpecificationExecutor {
    @Override
    @EntityGraph(attributePaths = { "bs", "B.cs" }) // Where B's are children of A's, and C's are children of B's.
    Page findAll(@Nullable Specification specification, Pageable pageable);
}
                
              

After applying these, the query time went down from 15 minutes to ... *drum-roll* ... 3 seconds. So please, for the love of all that is computationally efficient, force your JPA to use OUTER JOIN, or even roll your own SQL, ORM's really are one of the biggest trade-offs, and they really can bite you. This is why tend to stay in the DBI universe when I write Perl. It's funny shocking Ruby on Rails and Java developers with my ~15ms queries :D