If you are looking into why some tests written in TestNG fail randomly for no reason when run on Apache Maven®, and you want to quickly assess your code for thread safety of your tests and code, or, better yet, if you want to improve the time taken for your large bulky test suites to be run, then we might have some information to help as we explore how we can execute tests in parallel using Maven.
In this blog post, I will be explaining the story behind the basis of my exploration, and will also touch on how to use the parallel test techniques to test your code.
First thing’s first, let’s start with the story. Once upon a time in a land far far away, there was this Software Engineer (not me in case you are wondering!) who created an error-free pull request after completing the testing on his local environment, and the test suite was all green in his local environment. The next step was to run the test suite for that pull request in the CI/CD environment, and that also passed with flying colors the first time through. So, the engineer went back to his other work until such time that one of his colleagues raised the fact that the test suite was failing in CI/CD environment in the main development branch. Not for a moment did the engineer think that this would be as a result of his change, because he was confident of his work as the tests passed both locally and in the CI/CD environment. However, when he checked, the failing tests were related to his change, and tests had started failing randomly after his merge. And that’s how he started his laborious journey of troubleshooting the random failing tests.
Chapter 1 – Searching in the Dark
By now you would have guessed that the main protagonist in the story (or the antagonist whichever you want to call them) is none other than me. My first goal was to recreate the failures locally. So I ran the test suite locally multiple times using IntelliJ (which was a bad idea from the start), and found that every test was passing every time. I went through the code change and found it was no easy task to identify any issues because it was a somewhat bigger change. I added some additional logs to the failing tests as well, and the only thing the logs revealed was that some calculations based on the dates were not happening as expected and the calculations were incorrect. The painful part was, I had to wait for nearly an hour to get the results in the CI/CD environment (that’s the time it takes to run the whole suite in the CI/CD environment) because there was no way we could test this locally. So, I was stuck at this point without any ideas.
Chapter 2 – A Glimpse of Light
So I was stuck without any leads and our team lead suggested that I should talk to the internal team which handles the CI/CD specific queries. They suggested running the tests in Maven to see if anything comes about. This was the point where I got to know the things we could do when running tests with Maven and started to explore a bit more.
Chapter 3 – Into the Light We Go
In order to execute the Maven tests, we needed the Maven Surefire plugin. The Surefire plugin can work with the test frameworks JUnit and TestNG and by default, Surefire automatically includes all test classes whose name starts with Test, or ends with Test, Tests, or TestCase. We can include the following plugin in the POM file and we are ready to go.
When I ran the tests in maven, voila! The tests were failing randomly. Now I needed to understand why the tests were failing. As you can see in the configuration, there is a <includes> option where we can specify the files to be tested. You can also exclude the files to be tested by specifying <excludes>tag. So, as the first step, I included only the failing test class by specifying <include>**/NodeMonthsStepTest.java</include>Tests were passing. Then I started including other tests until the test in question started failing. And that is how I finally found the culprit.
Chapter 4 – The Culprit
We have been using joda time which is a date time library for java. As a result of my change, I had to move a test class between modules and that test class was using joda.time.DateTimeUtils.setCurrentMillisFixed method which sets a static member inside the DateTimeUtils. This static member is used by the failing test class for its own calculations. The tests were executing in parallel and the newly moved test class was causing the other tests to fail randomly because of this race condition. The fix was to remove the usage of setCurrentMillisFixed altogether and use the available java DateTime to get the desired effect for the newly migrated test class.
Chapter 5 – Exploration
Now, you might be wondering why the tests were not failing in IntelliJ in the first place. That was because intelliJ was running the tests sequentially and the migrated test was never executed in parallel with the failing test. Now, let’s explore a bit on how we can execute the tests in parallel using the maven Surefire plugin and the configurations related to that.
As you may have observed, we have <parallel>classes</parallel> and <threadCount>10</threadCount> in Surefire configuration which directly relates to the parallel execution of tests. For <parallel> we can use classes or methods as parameters where we can control the parallel execution of classes or for a more granular approach, “methods” in a single test class. Secondly, we can define the total number of threads we want Surefire to create by defining <threadCount>10</threadCount> .
This is a great way to assess your code for thread safety of your tests and code and also to reduce the time taken for a full test suite run. But you have to be careful that your tests are thread safe depending on the parallel mechanism you choose. For example, if we choose methods as the parallel execution mechanism to improve the timing, you will want to think twice about sharing class variables among methods in the same class.
Chapter 6 – There and Back Again
What we explored was only a tiny part of Surefire, and there’s a whole lot more to the Surefire plugin if you are willing to explore. If you go through the reference, you will see that the Surefire plugin is not limited only to TestNG but can work on JUnit tests as well, and there are a bunch of JUnit configurations in addition to TestNG ones in Surefire plugin for you to play with. I hope you got a basic idea on how we can parallelly execute the TestNG tests in order to test the thread safety of your tests/code and also on how to improve the overall test execution times for large test suites.