Handling High Performance Operations with Parallelism in Java

Author: Mustafa YENİCE, Project Manager – Defense Application Software

 

Fork / Join Framework

Fork / Join Framework was designed to recursively split a parallelizable task into smaller tasks and then combine the results of each subtask to produce the overall result. [1]

This framework is being used as a standard library with java 1.7. However the progress didn’t stop. It was improved in “JEP 155: Concurrency Updates” in Java 1.8. The improved feature is written as follow.

Added functionality and improved performance for ForkJoinPools that allow them to be used more effectively in the increasingly wider range of applications that users desire. New features include support for completion-based designs that are often most appropriate for IO-bound usages, among others. [2]

Algorithms

The algorithms used in the Fork / Join Framework are similar to the divide and conquer algorithms. With these algorithms it is possible to achieve simple and effective performance. This algorithm is often expressed by the following pseudo code:

 if(task is small enough)
        solve task sequentially
 else {
        split task into independent parts
        fork new subtasks to solve each part
        join all subtasks
        compose result from sub results
 }

As you can see from the above pseudo code, the fork operation starts a new parallel fork / join subtask. The join process is held until the completion of the forked subtask. [3]

Important Classes

To use the Fork / Join Framework, you need to know the following 4 classes. [4]

  • ForkJoinPool: This class is used to run the whole of the fork / join tasks in the whole program.
  • RecursiveTask<V>: This allows you to run the subclass in a ForkJoinPool and return a result.
  • RecursiveAction: This allows you to run the subclass in a ForkJoinPool.
  • ForkJoinTask<V>: It is the superclass of the RecursiveTask<V> and RecursiveAction. Fork and join are defined in this class.

Work stealing

Work stealing algorithm is used to redistribute and balance the tasks in the pool. If we try to explain why we need this algorithm, tasks are split approximately equally across all threads in the ForkJoinPool. Each of these threads include tasks in linked queue and when each task completes the process continues by taking a new job from the head of the queue. As the process progresses in this way, some threads complete the tasks before others do. In this case, some of our threads may be empty. Instead of being idle, thread takes a task from another tail of the queue and starts to work. This process is called a steal operation in this algorithm. This process continues until all the tasks are completed. [1]

 

Conclusions

The Fork / Join Framework speeds up the processing of ready massive tasks, but for a small amount of data, choosing this framework is almost never a winning decision. It is also more accurate to use the Fork / Join Framework in recursive algorithms.

References
1.) Java 8 in Action: Lambdas, streams, and functional-style programming (Raoul-Gabriel Urma, Mario Fusco, and Alan Mycroft)
2.) http://openjdk.java.net/jeps/155
3.) http://dl.acm.org/citation.cfm?id=337465
4.) http://homes.cs.washington.edu/~djg/teachingMaterials/spac/grossmanSPAC_forkJoinFramework.html

Leave a Reply

Your email address will not be published.