Translate

Friday, August 17, 2012

Performance comparison of Executor framework vs Fork/Join framework’s RecursiveTask feature in java or JDK7

I was testing new RecursiveTask feature of ForkJoin framework introduced in JDK7 or Java 1.7.

There are two types of ForkJoinTask specializations:
  1. Instances of RecursiveAction represent executions that do not yield a return value.
  2. In contrast, instances of RecursiveTask yield return values.
I have already posted an example of RecursiveAction in previous post. To know more about theoretic details and RecursiveAction sample code, you can visit the link: How to use Fork-Join Framework features in JDK7?


In this post I will do comparison of Executor Framework Vs ForkJoin framework's RecursiveTask feature. Following example fills primitive long array of size 100000000 with values from 0 to 10000000.

I use executor framework to calculate the sum of whole array  then I perform same logic using ForkJoin framework's RecursiveTask feature. To setup the example, I will first show the code which uses executor framework and its result then I will show ForkJoin framework's RecursiveTask feature implementing same logic and its result. Array sum logic in both samples was executed 10 times to get the average output and performance difference.

Update 1: Anonymous commenter advised to split the executor pool in even size (4 = number of available processors in my case) to maximize the through put. Advise well taken and updated blog as well. There is defiantly improvement in performance.

Executor framework code sample:

import static java.util.Arrays.asList;

import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class ExecutorSum {
    Random random = new Random();

    // Fill array with its own index value
    public void fillArray(long[] array) {
        for (int i = 0; i < array.length; i++) {
            array[i] = i;
        }
    }

    public static void main(String[] args) throws InterruptedException,
            ExecutionException {
        ExecutorSum sum = new ExecutorSum();
        long[] array = new long[10_00_00_000];
        sum.fillArray(array);

        for (int i = 0; i < 10; i++) {
            int processors = Runtime.getRuntime().availableProcessors();
            ExecutorService executor = Executors.newFixedThreadPool(processors);
            List<Future<Long>> results;
            long start = System.currentTimeMillis();
          
            // array size/No. of processors
            int splitCount = array.length / processors;
          
            //Split pool size into even size for maximum through put
            results = executor.invokeAll(asList(new ArraySum(array, 0,
                    splitCount), new ArraySum(array, splitCount + 1,
                    splitCount * 2), new ArraySum(array, (splitCount * 2) + 1,
                    splitCount * 3), new ArraySum(array, (splitCount * 3) + 1,
                    array.length)

            ));

            executor.shutdown();

            // Calculating final result sum
            long count = 0;
            for (Future<Long> result : results) {
                count = count + result.get();
            }

            System.out.println("result: " + count);
            System.out.println("Sequential processing time: "
                    + (System.currentTimeMillis() - start) + " ms");
        }
    }
}

class ArraySum implements Callable<Long> {
    private final long from;
    private final long to;
    private long[] array;

    ArraySum(long[] array, long from, long to) {
        this.from = from;
        this.to = to;
        this.array = array;
    }

    @Override
    public Long call() throws Exception {
        long count = 0L;
        // Calculating sum of the given array range
        for (int i = (int) from; i < to; i++) {
            count = count + array[i];
        }

        return count;
    }
}

Output:

result: 4999999800000000
Sequential processing time: 113 ms
result: 4999999800000000
Sequential processing time: 108 ms
result: 4999999800000000
Sequential processing time: 113 ms
result: 4999999800000000
Sequential processing time: 124 ms
result: 4999999800000000
Sequential processing time: 122 ms
result: 4999999800000000
Sequential processing time: 107 ms
result: 4999999800000000
Sequential processing time: 123 ms
result: 4999999800000000
Sequential processing time: 134 ms
result: 4999999800000000
Sequential processing time: 122 ms
result: 4999999800000000
Sequential processing time: 109 ms

Fork/Join framework code sample:

import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;

public class ForkJoinSumTask {
   
    public void fillArray(long[] array) {
        for (int i=0; i<array.length; i++) {
            array[i] = i;
        }
    }

    public static void main(String[] args) {
        ForkJoinSumTask sum = new ForkJoinSumTask();
        long[] array = new long[10_00_00_000];
        sum.fillArray(array);
       
        System.out.println("Number of processors available: " + Runtime.getRuntime().availableProcessors());
       
        ForkJoinPool fjpool = new ForkJoinPool(32); //Default parallelism level = Runtime.getRuntime().availableProcessors()
       
        for (int i=0; i<10; i ++) {
            RecursiveSumTask task = new RecursiveSumTask(array, 0, array.length);
            long start = System.currentTimeMillis();
            System.out.println("Result: " + fjpool.invoke(task));
            System.out.println("Parallel processing time: "    + (System.currentTimeMillis() - start)+ " ms");
        }
       
        System.out.println("Number of steals: " + fjpool.getStealCount() + "\n");
    }
}

class RecursiveSumTask extends RecursiveTask<Long> {
    private static final long serialVersionUID = 1L;
    private final long from;
    private final long to;
    private long[] array;
    final int splitSize = 10_00_0000; //Some threshold size to spit the task

    RecursiveSumTask(long[] array, long from, long to) {
        this.from = from;
        this.to = to;
        this.array = array;
    }

    @Override
    protected Long compute() {
        long count = 0L;
        List<RecursiveTask<Long>> forks = new LinkedList<>();
       
        if ( to - from > splitSize) {
            // task is huge so divide in half
            long mid = (from + to) >>> 1;
           
            //Divided the given task into task1 and task2
            RecursiveSumTask task1 = new RecursiveSumTask(array, from, mid);
            forks.add(task1);
            task1.fork();
           
            RecursiveSumTask task2 = new RecursiveSumTask(array, mid, to);
            forks.add(task2);
            task2.fork();
        } else {
            //Calculating sum of the given array range
            for (int i = (int) from; i < to; i++) {
                count = count + array[i];
            }
        }
       
        //Waiting for the result
        for (RecursiveTask<Long> task : forks) {
            count = count + task.join();
        }
       
        return count;
    }
}

Output:

Number of processors available: 4
Result: 4999999950000000
Parallel processing time: 135 ms
Result: 4999999950000000
Parallel processing time: 123 ms
Result: 4999999950000000
Parallel processing time: 105 ms
Result: 4999999950000000
Parallel processing time: 115 ms
Result: 4999999950000000
Parallel processing time: 111 ms
Result: 4999999950000000
Parallel processing time: 114 ms
Result: 4999999950000000
Parallel processing time: 114 ms
Result: 4999999950000000
Parallel processing time: 109 ms
Result: 4999999950000000
Parallel processing time: 126 ms
Result: 4999999950000000
Parallel processing time: 128 ms
Number of steals: 301

Conclusion:

Both code samples calculate array sum 10 times to warmup HotSpot JVM and get average result. As you can see from both outputs, there is no visible improvement in the performance using RecursiveTask. May be my sum logic is small and not so complex. If the logic is more complex then performance difference will be lot more.

References:
  1. How to use Fork-Join Framework features in JDK7?  
  2. Fork and Join: Java Can Excel at Painless Parallel Programming 
  3. JDK 7 Adoption Guide

3 comments:

  1. For the ExecutorSum benchmark, it is unclear to me why you create and submit three ArraySums of very uneven size. I would have expected four ArraySums (given that the threadpool has 4 threads) and that these ArraySums have equal size.

    ReplyDelete
    Replies
    1. You are right. I was trying to get various combinations to get best possible output from both examples. I tried what you suggested as well but it did not have much impact on the performance so I let the example be that way.

      You can try same example & make those changes and see if you can extract better performance. If you can get better results then please share I will update my post with your results.

      Delete
  2. Anonymous commenter advised to split the executor pool of ExecutorSum example into even size (4 = number of available processors in my case) to maximize the through put. Advise well taken and updated blog post as well. There is defiantly improvement in the performance.

    Does Anyone have idea about how to speed up Fork/Join framework example performance?

    ReplyDelete