Thursday, May 8, 2014

MapReduce Implementations - Hazelcast Vs Infinispan

I was testing the MapReduce implementation of Hazlecast with the recent release of Hazelcast 3.2. Then I decided to compare the performance with the Infinispan 6.0.2 MapReduce implementation.
Infinispan outperforming Hazelcast MapReduce implementation
Infinispan outperformed Hazelcast in the sample MapReduce implementation tested on different scenarios, in a single instance, as shown by the figure. Infinispan still outperformed Hazelcast in the nodes up to 6.

Is Infinispan really faster than Hazelcast? Probably it is, as shown by scala-map-benchmarks. Probably, it is something to do with the scenarios, as discussed in Hazelcast group. However, this difference is huge, unlike the previous benchmarks. My opinion is, it is something to do with the still immature MapReduce implementation of Hazelcast, as Hazelcast proven to be quite effective for my other distributed execution tasks. 

If your use case is centred around the MapReduce implementation, I would suggest Infinispan over Hazelcast, as Hazelcast implementation is quite buggy as of 3.2. I have encountered 3 issues so far - a known issue #2105 that was reproduced during MapReduce executions and two other (probably MapReduce implementation specific) issues that I reported - #2354 (Update: This issue has been fixed for 3.2.2 and 3.3 versions of Hazlecast. Thanks Noctarius for attending to this) and #2359. Hazelcast MapReduce might turn to be more scalable and highly performing, once these issues are addressed.

It should be noted that the API of the initial roots of Hazelcast MapReduce implementation (code-named, CastMapR) was inspired heavily by that of the stable and matured MapReduce implementation of Infinispan. The Hazelcast word-count MapReduce example hence follows the same design of that from Infinispan.

I am using Hazelcast 3.2 and Infinispan 6.0.2 for my master thesis at INESC-ID Lisboa. Wait for more updates from the awesome Lisbon. ^_^


Note:
These results are part of the paper given below, which was published in 2014 December. Please cite the paper, if you used these results in your research work.
Kathiravelu, P. & L. Veiga (2014). An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures. In IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC 2014), London, UK. pp. 79 – 88. IEEE Computer Society.

4 comments:

  1. For Infinispan 7.0 we have implemented additional performance map/reduce improvements (parallel map and reduce execution etc). Stay tuned and see http://blog.infinispan.org/2014/02/mapreduce-parallel-execution.html for more details.

    ReplyDelete
  2. Thanks for the link. That sounds more promising. However, as of now, Infinispan 7.0.x is tagged as unstable at http://infinispan.org/download/
    So I decided to go with 6.0.2.Final which is the latest and stable version, for now.

    ReplyDelete
  3. Kathiravelu, now that Infinispan 7.1.0.Final is out, could you please re-run your test ?

    ReplyDelete
  4. Thanks Tristan for the message.

    These results are part of the paper we published last year:
    Kathiravelu, P. & L. Veiga (2014). An Adaptive Distributed Simulator for Cloud and MapReduce Algorithms and Architectures. In IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC 2014), London, UK. pp. 79 – 88. IEEE Computer Society.

    I am aware of the release of Infinispan 7.1.0.Final, as I am using it for my current research work (which I hope to publish this year as well). I will update the blog with the results, when available.

    I may rerun the tests with the latest version of Infinispan and Hazelcast 3.4, when I have time.

    ReplyDelete

You are welcome to provide your opinions in the comments. Spam comments and comments with random links will be deleted.