November 2016

Apache Spark RDD

By | 2017-07-22T17:22:25+00:00 November 8th, 2016|Spark|

In this tutorial we'll learn about RDD (Re-silent Distributed Data sets) which is the core concept of spark. RDD is an immutable (read-only) collection of objects, distributed in the cluster. RDD can be created from storage data or from other RDD by performing any operation on it. Why RDD: In Older Map Reduce paradigm, the [...]

October 2016

Apache Spark Introduction

By | 2017-07-22T17:22:36+00:00 October 13th, 2016|Spark|

In this very first tutorial of Spark we are going to have an introduction of Apache Spark and its core concept RDD. What is Apache Spark? Apache Spark is an open source general purpose cluster computational engine. Spark was born out of the necessity to prove out the concept of Mesos, in the AMPLab at [...]