Spark & MapReduce: Introduction, Differences & Use Case

In this post, we will cover the Overview of Spark & MapReduce, and we will cover the brief difference between Spark & MapReduce, also we will going to discuss some of the use cases examples of Spark & MapReduce.

We will discuss How did Spark become so efficient in data processing compared to MapReduce?

Overview

MapReduce

MapReduce is a programming engine for processing and generating large data sets with a parallel, distributed algorithm on a cluster of the computer.
MapReduce is composed of several components, including :

JobTracker — The master node that manages all jobs and resources in a cluster
TaskTrackers — Agents deployed to each machine in the cluster to run the map and reduce tasks
JobHistoryServer — A component that tracks completed jobs, and is typically deployed as a separate function or with JobTracker

SPARK

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.

It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Spark serves several languages Scala, Python, R, and Java.

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics.

Now, when you understand the basic overview of Spark & MapReduce, let’s check some main difference between both.

Difference Between Spark & MapReduce

Spark stores data in-memory whereas MapReduce stores data on disk. Hadoop uses replication to achieve fault tolerance whereas Spark uses different data storage model, resilient distributed datasets (RDD), uses a clever way of guaranteeing fault tolerance that minimizes network I/O.

MapReduce, Spark, Use Cases, Diffrence, spark vs hadoop

Spark’s Major Use Cases Over MapReduce

Iterative Algorithms in Machine Learning
Interactive Data Mining and Data Processing
Spark is a fully Apache Hive-compatible data warehousing system that can run 100x faster than Hive.
Stream processing: Log processing and Fraud detection in live streams for alerts, aggregates, and analysis
Sensor data processing: Where data is fetched and joined from multiple sources, in-memory dataset really helpful as they are easy and fast to process.

Below are Some Use Cases & Scenarios That Will Explain the Benefits & Advantages of Spark over MapReduce.

Some scenarios have solutions with both MapReduce and Spark, which makes it clear as to why one should opt for Spark when writing long codes.

Scenario 1: Simple word count example in MapReduce and Spark

The same code in MapReduce.

MapReduce

Step 1: Create a text file on which processing is to be done.
hadoop fs -mkdir -p /user/$USER/input

Step 2: Copy the text file from local file system to hdfs
hadoop fs -copyFromLocal sample.txt input

Step 3: Create wordcount file

package wc;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName(“WordCount”);
job.setJarByClass(WordCount.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
return job.waitForCompletion(true) ? 0 : 1;
}
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
Mapper.Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
context.write(key, new IntWritable(sum));
}
}
}

Step 4: Execute the jar file
jar cf wordcount.jar WordCount*.class
hadoop jar wordcount.jar WordCount input output

Step 5: Check the output from the two partitions
hadoop fs -tail output/part-r-00000 | tail > sample-tail.out

Spark

Step 1: Open up the spark-shell (Scala or python)

Step 2: In scala shell

val rdd1=sc.textFile(“sample.txt”)
val rdd2=rdd1.flatMap(line => line.split( ))
val rdd3=rdd2.map(word => (word,1))
val rdd4=rdd3.reduceByKey((v1,v2)=>(v1+v2))
rdd4.collect()
rdd4.saveAsTextFile(“/user/input/wordcount”)

VERDICT

The 100 lines of code of a simple Word Count Program have been limited to just less than 10 lines. It shows the efficiency of Spark and ease in code.

You will get to know all of this and deep-dive into each concept related to Hadoop Development & Apache Spark, once you will get enrolled in our Hadoop Developer Using Apache Spark

Another question, which might come to your mind, What are all the things you will get when you enrolled!!

We are glad to tell you that:

Things you will get!!

Live Instructor-led Online Interactive Sessions
FREE unlimited retake for next 1 Years
FREE On-Job Support for next 1 Years
Training Material (Presentation + Step by Step Hands-on Guide)
Recording of Live Interactive Session for Lifetime Access
100% Money Back Guarantee (If you attend sessions, practice and don’t get results, We’ll do full REFUND, check our Refund Policy)

Reference & Related

Next Task for you:

Did you get a chance to download FREE Guide on Big Data Hadoop Development? If not, then get it now by clicking on the link below.

The post Spark & MapReduce: Introduction, Differences & Use Case appeared first on Oracle Trainings.

Spark & MapReduce: Introduction, Differences & Use Case

Overview

MapReduce

SPARK

Difference Between Spark & MapReduce

Spark’s Major Use Cases Over MapReduce

Below are Some Use Cases & Scenarios That Will Explain the Benefits & Advantages of Spark over MapReduce.

MapReduce

Spark

VERDICT

Things you will get!!

Reference & Related

Next Task for you:

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List