spark and scala programming | spark and scala introduction | spark and scala tutorial | spark

Spark and Scala by Akkem Sreenivasulu
=====================================
–| It is Bigdata Processing Framework
–| Spark can implmented using
Java
Scala
Python
R
–| It can process any Filesystem.
–| It is a InMemory(RAM) processing Framework.
–| It is from “Apache”
Before Spark we have already
MapReduce, Pig etc.. Bigdata Processing Frameworks.
MapReduce vs Spark:-
==================
1. MapReduce we can implment using Java, Python, etc..
Spark we can implment using Java, Scala, Python and R
2. In Hadoop Along with MapReduce we have
Pig — Scripting Framework — It is very simple and easy and very less
code
Hive — is like SQL
— HQL — Hive Query Language
Flume — It is Streaming Framework — Performs only streaming.

In Spark
———
Spark Core — RDD Programming — Java/Python/Scala
Spark SQL — DataFrames, Tables, Datasets
— DSL — Domain Specific Language or Native SQL Queries
DSL: df.select(*)
SQL: select * from df;
Spark Streaming — It is Streaming — Performs streaming + Live Analytics
Spark MLib — It is a Machine Learning Library
Spark GraphX — Graph Data Processing.

3. MapReduce tightly coupled with HDFS Filesystem
Spark can Process any Filesystem.

4. MapReduce uses Disk Memory and InMemory for Processing.
Spark by default uses InMemory for Processing
Spark is 100 times faster than MapReduce in InMemory Processing
Spark is 10 times faster than MapReduce in Disk Processing
Note:
MapReduce is a Fastest Processing Framework before Spark.

5. In Hadoop we have
MapReduce, Pig, Hive, Flume, Sqoop etc.. but we cannot combine
all these in single application to meet my requirement.

But In Spark we can combine Spark Core, Spark SQL, Spark Streaming,
Spark MLib, Spark Graphx in a Single Application.

6. MapReduce we can run only on “YARN” Environment
YARN means it is a runtime environment in Hadoop
YARN has
ResourceManager — Master — Takes request
NodeManager — Slave — Process Requests

Spark we can run on multiple environments:
Spark Standalone Cluster(Apache)
YARN Cluster (Apache)
Mesos Cluster(Apache)
What we are going to discuss as part of this course,
—————————————————
Spark Developer:
===============
1. Spark Foundation
2. Scala
3. Spark Core
4. Spark SQL
5. Spark Streaming
6. Spark Integrations
different filesystems like hdfs,csv,json,xml, etc..
rdbms–mysql
NoSQL — Cassandra and HBase
Kafka
Hive
Duration: min:35days max:40days — 35hrs to 40hrs

Spark Machine Learning Developer:
=================================
Spark MLib
Spark GraphX
Duration: 35hrs to 40hrs

JobS:
Spark Developers
Spark Machine Learning Developer

Thanks,
A.Sreenivasulu
Python Academy
(A Subsidery of CFamily IT Solutions Pvt Ltd)
www.pythonacademy.co
eMail:
[email protected]
[email protected]
contact:9133161144

Post Author: hatefull