First two Apache Spark Tutorials:
http://www.cloudera.com/documentation/enterprise/5-6-x/PDF/cloudera-spark.pdf
https://www.tutorialspoint.com/apache_spark/apache_spark_tutorial.pdf
Spark works on Big data. It is Open Source.
Here is how to install Spark on Ubuntu:
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/
It is considered to be the successor to MapReduce for general purpose data processing on Apache Hadoop clusters. In MapReduce the highest-level unit of computation is a job. In Spark, the highest-level unit of computation is an application.
It exposes APIs for Java, Python and Scala and consists of Spark core and several related projects.
Spark SQL-Module for working with structured data. Allows you to seamlessly mix SQL queries with Spark programs
Spark Streaming-API that allows you to build scalable fault-tolerant streaming applications.
MLlib-API that implements common machine learning algorithms.
GraphX-API for graphs and graph-parallel computation.
Here is an example of how an Apache Spark application works.
The simplest way to run a Spark application is by using the Scala or Python shells.
1. To start one of the shell applications, run one of the following commands:
-Scala
$SPARK_HOME/bin/spark-shell
-Python
$SPARK_HOME/bin/pyspark
2. To run the classic Hadoop word count application, copy an input file to HDFS:
$hdfs dfs -put input
3. Within a shell, run the word count application using the following code exmaples, submitting for namenode_host, path/to/input, and path/to/output
Scala:
scala > val myfile=sc.textFile("hdfs://namenode_host:8020/path/to/input")
scala > val counts=myfile.flatMap (line=>lin.split(" ").map(word =>(word,1)).reduceByKey(_+_)
scala > counts.saveAsTextFile("hdfs://namenode:8020/path/to/output")
Python:
>>>myfile =sc.textFile("hdfs://namenode_host:8020/path/to/input")
>>>counts=myfile.flatMap(lambda line : line.split(" ")).map(lambda word: (word, 1)).reduceByKey(lambda v1,v2:v1+v2)
>>>counts.saveAsTextFile("hdfs://namenode:8020/path/to/output")
The above code works on core Spark. We can also build Spark applications.
Here are some Spark books:
Mastering Apache Spark by Mike Frampton
https://www.amazon.com/Mastering-Apache-Spark-Mike-Frampton-ebook/dp/B0119R8J00
Spark Cookbook by Rishi Yadav
https://www.amazon.com/Rishi-Yadav/e/B012UW5VZE/ref=pd_sim_351_bl_3?_encoding=UTF8&refRID=TNS2TCMGB0KY4NNM9MJ9
I wrote about the solutions to some problems I found from programming and data analytics. They may help you on your work. Thank you.
ezoic
Subscribe to:
Post Comments (Atom)
looking for a man
I am a mid aged woman. I live in southern california. I was born in 1980. I do not have any kid. no compliacted dating. I am looking for ...
-
I tried to commit script to bitbucket using sourcetree. I first cloned from bitbucket using SSH, and I got an error, "authentication ...
-
https://github.com/boto/boto3/issues/134 import boto3 import botocore client = boto3.client('s3') result = client.list_obje...
-
There are some fun tools on Mac/PC which can help you on your studies, life and research. 1. Evernote: https://evernote.com/ To downl...
No comments:
Post a Comment