val text=sc.textFile("README.md")
text is an RDD. README.md is a Spark system file.
change rdd to dataframe:
val text1=text.toDF
convert list to RDD:
val textfile1=List(("Ale",1259,278,6782))
textfile1 is a list.
change list to RDD.
val textfile2=sc.parallelize(textfile1)
convert list to dataframe:
val aa=List(("a","k","u","t"))
val ak=aa.map{case (a, b, c, d) => (a, b, c, d)}
val au=ak.toDF
println(au.show)
val aa=List(List("a","k","u","t"))
val ak=aa.map{case List(a, b, c, d) => (a, b, c, d)}
val au=ak.toDF
println(au.show)
change RDD to list:
val text1=sc.textFile("README.md")
text1 is an RDD
change RDD to array:
val text2=text1.collect()
I wrote about the solutions to some problems I found from programming and data analytics. They may help you on your work. Thank you.
ezoic
Monday, April 10, 2017
Spark scala, change the data type of the RDD
code:
val aa=sc.parallelize(List(("Ale",12,2783,89277)))
val ak = aa.map ( {case(w,k,r,u)=>(w,k,r,u.toLong)})
print(ak.toDF.show())
Change the last integer's type to Long.
val aa=sc.parallelize(List(("Ale",12,2783,89277)))
val ak = aa.map ( {case(w,k,r,u)=>(w,k,r,u.toLong)})
print(ak.toDF.show())
Change the last integer's type to Long.
Saturday, April 8, 2017
Spark RDD operations
Spark RDD operations:
Array.sortBy(_._2)
sortByKey()
https://www.tutorialspoint.com/apache_spark/apache_spark_tutorial.pdf
Wednesday, April 5, 2017
How to delete a directory by force
How to delete a directory by force
sudo rm -rf directory-to-delete
sudo rm -rf directory-to-delete
install spark properly
http://stackoverflow.com/questions/38618460/how-to-properly-build-spark-2-0-from-source-to-include-pyspark
- Install sbt
- Build:
http://spark.apache.org/downloads.html
cd spark git checkout v2.0.0 sbt package
Tuesday, April 4, 2017
Scala Spark get date type:
Scala Spark get date type:
http://nuvostaq.github.io/BigDataSpark/2016/01/24/ExtractingData.html
i.e
val dateType = DateTypeConverter.GetType(startDt)
Spark SQL, get day of week:
http://stackoverflow.com/questions/25006607/how-to-get-day-of-week-in-sparksql
How to run spark scala on command line
How to run spark scala on command line
https://asimjalis.tumblr.com/post/112174265249/how-to-run-scala-script-on-spark
i.e.:
https://asimjalis.tumblr.com/post/112174265249/how-to-run-scala-script-on-spark
i.e.:
spark-shell -i myscript.scala
or you use jar.
Subscribe to:
Posts (Atom)
looking for a man
I am a mid aged woman. I was born in 1980. I do not have any kid. no complicated dating before . I am looking for a man here for marriage...
-
Previously, I wanted to install "script" on Atom to run PHP. And there was some problem, like the firewall. So I tried atom-runner...
-
I tried to commit script to bitbucket using sourcetree. I first cloned from bitbucket using SSH, and I got an error, "authentication ...
-
https://github.com/boto/boto3/issues/134 import boto3 import botocore client = boto3.client('s3') result = client.list_obje...