Learn x in y minutes, an interesting website:
https://learnxinyminutes.com/
I wrote about the solutions to some problems I found from programming and data analytics. They may help you on your work. Thank you.
ezoic
Friday, April 28, 2017
Scala command line methods, are you sure [y/n]
Scala command line methods, are you sure [y/n]. How to do this? Here are some examples:
https://tpolecat.github.io/tuco/docs/todo-list.html
https://learnxinyminutes.com/docs/scala/
http://www.scala-lang.org/docu/files/ScalaTutorial.pdf
http://stackoverflow.com/questions/2315912/scala-best-way-to-parse-command-line-parameters-cli
http://stackoverflow.com/questions/25788986/how-to-use-scala-to-write-command-line-tools
A chinese one:
https://twitter.github.io/scala_school/zh_cn/sbt.html
Here is some source code:
https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape.scala
https://github.com/foundweekends/conscript
https://github.com/scalastyle/scalastyle-sbt-plugin/blob/master/src/main/scala/org/scalastyle/sbt/Plugin.scala
A linux command line example:
https://www.pluralsight.com/guides/other/beginner-linux-navigation-manual
https://tpolecat.github.io/tuco/docs/todo-list.html
https://learnxinyminutes.com/docs/scala/
http://www.scala-lang.org/docu/files/ScalaTutorial.pdf
http://stackoverflow.com/questions/2315912/scala-best-way-to-parse-command-line-parameters-cli
http://stackoverflow.com/questions/25788986/how-to-use-scala-to-write-command-line-tools
A chinese one:
https://twitter.github.io/scala_school/zh_cn/sbt.html
Here is some source code:
https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape.scala
https://github.com/foundweekends/conscript
https://github.com/scalastyle/scalastyle-sbt-plugin/blob/master/src/main/scala/org/scalastyle/sbt/Plugin.scala
A linux command line example:
https://www.pluralsight.com/guides/other/beginner-linux-navigation-manual
Thursday, April 27, 2017
Monday, April 24, 2017
math and computer science
I have a Phd in statistics, a mathematical subject. I was not very outstanding in statistics though. That kind of subject is so hard that after you finish studying it, you kinda do not have the effort to do further a lot of things on it.
But, grandually, I started learning computer science, or more speicfically, programming, I found it was easier than math. Or that's my illusion? because the beginning of math is sought of easier too, and gradually it becomes more and more difficult. But how ahout computer science or i.e. programming? I feel it is easier now, is that because I am at the beginning of computer science? though I have had 5 years' experience on cs/it etc industry.
I feel CS is more practical, with it we can "build" things, not like math, more abstract, tricks.
But, grandually, I started learning computer science, or more speicfically, programming, I found it was easier than math. Or that's my illusion? because the beginning of math is sought of easier too, and gradually it becomes more and more difficult. But how ahout computer science or i.e. programming? I feel it is easier now, is that because I am at the beginning of computer science? though I have had 5 years' experience on cs/it etc industry.
I feel CS is more practical, with it we can "build" things, not like math, more abstract, tricks.
Wednesday, April 19, 2017
Tuesday, April 18, 2017
If you want to use assembly sbt, how to do it and the layout and the error fix
The way to build sbt package on linux, is first, to make some directories,
myProject/src/main/scala
And under scala, you put sbt file and scala files, and you make a "project" directory under scala, and put build.sbt under "project" directory which has content:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
resolvers += Resolver.url("artifactory", url("http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
resolvers += "Spray Repository" at "http://repo.spray.cc/"
addSbtPlugin("com.eed3si9n" %% "sbt-assembly" % "0.11.2")
resolvers += "jitpack" at "https://jitpack.io"
myProject/src/main/scala
And under scala, you put sbt file and scala files, and you make a "project" directory under scala, and put build.sbt under "project" directory which has content:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
resolvers += Resolver.url("artifactory", url("http://scalasbt.artifactoryonline.com/scalasbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/"
resolvers += "Spray Repository" at "http://repo.spray.cc/"
addSbtPlugin("com.eed3si9n" %% "sbt-assembly" % "0.11.2")
resolvers += "jitpack" at "https://jitpack.io"
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
And I found the error after I "sbt package" under "scala" directory
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
module not found: com.eed3si9n#sbt-assembly;0.11.2
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Some ppl said that it was because, you did not put correct scalaVersion there.
But I found that if I "sudo yum update", I resolved the problem.
So when you use the linux, first thing to do is to "sudo yum update", this is for red hat. for other linux, try
"sudo apt-get update".
Monday, April 17, 2017
Wednesday, April 12, 2017
Amazon EMR is an Amazon data science analysis box
Amazon EMR is an Amazon data science analysis box.
You can ssh to its linux interface using putty. And you will see this:
And itself has spark. You type "spark-shell" , and go to spark interface.
It itself does not have sbt. Its system is Redhat. So you use yum to install sbt.
First type "sudo yum update"
Here is how on a webpage.
http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
Then type
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
Then type:
sudo yum install sbt
And you will install sbt on your EMR. You do not need to install spark additionally, it already has spark on EMR.
And you can use "sbt package" to package your scala code to jar files.
You can ssh to its linux interface using putty. And you will see this:
And itself has spark. You type "spark-shell" , and go to spark interface.
It itself does not have sbt. Its system is Redhat. So you use yum to install sbt.
First type "sudo yum update"
Here is how on a webpage.
http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
Then type
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
Then type:
sudo yum install sbt
And you will install sbt on your EMR. You do not need to install spark additionally, it already has spark on EMR.
And you can use "sbt package" to package your scala code to jar files.
Monday, April 10, 2017
Spark data type convert, change
val text=sc.textFile("README.md")
text is an RDD. README.md is a Spark system file.
change rdd to dataframe:
val text1=text.toDF
convert list to RDD:
val textfile1=List(("Ale",1259,278,6782))
textfile1 is a list.
change list to RDD.
val textfile2=sc.parallelize(textfile1)
convert list to dataframe:
val aa=List(("a","k","u","t"))
val ak=aa.map{case (a, b, c, d) => (a, b, c, d)}
val au=ak.toDF
println(au.show)
val aa=List(List("a","k","u","t"))
val ak=aa.map{case List(a, b, c, d) => (a, b, c, d)}
val au=ak.toDF
println(au.show)
change RDD to list:
val text1=sc.textFile("README.md")
text1 is an RDD
change RDD to array:
val text2=text1.collect()
text is an RDD. README.md is a Spark system file.
change rdd to dataframe:
val text1=text.toDF
convert list to RDD:
val textfile1=List(("Ale",1259,278,6782))
textfile1 is a list.
change list to RDD.
val textfile2=sc.parallelize(textfile1)
convert list to dataframe:
val aa=List(("a","k","u","t"))
val ak=aa.map{case (a, b, c, d) => (a, b, c, d)}
val au=ak.toDF
println(au.show)
val aa=List(List("a","k","u","t"))
val ak=aa.map{case List(a, b, c, d) => (a, b, c, d)}
val au=ak.toDF
println(au.show)
change RDD to list:
val text1=sc.textFile("README.md")
text1 is an RDD
change RDD to array:
val text2=text1.collect()
Spark scala, change the data type of the RDD
code:
val aa=sc.parallelize(List(("Ale",12,2783,89277)))
val ak = aa.map ( {case(w,k,r,u)=>(w,k,r,u.toLong)})
print(ak.toDF.show())
Change the last integer's type to Long.
val aa=sc.parallelize(List(("Ale",12,2783,89277)))
val ak = aa.map ( {case(w,k,r,u)=>(w,k,r,u.toLong)})
print(ak.toDF.show())
Change the last integer's type to Long.
Saturday, April 8, 2017
Spark RDD operations
Spark RDD operations:
Array.sortBy(_._2)
sortByKey()
https://www.tutorialspoint.com/apache_spark/apache_spark_tutorial.pdf
Wednesday, April 5, 2017
How to delete a directory by force
How to delete a directory by force
sudo rm -rf directory-to-delete
sudo rm -rf directory-to-delete
install spark properly
http://stackoverflow.com/questions/38618460/how-to-properly-build-spark-2-0-from-source-to-include-pyspark
- Install sbt
- Build:
http://spark.apache.org/downloads.html
cd spark git checkout v2.0.0 sbt package
Tuesday, April 4, 2017
Scala Spark get date type:
Scala Spark get date type:
http://nuvostaq.github.io/BigDataSpark/2016/01/24/ExtractingData.html
i.e
val dateType = DateTypeConverter.GetType(startDt)
Spark SQL, get day of week:
http://stackoverflow.com/questions/25006607/how-to-get-day-of-week-in-sparksql
How to run spark scala on command line
How to run spark scala on command line
https://asimjalis.tumblr.com/post/112174265249/how-to-run-scala-script-on-spark
i.e.:
https://asimjalis.tumblr.com/post/112174265249/how-to-run-scala-script-on-spark
i.e.:
spark-shell -i myscript.scala
or you use jar.
Subscribe to:
Posts (Atom)
looking for a man
I am a mid aged woman. I was born in 1980. I do not have any kid. no complicated dating before . I am looking for a man here for marriage...
-
I tried to commit script to bitbucket using sourcetree. I first cloned from bitbucket using SSH, and I got an error, "authentication ...
-
Previously, I wanted to install "script" on Atom to run PHP. And there was some problem, like the firewall. So I tried atom-runner...
-
https://github.com/boto/boto3/issues/134 import boto3 import botocore client = boto3.client('s3') result = client.list_obje...