a book named on intelligence by jeff hawkins.
This is a website the author building for the book
http://www.onintelligence.org/
I wrote about the solutions to some problems I found from programming and data analytics. They may help you on your work. Thank you.
ezoic
Thursday, December 19, 2019
Saturday, November 23, 2019
An good article on building data analysis portfolio
https://blog.udacity.com/2016/02/how-to-build-a-data-analysis-portfolio-that-will-get-you-hired.html
An example of the data analysis portfolio
http://matatat.org/blog/markdown-latex-react
use jupyter and pelican
another good example of data analysis portfolio
http://davidventuri.com/portfolio#scroll
one of the portfolio on github
https://davidventuri.github.io/eda-project/
An example of the data analysis portfolio
http://matatat.org/blog/markdown-latex-react
use jupyter and pelican
another good example of data analysis portfolio
http://davidventuri.com/portfolio#scroll
one of the portfolio on github
https://davidventuri.github.io/eda-project/
Tuesday, November 19, 2019
Friday, November 15, 2019
An interesting website where one can see timesquare, hollywood blvd live
And some other countries and areas.
https://www.earthcam.com/
time square
https://www.earthcam.com/usa/newyork/timessquare/?cam=tsnorth_hd
hollywood blvd
https://www.earthcam.com/usa/california/losangeles/hollywoodblvd/?cam=hollywoodblvd
https://www.earthcam.com/
time square
https://www.earthcam.com/usa/newyork/timessquare/?cam=tsnorth_hd
hollywood blvd
https://www.earthcam.com/usa/california/losangeles/hollywoodblvd/?cam=hollywoodblvd
Monday, October 28, 2019
Tried to use skype on ubuntu 14.04, and did not work, here is how I fixed it
I use ubuntu 14.04. And I tried to use skype there. not working.
There was a skype there. And I could not sign in. I used microsoft account to log in.
I uninstalled that skype.
And I downloaded a skype rpm from here
https://www.skype.com/en/get-skype/
And I installed the rpm:
sudo apt-get install alien dpkg-dev debhelper build-essential
sudo alien packagename.rpm
sudo dpkg -i packagename.deb
it finally works.
There was a skype there. And I could not sign in. I used microsoft account to log in.
I uninstalled that skype.
And I downloaded a skype rpm from here
https://www.skype.com/en/get-skype/
And I installed the rpm:
sudo apt-get install alien dpkg-dev debhelper build-essential
sudo alien packagename.rpm
sudo dpkg -i packagename.deb
it finally works.
Monday, October 21, 2019
List of English forums etc
twitter, reddit, 4chan, quora, github, tumblr, snapchat, telegram, line, facebook
Tuesday, September 10, 2019
Sunday, September 8, 2019
reverse a string in python
a="imdelda"
a1=list(a)
for i in range(len(a1)/2):
tmp=a1[i]
a1[i]=a1[len(a1)-i-1]
a1[len(a1)-i-1]=tmp
a2="".join(a1)
print(a2)
a1=list(a)
for i in range(len(a1)/2):
tmp=a1[i]
a1[i]=a1[len(a1)-i-1]
a1[len(a1)-i-1]=tmp
a2="".join(a1)
print(a2)
Monday, September 2, 2019
how to improve the coding efficiency
How to improve the efficiency of your scripts? This problem may take one some years to accomplish.
Currently, here are some videos of programming on youtube hours long, for example :
https://www.youtube.com/watch?v=PJlAnR3asGQ&t=18011s
they can help ppl to learn scripts from beginning
And there are some books on coding efficiency:
https://www.amazon.com/Effective-Python-Specific-Software-Development/dp/0134034287
https://www.amazon.com/Effective-Specific-Improve-Programs-Designs/dp/0321334876
But to improve the efficiency of your coding, one needs to study on github etc constantly. But github only shows some portion of the scripts in the world. A lot companies, they use bitbucket to store the scripts internally. The scripts there are not public.
I saw some people's scripts, very efficient. I will post some here.
Currently, here are some videos of programming on youtube hours long, for example :
https://www.youtube.com/watch?v=PJlAnR3asGQ&t=18011s
they can help ppl to learn scripts from beginning
And there are some books on coding efficiency:
https://www.amazon.com/Effective-Python-Specific-Software-Development/dp/0134034287
https://www.amazon.com/Effective-Specific-Improve-Programs-Designs/dp/0321334876
But to improve the efficiency of your coding, one needs to study on github etc constantly. But github only shows some portion of the scripts in the world. A lot companies, they use bitbucket to store the scripts internally. The scripts there are not public.
I saw some people's scripts, very efficient. I will post some here.
deep learning , what it is
deep learning is the technique of machine learning for ai. it uses neural networks etc.
here is a tutorial for it on r-bloggers.com
https://www.r-bloggers.com/step-by-step-tutorial-deep-learning-with-tensorflow-in-r/
here is a video for it:
https://livevideo.manning.com/module/52_1_1/deep-learning-with-r-in-motion/getting-started/welcome-to-the-video-series?utm_source=rstudio&utm_medium=partner_website&utm_campaign=livevideo_deeplearningwithrinmotion&utm_content=unit1_rstudio
and a r-bloggers.com post for it:
https://www.r-bloggers.com/getting-started-with-deep-learning-in-r/
here is a tutorial for it on r-bloggers.com
https://www.r-bloggers.com/step-by-step-tutorial-deep-learning-with-tensorflow-in-r/
here is a video for it:
https://livevideo.manning.com/module/52_1_1/deep-learning-with-r-in-motion/getting-started/welcome-to-the-video-series?utm_source=rstudio&utm_medium=partner_website&utm_campaign=livevideo_deeplearningwithrinmotion&utm_content=unit1_rstudio
and a r-bloggers.com post for it:
https://www.r-bloggers.com/getting-started-with-deep-learning-in-r/
r-bloggers.com
r-bloggers.com is a comprehensive website for statistics and r programming. If you want to learn things about statistics and r programming , you can search the subject you want to study and " r-bloggers.com" on google, mostly you will find out what you want to learn.
Thursday, August 29, 2019
Friday, August 16, 2019
read in data to R, and check if any missing values in the data
code to read in the data into R:
data1<-read.csv("data1.csv", stringAsFactors=FALSE)
view(data1)
a line of code to check if any missing values in the data:
length(which(!complete.cases(data1))
will give the value 0, if there is no missing values in the data
data1<-read.csv("data1.csv", stringAsFactors=FALSE)
view(data1)
a line of code to check if any missing values in the data:
length(which(!complete.cases(data1))
will give the value 0, if there is no missing values in the data
Sunday, August 11, 2019
twitter tweets sentiment analysis
twitter tweets sentiment analysis using naive bayes classifier
https://towardsdatascience.com/creating-the-twitter-sentiment-analysis-program-in-python-with-naive-bayes-classification-672e5589a7ed
Saturday, August 10, 2019
Friday, August 9, 2019
predictive modeling and the accuracy
https://en.wikipedia.org/wiki/Predictive_modelling
Possible fundamental limitations of predictive models based on data fitting[edit]
1) History cannot always accurately predict the future. Using relations derived from historical data to predict the future implicitly assumes there are certain lasting conditions or constants in a complex system. This almost always leads to some imprecision when the system involves people.
2) The issue of unknown unknowns. In all data collection, the collector first defines the set of variables for which data is collected. However, no matter how extensive the collector considers his/her selection of the variables, there is always the possibility of new variables that have not been considered or even defined, yet are critical to the outcome.
3) Adversarial defeat of an algorithm. After an algorithm becomes an accepted standard of measurement, it can be taken advantage of by people who understand the algorithm and have the incentive to fool or manipulate the outcome. This is what happened to the CDO rating described above. The CDO dealers actively fulfilled the rating agencies' input to reach an AAA or super-AAA on the CDO they were issuing, by cleverly manipulating variables that were "unknown" to the rating agencies' "sophisticated" models.
Wednesday, August 7, 2019
building classifier using naive bayes algorithm
building classifier using naive bayes algorithm
https://www.machinelearningplus.com/predictive-modeling/how-naive-bayes-algorithm-works-with-example-and-full-code/
https://www.machinelearningplus.com/predictive-modeling/how-naive-bayes-algorithm-works-with-example-and-full-code/
Tuesday, August 6, 2019
overleaf is a good website for latex
overleaf is a good website for online latex editing
overleaf.com
overleaf.com
Tuesday, July 9, 2019
underfitting and overfitting , n and p
Overfitting refers to a model that models the training data too well.
Underfitting refers to a model that can neither model the training data nor generalize to new data.
We have p parameters and n sample.
over fitting results from trying to estimate too many parameters from too small a sample, when p>n
if we remove one feature, we will decrease the degree of overfitting .
Underfitting refers to a model that can neither model the training data nor generalize to new data.
We have p parameters and n sample.
over fitting results from trying to estimate too many parameters from too small a sample, when p>n
if we remove one feature, we will decrease the degree of overfitting .
ECS/EKS container services , docker, airflow, snowflake database
ECS/EKS container services
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
Docker is a software platform for building applications based on containers — small and lightweight execution environments that make shared use of the operating system kernel but otherwise run in isolation from one another. While containers as a concept have been around for some time, Docker, an open source project launched in 2013, helped popularize the technology, and has helped drive the trend towards containerization and microservices in software development that has come to be known as cloud-native development.
Docker is a software platform that allows you to build, test, and deploy applications quickly. Docker packages software into standardized units called containers that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.
containers amazon offers
https://aws.amazon.com/containers/services/
I used EMR before
https://aws.amazon.com/emr/
a tutorial for docker
https://www.youtube.com/watch?v=K6WER0oI-qs
airflow: Airflow is a platform to programmatically author, schedule and monitor workflows.
a short summary
https://blog.insightdatascience.com/airflow-101-start-automating-your-batch-workflows-with-ease-8e7d35387f94
https://airflow.apache.org/project.html
how to install
https://airflow.apache.org/installation.html
video tutorial
https://www.youtube.com/watch?v=AHMm1wfGuHE
snowflake database: cloud based data warehouse
https://docs.snowflake.net/manuals/user-guide/getting-started-tutorial.html
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
Docker is a software platform for building applications based on containers — small and lightweight execution environments that make shared use of the operating system kernel but otherwise run in isolation from one another. While containers as a concept have been around for some time, Docker, an open source project launched in 2013, helped popularize the technology, and has helped drive the trend towards containerization and microservices in software development that has come to be known as cloud-native development.
Docker is a software platform that allows you to build, test, and deploy applications quickly. Docker packages software into standardized units called containers that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.
containers amazon offers
https://aws.amazon.com/containers/services/
I used EMR before
https://aws.amazon.com/emr/
a tutorial for docker
https://www.youtube.com/watch?v=K6WER0oI-qs
airflow: Airflow is a platform to programmatically author, schedule and monitor workflows.
a short summary
https://blog.insightdatascience.com/airflow-101-start-automating-your-batch-workflows-with-ease-8e7d35387f94
https://airflow.apache.org/project.html
how to install
https://airflow.apache.org/installation.html
video tutorial
https://www.youtube.com/watch?v=AHMm1wfGuHE
snowflake database: cloud based data warehouse
https://docs.snowflake.net/manuals/user-guide/getting-started-tutorial.html
Monday, July 8, 2019
7 tips to learn programming faster
https://www.codingdojo.com/blog/7-tips-learn-programming-faster
#3 will land you a job
1. learn by doing
2. grasps the fundamentals for long-term benefit
3.code by hand, using a pen and write on paper
4.ask for help
5.seek out more online resources
6. don't just read the sample code, tinker with it
7. take breaks when debugging
How to run a python script on atom
how to run a python script on atom :
mac shift + command + I
mac command +I
linux/windows : SHIFT + Ctrl + B
mac shift + command + I
mac command +I
linux/windows : SHIFT + Ctrl + B
A thesis from a Phd and what he has done since graduation
Here is a thesis from a Phd
https://lib.dr.iastate.edu/etd/13537/
The title of the thesis is
A balanced approach to the multi-class imbalance problem
And after graduation, the author did not work for companies, he opens his consulting firm instead
Omni Analytics Group
https://omnianalytics.io/
https://lib.dr.iastate.edu/etd/13537/
The title of the thesis is
A balanced approach to the multi-class imbalance problem
And after graduation, the author did not work for companies, he opens his consulting firm instead
Omni Analytics Group
https://omnianalytics.io/
one good sql tutorial and some good machine learning channels
one good sql tutorial
https://www.youtube.com/watch?v=nWeW3sCmD2k
some good machine learning channels
https://www.youtube.com/user/joshstarmer/videos
https://www.youtube.com/user/mathtutordvd/videos
https://www.youtube.com/channel/UCq8JbYayUHvKvjimPV0TCqQ/videos
https://www.youtube.com/user/edurekaIN/videos
https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ/videos
Monday, June 24, 2019
Understanding self and __init__ method in python Class.
https://micropyramid.com/blog/understand-self-and-__init__-method-in-python-class/
self :
self represents the instance of the class. By using the "self" keyword we can access the attributesand methods of the class in python.
__init__ :
"__init__" is a reseved method in python classes. It is known as a constructor in object oriented concepts. This method called when an object is created from the class and it allow the class to initialize the attributes of a class.
Friday, June 14, 2019
Monday, June 10, 2019
l1 and l2 regularization
l1 and l2 regularization
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
https://www.linkedin.com/pulse/l1-l2-regularization-why-neededwhat-doeshow-helps-ravi-shankar
when p>> n, when we use OLS , we will have over fitting. to reduce overfitting, we use regularization, l1 and l2. l1 forces some parameters to be zero. l2 shrinks some of the parameters to be zeros, but it tries to keep all the parameters in the models.
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
https://www.linkedin.com/pulse/l1-l2-regularization-why-neededwhat-doeshow-helps-ravi-shankar
when p>> n, when we use OLS , we will have over fitting. to reduce overfitting, we use regularization, l1 and l2. l1 forces some parameters to be zero. l2 shrinks some of the parameters to be zeros, but it tries to keep all the parameters in the models.
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
Key Difference
- Ridge: It includes all (or none) of the features in the model. Thus, the major advantage of ridge regression is coefficient shrinkage and reducing model complexity.
- Lasso: Along with shrinking coefficients, lasso performs feature selection as well. (Remember the ‘selection‘ in the lasso full-form?) As we observed earlier, some of the coefficients become exactly zero, which is equivalent to the particular feature being excluded from the model.
Traditionally, techniques like stepwise regression were used to perform feature selection and make parsimonious models. But with advancements in Machine Learning, ridge and lasso regression provide very good alternatives as they give much better output, require fewer tuning parameters and can be automated to a large extend.
2. Typical Use Cases
- Ridge: It is majorly used to prevent overfitting. Since it includes all the features, it is not very useful in case of exorbitantly high #features, say in millions, as it will pose computational challenges.
- Lasso: Since it provides sparse solutions, it is generally the model of choice (or some variant of this concept) for modelling cases where the #features are in millions or more. In such a case, getting a sparse solution is of great computational advantage as the features with zero coefficients can simply be ignored.
Its not hard to see why the stepwise selection techniques become practically very cumbersome to implement in high dimensionality cases. Thus, lasso provides a significant advantage.
3. Presence of Highly Correlated Features
- Ridge: It generally works well even in presence of highly correlated features as it will include all of them in the model but the coefficients will be distributed among them depending on the correlation.
- Lasso: It arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero. Also, the chosen variable changes randomly with change in model parameters. This generally doesn’t work that well as compared to ridge regression.
Wednesday, June 5, 2019
How to Create Stunning Flowcharts in Microsoft Word
How to Create Stunning Flowcharts in Microsoft Word
https://www.youtube.com/watch?v=iiS7aAFI2Cs
https://www.youtube.com/watch?v=hjhJ3-jSBM8
https://www.youtube.com/watch?v=iiS7aAFI2Cs
https://www.youtube.com/watch?v=hjhJ3-jSBM8
Thursday, May 30, 2019
Classifier for imbalanced data
Classification for imbalanced data can be resolved using re-sampling method, like smote.
Here is an example and some sample scripts
https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18
This thesis presented a method for this problem
https://lib.dr.iastate.edu/cgi/viewcontent.cgi?referer=https://www.bing.com/&httpsredir=1&article=4544&context=etd
A balanced approach to the multi-class imbalance problem
R package , climm, climer
Here is an example and some sample scripts
https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18
This thesis presented a method for this problem
https://lib.dr.iastate.edu/cgi/viewcontent.cgi?referer=https://www.bing.com/&httpsredir=1&article=4544&context=etd
A balanced approach to the multi-class imbalance problem
R package , climm, climer
Tuesday, May 28, 2019
Monday, May 27, 2019
Data science blogs
- R for Data Science
https://r4ds.had.co.nz/introduction.html - A Complete Tutorial to learn Data Science in R from Scratch
https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/ - Yhat Blog
http://blog.yhat.com/ - A Complete Tutorial to Learn Data Science with Python from Scratch
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/ - R-bloggers
https://www.r-bloggers.com/ - Python Bloggers
http://www.pybloggers.com/ - Data Science Central
https://www.datasciencecentral.com/ - A data scientist's blog
https://machinelearningmastery.com/blog/ - Apache Spark Machine Learning Tutorial
https://mapr.com/blog/apache-spark-machine-learning-tutorial/ - Data Science 101
https://101.datascience.community/page/2/ - Win Vector Blog
http://www.win-vector.com/blog/ - Big data and data science review
bigdatadatascience.docx - infoq
https://www.infoq.com/ai-ml-data-eng - datatau
https://www.datatau.com/ - Lambda the Ultimate
http://lambda-the-ultimate.org/ - Simply Statistics
https://simplystatistics.org/ - Statistical Modeling, Casual Inference and Social Science
https://statmodeling.stat.columbia.edu/ - Flowing data
https://flowingdata.com/ - Data 36
https://data36.com/ - Kaggle Blog
http://blog.kaggle.com/ - Linear Digressions
https://lineardigressions.com/ - Towards Data Science
https://towardsdatascience.com/ - Seeing Theory
https://seeing-theory.brown.edu/ - Mode Blog
https://mode.com/blog/
Consumer analytics blogs
- Top 50 blogs on Consumer Analytics
some of the blogs no longer exist
https://www.ngdata.com/best-customer-analytics-blogs/ - How to Use Customer Behavior Data to Drive Revenue (Like Amazon, Netflix & Google)
https://www.pointillist.com/blog/customer-behavior-data/ - Using R for Customer Analytics
https://ds4ci.files.wordpress.com/2013/09/ciwr_2introandpracticals.pdf - Customer Analytics: Using Deep Learning With Keras To Predict Customer Churn
https://www.business-science.io/business/2017/11/28/customer_churn_analysis_keras.html - Marketing Analytics and Data Science
https://www.r-bloggers.com/marketing-analytics-and-data-science/ - Using R to predict if a customer will buy
https://www.masterdataanalysis.com/r/using-r-predict-customer-will-buy/ - Customer Segmentation using python
http://blog.yhat.com/posts/customer-segmentation-using-python.html - Using R for customer segmentation
https://ds4ci.files.wordpress.com/2013/09/user08_jimp_custseg_revnov08.pdf - Using r to analyze your customer data warehouse
https://www.bedrockdata.com/blog/using-r-to-analyze-your-customer-data-warehouse
Thursday, May 2, 2019
two data science blogs seems pretty good for data science
Win-Vector blog seems pretty good for data science
http://www.win-vector.com/blog/
Data Science Dojo blog
https://blog.datasciencedojo.com/
http://www.win-vector.com/blog/
Data Science Dojo blog
https://blog.datasciencedojo.com/
Tuesday, April 30, 2019
Consumer analytics articles
https://www.ngdata.com/best-customer-analytics-blogs/
https://ds4ci.files.wordpress.com/2013/09/ciwr_2introandpracticals.pdf
https://www.business-science.io/business/2017/11/28/customer_churn_analysis_keras.html
https://www.r-bloggers.com/marketing-analytics-and-data-science/
https://www.masterdataanalysis.com/r/using-r-predict-customer-will-buy/
http://blog.yhat.com/posts/customer-segmentation-using-python.html
https://ds4ci.files.wordpress.com/2013/09/user08_jimp_custseg_revnov08.pdf
https://www.bedrockdata.com/blog/using-r-to-analyze-your-customer-data-warehouse
https://legacy.gitbook.com/book/josepcurtodiaz/customer-analytics-with-r/details
Wednesday, April 24, 2019
Two articles about classifier
Metrics to evaluate machine learning algorithm
https://machinelearningmastery.com/metrics-evaluate-machine-learning-algorithms-python/
How to handle imbalanced data in classification
https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/
Thursday, April 4, 2019
One trick on big data analytics
I once worked on big data projects. I analyzed 5,000,000,000 rows of data each day. I used hadoop/hive. To analyze the data with some scripts took a long time. Sometimes when there were some errors with the scripts, the program would break, and I needed to start over. And it cost time. So sometimes it took relatively long time to get projects done.
So, when you have the problem, start with small samples of the data. Then the programs run faster. you will get the jobs done sooner. time saving.
So, when you have the problem, start with small samples of the data. Then the programs run faster. you will get the jobs done sooner. time saving.
Thursday, March 21, 2019
Feature engineering for machine learning
https://perso.limsi.fr/annlor/enseignement/ensiie/Feature_Engineering_for_Machine_Learning.pdf
feature engineering is an important topic in predictive modeling.
Subscribe to:
Posts (Atom)
looking for a man
I am a mid aged woman. I was born in 1980. I do not have any kid. no complicated dating before . I am looking for a man here for marriage...
-
I tried to commit script to bitbucket using sourcetree. I first cloned from bitbucket using SSH, and I got an error, "authentication ...
-
https://github.com/boto/boto3/issues/134 import boto3 import botocore client = boto3.client('s3') result = client.list_obje...
-
Previously, I wanted to install "script" on Atom to run PHP. And there was some problem, like the firewall. So I tried atom-runner...