ezoic

Thursday, December 19, 2019

recommend a book

a book named on intelligence by jeff hawkins.

This is a website the author building for the book


http://www.onintelligence.org/

Monday, October 28, 2019

Tried to use skype on ubuntu 14.04, and did not work, here is how I fixed it

I use ubuntu 14.04. And I tried to use skype there. not working.
There was a skype there. And I could not sign in. I used microsoft account to log in.
I uninstalled that skype.
And I downloaded a skype rpm from here
https://www.skype.com/en/get-skype/

And I installed the rpm:

sudo apt-get install alien dpkg-dev debhelper build-essential

sudo alien packagename.rpm

sudo dpkg -i packagename.deb

it finally works. 

Monday, October 21, 2019

Sunday, September 8, 2019

reverse a string in python

a="imdelda"
a1=list(a)
for i in range(len(a1)/2):
    tmp=a1[i]
    a1[i]=a1[len(a1)-i-1]
    a1[len(a1)-i-1]=tmp
a2="".join(a1)
print(a2)

Monday, September 2, 2019

how to improve the coding efficiency

How to improve the efficiency of your scripts? This problem may take one some years to accomplish.

Currently, here are some videos of programming on youtube hours long, for example :

https://www.youtube.com/watch?v=PJlAnR3asGQ&t=18011s

they can help ppl to learn scripts from beginning

And there are some books on coding efficiency:

https://www.amazon.com/Effective-Python-Specific-Software-Development/dp/0134034287

https://www.amazon.com/Effective-Specific-Improve-Programs-Designs/dp/0321334876



But to improve the efficiency of your coding, one needs to study on github etc constantly.  But github only shows some portion of the scripts in the world. A lot companies, they use bitbucket to store the scripts internally. The scripts there are not public.

I saw some people's scripts, very efficient. I will post some here.


deep learning , what it is

deep learning is the technique of machine learning for ai. it uses neural networks etc.

here is a tutorial for it on r-bloggers.com

https://www.r-bloggers.com/step-by-step-tutorial-deep-learning-with-tensorflow-in-r/

here is a video for it:

https://livevideo.manning.com/module/52_1_1/deep-learning-with-r-in-motion/getting-started/welcome-to-the-video-series?utm_source=rstudio&utm_medium=partner_website&utm_campaign=livevideo_deeplearningwithrinmotion&utm_content=unit1_rstudio

and a r-bloggers.com post for it:

https://www.r-bloggers.com/getting-started-with-deep-learning-in-r/


r-bloggers.com

r-bloggers.com is a comprehensive website for statistics and r programming. If you want to learn things about statistics and r programming , you can search the subject you want to study and " r-bloggers.com" on google, mostly  you will find out what you want to learn.

Friday, August 16, 2019

read in data to R, and check if any missing values in the data

code to read in the data into R:

data1<-read.csv("data1.csv", stringAsFactors=FALSE)
view(data1)

a line of code to check if any missing values in the data:
length(which(!complete.cases(data1))

will give the value 0, if there is no missing values in the data


Friday, August 9, 2019

interesting vlog for python


https://www.youtube.com/user/joejamesusa/videos

he is awesome

pandas and its difference from numpy and scipy


https://www.youtube.com/watch?v=e60ItwlZTKM



scipy

https://www.youtube.com/watch?v=MtdLd2lrvag


numpy

https://www.youtube.com/watch?v=8Mpc9ukltVA


predictive modeling and the accuracy

https://en.wikipedia.org/wiki/Predictive_modelling



Possible fundamental limitations of predictive models based on data fitting[edit]

1) History cannot always accurately predict the future. Using relations derived from historical data to predict the future implicitly assumes there are certain lasting conditions or constants in a complex system. This almost always leads to some imprecision when the system involves people.
2) The issue of unknown unknowns. In all data collection, the collector first defines the set of variables for which data is collected. However, no matter how extensive the collector considers his/her selection of the variables, there is always the possibility of new variables that have not been considered or even defined, yet are critical to the outcome.
3) Adversarial defeat of an algorithm. After an algorithm becomes an accepted standard of measurement, it can be taken advantage of by people who understand the algorithm and have the incentive to fool or manipulate the outcome. This is what happened to the CDO rating described above. The CDO dealers actively fulfilled the rating agencies' input to reach an AAA or super-AAA on the CDO they were issuing, by cleverly manipulating variables that were "unknown" to the rating agencies' "sophisticated" models.

Tuesday, July 9, 2019

underfitting and overfitting , n and p

Overfitting refers to a model that models the training data too well.

Underfitting refers to a model that can neither model the training data nor generalize to new data.

We have p parameters and n sample.
over fitting results from trying to estimate too many parameters from too small a sample, when p>n

if we remove one feature, we will decrease the degree of overfitting .

ECS/EKS container services , docker, airflow, snowflake database

ECS/EKS container services

A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.

Docker is a software platform for building applications based on containers — small and lightweight execution environments that make shared use of the operating system kernel but otherwise run in isolation from one another. While containers as a concept have been around for some time, Docker, an open source project launched in 2013, helped popularize the technology, and has helped drive the trend towards containerization and microservices in software development that has come to be known as cloud-native development.

Docker is a software platform that allows you to build, test, and deploy applications quickly. Docker packages software into standardized units called containers that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.

containers amazon offers

https://aws.amazon.com/containers/services/

I used EMR before

https://aws.amazon.com/emr/


a tutorial for docker

https://www.youtube.com/watch?v=K6WER0oI-qs



airflow: Airflow is a platform to programmatically author, schedule and monitor workflows.

a short summary

https://blog.insightdatascience.com/airflow-101-start-automating-your-batch-workflows-with-ease-8e7d35387f94


https://airflow.apache.org/project.html

how to install

https://airflow.apache.org/installation.html

video tutorial

https://www.youtube.com/watch?v=AHMm1wfGuHE






snowflake database: cloud based data warehouse

https://docs.snowflake.net/manuals/user-guide/getting-started-tutorial.html


Monday, July 8, 2019

7 tips to learn programming faster


https://www.codingdojo.com/blog/7-tips-learn-programming-faster

#3 will land you a job

1. learn by doing
2. grasps the fundamentals for long-term benefit
3.code by hand, using a pen and write on paper
4.ask for help
5.seek out more online resources
6. don't just read the sample code, tinker with it
7. take breaks when debugging

How to run a python script on atom

how to run a python script on atom :

mac  shift + command + I
mac command +I

linux/windows : SHIFT + Ctrl + B

A thesis from a Phd and what he has done since graduation

Here is a thesis from a Phd

https://lib.dr.iastate.edu/etd/13537/

The title of the thesis is

A balanced approach to the multi-class imbalance problem


And after graduation, the author did not work for companies, he opens his consulting firm instead

Omni Analytics Group

https://omnianalytics.io/


one good sql tutorial and some good machine learning channels


one good sql tutorial

https://www.youtube.com/watch?v=nWeW3sCmD2k


some good machine learning channels

https://www.youtube.com/user/joshstarmer/videos


https://www.youtube.com/user/mathtutordvd/videos

https://www.youtube.com/channel/UCq8JbYayUHvKvjimPV0TCqQ/videos

https://www.youtube.com/user/edurekaIN/videos

https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ/videos


Monday, June 24, 2019

Understanding self and __init__ method in python Class.



https://micropyramid.com/blog/understand-self-and-__init__-method-in-python-class/


self :
self represents the instance of the class. By using the "self" keyword we can access the attributesand methods of the class in python.

__init__ :
"__init__" is a reseved method in python classes. It is known as a constructor in object oriented concepts. This method called when an object is created from the class and it allow the class to initialize the attributes of a class.

Monday, June 10, 2019

l1 and l2 regularization

l1 and l2 regularization
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c


https://www.linkedin.com/pulse/l1-l2-regularization-why-neededwhat-doeshow-helps-ravi-shankar

when p>> n, when we use OLS , we will have over fitting. to reduce overfitting, we use regularization, l1 and l2. l1 forces some parameters to be zero.  l2 shrinks some of the parameters to be zeros, but it tries to keep all the parameters in the models.


https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/


Key Difference

  • Ridge: It includes all (or none) of the features in the model. Thus, the major advantage of ridge regression is coefficient shrinkage and reducing model complexity.
  • Lasso: Along with shrinking coefficients, lasso performs feature selection as well. (Remember the ‘selection‘ in the lasso full-form?) As we observed earlier, some of the coefficients become exactly zero, which is equivalent to the particular feature being excluded from the model.
Traditionally, techniques like stepwise regression were used to perform feature selection and make parsimonious models. But with advancements in Machine Learning, ridge and lasso regression provide very good alternatives as they give much better output, require fewer tuning parameters and can be automated to a large extend.

2. Typical Use Cases

  • Ridge: It is majorly used to prevent overfitting. Since it includes all the features, it is not very useful in case of exorbitantly high #features, say in millions, as it will pose computational challenges.
  • Lasso: Since it provides sparse solutions, it is generally the model of choice (or some variant of this concept) for modelling cases where the #features are in millions or more. In such a case, getting a sparse solution is of great computational advantage as the features with zero coefficients can simply be ignored.
Its not hard to see why the stepwise selection techniques become practically very cumbersome to implement in high dimensionality cases. Thus, lasso provides a significant advantage.

3. Presence of Highly Correlated Features

  • Ridge: It generally works well even in presence of highly correlated features as it will include all of them in the model but the coefficients will be distributed among them depending on the correlation.
  • Lasso: It arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero. Also, the chosen variable changes randomly with change in model parameters. This generally doesn’t work that well as compared to ridge regression.



Thursday, May 30, 2019

Classifier for imbalanced data

Classification for imbalanced data can be resolved using re-sampling method, like smote.

Here is an example and some sample scripts

https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18

This thesis presented a method for this problem

https://lib.dr.iastate.edu/cgi/viewcontent.cgi?referer=https://www.bing.com/&httpsredir=1&article=4544&context=etd

A balanced approach to the multi-class imbalance problem

R package , climm, climer

Monday, May 27, 2019

Data science blogs

  1. R for Data Science
    https://r4ds.had.co.nz/introduction.html
  2. A Complete Tutorial to learn Data Science in R from Scratch
    https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/
  3. Yhat Blog
    http://blog.yhat.com/
  4. A Complete Tutorial to Learn Data Science with Python from Scratch
    https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/
  5. R-bloggers
    https://www.r-bloggers.com/
  6. Python Bloggers
    http://www.pybloggers.com/
  7. Data Science Central
    https://www.datasciencecentral.com/
  8. A data scientist's blog
    https://machinelearningmastery.com/blog/
  9. Apache Spark Machine Learning Tutorial
    https://mapr.com/blog/apache-spark-machine-learning-tutorial/
  10. Data Science 101
    https://101.datascience.community/page/2/
  11. Win Vector Blog
    http://www.win-vector.com/blog/
  12. Big data and data science review
    bigdatadatascience.docx
  13. infoq
    https://www.infoq.com/ai-ml-data-eng
  14. datatau
    https://www.datatau.com/
  15. Lambda the Ultimate
    http://lambda-the-ultimate.org/
  16. Simply Statistics
    https://simplystatistics.org/
  17. Statistical Modeling, Casual Inference and Social Science
    https://statmodeling.stat.columbia.edu/
  18. Flowing data
    https://flowingdata.com/
  19. Data 36
    https://data36.com/
  20. Kaggle Blog
    http://blog.kaggle.com/
  21. Linear Digressions
    https://lineardigressions.com/
  22. Towards Data Science
    https://towardsdatascience.com/
  23. Seeing Theory
    https://seeing-theory.brown.edu/
  24. Mode Blog
    https://mode.com/blog/

Consumer analytics blogs


  1. Top 50 blogs on Consumer Analytics
    some of the blogs no longer exist
    https://www.ngdata.com/best-customer-analytics-blogs/
  2. How to Use Customer Behavior Data to Drive Revenue (Like Amazon, Netflix & Google)
    https://www.pointillist.com/blog/customer-behavior-data/
  3. Using R for Customer Analytics
    https://ds4ci.files.wordpress.com/2013/09/ciwr_2introandpracticals.pdf
  4. Customer Analytics: Using Deep Learning With Keras To Predict Customer Churn
    https://www.business-science.io/business/2017/11/28/customer_churn_analysis_keras.html
  5. Marketing Analytics and Data Science
    https://www.r-bloggers.com/marketing-analytics-and-data-science/
  6. Using R to predict if a customer will buy
    https://www.masterdataanalysis.com/r/using-r-predict-customer-will-buy/
  7. Customer Segmentation using python
    http://blog.yhat.com/posts/customer-segmentation-using-python.html
  8. Using R for customer segmentation
    https://ds4ci.files.wordpress.com/2013/09/user08_jimp_custseg_revnov08.pdf
  9. Using r to analyze your customer data warehouse
    https://www.bedrockdata.com/blog/using-r-to-analyze-your-customer-data-warehouse

Thursday, April 4, 2019

One trick on big data analytics

I once worked on big data projects. I analyzed 5,000,000,000 rows of data each day. I used hadoop/hive. To analyze the data with some scripts took a long time. Sometimes when there were some errors with the scripts, the program would break, and I needed to start over. And it cost time. So sometimes it took relatively long time to get projects done.

So, when you have the problem, start with small samples of the data. Then the programs run faster. you will get the jobs done sooner. time saving.

Thursday, March 21, 2019

Feature engineering for machine learning



https://perso.limsi.fr/annlor/enseignement/ensiie/Feature_Engineering_for_Machine_Learning.pdf

feature engineering is an important topic in predictive modeling.

looking for a man

 I am a mid aged woman. I live in southern california.  I was born in 1980. I do not have any kid. no compliacted dating.  I am looking for ...