classification, how to do it
http://inseaddataanalytics.github.io/INSEADAnalytics/CourseSessions/Sessions67/ClassificationAnalysisReading.html
I wrote about the solutions to some problems I found from programming and data analytics. They may help you on your work. Thank you.
ezoic
Friday, June 14, 2019
Monday, June 10, 2019
l1 and l2 regularization
l1 and l2 regularization
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
https://www.linkedin.com/pulse/l1-l2-regularization-why-neededwhat-doeshow-helps-ravi-shankar
when p>> n, when we use OLS , we will have over fitting. to reduce overfitting, we use regularization, l1 and l2. l1 forces some parameters to be zero. l2 shrinks some of the parameters to be zeros, but it tries to keep all the parameters in the models.
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
https://www.linkedin.com/pulse/l1-l2-regularization-why-neededwhat-doeshow-helps-ravi-shankar
when p>> n, when we use OLS , we will have over fitting. to reduce overfitting, we use regularization, l1 and l2. l1 forces some parameters to be zero. l2 shrinks some of the parameters to be zeros, but it tries to keep all the parameters in the models.
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/
Key Difference
- Ridge: It includes all (or none) of the features in the model. Thus, the major advantage of ridge regression is coefficient shrinkage and reducing model complexity.
- Lasso: Along with shrinking coefficients, lasso performs feature selection as well. (Remember the ‘selection‘ in the lasso full-form?) As we observed earlier, some of the coefficients become exactly zero, which is equivalent to the particular feature being excluded from the model.
Traditionally, techniques like stepwise regression were used to perform feature selection and make parsimonious models. But with advancements in Machine Learning, ridge and lasso regression provide very good alternatives as they give much better output, require fewer tuning parameters and can be automated to a large extend.
2. Typical Use Cases
- Ridge: It is majorly used to prevent overfitting. Since it includes all the features, it is not very useful in case of exorbitantly high #features, say in millions, as it will pose computational challenges.
- Lasso: Since it provides sparse solutions, it is generally the model of choice (or some variant of this concept) for modelling cases where the #features are in millions or more. In such a case, getting a sparse solution is of great computational advantage as the features with zero coefficients can simply be ignored.
Its not hard to see why the stepwise selection techniques become practically very cumbersome to implement in high dimensionality cases. Thus, lasso provides a significant advantage.
3. Presence of Highly Correlated Features
- Ridge: It generally works well even in presence of highly correlated features as it will include all of them in the model but the coefficients will be distributed among them depending on the correlation.
- Lasso: It arbitrarily selects any one feature among the highly correlated ones and reduced the coefficients of the rest to zero. Also, the chosen variable changes randomly with change in model parameters. This generally doesn’t work that well as compared to ridge regression.
Wednesday, June 5, 2019
How to Create Stunning Flowcharts in Microsoft Word
How to Create Stunning Flowcharts in Microsoft Word
https://www.youtube.com/watch?v=iiS7aAFI2Cs
https://www.youtube.com/watch?v=hjhJ3-jSBM8
https://www.youtube.com/watch?v=iiS7aAFI2Cs
https://www.youtube.com/watch?v=hjhJ3-jSBM8
Thursday, May 30, 2019
Classifier for imbalanced data
Classification for imbalanced data can be resolved using re-sampling method, like smote.
Here is an example and some sample scripts
https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18
This thesis presented a method for this problem
https://lib.dr.iastate.edu/cgi/viewcontent.cgi?referer=https://www.bing.com/&httpsredir=1&article=4544&context=etd
A balanced approach to the multi-class imbalance problem
R package , climm, climer
Here is an example and some sample scripts
https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18
This thesis presented a method for this problem
https://lib.dr.iastate.edu/cgi/viewcontent.cgi?referer=https://www.bing.com/&httpsredir=1&article=4544&context=etd
A balanced approach to the multi-class imbalance problem
R package , climm, climer
Tuesday, May 28, 2019
Monday, May 27, 2019
Data science blogs
- R for Data Science
https://r4ds.had.co.nz/introduction.html - A Complete Tutorial to learn Data Science in R from Scratch
https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/ - Yhat Blog
http://blog.yhat.com/ - A Complete Tutorial to Learn Data Science with Python from Scratch
https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/ - R-bloggers
https://www.r-bloggers.com/ - Python Bloggers
http://www.pybloggers.com/ - Data Science Central
https://www.datasciencecentral.com/ - A data scientist's blog
https://machinelearningmastery.com/blog/ - Apache Spark Machine Learning Tutorial
https://mapr.com/blog/apache-spark-machine-learning-tutorial/ - Data Science 101
https://101.datascience.community/page/2/ - Win Vector Blog
http://www.win-vector.com/blog/ - Big data and data science review
bigdatadatascience.docx - infoq
https://www.infoq.com/ai-ml-data-eng - datatau
https://www.datatau.com/ - Lambda the Ultimate
http://lambda-the-ultimate.org/ - Simply Statistics
https://simplystatistics.org/ - Statistical Modeling, Casual Inference and Social Science
https://statmodeling.stat.columbia.edu/ - Flowing data
https://flowingdata.com/ - Data 36
https://data36.com/ - Kaggle Blog
http://blog.kaggle.com/ - Linear Digressions
https://lineardigressions.com/ - Towards Data Science
https://towardsdatascience.com/ - Seeing Theory
https://seeing-theory.brown.edu/ - Mode Blog
https://mode.com/blog/
Subscribe to:
Posts (Atom)
looking for a man
I am a mid aged woman. I was born in 1980. I do not have any kid. no complicated dating before . I am looking for a man here for marriage...
-
Previously, I wanted to install "script" on Atom to run PHP. And there was some problem, like the firewall. So I tried atom-runner...
-
I tried to commit script to bitbucket using sourcetree. I first cloned from bitbucket using SSH, and I got an error, "authentication ...
-
https://github.com/boto/boto3/issues/134 import boto3 import botocore client = boto3.client('s3') result = client.list_obje...