I wrote about the solutions to some problems I found from programming and data analytics. They may help you on your work. Thank you.
ezoic
Wednesday, October 24, 2018
Thursday, September 27, 2018
Sunday, September 23, 2018
Monday, September 17, 2018
Monday, September 10, 2018
write the data out to a file , python script
output the data to a file , sample python script:
filename="filename11"+'.txt'File=open("C:/path/to/file/"+filename,'w') for item in a: File.write(item+"\n") File.close()
Friday, September 7, 2018
Thursday, September 6, 2018
Monday, September 3, 2018
Saturday, September 1, 2018
Wednesday, August 29, 2018
Wednesday, August 22, 2018
Thursday, August 16, 2018
Tuesday, August 14, 2018
An understandable article about python decorator
Here is an understandable article about python decorator:
https://www.programiz.com/python-programming/decorator
https://www.programiz.com/python-programming/decorator
Friday, August 10, 2018
Sample code for automatically log into sftp , and load a file to sftp
sample code
#!/bin/sh
HOST='HOST'
USER='USER'
PASSWD='PASS'
sshpass -p $PASSWD sftp $USER@$HOST << EOF
put file1.txt
EOF
echo "end ftp"
exit
Monday, July 30, 2018
Sunday, July 29, 2018
differences between t test and z test
distribution sample size variance known or unknown
t test normally distributed can be small unknown
z test no requirements for normality b/c clt is large known
Saturday, July 28, 2018
Odds and odds ratio in statistics
https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-interpret-odds-ratios-in-logistic-regression/
The odds of success are defined as the ratio of the probability of success over the probability of failure
confidence interval
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Confidence_Intervals_for_the_Odds_Ratio_in_Logistic_Regression_with_One_Binary_X.pdf
The odds of success are defined as the ratio of the probability of success over the probability of failure
confidence interval
https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/PASS/Confidence_Intervals_for_the_Odds_Ratio_in_Logistic_Regression_with_One_Binary_X.pdf
Assumptions of linear models
https://www.theanalysisfactor.com/assumptions-of-linear-models/
- The residuals are independent
- The residuals are normally distributed
- The residuals have a mean of 0 at all values of X
- The residuals have constant variance
Apply, sapply, tapply differences in R
https://www.guru99.com/r-apply-sapply-tapply.html
Apply: on matrice
apply(mat,1,var)
1: row, margin
2:column, margin
lapply: apply on a vector, return a list, no margin
movies <- c("SPYDERMAN","BATMAN","VERTIGO","CHINATOWN")
movies_lower <-lapply(movies, tolower)
[[1]]
[1] "spyderman"
[[2]]
[1]"batman"
...
sapply does the same job as lapply, but return a vector
tapply computes a measure ( min, max, median etc) or a function for each factor variable in a vector.
data(iris)
tapply(iris$Sepal.Width, iris$Species, median)
Friday, July 27, 2018
Monday, July 23, 2018
Saturday, July 21, 2018
Tuesday, July 17, 2018
Thursday, July 12, 2018
Wednesday, July 11, 2018
A good youtube math and machine learning channel
Here is a good youtube math and machine learning channel.
https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw
And it has a video series, machine learning Neural network for recognizing numbers
https://www.youtube.com/watch?v=aircAruvnKk
https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw
And it has a video series, machine learning Neural network for recognizing numbers
https://www.youtube.com/watch?v=aircAruvnKk
Tuesday, July 10, 2018
Randomly generate user agents and ip in python
1. randomly generate user agent
installation:
pip install fake_useragent
usage:
from fake_useragent import UserAgent
ua=UserAgent()
ua.random
got a random user agent
2. randomly generate ip
'.'.join('%s'%random.randint(0, 255) for i in range(4))
installation:
pip install fake_useragent
usage:
from fake_useragent import UserAgent
ua=UserAgent()
ua.random
got a random user agent
2. randomly generate ip
'.'.join('%s'%random.randint(0, 255) for i in range(4))
Sunday, July 8, 2018
Thursday, July 5, 2018
How to send emails on linux.
I use ubuntu system. How to find out which system you use, command is "uname -a".
I tried to send out email on ubuntu.
I tried on command line first.
I first installed postfix:
sudo apt-get install postfix
Then I tried the command:
echo "test message" | mailx -s "test subject" XXXX@xxx.com
And I got the following:
I tried to send out email on ubuntu.
I tried on command line first.
I first installed postfix:
sudo apt-get install postfix
Then I tried the command:
echo "test message" | mailx -s "test subject" XXXX@xxx.com
And I got the following:
The program 'mailx' is currently not installed. You can install it by typing:
sudo apt-get install mailutils
So I installed mailx.
I got the message.
And I put it in a linux shell script. Got it done.
Monday, July 2, 2018
Tuesday, June 26, 2018
stderr, stdout, stdin , how to output to linux file
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-3.html
There are 3 file descriptors, stdin, stdout and stderr (std=standard).
Basically you can:
- redirect stdout to a file
- redirect stderr to a file
- redirect stdout to a stderr
- redirect stderr to a stdout
- redirect stderr and stdout to a file
- redirect stderr and stdout to stdout
- redirect stderr and stdout to stderr
I stderr to a file.
scrapy crawl XXX 2> nohup.txt
or
scrapy crawl XXX 2>> nohup.txt
">>" means append.
Tuesday, June 12, 2018
Monday, June 11, 2018
Scrapy Spider, one url, multiple request sample code
class PabhSpider(CrawlSpider):
name = 'pabh'
allowed_domains = ['xxx']
def start_requests(self):
url = 'http://xxx'
num1 = '01'
formdata = {
"depart":num,
"years":'2014'
}
return [FormRequest(url=url,formdata=formdata,method='get',callback=self.parse)]
def parse(self, response):
item = XXXItem()
hxs = Selector(response)
item['bh'] = hxs.xpath('/html/body/form/p/font/select[3]/option/@value').extract()
yield item
num = ['02','03','04','05','06','07','08','09','10','11','12','13','14','21','31','40','51','61']
for x in num:
url = 'http://xxx'
formdata={
"depart":x,
"years":'2014'
}
yield FormRequest(url=url,formdata=formdata,method='get',callback=self.parse)
Wednesday, June 6, 2018
how to get rid of garbage characters when opening a txt file with excel and the txt file has east Asian characters
Sometimes when we open a txt file with excel and the txt file has east Asian characters, we will see some garbage characters. How to get rid of them.
To open a txt file with excel. First open an empty excel, then click File=>Open=> Go the the file you want to open => click open.
Some ones say if we open it with option Windows(ANSI) we will get rid of garbage characters. But I tried my file with Windows(ANSI), did not get rid of garbage characters.
So I tried some other options, I tried Unicode (UTF-8) and got rid of garbage characters.
To open a txt file with excel. First open an empty excel, then click File=>Open=> Go the the file you want to open => click open.
Some ones say if we open it with option Windows(ANSI) we will get rid of garbage characters. But I tried my file with Windows(ANSI), did not get rid of garbage characters.
So I tried some other options, I tried Unicode (UTF-8) and got rid of garbage characters.
Friday, May 18, 2018
Wednesday, May 16, 2018
some python code to plot subplots in python
import matplotlib.pyplot as plt
import numpy as np
# Simple data to display in various forms
x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)
plt.close('all')
# Just a figure and one subplot
f, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('Simple plot')
# Two subplots, the axes array is 1-d
f, axarr = plt.subplots(2, sharex=True)
axarr[0].plot(x, y)
axarr[0].set_title('Sharing X axis')
axarr[1].scatter(x, y)
# Two subplots, unpack the axes array immediately
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.plot(x, y)
ax1.set_title('Sharing Y axis')
ax2.scatter(x, y)
# Three subplots sharing both x/y axes
f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True, sharey=True)
ax1.plot(x, y)
ax1.set_title('Sharing both axes')
ax2.scatter(x, y)
ax3.scatter(x, 2 * y ** 2 - 1, color='r')
# Fine-tune figure; make subplots close to each other and hide
x ticks for
# all but bottom plot.
f.subplots_adjust(hspace=0)
plt.setp([a.get_xticklabels() for a in f.axes[:-1]], visible=False)
# row and column sharing
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex='col', sharey='row')
ax1.plot(x, y)
ax1.set_title('Sharing x per column, y per
row')
ax2.scatter(x, y)
ax3.scatter(x, 2 * y ** 2 - 1, color='r')
ax4.plot(x, 2 * y ** 2 - 1, color='r')
# Four axes, returned as a 2-d array
Thursday, May 10, 2018
Scrapy linux cron job not work, how I make it work
I tried to automate a scrapy job using cron on linux. But it did not work. I searched and found the solution.
First use "which scrapy " to find where the scrapy is. For my machine the scrapy is:
/home/ubuntu/anaconda2/bin/scrapy
First use "which scrapy " to find where the scrapy is. For my machine the scrapy is:
/home/ubuntu/anaconda2/bin/scrapy
Then in the shell script, write:
cd /path/to/spider
nohup /home/ubuntu/anaconda2/bin/scrapy crawl quotes >> log.txt &
It resolved the problem. Or can write:
cd /path/to/spider
PATH=$PATH:/home/ubuntu/anaconda2/bin
export PATH
nohup /home/ubuntu/anaconda2/bin/scrapy crawl quotes >> log.txt &
Tuesday, May 8, 2018
Compare Unix date and time
Compare Unix date and time.
To get Unix time, the command is:
current time:
now=`date +"%T"`
we will get a time like "20:55:01"
now=`date +"%H%M%S"`
we will get a time like "205501"
If we want to compare times, we can not compare the times in the format "%H:%M:%S", we can only compare them in the format "%H%M%S". Otherwise we will get an error Illegal number: 20:59:22
To get Unix date, the command is:
date1=`date +"%m/%d/%Y %H:%M:%S"`
We will get a date in like "5/8/2018 20:55:01"
If we want to get timestamp, we will use:
date2=`date +"%s"`
we will get a unix timestamp.
we can compare dates by its unix timestamp. it seems we can not compare two dates like "5/8/2018 20:55:01". Otherwise we will get an error: Illegal number: 05/08/2018 20:59:22.
To get Unix time, the command is:
current time:
now=`date +"%T"`
we will get a time like "20:55:01"
now=`date +"%H%M%S"`
we will get a time like "205501"
If we want to compare times, we can not compare the times in the format "%H:%M:%S", we can only compare them in the format "%H%M%S". Otherwise we will get an error Illegal number: 20:59:22
To get Unix date, the command is:
date1=`date +"%m/%d/%Y %H:%M:%S"`
We will get a date in like "5/8/2018 20:55:01"
If we want to get timestamp, we will use:
date2=`date +"%s"`
we will get a unix timestamp.
we can compare dates by its unix timestamp. it seems we can not compare two dates like "5/8/2018 20:55:01". Otherwise we will get an error: Illegal number: 05/08/2018 20:59:22.
Thursday, May 3, 2018
Linux Cron, linux job scheduler
Linux Cron, linux job scheduler
https://www.youtube.com/watch?v=4Icg3MYZZqI
https://awc.com.my/uploadnew/5ffbd639c5e6eccea359cb1453a02bed_Setting%20Up%20Cron%20Job%20Using%20crontab.pdf
to edit a cron file "crontab -e". Then you will run a job indefinitely.
Wednesday, May 2, 2018
implement decision tree from scratch using python
implement decision tree from scratch using python:
https://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/
This guy has a good blog:
https://machinelearningmastery.com/blog/
Monday, April 23, 2018
Code for drawing plot in python
I scraped some webpage, get a dictionary of date versus number of mentions. Here is the code:
import matplotlib.pyplot as plt def sortdict(d): for key in sorted(d): yield d[key] counter1={'2018-02-01': 22, '2018-01-31': 19,
'2018-01-30': 10, '2018-01-29': 5, '2018-01-27': 4,
'2018-01-28': 3, '2018-01-25': 3, '2018-01-23': 3, '2018-01-26': 3,
'2018-01-24': 2, '2018-01-01': 2, '2018-01-15': 2, '2018-01-12': 2,
'2018-01-09': 1, '2017-12-18': 1, '2017-12-26': 1, '2018-01-11': 1,
'2018-01-13': 1, '2017-11-28': 1, '2017-12-21': 1, '2017-12-22': 1,
'2017-02-09': 1, '2018-01-04': 1, '2017-01-17': 1, '2017-03-02': 1,
'2018-01-08': 1, '2017-12-09': 1, '2017-12-24': 1, '2017-02-20': 1,
'2018-01-14': 1, '2018-01-21': 1, '2017-12-28': 1, '2017-12-11': 1} x_labels = [] #create an empty list to store the labels#for key in counter.keys(): # x_labels.append(key) fig, ax = plt.subplots() lists = sorted(counter1.items()) # sorted by key, return a list of tuplesprint(lists) x, y = zip(*lists) print(x) print(y) # unpack a list of pairs into two tuplesplt.scatter(x, y) plt.xticks( range(len(x)), x, rotation=90,fontsize=5 ) plt.show()
results:
Subscribe to:
Posts (Atom)
looking for a man
I am a mid aged woman. I was born in 1980. I do not have any kid. no complicated dating before . I am looking for a man here for marriage...
-
I tried to commit script to bitbucket using sourcetree. I first cloned from bitbucket using SSH, and I got an error, "authentication ...
-
Previously, I wanted to install "script" on Atom to run PHP. And there was some problem, like the firewall. So I tried atom-runner...
-
https://github.com/boto/boto3/issues/134 import boto3 import botocore client = boto3.client('s3') result = client.list_obje...