ezoic

Monday, December 4, 2017

Python encoding and decoding

1. Unicode and UTF-8

Unicode is character set. UTF-8 is a way of decoding for unicode. And it includes UTF-16, UTF-32.we can find the unicode of a character by searching the standard. The UTF-8 decode can be calculated by unicode.

2. str and unicode in python 2

str and unicode are two types.

str stores the bytes after encoding. When outputting, every byte is presented by 16 digits, starting with \x. Every chinese character has three  bytes.

str can be converted to unicode by decode() method.

example:

a.decode('utf-8')

unicode can be converted to str by encode() method.

example:

b.encode('utf-8')



 ENCODE METHODS

1. head of a script

#-*- coding: utf-8 -*-

or

# coding=utf-8

if we do not set up, the default one is ascii. And you will get error.

2. sys.stdin.encoding and sys.stdout.encoding

3. sys.getdefaultencoding()

CHARACTER SET CONCATENATION

when concatenating str and unicode, the output is unicode. first decode str to unicode by decode(), then concatenating. Sometimes we will have errors.

READ IN FILE AND JSON

read in a file and we get str type, presented as 16-digits starting by "\x".

f=open('t.txt')
a=f.read()

a:

'{"hello":"\xe5\x92\xa9"}\n'

can be decoded to unicode by json

json.loads(a)

{u'hello':u'\u54a9'}

OUTPUT

str can be outputed to files. unicode needs to be encoded to str by encode()

COMPUTE md5

md5 computation requires unicode to be encoded to str first.

hashlib.md5(a).hexdigest()

OUTPUT TO STDOUT

when outputing to stdout , default encoding is sys.stdout.encoding, it depends on the default setting of the system.

import sys
sys.stdout.encoding

'UTF-8'

in the environment of zh_CN.GB2312, default is not UTF-8, we can not output normally.

COMMAND PARAMETER  READIN

the parameters gotten by sys.argv and argparse are all str type, presented as 16-digits starting with \x. We can get encoding type by sys.stdin.encoding , and then convert to unicode.

#! /usr/bin/evn python

# coding =utf-8

import sys

print repr(sys.argv[1])

print sys.stdin.encoding

print repr(sys.argv[1].decode(sys.stdin.encoding))

python hello.py "哇嘿嘿”

'\xe5\x93\x87\xe5\x98\xbf\xe5\x98\xbf'
UTF-8
u'\u54c7\u563f\u563f'


CHARACTER START WITH \u CONVERTED  TO UNICODE

b='\u54a9'

b

'\\u54a9'


convert b to chinese

1. unicode-escape

unicode(b,'unicode-escape')

u'\u54a9'

or

b.decode('unicode-escape')

u'\u54a9'

2. eval concatenation

eval('u"'+b.replace('"',r'\"')+'"'

u'\u54a9'










Python package structures and how to build a package automatically

A python package usually has the following structure:

funniest/
    funniest/
        __init__.py
        command_line.py
        tests/
            __init__.py
            test_joke.py
            test_command_line.py
    MANIFEST.in
    README.rst
    setup.py
    .gitignore

http://python-packaging.readthedocs.io/en/latest/everything.html


To build one package automatically, one can use cookiecutter

https://www.pydanny.com/cookie-project-templates-made-easy.html
https://github.com.audreyr/cookiecutter-pypackage.git




Thursday, October 12, 2017

Some sourcetree, stash tutorial

Atlassian Stash tutorial:



https://www.youtube.com/watch?v=QWS2yXehCNk


Sourcetree tutorial:


https://www.youtube.com/watch?v=PkVMgh1q33Q



More stash:

https://www.youtube.com/watch?v=y9rTRk-5uSQ



If your web.py read in some garbage character, try this method

I once used the python web.py to set up a web service. But when reading in non English words, it will show garbage characters.

Here is a way to fix that problem.

add the two lines at the top of the script:

#!/usr/bin/env python# -*- coding:utf-8 -*-

Add the lines to the beginning import of the script:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')


Add the header in the get function like below. 




It will solve the problem. 

Python's zip, map and lambda


Python's zip, map and lambda:

https://bradmontgomery.net/blog/pythons-zip-map-and-lambda/


Thursday, September 21, 2017

Tokenize in python

Tokenize in python.

https://docs.python.org/3/library/tokenize.html


https://www.techopedia.com/definition/13698/tokenization


Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens. Tokens can be individual words, phrases or even whole sentences. In the process of tokenization, some characters like punctuation marks are discarded. The tokens become the input for another process like parsing and text mining.
Tokenization is used in computer science, where it plays a large part in the process of lexical analysis.

Lexical analysis:

https://en.wikipedia.org/wiki/Lexical_analysis



Serialize and deserialize in programming

Serialize and deserialize in programming:


https://en.wikipedia.org/wiki/Serialization


In computer science , in the context of data storage, serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment).When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.
This process of serializing an object is also called marshalling an object.The opposite operation, extracting a data structure from a series of bytes, is deserialization (which is also called unmarshalling).



In python, we use pickle to do the serialization and deserialization. 

https://docs.python.org/3/library/pickle.html

There is a .pkl file.

https://fileinfo.com/extension/pkl

In php, there is a serialize function.




How to run php code in windows.

http://editrocket.com/articles/php_windows.html

results of the program in php:
a:3:{i:0;s:5:"Lorem";i:1;s:5:"Ipsum";i:2;s:5:"Dolor";}

serialize in c++, geeksforgeeks.

http://www.geeksforgeeks.org/serialize-deserialize-binary-tree/






Wednesday, September 20, 2017

How to improve programming skills

1. read, write as many lines of code as possible.

www.leetcode.com

http://poj.org/problem?id=2000


www.lintcode.com

http://www.jiuzhang.com/solutions/


http://codeforces.com/




www.github.com


https://projecteuler.net/


2. Do projects

3. study pattern design, objective oriented programming

https://www.youtube.com/watch?v=mym5m-GKG0Q

https://www.youtube.com/watch?v=j9arNRRoPe8

https://www.youtube.com/watch?v=0jjNjXcYmAU

https://www.youtube.com/watch?v=v9ejT8FO-7I

https://www.cp.eng.chula.ac.th/~wiwat/SDD/DP.pdf





4. learn as many as languages as you can , perl, python, java, js, scala, c, sh



places to learn python/pandas



1. Stackoverflow
2. Books: Python for data analysis, Learning the Pandas Library
3. planetpython.org
4. data skeptic podcast,
dataskeptic.com


encode, decode, python

  • Encoded. This is what's stored on disk. To Python, it's a bunch of 0's and 1's that you might treat like ASCII, but it could be anything -- binary data, a JPEG image, whatever. In Python 2.x, this is called a "string" variable. In Python 3.x, it's more accurately called a "bytes" variable.
  • Decoded. This is a string of actual characters. They could be encoded to 8-bit ASCII strings, or it could be encoded to 32-bit Chinese characters. But until it's time to convert to an encoded variable, it's just a Unicode string of characters.




https://stackoverflow.com/questions/5228925/python-string-comparison-problems-with-special-unicode-characters

Friday, September 15, 2017

How to use douban v2 api

make a request sample:


https://api.douban.com/v2/movie/subject/1764796


you will get a json.

Other usage/example:

https://developers.douban.com/wiki/?title=movie_v2






Apai-io php example

Apai-io php example


https://github.com/Exeu/apai-io

<?php
namespace Acme\Demo;

use ApaiIO\Configuration\GenericConfiguration;
use ApaiIO\Operations\Search;
use ApaiIO\ApaiIO;

$conf = new GenericConfiguration();
$client = new \GuzzleHttp\Client();
$request = new \ApaiIO\Request\GuzzleRequest($client);

$conf
    ->setCountry('com')
    ->setAccessKey(AWS_API_KEY)
    ->setSecretKey(AWS_API_SECRET_KEY)
    ->setAssociateTag(AWS_ASSOCIATE_TAG)
    ->setRequest($request);
$apaiIO = new ApaiIO($conf);

$search = new Search();
$search->setCategory('DVD');
$search->setActor('Bruce Willis');
$search->setKeywords('Die Hard');

$formattedResponse = $apaiIO->runOperation($search);

var_dump($formattedResponse);

Monday, July 31, 2017

Amazon python, get number of reviews

import requests

nreviews_re = {'com': re.compile('\d[\d,]+(?= customer review)'), 
               'co.uk':re.compile('\d[\d,]+(?= customer review)'),
               'de': re.compile('\d[\d\.]+(?= Kundenrezens\w\w)')}
no_reviews_re = {'com': re.compile('no customer reviews'), 
                 'co.uk':re.compile('no customer reviews'),
                 'de': re.compile('Noch keine Kundenrezensionen')}

def get_number_of_reviews(asin, country='com'):                                 
    url = 'http://www.amazon.{country}/product-reviews/{asin}'.format(country=country, asin=asin)
    html = requests.get(url).text
    try:
        return int(re.compile('\D').sub('',nreviews_re[country].search(html).group(0)))
    except:
        if no_reviews_re[country].search(html):
            return 0
        else:
            return None  # to distinguish from 0, and handle more cases if necessary


Saturday, July 29, 2017

How to build and use python package.

 e.

distutils is a python standard package. It helps people to package, etc.

For example, we have a directory, and we have three files there, foo.py, bar.py, setup.py

And setup.py has the following content:

from distutils.core import setup
setup(
    name='fooBar',
    version='1.0',
    author='Will',
    author_email='wilber@sh.com',
    url='http://www.cnblogs.com/wilber2013/',
    py_modules=['foo', 'bar'],
)


 We  run python setup.py sdist in that directory. And we will have a package named fooBar-1.0.zip.

And we can unzip the file, and "python setup.py install", and we can use "foo" , "bar" the two packages. 

need to use  "sudo chown -R $USER /usr/local/lib/python2.7"           when run python setup.py install, otherwise permission denied. 


List of python APIs

Here is a list of Python APIs:

http://www.pythonforbeginners.com/api/list-of-python-apis


I mainly focus on two of them:

Amazon Python API.

https://github.com/boto/boto

In fact it is boto package.

Here is a doc on Boto package, how to use it.

https://media.readthedocs.org/pdf/boto/latest/boto.pdf

Here is a video on boto:

https://www.youtube.com/watch?v=arhgSrqy1mM

The Linkedin API is

https://github.com/ozgur/python-linkedin

It is some package created from some user, and can complete some detailed application.










Wednesday, July 26, 2017

How to create a python package.

How to create a python package.

http://pythoncentral.io/how-to-create-a-python-package/


How to package python code

How to package python code

https://python-packaging.readthedocs.io/en/latest/


python console application example

python console application example:

https://stackoverflow.com/questions/9340391/python-interactive-shell-type-application


import readline
import shlex

print('Enter a command to do something, e.g. `create name price`.')
print('To get help, enter `help`.')

while True:
    cmd, *args = shlex.split(input('> '))

    if cmd=='exit':
        break

    elif cmd=='help':
        print('...')

    elif cmd=='create':
        name, cost = args
        cost = int(cost)
        # ...
        print('Created "{}", cost ${}'.format(name, cost))

    # ...

    else:
        print('Unknown command: {}'.format(cmd))




Python imdb, python rotten tomatoes

python imdb:

https://stackoverflow.com/questions/36513597/how-do-you-request-on-imdb-api-in-python


movies = {}
import json, requests
baseurl = "http://omdbapi.com/?t=" #only submitting the title parameter
with open("movies.txt", "r") as fin:
     for line in fin:
         movieTitle = line.rstrip("\n") # get rid of newline characters
         response = requests.get(url + movieTitle)
         if response.status_code == 200:
              movies[movieTitle] = json.loads(response.text)
         else:
              raise ValueError("Bad request!")
print movies['scream']

import json, requests
url = "http://www.omdbapi.com/?t=scream"
response = requests.get(url)
python_dictionary_values = json.loads(response.text)

python rotten tomatoes:

https://www.blog.pythonlibrary.org/2013/11/06/python-101-how-to-grab-data-from-rottentomatoes/

https://pypi.python.org/pypi/rtsimple








looking for a man

 I am a mid aged woman. I live in southern california.  I was born in 1980. I do not have any kid. no compliacted dating.  I am looking for ...