Python:从入门到实践

什么是 Python?

 Python is a programming language that lets you work quickly and integrate systems more effectively.

Python and Mouse

(图片来源:Mac Smith 的个人作品,已获得授权)

为什么要有 Python?

胶水语言

 胶水语言,能够把用其他语言制作的各种模块(尤其是 C/C++)很轻松地联结在一起

脚本语言

 ABC 语言的一种继承

 缩短传统的 编写 - 编译 - 链接 - 运行edit-compile-link-run)过程

环境部署

Python 安装

Linux 基础环境

1
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y

Python 编译安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 在 python ftp 服务器中下载到 对应版本的 python
$ wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz

# 编译
$ tar -zxvf Python-3.6.8.tgz
$ cd /usr/local/Python-3.6.8
$ ./configure --prefix=/usr/local/python36
$ make
$ make install

$ ls /usr/local/python36/ -al
total 24
drwxr-xr-x 6 root root 4096 Jan 30 11:10 .
drwxr-xr-x 1 root root 4096 Jan 30 11:09 ..
drwxr-xr-x 2 root root 4096 Jan 30 11:10 bin
drwxr-xr-x 3 root root 4096 Jan 30 11:10 include
drwxr-xr-x 4 root root 4096 Jan 30 11:10 lib
drwxr-xr-x 3 root root 4096 Jan 30 11:10 share

覆盖旧版 Python

1
2
3
4
5
6
7
8
9
# 覆盖原来的 python6
$ which python
/usr/bin/python
$ /usr/local/python36/bin/python3.6 -V
Python 3.6.8
$ mv /usr/bin/python /usr/bin/python_old
$ ln -s /usr/local/python36/bin/python3.6 /usr/bin/python
$ python -V
Python 3.6.8

恢复 yum 中旧版 Python 的引用

1
2
3
4
5
6
7
# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum
# 第一行修改为 python2.6
#!/usr/bin/python2.6

$ yum --version | sed '2,$d'
3.2.29
如果要在 MacOS 环境中安装 Python 的话,可以从 Python 官网直接下载 pkg 格式的安装包,进行一键安装或者升级

Pip

安装

在线
1
2
3
4
5
$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)

# upgrade setup tools and pip
$ pip install --upgrade setuptools pip
离线
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# https://pypi.org/project/setuptools/#files 下载 setuptools-40.7.1.zip
$ unzip setuptools-40.7.1.zip
$ cd setuptools-40.7.1
$ python setup.py install

# https://pypi.org/project/pip/#files 下载 pip-19.0.1.tar.gz
$ tar zxvf pip-19.0.1.tar.gz
$ cd pip-19.0.1
$ python setup.py install

$ python -m pip -V
pip 18.1 from /usr/local/python36/lib/python3.6/site-packages/pip (python 3.6)

# 环境变量
$ vim ~/.bashrc
export PATH=$PATH:/usr/local/python36/bin
$ source ~/.bashrc
$ pip -V
pip 19.0.1 from /usr/local/python36/lib/python3.6/site-packages/pip-19.0.1-py3.6.egg/pip (python 3.6)

VirtualEnv

 这里我们以 Apache Superset 为例,更多相关内容,详见我的另一篇博客《Apache Superset 二次开发

解压安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ pip install virtualenv

# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ source venv/bin/activate
# 如果希望 virtualEnv 的隔离环境,能够访问系统全局的 site-packages 目录,可以增加 `--system-site-packages` 参数
# virtualenv -p /usr/local/bin/python --system-site-packages venv
# 另外,如果考虑到便于拷贝,使得 virtualEnv 中依赖的文件,都是复制进来的,而非软链接,则增加 `--always-copy` 参数
# virtualenv -p /usr/local/bin/python --always-copy venv

## 【Offline环境】安装 virtualenv
# 在 https://pypi.python.org/pypi/virtualenv#downloads 页面,下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install

$ virtualenv --version
15.1.0

部署上线

拷贝
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# rsync 替换 scp 可以确保软链接 也能被 cp
$ rsync -avuz -e ssh /home/superset/superset-0.15.4/ yuzhouwan@middle:/home/yuzhouwan/superset-0.15.4

//...
sent 142935894 bytes received 180102 bytes 3920986.19 bytes/sec
total size is 359739823 speedup is 2.51

# 在 本机 和 目标机器 的 Superset 目录下,校验文件数量
$ find | wc -l
10113

# 重复以上步骤,从跳板机 rsync 到线上机器
$ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ root@192.168.2.10:/home/superset/superset-0.15.4

# virtualenv 创建依赖的 python
$ rsync -avuz -e ssh /root/software yuzhouwan@middle:/home/yuzhouwan
$ rsync -avuz -e ssh /home/yuzhouwan/software root@druid-prd01:/root

$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep / # nessnary!!
$ python -V
Python 2.7.12
动态链接库
1
2
3
4
5
6
7
8
9
10
11
# 虽然软链接已经 rsync 过来了,但是 目标机器相关目录下,没有对应的 python 的动态链接库
$ file /root/superset/lib/python2.7/lib-dynload

/root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload`

# 需要和联网环境中,创建 virtualenv 时的 python 全局环境一致
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /

$ ls /usr/local/python27/lib/python2.7/lib-dynload -sail

VirtualEnvWrapper

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# VirtualEnv Wrapper 是 virtualenv 的扩展工具,可以方便的创建、删除、复制、切换不同的虚拟环境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
# 增加
export WORKON_HOME=~/virtualenv
source /usr/local/bin/virtualenvwrapper.sh

$ mkvirtualenv --python=/usr/bin/python superset
Running virtualenv with interpreter /usr/bin/python
New python executable in /root/virtualenv/superset/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [root@superset01 virtualenv]# deactivate

$ workon superset
(superset) [root@superset01 virtualenv]# lsvirtualenv -b
superset

基本语法

基本数据类型

int

int 类型的最大值
1
2
3
4
5
6
7
8
9
>>> import sys
>>> sys.maxsize
9223372036854775807

# 该值取决于你的操作系统位数
>>> pow(2, 63) - 1
9223372036854775807
>>> 1 << 64 - 1
9223372036854775808

float

inf 无穷大
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
>>> float('inf')
inf
>>> float('Inf')
inf
>>> float('inf') > 0
True
>>> float('inf') < 0
False
>>> float('inf') > 9999999999
True
>>> float('inf') > 9999999999999999999999
True
>>> float('-inf') < -9999999999999999999999
True
# inf、Inf、INF 都是可以表示无穷大的(infinity),这里没有大小写的规定
# inf 表示正无穷,而 -inf 表示为负无穷
>>> float('Inf') == float('inf') == -float('-inf') == -float('-Inf')
True

string

split
1
2
3
4
5
6
7
8
>>> 'a b c'.split(' ')
['a', 'b', 'c']

>>> 'a b c'.split(' ', 1)
['a', 'b c']

>>> 'a b c'.split(' ', 2)
['a', 'b', 'c']
类型转换
1
2
3
4
5
6
7
8
>>> int(1)
1

>>> float(1.0)
1.0

>>> b"yuzhouwan.com".decode("utf-8")
u'yuzhouwan.com'
占位符
1
2
3
4
5
>>> "speed: %skm/h" % 16.8
'speed: 16.8km/h'

>>> "(%s, %s)" % ("percent", 99.97)
'(percent, 99.97)'
遍历字符
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
>>> for i, c in enumerate('yuzhouwan.com'):
... print i, c
...
0 y
1 u
2 z
3 h
4 o
5 u
6 w
7 a
8 n
9 .
10 c
11 o
12 m

打印

不换行

1
2
>>> print("[]", end="")
[]>>>

居中

1
2
print("asdf2014".center(50, '-'))
print("yuzhouwan.com".center(50, '-'))
1
2
---------------------asdf2014---------------------
------------------yuzhouwan.com-------------------

OS

操作系统相关

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 获取操作系统特定的路径分割符(Windows: '\\';Linux/Unix: '/')
os.sep
# 字符串表示正在使用的平台(Windows: 'nt';Linux/Unix: 'posix')
os.name
# 字符串给出当前平台使用的行终止符(Windows: '\r\n';Linux: '\n';Mac: '\r')
os.linesep
# 函数用来运行 shell 命令
os.system(shell)

# 获得当前工作目录
os.getcwd()
# 获取 / 设置 环境变量
os.getenv(key) / os.putenv(key, value)
# 获得当前进程的 PID
os.getpid()

获取文件/路径信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 返回指定目录下的所有文件和目录名,v3.5 之后被替换为 scandir
os.listdir(path)
# 函数返回路径 path 的目录名和文件名
os.path.split(path)
# 判断路径是一个文件还是目录
os.path.isfile(path) / os.path.isdir(path)
# 判断路径是否是软链接
os.path.islink(path)
# 判断是否存在文件或目录
os.path.exists(path)
# 获得文件大小,如果 path 是目录返回 0L
os.path.getsize(path)
# 获得绝对路径
os.path.abspath(path)
# 规范 path 字符串形式
os.path.normpath(path)
# 分割文件名与目录
os.path.split(path)
# 分离文件名与扩展名
os.path.splitext(path)
# 连接目录与文件名或目录
os.path.join(path, file)
# 返回文件名
os.path.basename(path)
# 返回文件路径
os.path.dirname(path)

实际操作文件 / 路径

1
2
3
4
5
6
7
8
9
10
# 返回但前目录
os.curdir
# 改变工作目录到 path
os.chdir(path)
# 删除文件
os.remove(path)
# 删除目录
os.rmdir(path)
# 递归删除目录,删除 'foo/bar/baz',意味着依次删除 'foo/bar/baz' - 'foo/bar' - 'foo'
os.removedirs(path)

读取文件

1
2
3
4
5
6
def open_file(f = ""):
if not os.path.exists(f):
print("File not exists, path is %s!" % f)
return
with open(f, "r+", encoding = "utf8") as of:
return of.readlines()

执行 shell 命令

1
2
3
4
>>> import os
>>> exit_code = os.system("source ~/.bashrc")
>>> exit_code
0

JSON

加载与提取

1
2
3
4
5
6
>>> user = json.loads('{"name":"benedict","infos":{"age":0,"blog":"yuzhouwan.com"}}')
>>> user['name']
'benedict'

>>> user['infos']['blog']
'yuzhouwan.com'

与 YAML 格式互换

1
2
3
4
5
6
7
8
9
import json
import sys

import yaml

# json2yaml
sys.stdout.write(yaml.dump(json.load(sys.stdin)))
# yaml2json
sys.stdout.write(json.dumps(yaml.load(sys.stdin)))

加载 JSON 格式的文件

1
$ vim test.json
1
2
3
4
5
6
7
8
9
10
{
"a": [
1,
true
],
"b": [
0,
false
]
}
1
2
3
4
5
6
7
import json

with open("./test.json", "r+", encoding="utf8") as f:
content = json.load(f)
print(content)
if content['a'][1]:
print(content['b'][0])
1
2
{'a': [1, True], 'b': [0, False]}
0

集合

map

赋值 / 取值
1
2
3
4
5
6
7
>>> kv_map = {}
>>> kv_map["k"] = "v"
>>> kv_map
{'k': 'v'}

>>> kv_map["k"]
'v'
排序
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>>> costs = {"b": 2, "a": 1, "c": 3}
>>> costs
{'b': 2, 'c': 3, 'a': 1}

# 按照 Key 排序
>>> sorted(costs)
['a', 'b', 'c']
>>> sorted(costs.keys())
['a', 'b', 'c']
>>> dict(sorted(costs.items()))
{'a': 1, 'c': 3, 'b': 2}

# 按照 Value 排序
>>> sorted(costs.values())
[1, 2, 3]
>>> [ (k, costs[k]) for k in sorted(costs, key=costs.get, reverse=False) ]
[('a', 1), ('b', 2), ('c', 3)]
>>> sorted(costs.items(), key=lambda item: item[1], reverse=True)
[('c', 3), ('b', 2), ('a', 1)]
遍历
1
2
3
4
5
6
>>> for k, v in costs_sorted:
... print(k, v)
...
a 1
b 2
c 3
求和
1
2
>>> sum({"b": 2, "a": 1, "c": 3}.values())
6

list

单层 list
1
2
3
4
5
# range(start, stop, step)
# 参数三 如果是负数,则是倒序遍历
# 注意 [start, stop) 是前闭后开的
>>> [ _ for _ in range(3, 0, -1)]
[3, 2, 1]
双层 list
1
2
>>> [['' for _ in range(2)] for _ in range(2)]
[['', ''], ['', '']]
Join 双层 list
1
2
>>> '.'.join(str(x) for inner_arr in ['yuzhouwan', 'com'] for x in inner_arr)
'y.u.z.h.o.u.w.a.n.c.o.m'

set

1
2
3
4
5
6
7
8
>>> s = set()
>>> s.add(1)
>>> s.add(2)
>>> s.add(2)
>>> s.add(3)
>>> print(s)
set([1, 2, 3])
>>>

iter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
>>> blog = "yuzhouwan"
>>> iter_blog = iter(blog)
>>> print(next(iter_blog))
y
>>> print(next(iter_blog))
u
>>> print(next(iter_blog))
z
>>> print(next(iter_blog))
h
>>> print(next(iter_blog))
o
>>> print(next(iter_blog))
u
>>> print(next(iter_blog))
w
>>> print(next(iter_blog))
a
>>> print(next(iter_blog))
n
>>> print(next(iter_blog))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> print(next(iter_blog, None))
None

流程控制

if-else

1
2
3
4
>>> -1 if True else 0
-1
>>> -1 if False else 0
0

try-except

1
2
3
4
5
6
7
8
9
10
import sys

import requests

try:
resp = requests.get("https://yuzhouwan.com/")
print(resp.text)
except requests.HTTPError as error:
print("Cannot visit this url!", error)
sys.exit(1)

retry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import time

import requests

retry_max = 3
for i in range(retry_max):
try:
# 这里故意不小心地调用 json() 函数来解析 HTML 网页
requests.get("https://yuzhouwan.com/").json()
except Exception as e:
print("Exception:", e)
print("Retry (%s / %s)..." % (i + 1, retry_max))
time.sleep(1)
else:
break
1
2
3
4
5
6
Exception: Expecting value: line 1 column 1 (char 0)
Retry (1 / 3)...
Exception: Expecting value: line 1 column 1 (char 0)
Retry (2 / 3)...
Exception: Expecting value: line 1 column 1 (char 0)
Retry (3 / 3)...

算术运算

除以并返回商的整数值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>>> 1 // 1
1
>>> 2 // 1
2
>>> 3 // 1
3

>>> 1 // 2
0
>>> 2 // 2
1
>>> 3 // 2
1
>>> 4 // 2
2
>>> 5 // 2
2
>>> 6 // 2
3

精度控制

1
2
3
4
5
6
>>> round(111111 / 1024, 2)
108.51
>>> round(111111 / 1024, 0)
109.0
>>> int(round(111111 / 1024, 0))
109

逻辑运算

& vs and

1
2
3
4
5
6
7
8
9
10
11
>>> True & False
False

>>> True and False
False

>>> 10 > 1 & 10 < 1
True

>>> 10 > 1 and 10 < 1
False

位运算

位运算 运算符 运算规则
与运算 & A 与 B 值均为 1 时,结果才为 1,否则为 0
或运算 | A 或 B 值为 1 时,结果才为 1,否则为 0
异或运算 ^ A 与 B 不同为 0 或 1 时,结果才为 1,否则为 0
按位取反 ~ 取反二进制数,0 取 1,1 取 0

切片

获取列表的一部分

1
2
3
4
5
6
7
8
>>> [1, 2, 3][:1]
[1]
>>> [1, 2, 3][:2]
[1, 2]
>>> [1, 2, 3][:3]
[1, 2, 3]
>>> [1, 2, 3][1::]
[2, 3]

获取整个列表

1
2
>>> [1, 2, 3][:]
[1, 2, 3]

反转

1
2
3
4
5
6
# 反转列表
>>> [1, 2, 3][::-1]
[3, 2, 1]
# 反转字符串
>>> 'nawuohzuy'[::-1]
'yuzhouwan'

对列表的切片赋值

1
2
3
4
5
6
7
8
9
>>> l = list(range(10))
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> l[0:3] = [0, -1, -2]
>>> l
[0, -1, -2, 3, 4, 5, 6, 7, 8, 9]
>>> l[2::3] = [0, 0, 0]
>>> l
[0, -1, 0, 3, 4, 0, 6, 7, 0, 9]

Python 标准库

binascii

1
2
3
4
5
6
7
8
9
10
11
12
13
>>> import binascii

# 字符串转 16 进制
>>> binascii.b2a_hex(u"宇宙湾".encode("utf8"))
'e5ae87e5ae99e6b9be'

# 16 进制转字符串
>>> 'e5ae87e5ae99e6b9be'.decode('hex')
'\xe5\xae\x87\xe5\xae\x99\xe6\xb9\xbe'

# 打印
>>> print('e5ae87e5ae99e6b9be'.decode('hex'))
宇宙湾

datetime

1
2
3
4
5
import datetime

start = datetime.datetime.now()
end = datetime.datetime.now()
print((end - start).microseconds)

gettext

制作 PO 文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# 生成模板
$ python D:\apps\Python\Python35\Tools\i18n\pygettext.py
$ cat messages.pot
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2017-12-28 11:24+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=cp936\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"

# 修改 charset 为 UTF-8,以及其他基本信息
$ vim messages.pot
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2017 yuzhouwan.com
# Benedict Jin <benedictjin2016@gmail.com>, 2017.
#
msgid ""
msgstr ""
"Project-Id-Version: Yuzhouwan v1.0.2\n"
"POT-Creation-Date: 2017-12-28 11:24+0800\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: Benedict Jin <benedictjin2016@gmail.com>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"

# 使用 PoEdit 打开,并且保存为 po 文件(messages.pot - messages.po)
# 移动到 locale 目录下
$ mv messages.po locale/cn/LC_MESSAGES

# 增加两段翻译
$ vim messages.po
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2017 yuzhouwan.com
# Benedict Jin <benedictjin2016@gmail.com>, 2017.
#
msgid ""
msgstr ""
"Project-Id-Version: Yuzhouwan v1.0.2\n"
"POT-Creation-Date: 2017-12-28 11:39+0800\n"
"PO-Revision-Date: 2017-12-28 11:43+0800\n"
"Language-Team: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: pygettext.py 1.5\n"
"X-Generator: Poedit 2.0.1\n"
"Last-Translator: \n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
"Language: zh\n"

msgid "Hello, world!"
msgstr "世界,你好!"

msgid "yuzhouwan.com"
msgstr "宇宙湾"

编写 PO 程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import gettext
import os


def getLocStrings():
current_dir = os.path.dirname(os.path.realpath(__file__))
locale_dir = os.path.join(current_dir, "locale")
print("Locale directory:", locale_dir)
return gettext.translation('messages', locale_dir, ["zh_CN", "en-US"]).gettext


_ = getLocStrings()

print(_("Hello, world!"))
print(_("yuzhouwan.com"))
1
2
3
Locale directory: E:\Core Code\leetcode\i18n\locale
世界,你好!
宇宙湾

re

1
2
3
4
5
6
7
8
9
10
import re

# 匹配日期,输出 ['2023-1-1', '2023-11-1', '2023-12-03']
print(re.compile(r'(\d{4}-\d{1,2}-\d{1,2})').findall("[2023-1-1, 2023-11-1, 2023-12-03]"))

# 匹配局部字符,采用零宽后行断言+零宽先行断言,输出 yuzhouwan.com
print(re.compile(r'(?<=blog: https://)(.*)(?=/)').findall("blog: https://yuzhouwan.com/")[0])

# 匹配字符串的局部,并将匹配成功的部分删除掉,输出 https://yuzhouwan.com/
print(re.sub(r"posts/.*", '', "https://yuzhouwan.com/posts/43687/"))

time

1
2
3
4
5
6
7
8
9
10
11
12
13
import time

# 拿到当前时间的字符串
time.strftime('%Y-%m-%d %H:%M:%S', time.localtime())

# 拿到秒级的时间戳
int(time.mktime(time.strptime("2016-3-1 0:0:0", "%Y-%m-%d %H:%M:%S")))

# 获取当前时间戳
datetime.datetime.now().time()

# 休眠 1 秒
time.sleep(1)

Python 第三方库

文件处理

jproperties

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from jproperties import Properties

properties = """
user=yuzhouwan
github=asdf2014
"""
with open('./test.properties', 'w') as f:
f.write(properties)

configs = Properties()
with open('./test.properties', 'rb') as last_result:
configs.load(last_result)
user = str(configs.get('user').data)
github = str(configs.get('github').data)
print("user:", user)
print("github:", github)

pangu

1
$ vim blog.md
1
博客名是yuzhouwan,地址是`https://yuzhouwan.com`。
1
2
# 通过 pangu 可以自动给文本加上空格
$ cat blog.md | pangu
1
博客名是 yuzhouwan,地址是 `https://yuzhouwan.com`。

服务端

Flask

 完整的 Flask 服务端示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from flask import Flask, request

app = Flask(__name__)


@app.route('/', methods=['GET', 'POST'])
def welcome():
return "Welcome!"


@app.route('/blog', methods=['GET', 'POST'])
def blog():
return "<a href='https://yuzhouwan.com/'>https://yuzhouwan.com/<a>"


# http://127.0.0.1:65533/get?name=asdf2014
@app.route('/get', methods=['GET'])
def get():
return request.args.get("name")


# curl -X POST -H 'Content-Type: application/json' -d '{"name":"yuzhouwan"}' http://127.0.0.1:65533/post
@app.route('/post', methods=['POST'])
def post():
return request.json.get("name")


if __name__ == "__main__":
app.run(host="127.0.0.1", port=65533, debug=True)

数据分析核心库

Pandas

SciPy

NumPy

1
2
3
4
5
6
7
8
9
import numpy as np
arr = [2, 4, 6, 8, 10]
print np.mean(arr) # 平均值
print np.median(arr) # 中位数
print np.std(arr) # 标准差

6.0
6.0
2.82842712475

Tips: Full code is here.

统计学

Scrapy

StatsModels

NLP

NLTK

Gensim

机器学习

Scikit-learn

人工智能

TensorFlow

Theano

Keras

可视化

PyEcharts

Matplotlib

基本绘图
1
2
3
4
5
6
7
8
9
10
import matplotlib.pyplot as plt

byte_arr_for_point = [1, 2, 3, 4, 5]
byte_arr_for_multi_point = [1.1, 1.2, 1.3, 1.4, 1.5]
plt.plot([i for i in range(0, 5)], byte_arr_for_point, ls='-', label='line1')
plt.plot([i for i in range(0, 5)], byte_arr_for_multi_point, ls='--', label='line2')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

Python Matplotlib Example

(对 Matplotlib™ 输出界面的截图)
三角函数
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import matplotlib.pyplot as plt

plt.figure(1)
plt.figure(2)
plt.figure(3)

x = np.linspace(0, 6, 100)
for i in range(3):
plt.figure(1)
plt.plot(x, np.sin(i * x))
plt.figure(2)
plt.plot(x, np.cos(i * x))
plt.figure(3)
plt.plot(x, np.tan(i * x))

plt.show()
plt.close()

Python Matplotlib Sin
Python Matplotlib Cos
Python Matplotlib Tan

(对 Matplotlib™ 输出界面的截图)
积分
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from matplotlib.pyplot import *


def f(_):
return pow(np.e, (-1 * _))


a = 0
b = 1
x = np.linspace(0, 2)
y = f(x)

fig_size = 6
fig, ax = subplots(figsize=(fig_size, fig_size * 0.618))
plot(x, y, 'b', linewidth=1)
ylim()

x_ = np.linspace(a, b)
y_ = f(x_)
shadow = [(a, 0)] + list(zip(x_, y_)) + [(b, 0)]
poly = Polygon(shadow, facecolor='0.8', edgecolor='0.4')
ax.add_patch(poly)

text(1.4 * (a + b), 1.2,
r"$Cost(X, Y) = \int_{x_0}^{x_1} \int_{y_0}^{y_1} e^{-\lambda|x-y|}{\rm d}x{\rm d}y$",
horizontalalignment='center', fontsize=16)
figtext(0.95, 0.03, '$x$')
figtext(0.075, 0.82, '$f(x)$')

ax.set_xticks((a, b))
ax.set_xticklabels(['$x=%d$' % a, '$y=%d$' % b])
ax.set_yticks([f(a), f(b)])
title('')
show()

Python Matplotlib Cost

(对 Matplotlib™ 输出界面的截图)

Seaborn

Bokeh

Plotly

地图

GeoplotLib

MapBox

图像处理

PIL

爬虫

lxml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from lxml import etree
import requests


def get_ide_id(job_id, tag_name):
# view-source:http://historyserver-yuzhouwan:19888/jobhistory/conf/job_1010101010101_0101010
url = "http://historyserver-yuzhouwan:19888/jobhistory/conf/" + job_id
page = requests.get(url)
html = page.text
selector = etree.HTML(html)
tds = selector.xpath("//*[@id='conf']//tbody//tr//td//text()")
exist = False
for td in tds:
if tag_name in td:
exist = True
continue
if exist:
return td.strip()


print(get_ide_id("job_1010101010101_0101010", "hive.ide.job.id"))
如果未指定 requests#get 方法中的 timeout 参数,则默认会一直阻塞着

requests

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import requests

url = 'http://localhost:8082/druid/v2/sql'
auth = ('xxx', 'yyy')
headers = {'Content-Type': 'application/json'}
body = {
"query": "SELECT 1",
"resultFormat": "array",
"header": 'true',
"context": {
"sqlOuterLimit": 1
}
}
# 输出 1
print(int(requests.post(url, headers=headers, auth=auth, json=body).json()[1][0]))

科学分析工具

IPython Notebook

安装

Windows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 安装之前需要确定 pip 版本足够高,以及环境变量中加入了 %PYTHON_HOME%/Script
$ python -m pip install --upgrade pip

# 下载 Enthought Canopy 套件 (https://www.enthought.com/canopy-subscriptions/)
# 安装后,配置环境变量
$ PATH=D:\apps\Enthought\Canopy\App;%PATH%
# 安装
$ pip install "ipython[all]"
# 启动
$ mkdir ipython
$ cd ipython
$ ipython notebook
$ ipython notebook --pylab # pylab 模式
$ ipython notebook --pylab inline # Matplotlib 生成的图片嵌入网页内显示
MacOS
1
2
$ pip3 install notebook
$ jupyter notebook

配置

1
2
3
4
5
6
7
8
9
10
# 创建默认配置文件
$ jupyter notebook --generate-config
Writing default config to: C:\Users\BenedictJin\.jupyter\jupyter_notebook_config.py

# 修改默认工作区
$ vim ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.notebook_dir = 'F:\Github\_draft\ipython'

# 重启,验证
$ ipython notebook

格式转换

1
2
3
$ ipython c --to markdown --execute Basic.ipynb
# 或者使用 notedown 进行转换 (https://github.com/aaren/notedown)
$ pip install notedown

实用技巧

嵌入 Markdown

 iPython 创建好 .ipynb文件后,在 markdown 使用 <iframe>标签,就可以将完成嵌入

1
<iframe src="https://nbviewer.jupyter.org/github/asdf2014/yuzhouwan/blob/master/yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/ipython/Basic.ipynb" width="640" height="700" frameborder="0"></iframe>

 如此一来,可以将 matplotlib 画出的可视化图形,展示出来,而非仅仅一段 python 脚本,实际效果如下:

Tips: 如果你的博客也是全站 HTTPS 的话,则需要保证 iframe 里面加载的资源也是 https 的,否则 Chrome 会阻止混合内容的展示

帮助文档

 ? 单问号,可以展示出 对应函数、类、变量的文档,而使用 ?? 双问号,则可以将对应的源码展示出来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ a = 1
$ a?
Type: int
String form: 1
Docstring:
int(x=0) -> int or long
int(x, base=10) -> int or long

Convert a number or string to an integer, or return 0 if no arguments
are given. If x is floating point, the conversion truncates towards zero.
If x is outside the integer range, the function returns a long instead.

If x is not a number or if base is given, then x must be a string or
Unicode object representing an integer literal in the given base. The
literal can be preceded by '+' or '-' and be surrounded by whitespace.
The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to
interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4

$ a??
Type: int
String form: 1

# 另外,推荐使用 "shift + tab" 快捷键,可以更便捷地展示方法的详细描述
配置 iPython Notebook 支持Python 3
1
2
3
4
5
6
7
8
9
10
11
12
# 安装 python3
$ which python
/d/apps/Python/Python35/python

# 安装 iPython kernel
$ python -m pip install ipykernel
$ python -m ipykernel install --user

# 安装 notebook
$ which pip
/d/apps/Python/Python35/Scripts/pip
$ pip install notebook
切换 iPython 主题配色
1
2
$ pip install jupyterthemes
$ jt -l
1
2
3
4
5
6
7
8
9
10
Available Themes:
chesterish
grade3
gruvboxd
gruvboxl
monokai
oceans16
onedork
solarizedd
solarizedl
1
2
$ jt -t onedork
# 如果 iPython 还在运行中,则需要重启使之生效

Python 工程工具

VirtualEnv

实战技巧

设置 Proxy

通过环境变量

1
2
3
4
$ export http_proxy="http://127.0.0.1:1080"
$ export https_proxy="https://127.0.0.1:1080"
$ export socks5_proxy="socks5://127.0.0.1:1080"
# pip install --upgrade pip

通过程序调用

1
2
3
4
5
6
import socket

import socks

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9876)
socket.socket = socks.socksocket

Remote Debug

 我们需要达到的效果是,本地通过 断点直接对 Python 代码进行 Debug修改,并在 Ctrl+S 之后会通过 SFTP 直接上传远程服务器,待全部修改部署完成,自动通过 Flask 自动 reload 最新代码,并自动重启远程 Python 进程,在本地直接看到修改之后的线上效果。(这里我们以 Airbnb的 Superset 项目为基础来介绍)

PyCharm

Windows 开发机
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## local
# should shutdown local firewall firstly
$ cd .\JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg
$ easy_install pycharm-debug.egg
# 若运行使用的是 Python3,则需要 pycharm-debug-py3k.egg

# Run/Debug Configuration - SuperSet Remote Debug - 192.168.3.10(local ip) - 12345(port > 10000), will generate..
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)

# Path mappings
E:/Core Code/superset=/root/superset

# SFTP
# copy a project to a local directory.
# configure: tools - deployment, to upload this local copy to remote server
# config remote host

192.168.1.10 SFTP 192.168.1.10 22 /root/superset-0.15.4 root/****** UTF-8 # 脱敏
# Tools - Deployment - Options - Upload changed files automatically to the default server (On explicit save action (Ctrl+S))

# make deployment automatic: tools - deployment - "automatic upload"
# add remote interpreter: file - settings - python interpreters - "+" - "Remote.."

# Start Debug
Starting debug server at port 12345
Use the following code to connect to the debugger:
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
Waiting for process connection...
Connected to pydev debugger (build 162.1967.10)
Starting server with command: gunicorn -w 2 --timeout 60 -b 0.0.0.0:9097 --limit-request-line 0 --limit-request-field_size 0 superset:app
远程 Linux 运行环境
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## remote
$ cd /root/superset
$ source bin/activate
$ cd /root/superset/lib
# cp \JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg 到 lib 目录中
$ easy_install pycharm-debug.egg

# trouble shooting
>>> import pydevd

# restart
$ vim /root/superset/bin/superset

import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)

# After local debug, then start superset
$ mkdir logs
$ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log &


# Flask - Werkzeug debugger
2017-02-07 15:47:03,905:WARNING:werkzeug: * Debugger is active!
2017-02-07 15:47:03,905:INFO:werkzeug: * Debugger pin code: 330-765-812

$ pip install django-debug-toolbar

$ vim lib/python2.7/site-packages/pycharm-debug.egg/tests_pydevd_python/my_django_proj_17/my_django_proj_17/settings.py

INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'debug_toolbar', # add
'my_app',
)

# enable django
Setting - Language & Frameworks - Django - "Enable Django Support"

E:\Core Code\superset-0.15.4\bin\superset runserver -a '0.0.0.0' -p 9097

############################# PyDevd is so stiff! Let's Try Remote Python. #############################


# 配置 SFTP (同上)
# 配置 Remote Python

File - Settings - Project: superset-0.15.4 - Project Interpreter - show all(+) -

name: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
SSH Credentials
Host: 192.168.1.10 Port: 22
User name: root
Auth type: Password # 脱敏
Python interpreter path: /root/superset-0.15.4/bin/python
PyCharm helpers path: /root/superset-0.15.4/.pycharm_helpers

# 如果发现无法识别,可能是 python 缺少运行权限
$ cd /root/superset-0.15.4/bin && chmod 777 *
PyCharm 相关配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 配置 Python 运行项目

Run - Run/Debug Configurations(+) - Python -

Name: superset
Script: E:\Core Code\superset-0.15.4\bin\superset
Script parameters: runserver -d -p 9097
Environment Variables: VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
Python interpreter: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python) # 上面配置的 remote python
Working directory: E:\Core Code\superset-0.15.4\bin
Path mapping: E:/Core Code/superset-0.15.4=/root/superset-0.15.4


# 在用远程 python 进行 remote debug 之前,进入到 virtualenv 中
# 这里有可能找不到 activate 文件,可直接添加

File - Settings - Tools - Terminal - Shell path

/bin/bash --rcfile ~/.pycharmrc


$ vim '/e/Core Code/superset-0.15.4/.pycharmrc' # 本地工程增加 .pycharmrc

VIRTUAL_ENV="/root/superset-0.15.4" # 远程服务器中的 virtualenv 目录 (可以直接将 bin/activate 文件内容复制过来)
export VIRTUAL_ENV


# 远程服务器上多了两个进程
$ ps -ef | grep superset | grep -v grep

root 8638 10912 0 15:24 pts/1 00:00:00 bash -c cd /root/superset-0.15.4/bin; env "IDE_PROJECT_ROOTS"="/root/superset-0.15.4" "IPYTHONENABLE"="True" "PYTHONPATH"="/root/superset-0.15.4:/root/superset-0.15.4/.pycharm_helpers/pydev" "PYTHONUNBUFFERED"="1" "PYCHARM_HOSTED"="1" "VIRTUALENVWRAPPER_PYTHON"="E:\Core Code\superset-0.15.4\bin\python" "LIBRARY_ROOTS"="C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/544046706;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/964856790;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1532312494;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/python_stubs/250609560;D:/apps/JetBrains/PyCharm 2016.3.2/helpers/python-skeletons" "PYTHONDONTWRITEBYTECODE"="1" "JETBRAINS_REMOTE_RUN"="1" "PYTHONIOENCODING"="UTF-8" /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client '0.0.0.0' --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
root 8660 8638 11 15:24 pts/1 00:00:17 /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
root 8715 8660 28 15:24 pts/1 00:00:38 /root/superset-0.15.4/bin/python /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
完成
1
2
# 本地 windows 上访问
http://192.168.1.10:9097/login/

Visual Studio Code

 Not good for me! You can still try it if you are interested.

踩过的坑

Gunicorn 预开启了多个 Work 子进程,无法 Remote Debug
描述

 在本地 windows 开发机上,远程连接 linux 上运行在 virtualenv 里的 superset,发现可以 debug,但是 superset 里的 gunicorn 用的是 prefork 模型,开启了好多个 work 子进程

解决

a) 正常的 remote debug 来处理 —not ok

1
2
3
4
5
6
Connected to pydev debugger (build 162.1967.10)
[2017-02-06 18:13:22 +0000] [13609] [INFO] Starting gunicorn 19.6.0
[2017-02-06 18:13:22 +0000] [13609] [INFO] Listening at: http://0.0.0.0:9097 (13609)
[2017-02-06 18:13:22 +0000] [13609] [INFO] Using worker: sync
[2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13624)
[2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13623)

b) 所以用 “Django server” 替换 “Python Remote Debug” 来进行调试 —not ok

 配置的 Remote Python 明明是 /root/superset/bin/python,但是看到 报错信息里面,用的却是 /usr/local/bin/python

c) ipdb —not good

 将 gunicorn 进程切换到前台,在 命令行用 ipdb 进行 debug

d) 增加 -w 参数,控制 work 数量 —not ok

1
2
3
4
5
@manager.option(
'-w', '--workers', default=config.get("SUPERSET_WORKERS", 2), # default: 2
help="Number of gunicorn web server workers to fire up")

$ superset runserver -a 0.0.0.0 -p 9097 -w 0

e) 关闭 gunicorn —ok

 只有在压测时候,才需要开启 gunicorn
 superset runserver -d -p 9097

Trying to add breakpoint to file that does not exist
描述
1
pydev debugger: warning: trying to add breakpoint to file that does not exist: /root/superset/d:/apps/python27/lib/site-packages/gunicorn/arbiter.py
解决

a) 增加 python 中 site-packages 的 mapping 映射 —not good

1
E:/Core Code/superset=/root/superset;D:/apps/Python27=/root/superset/lib/python2.7

b) 修改 python 为 superset 项目中的 python,而不是本机的 python —ok

 同步到本机的 python 不是 python.exe —no
 使用 remote python —ok

Couldn’t obtain remote socket
描述
1
2
3
Error running superset
Can't run remote python interpreter: Couldn't obtain remote socket from output ('0.0.0.0', 52703), stderr /usr/local/bin/python: No module named virtualenvwrapper virtualenvwrapper.sh: There was a problem running the initialization hooks.
If Python could not import the module virtualenvwrapper.hook_loader, check that virtualenvwrapper has been installed for VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python and that PATH is set properly.
解决
1
2
3
4
5
6
7
# 查看 PATH 是否包含 venvWapper 的环境变量
$ echo $PATH

# 没有,则检查 ~/.bashrc,将其注释
# Source global definitions
# export WORKON_HOME=~/virtualenv
# source /usr/local/bin/virtualenvwrapper.sh

Vagrant

 Vagrant 是一款可以自动化虚拟机的 安装和配置流程的软件

下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Vagrant
https://www.vagrantup.com/downloads.html

# VirtualBox
https://www.virtualbox.org/wiki/Downloads
http://download.virtualbox.org/virtualbox/5.1.12/ # better
https://hashicorp-files.hashicorp.com/lucid32.box # not good
https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box # best

# 相关镜像
https://atlas.hashicorp.com/boxes/search
http://chef.github.io/bento/

# 安装完成之后,需要 cmd/pycharm/git dash 等等,最好重启电脑

使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
$ vagrant box add superset /f/软件库/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box

==> box: Box file was not detected as metadata. Adding it directly...
==> box: Adding box 'superset' (v0) for provider:
box: Unpacking necessary files from: file:///F:/%C8%ED%BC%FE%BF%E2/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box
box:
==> box: Successfully added box 'superset' (v0) for 'virtualbox'!

$ vagrant box list
superset (virtualbox, 0)

$ vagrant init

A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on `vagrantup.com` for more information on using Vagrant.

$ vim /e/vagrant/superset-0.15.4/Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrant.configure("2") do |config|
# config.vm.box = "superset"
# config.vm.box_check_update = false
# config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
# config.vm.synced_folder "./", "/root/superset-0.15.4"
#
# config.vm.network "public_network"
# config.vm.provider "virtualbox" do |vb|
# vb.gui = true
# vb.memory = "1024"
# end
# config.vm.provision "shell", inline: <<-SHELL
# apt-get update
# SHELL
# end

$ vagrant up --provide virtualbox

Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'superset'...
==> default: Matching MAC address for NAT networking...
==> default: Setting the name of the VM: superset-0154_default_1486969836220_44233
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
default: Adapter 2: hostonly
==> default: Forwarding ports...
default: 22 (guest) => 2122 (host) (adapter 1)
default: 80 (guest) => 6080 (host) (adapter 1)
default: 6079 (guest) => 6079 (host) (adapter 1)
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key

踩过的坑

Provider ‘virtualbox’ not found
描述
1
2
3
4
5
$ vagrant up
==> Provider 'virtualbox' not found. We'll automatically install it now...
The installation process will start below. Human interaction may be required at some points. If you're uncomfortable with automatically installing this provider, you can safely Ctrl-C this process and install it manually.
==> Downloading VirtualBox 5.0.10...
This may not be the latest version of VirtualBox, but it is a version that is known to work well. Over time, we'll update the version that is installed.
解决

 vagrant up --provider=virtualbox

Timed out while waiting for the machine to boot
描述
1
2
3
4
5
6
7
8
9
10
11
子目录或文件 -p 已经存在。
处理: -p 时出错。
子目录或文件 charms 已经存在。
处理: charms 时出错。
Timed out while waiting for the machine to boot. This means that Vagrant was unable to communicate with the guest machine within the configured ("config.vm.boot_timeout" value) time period.

If you look above, you should be able to see the error(s) that Vagrant had when attempting to connect to the machine. These errors are usually good hints as to what may be wrong.

If you're using a custom box, make sure that networking is properly working and you're able to connect to the machine. It is a common problem that networking isn't setup properly in these boxes. Verify that authentication configurations are also setup properly, as well.

If the box appears to be booting properly, you may want to increase the timeout ("config.vm.boot_timeout") value.'
解决

 升级 VirtualBox 到 5.1.12

default: stdin: is not a tty
描述

 default: stdin: is not a tty

解决
1
config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"

Unittest

-t 改变 顶级 package 路径

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
The discover sub-command has the following options:

-v, --verbose Verbose output
-s, --start-directory directory Directory to start discovery (. default)
-p, --pattern pattern Pattern to match test files (test*.py default)
-t, --top-level-directory directory Top level directory of project (defaults to start directory)

Name druid_tests
Script E:\Core Code\superset-0.15.4\code\tests\druid_tests.py
Environment variables VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
Python interpreter Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
Interpreter options -m tests.druid_tests
Working directory E:\Core Code\superset-0.15.4\code\
Path mappings E:/Core Code/superset-0.15.4=/root/superset-0.15.4

$ export SUPERSET_CONFIG=tests.superset_test_config
$ python -m tests.druid_tests discover . "druid_tests.py"

# 测试完成之后,需要 unset掉 SUPERSET_CONFIG
$ unset SUPERSET_CONFIG

参数解析

  • 编写示例
1
$ vim blog.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import argparse
import os

parser = argparse.ArgumentParser()
parser.add_argument('--ip', '-i', type=str, help='ip', default="localhost", required=False)
parser.add_argument('--port', '-p', type=int, help='port', default=80, required=False)
parser.add_argument('--debug', '-d', action='store_true', help='enable debug mode', default=False, required=False)
args = parser.parse_args()

ip = str(args.ip)
port = int(args.port)
debug = args.debug
if port == 80:
blog = "https://%s" % ip
else:
blog = "https://%s:%s" % (ip, port)
if debug:
print(blog)
os.system("open '%s'" % blog)
  • 查看帮助文档
1
$ python3 blog.py -h
1
2
3
4
5
6
7
usage: blog.py [-h] [--ip IP] [--port PORT] [--debug]

optional arguments:
-h, --help show this help message and exit
--ip IP, -i IP ip
--port PORT, -p PORT port
--debug, -d enable debug mode
  • 执行脚本
1
$ python3 blog.py -i yuzhouwan.com -d
1
https://yuzhouwan.com

踩过的坑

UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x87 in position illegal multibyte sequence

解决

1
2
3
4
# 在程序开头,指定编码,并在 open 文件的时候,指定 encoding 属性

# -*- coding:utf8 -*-
open(fname, "r", encoding="utf8")

connection broken by SSLError

解决

1
$ python -m pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org --upgrade pip

ModuleNotFoundError: No module named ‘yaml’

解决

1
$ pip install pyyaml

ERROR: Gateway Timeout

解决

1
2
3
4
5
6
import os
import requests

# 关闭代理
os.environ['NO_PROXY'] = '127.0.0.1'
r = requests.get('http://127.0.0.1:8080')

ModuleNotFoundError: No module named ‘IPython.core.inputtransformer2’

描述

1
$ jupyter notebook
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[I 2023-09-02 10:55:27.117 LabApp] JupyterLab extension loaded from /Users/jiayi.jin/opt/anaconda3/lib/python3.9/site-packages/jupyterlab
[I 2023-09-02 10:55:27.117 LabApp] JupyterLab application directory is /Users/jiayi.jin/opt/anaconda3/share/jupyter/lab
[I 10:55:27.123 NotebookApp] Serving notebooks from local directory: /Users/jiayi.jin/code/yuzhouwan
[I 10:55:27.123 NotebookApp] Jupyter Notebook 6.4.12 is running at:
[I 10:55:27.123 NotebookApp] http://localhost:8888/?token=661ee7213e7c3da16618b4c0069bbb27c2f6eddbe6545f2b
[I 10:55:27.123 NotebookApp] or http://127.0.0.1:8888/?token=661ee7213e7c3da16618b4c0069bbb27c2f6eddbe6545f2b
[I 10:55:27.123 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 10:55:27.128 NotebookApp]

To access the notebook, open this file in a browser:
file:///Users/jiayi.jin/Library/Jupyter/runtime/nbserver-32075-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=661ee7213e7c3da16618b4c0069bbb27c2f6eddbe6545f2b
or http://127.0.0.1:8888/?token=661ee7213e7c3da16618b4c0069bbb27c2f6eddbe6545f2b
[W 10:55:40.678 NotebookApp] Notebook yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/basic/Collection.ipynb is not trusted
[I 10:55:41.195 NotebookApp] Kernel started: eb857db3-08fd-4fc7-b021-b0bbadefeb62, name: python3
Traceback (most recent call last):
File "/Users/jiayi.jin/opt/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/jiayi.jin/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/jiayi.jin/opt/anaconda3/lib/python3.9/site-packages/ipykernel_launcher.py", line 15, in <module>
from ipykernel import kernelapp as app
File "/Users/jiayi.jin/opt/anaconda3/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 51, in <module>
from .ipkernel import IPythonKernel
File "/Users/jiayi.jin/opt/anaconda3/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 19, in <module>
from .debugger import Debugger, _is_debugpy_available
File "/Users/jiayi.jin/opt/anaconda3/lib/python3.9/site-packages/ipykernel/debugger.py", line 8, in <module>
from IPython.core.inputtransformer2 import leading_empty_lines
ModuleNotFoundError: No module named 'IPython.core.inputtransformer2'

解决

1
$ pip install --upgrade ipython jupyter
1
2
3
4
5
6
7
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spyder 5.3.3 requires pyqt5<5.16, which is not installed.
spyder 5.3.3 requires pyqtwebengine<5.16, which is not installed.
nb-mermaid 0.1.0 requires IPython<4.0,>3.0, but you have ipython 8.15.0 which is incompatible.
spyder 5.3.3 requires ipython<8.0.0,>=7.31.1, but you have ipython 8.15.0 which is incompatible.
spyder-kernels 2.3.3 requires ipython<8,>=7.31.1; python_version >= "3", but you have ipython 8.15.0 which is incompatible.
Successfully installed asttokens-2.4.0 exceptiongroup-1.1.3 executing-1.2.0 ipython-8.15.0 prompt-toolkit-3.0.39 pure-eval-0.2.2 stack-data-0.6.2
1
2
3
4
5
# 这个错误表明你升级了 `ipython` 到版本 `8.15.0`,但是这个版本与你当前已安装的其他包(如 `spyder` 和 `nb-mermaid`)不兼容
# 这里选择直接卸载这两个不常用的(当然也可以选择 VirtualEnv 来规避)
$ pip uninstall spyder
$ pip uninstall nb-mermaid
$ pip install --upgrade ipython jupyter

资料

欢迎加入我们的技术群,一起交流学习

群名称 群号
人工智能(高级)
人工智能(进阶)
大数据
算法
数据库