Python

什么是 Python?

 Python is a programming language that lets you work quickly and integrate systems more effectivelyPython Official Site

为什么要有 Python?

胶水语言

 胶水语言,能够把用其他语言制作的各种模块 ( 尤其是 C/C++ ) 很轻松地联结在一起

脚本语言

 ABC 语言的一种继承
 缩短传统的 编写 - 编译 - 链接 - 运行 ( edit-compile-link-run ) 过程

环境部署

Python 安装

Linux 基础环境

1
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y

Python 编译安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 在 python ftp 服务器中下载到 对应版本的 python
$ wget http://python.org/ftp/python/2.7.12/Python-2.7.12.tgz
# 编译
$ tar -zxvf Python-2.7.12.tgz
$ cd /root/software/Python-2.7.12
$ ./configure --prefix=/usr/local/python27
$ make
$ make install
$ ls /usr/local/python27/ -al
drwxr-xr-x. 6 root root 4096 12月 15 14:22 .
drwxr-xr-x. 13 root root 4096 12月 15 14:20 ..
drwxr-xr-x. 2 root root 4096 12月 15 14:22 bin
drwxr-xr-x. 3 root root 4096 12月 15 14:21 include
drwxr-xr-x. 4 root root 4096 12月 15 14:22 lib
drwxr-xr-x. 3 root root 4096 12月 15 14:22 share

覆盖旧版 Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 覆盖原来的 python6
$ which python
/usr/local/bin/python
# mv /usr/bin/python /usr/bin/python_old
$ mv /usr/local/bin/python /usr/local/bin/python_old
$ ln -s /usr/local/python27/bin/python /usr/local/bin/
$ python --version
Python 2.7.12
# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum
# 第一行修改为 python2.6
#!/usr/bin/python2.6
$ yum --version | sed '2,$d'
3.2.29

Pip

解压安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)
# upgrade setup tools and pip
$ pip install --upgrade setuptools pip
## 【Offline环境】安装 pip
# https://pypi.python.org/pypi/setuptools#code-of-conduct 下载 setuptools-32.0.0.tar.gz
$ tar zxvf setuptools-32.0.0.tar.gz
$ cd setuptools-32.0.0
$ cd setuptools-32.0.0
$ python setup.py install
# https://pypi.python.org/pypi/pip 下载 pip-9.0.1.tar.gz
$ wget --no-check-certificate https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
$ tar zxvf pip-9.0.1.tar.gz
$ cd pip-9.0.1
$ python setup.py install
Installed /usr/local/python27/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Processing dependencies for pip==9.0.1
Finished processing dependencies for pip==9.0.1
$ pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)

VirtualEnv

 这里我们以 Superset 为例,更多相关内容,参见我的另一篇博客《Apache Superset

解压安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ pip install virtualenv
# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ . ./venv/bin/activate
# 如果希望 virtualEnv 的隔离环境,能够访问系统全局的 site-packages 目录,可以增加 `--system-site-packages` 参数
# virtualenv -p /usr/local/bin/python --system-site-packages venv
# 另外,如果考虑到便于拷贝,使得 virtualEnv 中依赖的文件,都是复制进来的,而非软链接,则增加 `--always-copy` 参数
# virtualenv -p /usr/local/bin/python --always-copy venv
## 【Offline环境】安装 virtualenv
# 在 https://pypi.python.org/pypi/virtualenv#downloads 页面,下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install
$ virtualenv --version
15.1.0

部署上线

拷贝
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# rsync 替换 scp 可以确保软链接 也能被 cp
$ rsync -avuz -e ssh /home/superset/superset-0.15.4/ yuzhouwan@middle:/home/yuzhouwan/superset-0.15.4
//...
sent 142935894 bytes received 180102 bytes 3920986.19 bytes/sec
total size is 359739823 speedup is 2.51
# 在 本机 和 目标机器 的 superset 目录下,校验文件数量
$ find | wc -l
10113
# 重复以上步骤,从跳板机 rsync 到线上机器
$ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ root@192.168.2.10:/home/superset/superset-0.15.4
# virtualenv 创建依赖的 python
$ rsync -avuz -e ssh /root/software yuzhouwan@middle:/home/yuzhouwan
$ rsync -avuz -e ssh /home/yuzhouwan/software root@druid-prd01:/root
$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12
$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep / # nessnary!!
$ python -V
Python 2.7.12
动态链接库
1
2
3
4
5
6
7
8
9
10
11
# 虽然软链接已经 rsync 过来了,但是 目标机器相关目录下,没有对应的 python 的动态链接库
$ file /root/superset/lib/python2.7/lib-dynload
/root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload`
# 需要和联网环境中,创建 virtualenv 时的 python 全局环境一致
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ ls /usr/local/python27/lib/python2.7/lib-dynload -sail

VirtualEnvWrapper

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# VirtualEnv Wrapper 是 virtualenv 的扩展工具,可以方便的创建、删除、复制、切换不同的虚拟环境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
# 增加
export WORKON_HOME=~/virtualenv
source /usr/local/bin/virtualenvwrapper.sh
$ mkvirtualenv --python=/usr/bin/python superset
Running virtualenv with interpreter /usr/bin/python
New python executable in /root/virtualenv/superset/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [root@superset01 virtualenv]# deactivate
$ workon superset
(superset) [root@superset01 virtualenv]# lsvirtualenv -b
superset

基本语法

JSON

1
2
3
4
5
6
>>> user = json.loads('{"name":"benedict","infos":{"age":0,"blog":"yuzhouwan.com"}}')
>>> user['name']
'benedict'
>>> user['infos']['blog']
'yuzhouwan.com'

OS

操作系统相关

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 获取操作系统特定的路径分割符 (Windows: '\\'; Linux/Unix: '/')
os.sep
# 字符串表示正在使用的平台 (Windows: 'nt'; Linux/Unix: 'posix')
os.name
# 字符串给出当前平台使用的行终止符 (Windows: '\r\n'; Linux: '\n'; Mac: '\r')
os.linesep
# 函数用来运行 shell 命令
os.system(shell)
# 获得当前工作目录
os.getcwd()
# 获取 / 设置 环境变量
os.getenv(key) / os.putenv(key, value)
# 获得当前进程的 PID
os.getpid()

获取文件/路径信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 返回指定目录下的所有文件和目录名,v3.5 之后被替换为 scandir
os.listdir(path)
# 函数返回路径 path 的目录名和文件名
os.path.split(path)
# 判断路径是一个文件还是目录
os.path.isfile(path) / os.path.isdir(path)
# 判断路径是否是软链接
os.path.islink(path)
# 判断是否存在文件或目录
os.path.exists(path)
# 获得文件大小,如果 path 是目录返回 0L
os.path.getsize(path)
# 获得绝对路径
os.path.abspath(path)
# 规范 path 字符串形式
os.path.normpath(path)
# 分割文件名与目录
os.path.split(path)
# 分离文件名与扩展名
os.path.splitext(path)
# 连接目录与文件名或目录
os.path.join(path, file)
# 返回文件名
os.path.basename(path)
# 返回文件路径
os.path.dirname(path)

实际操作文件 / 路径

1
2
3
4
5
6
7
8
9
10
# 返回但前目录
os.curdir
# 改变工作目录到 path
os.chdir(path)
# 删除文件
os.remove(path)
# 删除目录
os.rmdir(path)
# 递归删除目录,删除 'foo/bar/baz',意味着依次删除 'foo/bar/baz' - 'foo/bar' - 'foo'
os.removedirs(path)

读取文件

1
2
3
4
5
6
def open_file(f = ""):
if not os.path.exists(f):
print("File not exists, path is %s!" % f)
return
with open(f, "r+", encoding="utf8") as of:
return of.readlines()

String

split

1
2
3
4
5
6
7
8
>>> 'a b c'.split(' ')
['a', 'b', 'c']
>>> 'a b c'.split(' ', 1)
['a', 'b c']
>>> 'a b c'.split(' ', 2)
['a', 'b', 'c']

类型转换

1
2
3
4
5
>>> int(1)
1
>>> float(1.0)
1.0

占位符

1
2
3
4
5
>>> "speed: %skm/h" % 16.8
'speed: 16.8km/h'
>>> "(%s, %s)" % ("percent", 99.97)
'(percent, 99.97)'

集合

map

赋值 / 取值
1
2
3
4
5
6
7
>>> kv_map = {}
>>> kv_map["k"] = "v"
>>> kv_map
{'k': 'v'}
>>> kv_map["k"]
'v'
排序
1
2
3
4
5
6
7
>>> costs = {"b": 2, "a": 1, "c": 3}
>>> costs
{'b': 2, 'c': 3, 'a': 1}
>>> costs_sorted = [ (k, costs[k]) for k in sorted(costs, key=costs.get, reverse=False) ]
>>> costs_sorted
[('a', 1), ('b', 2), ('c', 3)]
遍历
1
2
3
4
5
6
>>> for k, v in costs_sorted:
... print(k, v)
...
a 1
b 2
c 3

list

1
2
3
# range(start, stop, step)
>>> [ _ for _ in range(3, 0, -1)]
[3, 2, 1]

逻辑运算

& vs. and

1
2
3
4
5
6
7
8
9
10
11
>>> True & False
False
>>> True and False
False
>>> 10 > 1 & 10 < 1
True
>>> 10 > 1 and 10 < 1
False

Python 标准库

argparse

ftplib

json

urllib

Python 第三方库

数据分析核心库

Pandas

SciPy

NumPy

1
2
3
4
5
6
7
8
9
import numpy as np
arr = [2, 4, 6, 8, 10]
print np.mean(arr) # 平均值
print np.median(arr) # 中位数
print np.std(arr) # 标准差
6.0
6.0
2.82842712475

Tips: Full code is here.

统计学

Scrapy

StatsModels

NLP

NLTK

Gensim

机器学习

Scikit-learn

人工智能

TensorFlow

Theano

Keras

可视化

Matplotlib

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import numpy as np
import matplotlib.pyplot as plt
plt.figure(1)
plt.figure(2)
plt.figure(3)
x = np.linspace(0, 6, 100)
for i in range(3):
plt.figure(1)
plt.plot(x, np.sin(i * x))
plt.figure(2)
plt.plot(x, np.cos(i * x))
plt.figure(3)
plt.plot(x, np.tan(i * x))
plt.show()
plt.close()

Seaborn

Bokeh

Plotly

地图

GeoplotLib

MapBox

图像处理

PIL

Python 科学分析工具

IPython Notebook

安装

1
2
3
4
5
6
7
8
9
10
11
# 下载 Enthought Canopy 套件 (https://www.enthought.com/canopy-subscriptions/)
# 安装后,配置环境变量
$ PATH=D:\apps\Enthought\Canopy\App;%PATH%
# 安装
$ pip install "ipython[all]"
# 启动
$ mkdir ipython
$ cd ipython
$ ipython notebook
$ ipython notebook --pylab # pylab 模式
$ ipython notebook --pylab inline # Matplotlib 生成的图片嵌入网页内显示

格式转换

1
2
3
$ ipython c --to markdown --execute Basic.ipynb
# 或者使用 notedown 进行转换 (https://github.com/aaren/notedown)
$ pip install notedown

实用技巧

嵌入 Markdown

 iPython 创建好 .ipynb文件后,在 markdown 使用 <iframe>标签,就可以将完成嵌入

1
<iframe src="http://nbviewer.jupyter.org/github/asdf2014/yuzhouwan/blob/master/yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/ipython/Basic.ipynb" width="640" height="700" frameborder="0"></iframe>

 如此一来,可以将 matplotlib 画出的可视化图形,展示出来,而非仅仅一段 python 脚本,实际效果如下:

帮助文档

 ? 单问号,可以展示出 对应函数、类、变量的文档,而使用 ?? 双问号,则可以将对应的源码展示出来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ a = 1
$ a?
Type: int
String form: 1
Docstring:
int(x=0) -> int or long
int(x, base=10) -> int or long
Convert a number or string to an integer, or return 0 if no arguments
are given. If x is floating point, the conversion truncates towards zero.
If x is outside the integer range, the function returns a long instead.
If x is not a number or if base is given, then x must be a string or
Unicode object representing an integer literal in the given base. The
literal can be preceded by '+' or '-' and be surrounded by whitespace.
The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to
interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4
$ a??
Type: int
String form: 1
# 另外,推荐使用 "shift + tab",可以快速展示方法的详细描述

配置 iPython Notebook 支持Python 3
1
2
3
4
5
6
7
8
9
10
11
12
# 安装 python3
$ which python
/d/apps/Python/Python35/python
# 安装 iPython kernel
$ python -m pip install ipykernel
$ python -m ipykernel install --user
# 安装 notebook
$ which pip
/d/apps/Python/Python35/Scripts/pip
$ pip install notebook

Python 工程工具

Tox

VirtualEnv

实战技巧

设置 Proxy

1
2
3
4
$ export http_proxy="http://127.0.0.1:1080"
$ export https_proxy="https://127.0.0.1:1080"
$ export socks5_proxy="socks5://127.0.0.1:1080"
# pip install --upgrade pip

Remote Debug

 我们需要达到的效果是,本地通过 断点直接对 Python 代码进行 Debug修改,并在 Ctrl+S 之后会通过 SFTP 直接上传远程服务器,待全部修改部署完成,自动通过 Flask 自动 reload 最新代码,并自动重启远程 Python 进程,在本地直接看到修改之后的线上效果。(这里我们以 Airbnb的 Superset 项目为基础来介绍)

PyCharm

Windows 开发机
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## local
# should shutdown local firewall firstly
$ cd .\JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg
$ easy_install pycharm-debug.egg
# 若运行使用的是 Python3,则需要 pycharm-debug-py3k.egg
# Run/Debug Configuration - SuperSet Remote Debug - 192.168.3.10(local ip) - 12345(port > 10000), will generate..
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
# Path mappings
E:/Core Code/superset=/root/superset
# SFTP
# copy a project to a local directory.
# configure: tools - deployment, to upload this local copy to remote server
# config remote host
192.168.1.10 SFTP 192.168.1.10 22 /root/superset-0.15.4 root/****** UTF-8 # 脱敏
# Tools - Deployment - Options - Upload changed files automatically to the default server (On explicit save action (Ctrl+S))
# make deployment automatic: tools - deployment - "automatic upload"
# add remote interpreter: file - settings - python interpreters - "+" - "Remote.."
# Start Debug
Starting debug server at port 12345
Use the following code to connect to the debugger:
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
Waiting for process connection...
Connected to pydev debugger (build 162.1967.10)
Starting server with command: gunicorn -w 2 --timeout 60 -b 0.0.0.0:9097 --limit-request-line 0 --limit-request-field_size 0 superset:app
远程 Linux 运行环境
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## remote
$ cd /root/superset
$ source bin/activate
$ cd /root/superset/lib
# cp \JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg 到 lib 目录中
$ easy_install pycharm-debug.egg
# trouble shooting
>>> import pydevd
# restart
$ vim /root/superset/bin/superset
import pydevd
pydevd.settrace('192.168.3.10', port=12345, stdoutToServer=True, stderrToServer=True)
# After local debug, then start superset
$ mkdir logs
$ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log &
# Flask - Werkzeug debugger
2017-02-07 15:47:03,905:WARNING:werkzeug: * Debugger is active!
2017-02-07 15:47:03,905:INFO:werkzeug: * Debugger pin code: 330-765-812
$ pip install django-debug-toolbar
$ vim lib/python2.7/site-packages/pycharm-debug.egg/tests_pydevd_python/my_django_proj_17/my_django_proj_17/settings.py
INSTALLED_APPS = (
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'debug_toolbar', # add
'my_app',
)
# enable django
Setting - Language & Frameworks - Django - "Enable Django Support"
E:\Core Code\superset-0.15.4\bin\superset runserver -a '0.0.0.0' -p 9097
############################# PyDevd is so stiff! Let's Try Remote Python. #############################
# 配置 SFTP (同上)
# 配置 Remote Python
File - Settings - Project: superset-0.15.4 - Project Interpreter - show all(+) -
name: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
SSH Credentials
Host: 192.168.1.10 Port: 22
User name: root
Auth type: Password # 脱敏
Python interpreter path: /root/superset-0.15.4/bin/python
PyCharm helpers path: /root/superset-0.15.4/.pycharm_helpers
# 如果发现无法识别,可能是 python 缺少运行权限
$ cd /root/superset-0.15.4/bin && chmod 777 *
PyCharm 相关配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 配置 Python 运行项目
Run - Run/Debug Configurations(+) - Python -
Name: superset
Script: E:\Core Code\superset-0.15.4\bin\superset
Script parameters: runserver -d -p 9097
Environment Variables: VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
Python interpreter: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python) # 上面配置的 remote python
Working directory: E:\Core Code\superset-0.15.4\bin
Path mapping: E:/Core Code/superset-0.15.4=/root/superset-0.15.4
# 在用远程 python 进行 remote debug 之前,进入到 virtualenv 中
# 这里有可能找不到 activate 文件,可直接添加
File - Settings - Tools - Terminal - Shell path
/bin/bash --rcfile ~/.pycharmrc
$ vim '/e/Core Code/superset-0.15.4/.pycharmrc' # 本地工程增加 .pycharmrc
VIRTUAL_ENV="/root/superset-0.15.4" # 远程服务器中的 virtualenv 目录 (可以直接将 bin/activate 文件内容复制过来)
export VIRTUAL_ENV
# 远程服务器上多了两个进程
$ ps -ef | grep superset | grep -v grep
root 8638 10912 0 15:24 pts/1 00:00:00 bash -c cd /root/superset-0.15.4/bin; env "IDE_PROJECT_ROOTS"="/root/superset-0.15.4" "IPYTHONENABLE"="True" "PYTHONPATH"="/root/superset-0.15.4:/root/superset-0.15.4/.pycharm_helpers/pydev" "PYTHONUNBUFFERED"="1" "PYCHARM_HOSTED"="1" "VIRTUALENVWRAPPER_PYTHON"="E:\Core Code\superset-0.15.4\bin\python" "LIBRARY_ROOTS"="C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/544046706;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/964856790;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1532312494;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/python_stubs/250609560;D:/apps/JetBrains/PyCharm 2016.3.2/helpers/python-skeletons" "PYTHONDONTWRITEBYTECODE"="1" "JETBRAINS_REMOTE_RUN"="1" "PYTHONIOENCODING"="UTF-8" /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client '0.0.0.0' --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
root 8660 8638 11 15:24 pts/1 00:00:17 /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
root 8715 8660 28 15:24 pts/1 00:00:38 /root/superset-0.15.4/bin/python /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
完成
1
2
# 本地 windows 上访问
http://192.168.1.10:9097/login/

Visual Studio Code

 Not good for me! You can still try it if you are interested.

踩过的坑

Gunicorn 预开启了多个 Work 子进程,无法 Remote Debug
描述

 在本地 windows 开发机上,远程连接 linux 上运行在 virtualenv 里的 superset,发现可以 debug,但是 superset 里的 gunicorn 用的是 prefork 模型,开启了好多个 work 子进程

解决

a) 正常的 remote debug 来处理 –not ok

1
2
3
4
5
6
Connected to pydev debugger (build 162.1967.10)
[2017-02-06 18:13:22 +0000] [13609] [INFO] Starting gunicorn 19.6.0
[2017-02-06 18:13:22 +0000] [13609] [INFO] Listening at: http://0.0.0.0:9097 (13609)
[2017-02-06 18:13:22 +0000] [13609] [INFO] Using worker: sync
[2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13624)
[2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13623)

b) 所以用 “Django server” 替换 “Python Remote Debug” 来进行调试 –not ok

 配置的 Remote Python 明明是 /root/superset/bin/python,但是看到 报错信息里面,用的却是 /usr/local/bin/python

c) ipdb –not good

 将 gunicorn 进程切换到前台,在 命令行用 ipdb 进行 debug

d) 增加 -w 参数,控制 work 数量 –not ok

1
2
3
4
5
@manager.option(
'-w', '--workers', default=config.get("SUPERSET_WORKERS", 2), # default: 2
help="Number of gunicorn web server workers to fire up")
$ superset runserver -a 0.0.0.0 -p 9097 -w 0

e) 关闭 gunicorn –ok

 只有在压测时候,才需要开启 gunicorn
 superset runserver -d -p 9097

Trying to add breakpoint to file that does not exist
描述
1
pydev debugger: warning: trying to add breakpoint to file that does not exist: /root/superset/d:/apps/python27/lib/site-packages/gunicorn/arbiter.py
解决

a) 增加 python 中 site-packages 的 mapping 映射 –not good

1
E:/Core Code/superset=/root/superset;D:/apps/Python27=/root/superset/lib/python2.7

b) 修改 python 为 superset 项目中的 python,而不是本机的 python –ok

 同步到本机的 python 不是 python.exe –no
 使用 remote python –ok

Couldn’t obtain remote socket
描述
1
2
3
Error running superset
Can't run remote python interpreter: Couldn't obtain remote socket from output ('0.0.0.0', 52703), stderr /usr/local/bin/python: No module named virtualenvwrapper virtualenvwrapper.sh: There was a problem running the initialization hooks.
If Python could not import the module virtualenvwrapper.hook_loader, check that virtualenvwrapper has been installed for VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python and that PATH is set properly.
解决
1
2
3
4
5
6
7
# 查看 PATH 是否包含 venvWapper 的环境变量
$ echo $PATH
# 没有,则检查 ~/.bashrc,将其注释
# Source global definitions
# export WORKON_HOME=~/virtualenv
# source /usr/local/bin/virtualenvwrapper.sh
参考

Vagrant

 Vagrant 是一款可以自动化虚拟机的 安装和配置流程的软件

下载

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Vagrant
https://www.vagrantup.com/downloads.html
# VirtualBox
https://www.virtualbox.org/wiki/Downloads
http://download.virtualbox.org/virtualbox/5.1.12/ # better
https://hashicorp-files.hashicorp.com/lucid32.box # not good
https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box # best
# 相关镜像
https://atlas.hashicorp.com/boxes/search
http://chef.github.io/bento/
# 安装完成之后,需要 cmd/pycharm/git dash 等等,最好重启电脑

使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
$ vagrant box add superset /f/软件库/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box
==> box: Box file was not detected as metadata. Adding it directly...
==> box: Adding box 'superset' (v0) for provider:
box: Unpacking necessary files from: file:///F:/%C8%ED%BC%FE%BF%E2/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box
box:
==> box: Successfully added box 'superset' (v0) for 'virtualbox'!
$ vagrant box list
superset (virtualbox, 0)
$ vagrant init
A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on `vagrantup.com` for more information on using Vagrant.
$ vim /e/vagrant/superset-0.15.4/Vagrantfile
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrant.configure("2") do |config|
# config.vm.box = "superset"
# config.vm.box_check_update = false
# config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
# config.vm.synced_folder "./", "/root/superset-0.15.4"
#
# config.vm.network "public_network"
# config.vm.provider "virtualbox" do |vb|
# vb.gui = true
# vb.memory = "1024"
# end
# config.vm.provision "shell", inline: <<-SHELL
# apt-get update
# SHELL
# end
$ vagrant up --provide virtualbox
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'superset'...
==> default: Matching MAC address for NAT networking...
==> default: Setting the name of the VM: superset-0154_default_1486969836220_44233
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
default: Adapter 2: hostonly
==> default: Forwarding ports...
default: 22 (guest) => 2122 (host) (adapter 1)
default: 80 (guest) => 6080 (host) (adapter 1)
default: 6079 (guest) => 6079 (host) (adapter 1)
default: 22 (guest) => 2222 (host) (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key

踩过的坑

Provider ‘virtualbox’ not found
描述
1
2
3
4
5
$ vagrant up
==> Provider 'virtualbox' not found. We'll automatically install it now...
The installation process will start below. Human interaction may be required at some points. If you're uncomfortable with automatically installing this provider, you can safely Ctrl-C this process and install it manually.
==> Downloading VirtualBox 5.0.10...
This may not be the latest version of VirtualBox, but it is a version that is known to work well. Over time, we'll update the version that is installed.
解决

 vagrant up --provider=virtualbox

Timed out while waiting for the machine to boot
描述
1
2
3
4
5
6
7
8
9
10
11
子目录或文件 -p 已经存在。
处理: -p 时出错。
子目录或文件 charms 已经存在。
处理: charms 时出错。
Timed out while waiting for the machine to boot. This means that Vagrant was unable to communicate with the guest machine within the configured ("config.vm.boot_timeout" value) time period.
If you look above, you should be able to see the error(s) that Vagrant had when attempting to connect to the machine. These errors are usually good hints as to what may be wrong.
If you're using a custom box, make sure that networking is properly working and you're able to connect to the machine. It is a common problem that networking isn't setup properly in these boxes. Verify that authentication configurations are also setup properly, as well.
If the box appears to be booting properly, you may want to increase the timeout ("config.vm.boot_timeout") value.'
解决

 升级 VirtualBox 到 5.1.12

default: stdin: is not a tty
描述

 default: stdin: is not a tty

解决
1
config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
参考

Unittest

-t 改变 顶级 package 路径

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
The discover sub-command has the following options:
-v, --verbose Verbose output
-s, --start-directory directory Directory to start discovery (. default)
-p, --pattern pattern Pattern to match test files (test*.py default)
-t, --top-level-directory directory Top level directory of project (defaults to start directory)
Name druid_tests
Script E:\Core Code\superset-0.15.4\code\tests\druid_tests.py
Environment variables VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1
Python interpreter Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python)
Interpreter options -m tests.druid_tests
Working directory E:\Core Code\superset-0.15.4\code\
Path mappings E:/Core Code/superset-0.15.4=/root/superset-0.15.4
$ export SUPERSET_CONFIG=tests.superset_test_config
$ python -m tests.druid_tests discover . "druid_tests.py"
# 测试完成之后,需要 unset掉 SUPERSET_CONFIG
$ unset SUPERSET_CONFIG

踩过的坑

UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x87 in position illegal multibyte sequence

解决

1
2
3
4
# 在程序开头,指定编码,并在 open 文件的时候,指定 encoding 属性
# -*- coding:utf8 -*-
open(fname, "r", encoding="utf8")

资料

Blog

PEP

PyCharm

IPython

Book

更多资源,欢迎加入,一起交流学习

QQ group: (人工智能 1020982 (高级) & 1217710 (进阶) | BigData 1670647)