什么是 Python? Python is a programming language that lets you work quickly and integrate systems more effectively .
(图片来源:Mac Smith 的个人作品,已获得授权)
为什么要有 Python? 胶水语言 胶水语言,能够把用其他语言制作的各种模块(尤其是 C
/C++
)很轻松地联结在一起
脚本语言 ABC 语言的一种继承
缩短传统的 编写
- 编译
- 链接
- 运行
(edit
-compile
-link
-run
)过程
环境部署 Python 安装 Linux 基础环境 1 $ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y
Python 编译安装 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 $ wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz $ tar -zxvf Python-3.6.8.tgz $ cd /usr/local/Python-3.6.8 $ ./configure --prefix=/usr/local/python36 $ make $ make install $ ls /usr/local/python36/ -al total 24 drwxr-xr-x 6 root root 4096 Jan 30 11:10 . drwxr-xr-x 1 root root 4096 Jan 30 11:09 .. drwxr-xr-x 2 root root 4096 Jan 30 11:10 bin drwxr-xr-x 3 root root 4096 Jan 30 11:10 include drwxr-xr-x 4 root root 4096 Jan 30 11:10 lib drwxr-xr-x 3 root root 4096 Jan 30 11:10 share
覆盖旧版 Python 1 2 3 4 5 6 7 8 9 $ which python /usr/bin/python $ /usr/local/python36/bin/python3.6 -V Python 3.6.8 $ mv /usr/bin/python /usr/bin/python_old $ ln -s /usr/local/python36/bin/python3.6 /usr/bin/python $ python -V Python 3.6.8
恢复 yum 中旧版 Python 的引用 1 2 3 4 5 6 7 $ vim /usr/bin/yum $ yum --version | sed '2,$d' 3.2.29
如果要在 MacOS 环境中安装 Python 的话,可以从
Python 官网 直接下载 pkg 格式的安装包,进行一键安装或者升级
Pip 安装 在线 1 2 3 4 5 $ pip --version pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7) $ pip install --upgrade setuptools pip
离线 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 $ unzip setuptools-40.7.1.zip $ cd setuptools-40.7.1 $ python setup.py install $ tar zxvf pip-19.0.1.tar.gz $ cd pip-19.0.1 $ python setup.py install $ python -m pip -V pip 18.1 from /usr/local/python36/lib/python3.6/site-packages/pip (python 3.6) $ vim ~/.bashrc export PATH=$PATH :/usr/local/python36/bin $ source ~/.bashrc $ pip -V pip 19.0.1 from /usr/local/python36/lib/python3.6/site-packages/pip-19.0.1-py3.6.egg/pip (python 3.6)
VirtualEnv 这里我们以 Apache Superset 为例,更多相关内容,详见我的另一篇博客《Apache Superset 二次开发 》
解压安装 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 $ pip install virtualenv $ virtualenv venv $ source venv/bin/activate $ tar zxvf virtualenv-15.1.0.tar.gz $ cd virtualenv-15.1.0 $ python setup.py install $ virtualenv --version 15.1.0
部署上线 拷贝 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 $ rsync -avuz -e ssh /home/superset/superset-0.15.4/ yuzhouwan@middle:/home/yuzhouwan/superset-0.15.4 //... sent 142935894 bytes received 180102 bytes 3920986.19 bytes/sec total size is 359739823 speedup is 2.51 $ find | wc -l 10113 $ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ root@192.168.2.10:/home/superset/superset-0.15.4 $ rsync -avuz -e ssh /root/software yuzhouwan@middle:/home/yuzhouwan $ rsync -avuz -e ssh /home/yuzhouwan/software root@druid-prd01:/root $ cd /root/software $ tar zxvf Python-2.7.12.tgz $ cd Python-2.7.12 $ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC $ make && make install $ /sbin/ldconfig -v | grep / $ python -V Python 2.7.12
动态链接库 1 2 3 4 5 6 7 8 9 10 11 $ file /root/superset/lib/python2.7/lib-dynload /root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload` $ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC $ make && make install $ /sbin/ldconfig -v | grep / $ ls /usr/local/python27/lib/python2.7/lib-dynload -sail
VirtualEnvWrapper 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 $ pip install virtualenvwrapper $ mkdir ~/workspaces $ vim ~/.bashrc export WORKON_HOME=~/virtualenv source /usr/local/bin/virtualenvwrapper.sh $ mkvirtualenv --python=/usr/bin/python superset Running virtualenv with interpreter /usr/bin/python New python executable in /root/virtualenv/superset/bin/python Installing setuptools, pip, wheel...done. virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details (superset) [root@superset01 virtualenv] $ workon superset (superset) [root@superset01 virtualenv] superset
基本语法 基本数据类型 int int 类型的最大值 1 2 3 4 5 6 7 8 9 >>> import sys>>> sys.maxsize 9223372036854775807 >>> pow (2 , 63 ) - 1 9223372036854775807 >>> 1 << 64 - 1 9223372036854775808
float 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 >>> float ('inf' ) inf >>> float ('Inf' ) inf >>> float ('inf' ) > 0 True >>> float ('inf' ) < 0 False >>> float ('inf' ) > 9999999999 True >>> float ('inf' ) > 9999999999999999999999 True >>> float ('-inf' ) < -9999999999999999999999 True >>> float ('Inf' ) == float ('inf' ) == -float ('-inf' ) == -float ('-Inf' ) True
split 1 2 3 4 5 6 7 8 >>> 'a b c' .split(' ' ) ['a' , 'b' , 'c' ] >>> 'a b c' .split(' ' , 1 ) ['a' , 'b c' ] >>> 'a b c' .split(' ' , 2 ) ['a' , 'b' , 'c' ]
类型转换 1 2 3 4 5 6 7 8 >>> int (1 ) 1 >>> float (1.0 ) 1.0 >>> b"yuzhouwan.com" .decode("utf-8" ) u'yuzhouwan.com'
占位符 1 2 3 4 5 >>> "speed: %skm/h" % 16.8 'speed: 16.8km/h' >>> "(%s, %s)" % ("percent" , 99.97 ) '(percent, 99.97)'
遍历字符 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 >>> for i, c in enumerate ('yuzhouwan.com' ):... print i, c... 0 y 1 u 2 z 3 h 4 o 5 u 6 w 7 a 8 n 9 . 10 c 11 o 12 m
打印 不换行 1 2 >>> print ("[]" , end="" )[]>>>
居中 1 2 print ("asdf2014" .center(50 , '-' ))print ("yuzhouwan.com" .center(50 , '-' ))
1 2 ---------------------asdf2014--------------------- ------------------yuzhouwan.com-------------------
操作系统相关 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 os.sep os.name os.linesep os.system(shell) os.getcwd() os.getenv(key) / os.putenv(key, value) os.getpid()
获取文件/路径信息 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 os.listdir(path) os.path.split(path) os.path.isfile(path) / os.path.isdir(path) os.path.islink(path) os.path.exists(path) os.path.getsize(path) os.path.abspath(path) os.path.normpath(path) os.path.split(path) os.path.splitext(path) os.path.join(path, file) os.path.basename(path) os.path.dirname(path)
实际操作文件 / 路径 1 2 3 4 5 6 7 8 9 10 os.curdir os.chdir(path) os.remove(path) os.rmdir(path) os.removedirs(path)
读取文件 1 2 3 4 5 6 def open_file (f = "" ): if not os.path.exists(f): print ("File not exists, path is %s!" % f) return with open (f, "r+" , encoding = "utf8" ) as of: return of.readlines()
执行 shell 命令 1 2 3 4 >>> import os >>> exit_code = os.system("source ~/.bashrc" ) >>> exit_code 0
加载与提取 1 2 3 4 5 6 >>> user = json.loads('{"name":"benedict","infos":{"age":0,"blog":"yuzhouwan.com"}}' )>>> user['name' ] 'benedict' >>> user['infos' ]['blog' ] 'yuzhouwan.com'
与 YAML 格式互换 1 2 3 4 5 6 7 8 9 import jsonimport sysimport yamlsys.stdout.write(yaml.dump(json.load(sys.stdin))) sys.stdout.write(json.dumps(yaml.load(sys.stdin)))
加载 JSON 格式的文件
1 2 3 4 5 6 7 8 9 10 { "a" : [ 1 , true ] , "b" : [ 0 , false ] }
1 2 3 4 5 6 7 import jsonwith open ("./test.json" , "r+" , encoding="utf8" ) as f: content = json.load(f) print (content) if content['a' ][1 ]: print (content['b' ][0 ])
1 2 {'a' : [1, True], 'b' : [0, False]} 0
集合 map 赋值 / 取值 1 2 3 4 5 6 7 >>> kv_map = {}>>> kv_map["k" ] = "v" >>> kv_map {'k' : 'v' } >>> kv_map["k" ] 'v'
排序 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 >>> costs = {"b" : 2 , "a" : 1 , "c" : 3 }>>> costs {'b' : 2 , 'c' : 3 , 'a' : 1 } >>> sorted (costs) ['a' , 'b' , 'c' ] >>> sorted (costs.keys()) ['a' , 'b' , 'c' ] >>> sorted (costs.values()) [1 , 2 , 3 ] >>> [ (k, costs[k]) for k in sorted (costs, key=costs.get, reverse=False ) ] [('a' , 1 ), ('b' , 2 ), ('c' , 3 )] >>> sorted (costs.items(), key=lambda item: item[1 ], reverse=True ) [('c' , 3 ), ('b' , 2 ), ('a' , 1 )]
遍历 1 2 3 4 5 6 >>> for k, v in costs_sorted:... print (k, v)... a 1 b 2 c 3
求和 1 2 >>> sum ({"b" : 2 , "a" : 1 , "c" : 3 }.values()) 6
list 单层 list 1 2 3 4 5 >>> [ _ for _ in range (3 , 0 , -1 )] [3 , 2 , 1 ]
双层 list 1 2 >>> [['' for _ in range (2 )] for _ in range (2 )] [['' , '' ], ['' , '' ]]
Join 双层 list 1 2 >>> '.' .join(str (x) for inner_arr in ['yuzhouwan' , 'com' ] for x in inner_arr) 'y.u.z.h.o.u.w.a.n.c.o.m'
set 1 2 3 4 5 6 7 8 >>> s = set ()>>> s.add(1 )>>> s.add(2 )>>> s.add(2 )>>> s.add(3 )>>> print (s) set ([1 , 2 , 3 ]) >>>
iter 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 >>> blog = "yuzhouwan" >>> iter_blog = iter (blog)>>> print (next (iter_blog)) y >>> print (next (iter_blog)) u >>> print (next (iter_blog)) z >>> print (next (iter_blog)) h >>> print (next (iter_blog)) o >>> print (next (iter_blog)) u >>> print (next (iter_blog)) w >>> print (next (iter_blog)) a >>> print (next (iter_blog)) n >>> print (next (iter_blog)) Traceback (most recent call last): File "<stdin>" , line 1 , in <module> StopIteration >>> print (next (iter_blog, None )) None
流程控制 if-else 1 2 3 4 >>> -1 if True else 0 -1 >>> -1 if False else 0 0
try-except 1 2 3 4 5 6 7 8 9 10 import sysimport requeststry : resp = requests.get("https://yuzhouwan.com/" ) print (resp.text) except requests.HTTPError as error: print ("Cannot visit this url!" , error) sys.exit(1 )
retry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import timeimport requestsretry_max = 3 for i in range (retry_max): try : requests.get("https://yuzhouwan.com/" ).json() except Exception as e: print ("Exception:" , e) print ("Retry (%s / %s)..." % (i + 1 , retry_max)) time.sleep(1 ) else : break
1 2 3 4 5 6 Exception: Expecting value: line 1 column 1 (char 0) Retry (1 / 3)... Exception: Expecting value: line 1 column 1 (char 0) Retry (2 / 3)... Exception: Expecting value: line 1 column 1 (char 0) Retry (3 / 3)...
算术运算 除以并返回商的整数值 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 >>> 1 // 1 1 >>> 2 // 1 2 >>> 3 // 1 3 >>> 1 // 2 0 >>> 2 // 2 1 >>> 3 // 2 1 >>> 4 // 2 2 >>> 5 // 2 2 >>> 6 // 2 3
1 2 3 4 5 6 >>> round (111111 / 1024 , 2 )108.51 >>> round (111111 / 1024 , 0 )109.0 >>> int (round (111111 / 1024 , 0 ))109
逻辑运算 & vs and 1 2 3 4 5 6 7 8 9 10 11 >>> True & False False >>> True and False False >>> 10 > 1 & 10 < 1 True >>> 10 > 1 and 10 < 1 False
位运算
位运算
运算符
运算规则
与运算
&
A 与 B 值均为 1 时,结果才为 1,否则为 0
或运算
|
A 或 B 值为 1 时,结果才为 1,否则为 0
异或运算
^
A 与 B 不同为 0 或 1 时,结果才为 1,否则为 0
按位取反
~
取反二进制数,0 取 1,1 取 0
切片 获取列表的一部分 1 2 3 4 5 6 7 8 >>> [1 , 2 , 3 ][:1 ] [1 ] >>> [1 , 2 , 3 ][:2 ] [1 , 2 ] >>> [1 , 2 , 3 ][:3 ] [1 , 2 , 3 ] >>> [1 , 2 , 3 ][1 ::] [2 , 3 ]
获取整个列表 1 2 >>> [1 , 2 , 3 ][:] [1 , 2 , 3 ]
反转 1 2 3 4 5 6 >>> [1 , 2 , 3 ][::-1 ] [3 , 2 , 1 ] >>> 'nawuohzuy' [::-1 ] 'yuzhouwan'
对列表的切片赋值 1 2 3 4 5 6 7 8 9 >>> l = list (range (10 ))>>> l [0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] >>> l[0 :3 ] = [0 , -1 , -2 ]>>> l [0 , -1 , -2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] >>> l[2 ::3 ] = [0 , 0 , 0 ]>>> l [0 , -1 , 0 , 3 , 4 , 0 , 6 , 7 , 0 , 9 ]
Python 标准库 1 2 3 4 5 6 7 8 9 10 11 12 13 >>> import binascii>>> binascii.b2a_hex(u"宇宙湾" .encode("utf8" )) 'e5ae87e5ae99e6b9be' >>> 'e5ae87e5ae99e6b9be' .decode('hex' ) '\xe5\xae\x87\xe5\xae\x99\xe6\xb9\xbe' >>> print ('e5ae87e5ae99e6b9be' .decode('hex' )) 宇宙湾
1 2 3 4 5 import datetimestart = datetime.datetime.now() end = datetime.datetime.now() print ((end - start).microseconds)
制作 PO 文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 $ python D:\apps\Python\Python35\Tools\i18n\pygettext.py $ cat messages.pot msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "POT-Creation-Date: 2017-12-28 11:24+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=cp936\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: pygettext.py 1.5\n" $ vim messages.pot msgid "" msgstr "" "Project-Id-Version: Yuzhouwan v1.0.2\n" "POT-Creation-Date: 2017-12-28 11:24+0800\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: Benedict Jin <benedictjin2016@gmail.com>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: pygettext.py 1.5\n" $ mv messages.po locale/cn/LC_MESSAGES $ vim messages.po msgid "" msgstr "" "Project-Id-Version: Yuzhouwan v1.0.2\n" "POT-Creation-Date: 2017-12-28 11:39+0800\n" "PO-Revision-Date: 2017-12-28 11:43+0800\n" "Language-Team: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Generated-By: pygettext.py 1.5\n" "X-Generator: Poedit 2.0.1\n" "Last-Translator: \n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" "Language: zh\n" msgid "Hello, world!" msgstr "世界,你好!" msgid "yuzhouwan.com" msgstr "宇宙湾"
编写 PO 程序 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import gettextimport osdef getLocStrings (): current_dir = os.path.dirname(os.path.realpath(__file__)) locale_dir = os.path.join(current_dir, "locale" ) print ("Locale directory:" , locale_dir) return gettext.translation('messages' , locale_dir, ["zh_CN" , "en-US" ]).gettext _ = getLocStrings() print (_("Hello, world!" ))print (_("yuzhouwan.com" ))
1 2 3 Locale directory: E:\Core Code\leetcode\i18n\locale 世界,你好! 宇宙湾
1 2 3 4 5 6 7 8 9 10 import reprint (re.compile (r'(\d{4}-\d{1,2}-\d{1,2})' ).findall("[2023-1-1, 2023-11-1, 2023-12-03]" ))print (re.compile (r'(?<=blog: https://)(.*)(?=/)' ).findall("blog: https://yuzhouwan.com/" )[0 ])print (re.sub(r"posts/.*" , '' , "https://yuzhouwan.com/posts/43687/" ))
1 2 3 4 5 6 7 8 9 10 11 12 13 import timetime.strftime('%Y-%m-%d %H:%M:%S' , time.localtime()) int (time.mktime(time.strptime("2016-3-1 0:0:0" , "%Y-%m-%d %H:%M:%S" )))datetime.datetime.now().time() time.sleep(1 )
Python 第三方库 文件处理 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 from jproperties import Propertiesproperties = """ user=yuzhouwan github=asdf2014 """ with open ('./test.properties' , 'w' ) as f: f.write(properties) configs = Properties() with open ('./test.properties' , 'rb' ) as last_result: configs.load(last_result) user = str (configs.get('user' ).data) github = str (configs.get('github' ).data) print ("user:" , user)print ("github:" , github)
1 博客名是yuzhouwan,地址是`https://yuzhouwan.com`。
1 博客名是 yuzhouwan,地址是 `https://yuzhouwan.com`。
服务端 Flask 完整的 Flask 服务端示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from flask import Flaskapp = Flask(__name__) @app.route('/' , methods=['GET' , 'POST' ] ) def welcome (): return "Welcome!" @app.route('/blog' , methods=['GET' , 'POST' ] ) def blog (): return "<a href='https://yuzhouwan.com/'>https://yuzhouwan.com/<a>" if __name__ == "__main__" : app.run(host="127.0.0.1" , port=65533 , debug=True )
数据分析核心库 1 2 3 4 5 6 7 8 9 import numpy as nparr = [2 , 4 , 6 , 8 , 10 ] print np.mean(arr) print np.median(arr) print np.std(arr) 6.0 6.0 2.82842712475
Tips: Full code is here .
NLP 机器学习 可视化
基本绘图 1 2 3 4 5 6 7 8 9 10 import matplotlib.pyplot as pltbyte_arr_for_point = [1 , 2 , 3 , 4 , 5 ] byte_arr_for_multi_point = [1.1 , 1.2 , 1.3 , 1.4 , 1.5 ] plt.plot([i for i in range (0 , 5 )], byte_arr_for_point, ls='-' , label='line1' ) plt.plot([i for i in range (0 , 5 )], byte_arr_for_multi_point, ls='--' , label='line2' ) plt.xlabel('x' ) plt.ylabel('y' ) plt.legend() plt.show()
(对 Matplotlib ™ 输出界面的截图)
三角函数 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import numpy as npimport matplotlib.pyplot as pltplt.figure(1 ) plt.figure(2 ) plt.figure(3 ) x = np.linspace(0 , 6 , 100 ) for i in range (3 ): plt.figure(1 ) plt.plot(x, np.sin(i * x)) plt.figure(2 ) plt.plot(x, np.cos(i * x)) plt.figure(3 ) plt.plot(x, np.tan(i * x)) plt.show() plt.close()
(对 Matplotlib ™ 输出界面的截图)
积分 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 from matplotlib.pyplot import *def f (_ ): return pow (np.e, (-1 * _)) a = 0 b = 1 x = np.linspace(0 , 2 ) y = f(x) fig_size = 6 fig, ax = subplots(figsize=(fig_size, fig_size * 0.618 )) plot(x, y, 'b' , linewidth=1 ) ylim() x_ = np.linspace(a, b) y_ = f(x_) shadow = [(a, 0 )] + list (zip (x_, y_)) + [(b, 0 )] poly = Polygon(shadow, facecolor='0.8' , edgecolor='0.4' ) ax.add_patch(poly) text(1.4 * (a + b), 1.2 , r"$Cost(X, Y) = \int_{x_0}^{x_1} \int_{y_0}^{y_1} e^{-\lambda|x-y|}{\rm d}x{\rm d}y$" , horizontalalignment='center' , fontsize=16 ) figtext(0.95 , 0.03 , '$x$' ) figtext(0.075 , 0.82 , '$f(x)$' ) ax.set_xticks((a, b)) ax.set_xticklabels(['$x=%d$' % a, '$y=%d$' % b]) ax.set_yticks([f(a), f(b)]) title('' ) show()
(对 Matplotlib ™ 输出界面的截图)
地图 图像处理 爬虫 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 from lxml import etreeimport requestsdef get_ide_id (job_id, tag_name ): url = "http://historyserver-yuzhouwan:19888/jobhistory/conf/" + job_id page = requests.get(url) html = page.text selector = etree.HTML(html) tds = selector.xpath("//*[@id='conf']//tbody//tr//td//text()" ) exist = False for td in tds: if tag_name in td: exist = True continue if exist: return td.strip() print (get_ide_id("job_1010101010101_0101010" , "hive.ide.job.id" ))
如果未指定 requests#get 方法中的 timeout 参数,则默认会一直阻塞着
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import requestsurl = 'http://localhost:8082/druid/v2/sql' auth = ('xxx' , 'yyy' ) headers = {'Content-Type' : 'application/json' } body = { "query" : "SELECT 1" , "resultFormat" : "array" , "header" : 'true' , "context" : { "sqlOuterLimit" : 1 } } print (int (requests.post(url, headers=headers, auth=auth, json=body).json()[1 ][0 ]))
科学分析工具 安装 Windows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 $ python -m pip install --upgrade pip $ PATH=D:\apps\Enthought\Canopy\App;%PATH% $ pip install "ipython[all]" $ mkdir ipython $ cd ipython $ ipython notebook $ ipython notebook --pylab $ ipython notebook --pylab inline
MacOS 1 2 $ pip3 install notebook $ jupyter notebook
配置 1 2 3 4 5 6 7 8 9 10 $ jupyter notebook --generate-config Writing default config to: C:\Users\BenedictJin\.jupyter\jupyter_notebook_config.py $ vim ~/.jupyter/jupyter_notebook_config.py c.NotebookApp.notebook_dir = 'F:\Github\_draft\ipython' $ ipython notebook
格式转换 1 2 3 $ ipython c --to markdown --execute Basic.ipynb $ pip install notedown
实用技巧 嵌入 Markdown iPython 创建好 .ipynb
文件后,在 markdown 使用 <iframe>
标签,就可以将完成嵌入
1 <iframe src ="https://nbviewer.jupyter.org/github/asdf2014/yuzhouwan/blob/master/yuzhouwan-hacker/yuzhouwan-hacker-python/src/main/resources/ipython/Basic.ipynb" width ="640" height ="700" frameborder ="0" > </iframe >
如此一来,可以将 matplotlib 画出的可视化图形,展示出来,而非仅仅一段 python 脚本,实际效果如下:
Tips: 如果你的博客也是全站 HTTPS 的话,则需要保证 iframe
里面加载的资源也是 https
的,否则 Chrome 会阻止混合内容 的展示
帮助文档 ?
单问号,可以展示出 对应函数、类、变量的文档,而使用 ??
双问号,则可以将对应的源码展示出来
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 $ a = 1 $ a? Type: int String form: 1 Docstring: int(x=0) -> int or long int(x, base=10) -> int or long Convert a number or string to an integer , or return 0 if no arguments are given. If x is floating point, the conversion truncates towards zero. If x is outside the integer range, the function returns a long instead. If x is not a number or if base is given, then x must be a string or Unicode object representing an integer literal in the given base. The literal can be preceded by '+' or '-' and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int('0b100' , base=0) 4 $ a?? Type: int String form: 1
1 2 3 4 5 6 7 8 9 10 11 12 $ which python /d/apps/Python/Python35/python $ python -m pip install ipykernel $ python -m ipykernel install --user $ which pip /d/apps/Python/Python35/Scripts/pip $ pip install notebook
Python 工程工具 实战技巧 设置 Proxy 通过环境变量 1 2 3 4 $ export http_proxy="http://127.0.0.1:1080" $ export https_proxy="https://127.0.0.1:1080" $ export socks5_proxy="socks5://127.0.0.1:1080"
通过程序调用 1 2 3 4 5 6 import socketimport sockssocks.set_default_proxy(socks.SOCKS5, "127.0.0.1" , 9876 ) socket.socket = socks.socksocket
Remote Debug 我们需要达到的效果是,本地 通过 断点直接对 Python 代码进行 Debug 并修改 ,并在 Ctrl+S 之后会通过 SFTP 直接上传 至远程服务器 ,待全部修改部署完成,自动通过 Flask 自动 reload 最新 的代码 ,并自动重启 远程 Python 进程 ,在本地直接看到修改之后的线上效果。(这里我们以 Airbnb的 Superset 项目为基础来介绍)
PyCharm Windows 开发机 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 $ cd .\JetBrains\PyCharm 2016.2.3\debug-eggs\pycharm-debug.egg $ easy_install pycharm-debug.egg import pydevd pydevd.settrace('192.168.3.10' , port=12345, stdoutToServer=True, stderrToServer=True) E:/Core Code/superset=/root/superset 192.168.1.10 SFTP 192.168.1.10 22 /root/superset-0.15.4 root/****** UTF-8 Starting debug server at port 12345 Use the following code to connect to the debugger: import pydevd pydevd.settrace('192.168.3.10' , port=12345, stdoutToServer=True, stderrToServer=True) Waiting for process connection... Connected to pydev debugger (build 162.1967.10) Starting server with command : gunicorn -w 2 --timeout 60 -b 0.0.0.0:9097 --limit-request-line 0 --limit-request-field_size 0 superset:app
远程 Linux 运行环境 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 $ cd /root/superset $ source bin/activate $ cd /root/superset/lib $ easy_install pycharm-debug.egg >>> import pydevd $ vim /root/superset/bin/superset import pydevd pydevd.settrace('192.168.3.10' , port=12345, stdoutToServer=True, stderrToServer=True) $ mkdir logs $ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log & 2017-02-07 15:47:03,905:WARNING:werkzeug: * Debugger is active! 2017-02-07 15:47:03,905:INFO:werkzeug: * Debugger pin code: 330-765-812 $ pip install django-debug-toolbar $ vim lib/python2.7/site-packages/pycharm-debug.egg/tests_pydevd_python/my_django_proj_17/my_django_proj_17/settings.py INSTALLED_APPS = ( 'django.contrib.admin' , 'django.contrib.auth' , 'django.contrib.contenttypes' , 'django.contrib.sessions' , 'django.contrib.messages' , 'django.contrib.staticfiles' , 'debug_toolbar' , 'my_app' , ) Setting - Language & Frameworks - Django - "Enable Django Support" E:\Core Code\superset-0.15.4\bin\superset runserver -a '0.0.0.0' -p 9097 File - Settings - Project: superset-0.15.4 - Project Interpreter - show all(+) - name: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python) SSH Credentials Host: 192.168.1.10 Port: 22 User name: root Auth type : Password Python interpreter path: /root/superset-0.15.4/bin/python PyCharm helpers path: /root/superset-0.15.4/.pycharm_helpers $ cd /root/superset-0.15.4/bin && chmod 777 *
PyCharm 相关配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Run - Run/Debug Configurations(+) - Python - Name: superset Script: E:\Core Code\superset-0.15.4\bin\superset Script parameters: runserver -d -p 9097 Environment Variables: VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1 Python interpreter: Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python) Working directory: E:\Core Code\superset-0.15.4\bin Path mapping: E:/Core Code/superset-0.15.4=/root/superset-0.15.4 File - Settings - Tools - Terminal - Shell path /bin/bash --rcfile ~/.pycharmrc $ vim '/e/Core Code/superset-0.15.4/.pycharmrc' VIRTUAL_ENV="/root/superset-0.15.4" export VIRTUAL_ENV $ ps -ef | grep superset | grep -v grep root 8638 10912 0 15:24 pts/1 00:00:00 bash -c cd /root/superset-0.15.4/bin; env "IDE_PROJECT_ROOTS" ="/root/superset-0.15.4" "IPYTHONENABLE" ="True" "PYTHONPATH" ="/root/superset-0.15.4:/root/superset-0.15.4/.pycharm_helpers/pydev" "PYTHONUNBUFFERED" ="1" "PYCHARM_HOSTED" ="1" "VIRTUALENVWRAPPER_PYTHON" ="E:\Core Code\superset-0.15.4\bin\python" "LIBRARY_ROOTS" ="C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/544046706;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/964856790;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1532312494;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/368920028/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.2/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/2125044534;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/550610069;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/421221282;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1386076807;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-900005478;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/77779222;C:/Users/yuzhouwan/.PyCharm2016.3/system/remote_sources/250609560/-1783908167;C:/Users/yuzhouwan/.PyCharm2016.3/system/python_stubs/250609560;D:/apps/JetBrains/PyCharm 2016.3.2/helpers/python-skeletons" "PYTHONDONTWRITEBYTECODE" ="1" "JETBRAINS_REMOTE_RUN" ="1" "PYTHONIOENCODING" ="UTF-8" /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client '0.0.0.0' --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097 root 8660 8638 11 15:24 pts/1 00:00:17 /root/superset-0.15.4/bin/python -u /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097 root 8715 8660 28 15:24 pts/1 00:00:38 /root/superset-0.15.4/bin/python /root/superset-0.15.4/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support --client 0.0.0.0 --port 39925 --file /root/superset-0.15.4/bin/superset runserver -d -p 9097
完成 1 2 http://192.168.1.10:9097/login/
Visual Studio Code Not good for me! You can still try it if you are interested.
踩过的坑 Gunicorn 预开启了多个 Work 子进程,无法 Remote Debug 描述 在本地 windows 开发机上,远程连接 linux 上运行在 virtualenv 里的 superset,发现可以 debug,但是 superset 里的 gunicorn 用的是 prefork 模型,开启了好多个 work 子进程
解决 a) 正常的 remote debug 来处理 —not ok
1 2 3 4 5 6 Connected to pydev debugger (build 162.1967.10) [2017-02-06 18:13:22 +0000] [13609] [INFO] Starting gunicorn 19.6.0 [2017-02-06 18:13:22 +0000] [13609] [INFO] Listening at: http://0.0.0.0:9097 (13609) [2017-02-06 18:13:22 +0000] [13609] [INFO] Using worker: sync [2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13624) [2017-02-06 18:14:23 +0000] [13609] [CRITICAL] WORKER TIMEOUT (pid:13623)
b) 所以用 “Django server” 替换 “Python Remote Debug” 来进行调试 —not ok
配置的 Remote Python 明明是 /root/superset/bin/python
,但是看到 报错信息里面,用的却是 /usr/local/bin/python
c) ipdb —not good
将 gunicorn 进程切换到前台,在 命令行用 ipdb 进行 debug
d) 增加 -w
参数,控制 work 数量 —not ok
1 2 3 4 5 @manager.option( '-w' , '--workers' , default=config.get("SUPERSET_WORKERS" , 2), help ="Number of gunicorn web server workers to fire up" )$ superset runserver -a 0.0.0.0 -p 9097 -w 0
e) 关闭 gunicorn —ok
只有在压测时候,才需要开启 gunicorn superset runserver -d -p 9097
Trying to add breakpoint to file that does not exist 描述 1 pydev debugger: warning: trying to add breakpoint to file that does not exist: /root/superset/d:/apps/python27/lib/site-packages/gunicorn/arbiter.py
解决 a) 增加 python 中 site-packages 的 mapping 映射 —not good
1 E:/Core Code/superset=/root/superset;D:/apps/Python27=/root/superset/lib/python2.7
b) 修改 python 为 superset 项目中的 python,而不是本机的 python —ok
同步到本机的 python 不是 python.exe —no 使用 remote python —ok
Couldn’t obtain remote socket 描述 1 2 3 Error running superset Can't run remote python interpreter: Couldn' t obtain remote socket from output ('0.0.0.0' , 52703), stderr /usr/local/bin/python: No module named virtualenvwrapper virtualenvwrapper.sh: There was a problem running the initialization hooks. If Python could not import the module virtualenvwrapper.hook_loader, check that virtualenvwrapper has been installed for VIRTUALENVWRAPPER_PYTHON=/usr/local/bin/python and that PATH is set properly.
解决
Vagrant Vagrant 是一款可以自动化
虚拟机的 安装和配置流程
的软件
下载 1 2 3 4 5 6 7 8 9 10 11 12 13 14 https://www.vagrantup.com/downloads.html https://www.virtualbox.org/wiki/Downloads http://download.virtualbox.org/virtualbox/5.1.12/ https://hashicorp-files.hashicorp.com/lucid32.box https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box https://atlas.hashicorp.com/boxes/search http://chef.github.io/bento/
使用 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 $ vagrant box add superset /f/软件库/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box ==> box: Box file was not detected as metadata. Adding it directly... ==> box: Adding box 'superset' (v0) for provider: box: Unpacking necessary files from: file:///F:/%C8%ED%BC%FE%BF%E2/python/trusty-server-cloudimg-amd64-juju-vagrant-disk1.box box: ==> box: Successfully added box 'superset' (v0) for 'virtualbox' ! $ vagrant box list superset (virtualbox, 0) $ vagrant init A `Vagrantfile` has been placed in this directory. You are now ready to `vagrant up` your first virtual environment! Please read the comments in the Vagrantfile as well as documentation on `vagrantup.com` for more information on using Vagrant. $ vim /e/vagrant/superset-0.15.4/Vagrantfile $ vagrant up --provide virtualbox Bringing machine 'default' up with 'virtualbox' provider... ==> default: Importing base box 'superset' ... ==> default: Matching MAC address for NAT networking... ==> default: Setting the name of the VM: superset-0154_default_1486969836220_44233 ==> default: Clearing any previously set forwarded ports... ==> default: Clearing any previously set network interfaces... ==> default: Preparing network interfaces based on configuration... default: Adapter 1: nat default: Adapter 2: hostonly ==> default: Forwarding ports... default: 22 (guest) => 2122 (host) (adapter 1) default: 80 (guest) => 6080 (host) (adapter 1) default: 6079 (guest) => 6079 (host) (adapter 1) default: 22 (guest) => 2222 (host) (adapter 1) ==> default: Running 'pre-boot' VM customizations... ==> default: Booting VM... ==> default: Waiting for machine to boot. This may take a few minutes... default: SSH address: 127.0.0.1:2222 default: SSH username: vagrant default: SSH auth method: private key
踩过的坑 Provider ‘virtualbox’ not found 描述 1 2 3 4 5 $ vagrant up ==> Provider 'virtualbox' not found. We'll automatically install it now... The installation process will start below. Human interaction may be required at some points. If you' re uncomfortable with automatically installing this provider, you can safely Ctrl-C this process and install it manually. ==> Downloading VirtualBox 5.0.10... This may not be the latest version of VirtualBox, but it is a version that is known to work well. Over time, we'll update the version that is installed.
解决 vagrant up --provider=virtualbox
Timed out while waiting for the machine to boot 描述 1 2 3 4 5 6 7 8 9 10 11 子目录或文件 -p 已经存在。 处理: -p 时出错。 子目录或文件 charms 已经存在。 处理: charms 时出错。 Timed out while waiting for the machine to boot. This means that Vagrant was unable to communicate with the guest machine within the configured ("config.vm.boot_timeout" value) time period. If you look above, you should be able to see the error(s) that Vagrant had when attempting to connect to the machine. These errors are usually good hints as to what may be wrong. If you're using a custom box, make sure that networking is properly working and you' re able to connect to the machine. It is a common problem that networking isn't setup properly in these boxes. Verify that authentication configurations are also setup properly, as well. If the box appears to be booting properly, you may want to increase the timeout ("config.vm.boot_timeout") value.'
解决 升级 VirtualBox 到 5.1.12
default: stdin: is not a tty 描述 default: stdin: is not a tty
解决 1 config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
-t 改变 顶级 package 路径 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 The discover sub-command has the following options: -v, --verbose Verbose output -s, --start-directory directory Directory to start discovery (. default) -p, --pattern pattern Pattern to match test files (test *.py default) -t, --top-level-directory directory Top level directory of project (defaults to start directory) Name druid_tests Script E:\Core Code\superset-0.15.4\code\tests\druid_tests.py Environment variables VIRTUALENVWRAPPER_PYTHON=E:\Core Code\superset-0.15.4\bin\python;PYTHONUNBUFFERED=1 Python interpreter Remote Python 2.7.12 (ssh://root@192.168.1.10:22/root/superset-0.15.4/bin/python) Interpreter options -m tests.druid_tests Working directory E:\Core Code\superset-0.15.4\code\ Path mappings E:/Core Code/superset-0.15.4=/root/superset-0.15.4 $ export SUPERSET_CONFIG=tests.superset_test_config $ python -m tests.druid_tests discover . "druid_tests.py" $ unset SUPERSET_CONFIG
参数解析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import argparseimport osparser = argparse.ArgumentParser() parser.add_argument('--ip' , '-i' , type =str , help ='ip' , default="localhost" , required=False ) parser.add_argument('--port' , '-p' , type =int , help ='port' , default=80 , required=False ) parser.add_argument('--debug' , '-d' , action='store_true' , help ='enable debug mode' , default=False , required=False ) args = parser.parse_args() ip = str (args.ip) port = int (args.port) debug = args.debug if port == 80 : blog = "https://%s" % ip else : blog = "https://%s:%s" % (ip, port) if debug: print (blog) os.system("open '%s'" % blog)
1 2 3 4 5 6 7 usage: blog.py [-h] [--ip IP] [--port PORT] [--debug] optional arguments: -h, --help show this help message and exit --ip IP, -i IP ip --port PORT, -p PORT port --debug, -d enable debug mode
1 $ python3 blog.py -i yuzhouwan.com -d
踩过的坑 UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x87 in position illegal multibyte sequence 解决 1 2 3 4 open (fname, "r" , encoding="utf8" )
connection broken by SSLError 解决 1 $ python -m pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org --upgrade pip
ModuleNotFoundError: No module named ‘yaml’ 解决
ERROR: Gateway Timeout 解决 1 2 3 4 5 6 import osimport requestsos.environ['NO_PROXY' ] = '127.0.0.1' r = requests.get('http://127.0.0.1:8080' )
资料
群名称
群号
人工智能(高级)
人工智能(进阶)
大数据
算法
数据库