Apache Superset 二次开发

Apache Superset 是什么?

Apache Superset™ is a modern data exploration and visualization platform.

基础组件

Flask

 Python 几大著名 Web 框架之一,以其轻量级,高可扩展性而著名

  • Jinja2
    模板引擎

  • Werkzeug
    WSGI 工具集

Gunicorn

 Gunicorn 是一个开源的 Python WSGI HTTP 服务器,移植于 Ruby 的 Unicorn 项目的采用 pre-fork 模式的服务器

WSGI

 WSGI,即 Python Web Server Gateway Interface,是专门用于 Python 应用程序或框架与 Web 服务器之间的一种接口,没有官方的实现,因为 WSGI 更像一个协议,只要遵照这些协议,WSGI 应用都可以在 任何服务器上运行,反之亦然

Pre-Fork

 一个进程处理一个请求,基于 select 模型,所以最多一次创建 1024 个进程
 预先创建进程,pre-fork 采用的是预派生子进程方式,用子进程处理不同的请求,每个请求对应一个子进程,进程之间是彼此独立的
 一定程度上加快了进程的响应速度

Django

 Django 是一个开放源代码的 Web 应用框架,由 Python 写成。采用了 MVC 的软件设计模式,使得开发复杂的、数据库驱动的网站变得简单
 Django 注重组件的重用性和”可插拔性”,敏捷开发和 DRY 法则(Do not Repeat Yourself)

 核心组件

  • 物件导向的映射器,用作数据模型(以 Python 类的形式定义)和 关联性数据库间的媒介
  • 基于正则表达式的 URL 分发器
  • 视图系统,用于处理请求
  • 模板系统

PyDruid

 A Python connector for Druid
 Exposes a simple API to create, execute, and analyze Druid queries

Pandas

 Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive

SciPy

 SciPy 是基于 Numpy 构建的一个集成了多种数学算法和方便的函数的 Python 模块

Scikit-learn

 Machine Learning in Python

D3.js

 D3.js 是一个操纵数据的 JavaScript 库

安装

基础环境

OS

1
2
3
4
5
6
7
8
9
10
$ uname -a
Linux 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/version
Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013

# For Fedora and RHEL-derivatives
# [Doc]: Other System https://superset.apache.org/installation.html#os-dependencies
$ sudo yum upgrade python-setuptools -y
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y

Machines

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 外网(http://192.168.1.10:9097/)
superset01 192.168.1.10 Superset
druid01 192.168.1.11 Druid
druid02 192.168.1.12 MySQL

# Cluster 配置
Cluster druid cluster
Coordinator Host 192.168.1.11
Coordinator Port 8081
Coordinator Endpoint druid/coordinator/v1/metadata
Broker Host 192.168.1.13
Broker Port 8082
Broker Endpoint druid/v2
Cache Timeout 86400 # 1day: result_backend


# 线上(http://192.168.2.10:9097)
druid-prd01 192.168.2.10 Superset
druid-prd02 192.168.2.11 Druid

# Cluster 配置
Cluster druid cluster
Coordinator Host 192.168.2.11
Coordinator Port 8081
Coordinator Endpoint druid/coordinator/v1/metadata
Broker Host 192.168.2.13
Broker Port 8082
Broker Endpoint druid/v2
Cache Timeout 86400 # 1day: result_backend

Python 相关

Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
$ python --version
Python 2.7.8

[Note]: Superset is tested using Python 2.7 and Python 3.4+. Python 3 is the recommended version, Python 2.6 won't be supported.'

## 升级 Python(stable: Python 2.7.12 | 3.4.5, lastest: Python 3.5.2 [2016/12/15])
https://www.python.org/downloads/

# 在 python ftp 服务器中下载到,对应版本的 python
$ wget http://python.org/ftp/python/2.7.12/Python-2.7.12.tgz

# 编译
$ tar -zxvf Python-2.7.12.tgz
$ cd /root/software/Python-2.7.12
$ ./configure --prefix=/usr/local/python27
$ make
$ make install

$ ls /usr/local/python27/ -al

drwxr-xr-x. 6 root root 4096 12月 15 14:22 .
drwxr-xr-x. 13 root root 4096 12月 15 14:20 ..
drwxr-xr-x. 2 root root 4096 12月 15 14:22 bin
drwxr-xr-x. 3 root root 4096 12月 15 14:21 include
drwxr-xr-x. 4 root root 4096 12月 15 14:22 lib
drwxr-xr-x. 3 root root 4096 12月 15 14:22 share


# 覆盖原来的 python6
$ which python
/usr/local/bin/python
# mv /usr/bin/python /usr/bin/python_old
$ mv /usr/local/bin/python /usr/local/bin/python_old
$ ln -s /usr/local/python27/bin/python /usr/local/bin/
$ python --version
Python 2.7.12

# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum

# 第一行修改为 python2.6
#!/usr/bin/python2.6

$ yum --version | sed '2,$d'
3.2.29

Pip

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ pip --version
$ pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)

# upgrade setup tools and pip
$ pip install --upgrade setuptools pip

## Offline 环境下安装 pip
# https://pypi.python.org/pypi/setuptools#code-of-conduct 下载 setuptools-32.0.0.tar.gz
$ tar zxvf setuptools-32.0.0.tar.gz
$ cd setuptools-32.0.0

$ cd setuptools-32.0.0
$ python setup.py install

# https://pypi.python.org/pypi/pip 下载 pip-9.0.1.tar.gz
$ wget --no-check-certificate https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
$ tar zxvf pip-9.0.1.tar.gz
$ cd pip-9.0.1
$ python setup.py install
Installed /usr/local/python27/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Processing dependencies for pip==9.0.1
Finished processing dependencies for pip==9.0.1

$ pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)

Virtualenv

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ pip install virtualenv

# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ source venv/bin/activate

## Offline 环境下安装 virtualenv
# https://pypi.python.org/pypi/virtualenv#downloads 下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install

$ virtualenv --version
15.1.0

Superset 相关

Superset 初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
$ pip install superset

## Offline 环境下安装 superset
# https://pypi.python.org/pypi/superset 下载 superset-0.15.0.tar.gz
$ tar zxvf superset-0.15.0.tar.gz
$ cd superset-0.15.0
$ python setup.py install

# Create an admin user
$ fabmanager create-admin --app superset

Username [admin]: # login name
User first name [admin]: # first name
User last name [user]: # lastname
Email [admin@fab.org]: # email, must unique
Password:
Repeat for confirmation:
Error: the two entered values do not match
Password: #superset
Repeat for confirmation: #superset
// ...
Recognized Database Authentications.
2016-12-14 17:53:40,945:INFO:flask_appbuilder.security.sqla.manager:Added user superset db upgrade
Admin User superset db upgrade created.

# Initialize the database
$ superset db upgrade

// ...
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.


# Load some data to play with
$ superset load_examples

Loading examples into <SQLA engine=u'sqlite:////root/.superset/superset.db'>
Creating default CSS templates
Loading energy related dataset
Creating table [wb_health_population] reference
2016-12-14 17:58:09,568:INFO:root:Creating database reference
2016-12-14 17:58:09,575:INFO:root:sqlite:////root/.superset/superset.db
Loading [World Bank's Health Nutrition and Population Stats]'
Creating table [wb_health_population] reference
2016-12-14 17:58:30,840:INFO:root:Creating database reference
2016-12-14 17:58:30,846:INFO:root:sqlite:////root/.superset/superset.db


# Create default roles and permissions
$ superset init

Loading examples into <SQLA engine=u'sqlite:////root/.superset/superset.db'>
Creating default CSS templates
Loading energy related dataset
Creating table [wb_health_population] reference
2016-12-14 17:58:09,568:INFO:root:Creating database reference
2016-12-14 17:58:09,575:INFO:root:sqlite:////root/.superset/superset.db
Loading [World Bank's Health Nutrition and Population Stats]
Creating table [wb_health_population] reference
2016-12-14 17:58:30,840:INFO:root:Creating database reference
2016-12-14 17:58:30,846:INFO:root:sqlite:////root/.superset/superset.db
Creating slices
Creating a World's Health Bank dashboard
Loading [Birth names]
Done loading table!
--------------------------------------------------------------------------------
Creating table [birth_names] reference
2016-12-14 17:58:52,276:INFO:root:Creating database reference
2016-12-14 17:58:52,280:INFO:root:sqlite:////root/.superset/superset.db
Creating some slices
Creating a dashboard
Loading [Random time series data]
Done loading table!
--------------------------------------------------------------------------------
Creating table [random_time_series] reference
2016-12-14 17:58:53,953:INFO:root:Creating database reference
2016-12-14 17:58:53,957:INFO:root:sqlite:////root/.superset/superset.db
Creating a slice
Loading [Random long/lat data]
Done loading table!
--------------------------------------------------------------------------------
Creating table reference
2016-12-14 17:59:09,732:INFO:root:Creating database reference
2016-12-14 17:59:09,736:INFO:root:sqlite:////root/.superset/superset.db
Creating a slice
Loading [Multiformat time series]
Done loading table!
--------------------------------------------------------------------------------
Creating table [multiformat_time_series] reference
2016-12-14 17:59:10,421:INFO:root:Creating database reference
2016-12-14 17:59:10,426:INFO:root:sqlite:////root/.superset/superset.db
Creating some slices
Loading [Misc Charts] dashboard
Creating the dashboard


# Start the web server on port 8088
$ superset runserver -p 8088

# To start a development web server, use the -d switch
# superset runserver -d

# Refresh Druid Datasource (after config it)
$ superset refresh_druid

Virtualenv 工作空间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# superset01 192.168.1.10
$ cd root
$ virtualenv -p /usr/local/bin/python --system-site-packages --always-copy superset
$ source superset/bin/activate

# 详见下文 `遇到的坑` - `安装 superset 需要下载依赖库` 部分
# 旧版
# pip install --download package -r requirements.txt
# 新版 (v19.0.3)
# pip download -d package -r requirements.txt
$ pip install -r /root/requirements.txt

$ superset runserver -a 0.0.0.0 -p 8088

# 建议使用 rsync,详见 `部署上线` 部分
$ cd /root
$ tar zcvf virtualenv.tar.gz virtualenv/
$ scp virtualenv.tar.gz root@192.168.1.13:/root/

# 192.168.1.13
$ cd /root/virtualenv/superset
$ source bin/activate

VirtualenvWrapper

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## 【拓展】
# virtualenvwrapper 是 virtualenv 的扩展工具,可以方便的创建、删除、复制、切换不同的虚拟环境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
# 增加
export WORKON_HOME=~/virtualenv
source /usr/local/bin/virtualenvwrapper.sh

$ mkvirtualenv --python=/usr/bin/python superset
Running virtualenv with interpreter /usr/bin/python
New python executable in /root/virtualenv/superset/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [root@superset01 virtualenv]#
(superset) [root@superset01 virtualenv]# deactivate

$ workon superset
(superset) [root@superset01 virtualenv]# lsvirtualenv -b
superset

部署上线

拷贝

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# rsync 替换 scp 可以确保软链接 也能被 cp
$ rsync -avuz -e ssh /home/superset/superset-0.15.4/ yuzhouwan@middle:/home/yuzhouwan/superset-0.15.4

//...
sent 142935894 bytes received 180102 bytes 3920986.19 bytes/sec
total size is 359739823 speedup is 2.51

# 在 本机 和 目标机器 的 superset 目录下校验文件数量
$ find | wc -l
10113

# 重复以上步骤,从跳板机 rsync 到线上机器
$ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ root@192.168.2.10:/home/superset/superset-0.15.4

# virtualenv 创建依赖的 python
$ rsync -avuz -e ssh /root/software yuzhouwan@middle:/home/yuzhouwan
$ rsync -avuz -e ssh /home/yuzhouwan/software root@druid-prd01:/root

$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep / # nessnary!!
$ python -V
Python 2.7.12

动态链接库

1
2
3
4
5
6
7
8
9
10
11
# 虽然软链接已经 rsync 过来了,但是 目标机器相关目录下,没有对应的 Python 的动态链接库
$ file /root/superset/lib/python2.7/lib-dynload

/root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload`

# 需要和联网环境中,创建 VirtualEnv 时的 Python 全局环境一致
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /

$ ls /usr/local/python27/lib/python2.7/lib-dynload -sail

用户权限

1
2
3
4
5
6
7
8
9
# 创建用户
$ adduser superset
$ cd /home/superset
# 如果存在版本号,需要创建 软链接
$ chown -R superset:superset superset-0.15.4
$ ln -s superset-0.15.4 superset

$ chown -h superset:superset superset
$ su - superset

元数据存储

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 修改数据库
$ vim ./lib/python2.7/site-packages/superset/config.py

# SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://user:password@mysql01:3306/superset1?charset=utf8'

$ mysql -hmysql01 -p3306 -uuser -ppassword
> use superset1;
> show tables;
+-------------------------+
| Tables_in_superset1 |
+-------------------------+
| ab_permission |
| ... |
| url |
+-------------------------+
28 rows in set (0.00 sec)

# mysqldump -hmysql01 -p3306 -uuser -ppassword superset1 > superset1.sql
$ mysqldump -hmysql01 -p3306 -uuser -ppassword --single-transaction superset1 > superset1.sql

启动

1
2
3
4
$ cd /home/superset/superset-0.15.4
$ source bin/activate
$ mkdir logs
$ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 -w 4 > logs/superset.log &

本地运行

依赖

Windows 相关

Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat)
描述

 error: Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat). Get it from http://aka.ms/vcpython27

解决
1
2
$ pip install wheel setuptools
# VCForPython27.msi 下载安装
‘openssl/opensslv.h’: No such file or directory
解决
1
# download openssl-0.9.8h-1-setup.exe from http://gnuwin32.sourceforge.net/packages/openssl.htm
Cannot open include file: ‘stdint.h’: No such file or directory
解决
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Microsoft Visual C++ 2015 Redistributable Update 3
# download vc_redist.x64.exe from https://www.microsoft.com/zh-CN/download/details.aspx?id=53840
$ vim D:\apps\Python27\Lib\distutils\msvc9compiler.py

def get_build_version():
return 9.0
def find_vcvarsall(version):
return r'C:\Users\yuzhouwan\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\vcvarsall.bat'

$ cd superset-0.15.4
$ python setup.py install

# Microsoft 提供的 VCForPython27.msi 默认使用 VC2008,而 stdint.h 是从 VC2012 开始支持的
# 2014 年之后,VCForPython27.msi 便不再维护,决定尝试用 ubuntu or remote debug ...

Python相关

Make sure that you use the correct version of ‘pip’
描述
1
Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'D:\apps\Python27\python.exe'
解决
1
2
3
4
5
# 安装 pip,下载 https://bootstrap.pypa.io/get-pip.py 安装文件
$ python get-pip.py

$ pip --version
pip 8.1.1 from d:\apps\python27\lib\site-packages (python 2.7)
‘Connection to pypi.python.org timed out. (connect timeout=15)’
描述
1
2
$ pip install --upgrade pip
'Connection to pypi.python.org timed out. (connect timeout=15)'
解决
1
2
3
4
5
# 设置 proxy
$ export https_proxy="http://10.10.10.10:8080"
$ pip install --upgrade pip
$ pip --version
pip 9.0.1 from d:\apps\python27\lib\site-packages (python 2.7)
setup.py failed with error code 1
描述
1
Command "d:\apps\python27\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\yuzhouwan\\appdata\\local\\temp\\pip-build-zzbhrq\\sasl\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\yuzhouwan\appdata\local\temp\pip-erwavd-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\yuzhouwan\appdata\local\temp\pip-build-zzbhrq\sasl\
解决
1
2
3
4
5
6
7
$ pip install --upgrade setuptools pip
$ pip install superset

# Download superset-0.15.4.tar.gz from https://pypi.python.org/pypi/superset
$ tar zxvf superset-0.15.4.tar.gz
$ cd superset-0.15.4
$ python setup.py install

基于 K8S 环境部署

1
$ helm install superset stable/superset --version 1.1.11
1
2
3
4
5
6
7
8
9
10
11
12
NAME: superset
LAST DEPLOYED: Mon Jul 19 08:45:33 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Superset can be accessed via port 9000 on the following DNS name from within your cluster:
superset.default.svc.cluster.local

Initially you can login with username/password: admin/admin.
WARNING: Persistence is DISABLED !
1
2
3
4
5
$ kubectl exec -it superset-65c7696586-lhwpp bash
$ export FLASK_APP=superset && flask fab create-admin
$ exit
$ kill `ps -ef | grep 8088 | grep -v grep | awk '{print $2}'`; export POD_NAME=$(kubectl get pods | grep superset | awk '{print $1}') ; nohup kubectl port-forward $POD_NAME 8088:8088 --address 0.0.0.0 2>&1 &
$ open 'http://localhost:8088/login/'

Apache Superset Login
Apache Superset Dashboards

(对 Apache Superset™ 可视化页面的截图)

开发环境搭建

依赖

1
2
3
4
5
6
7
8
9
10
11
12
$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ python -V
$ Python 2.7.12

$ mv /usr/local/bin/python /usr/local/bin/python_bak
$ ln -s /usr/local/python27/bin/python /usr/local/bin/python

虚拟环境

1
2
3
4
$ cd /root
$ virtualenv -p /usr/local/bin/python --system-site-packages env
$ cd env
$ mkdir code

代码

1
2
3
4
5
6
7
8
# windows
$ cd E:\Github\super\env
$ git init
$ git remote add origin master https://github.com/asdf2014/superset.git
$ git pull origin master

# SFTP
# 上传到 /root/env/code

安装

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$ cd /root/env/code
$ source /root/env/bin/activate

$ cd /root/env/code/superset/static
$ mv assets assets_bak
$ ln -s ../assets assets

$ cd /root/env/code
$ python setup.py develop

Finished processing dependencies for superset==0.15.4

$ pip freeze | grep superset
superset==0.15.4

# Create an admin user
$ fabmanager create-admin --app superset

Username [admin]: # login name
User first name [admin]: # first name
User last name [user]: # lastname
Email [admin@fab.org]: # email, must unique
Password:
Repeat for confirmation:
Error: the two entered values do not match
Password: #superset
Repeat for confirmation: #superset
// ...
Recognized Database Authentications.
2016-12-14 17:53:40,945:INFO:flask_appbuilder.security.sqla.manager:Added user superset db upgrade
Admin User superset db upgrade created.

$ superset db upgrade
$ superset init
$ superset load_examples

Npm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# [Mac OS]
$ sudo yum group install "Development Tools" --setopt=group_package_types=mandatory,default,optional --skip-broken -y
$ sudo yum install curl git m4 ruby texinfo bzip2-devel curl-devel expat-devel ncurses-devel zlib-devel -y

# ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/linuxbrew/go/install)" # Do not run this as root!
$ wget https://raw.githubusercontent.com/Homebrew/linuxbrew/go/install --no-check-certificate
$ mv install install.rb
$ vim install.rb

 # abort "Don't run this as root!" if Process.uid == 0

$ mkdir -p /root/.linuxbrew/bin
$ export PATH="/root/.linuxbrew/bin:$PATH"
$ ruby install.rb

$ vim ~/.bashrc

 export PATH="$HOME/.linuxbrew/bin:$PATH"
 export MANPATH="$HOME/.linuxbrew/share/man:$MANPATH"
 export INFOPATH="$HOME/.linuxbrew/share/info:$INFOPATH"


# [CentOS]
$ yum install npm
$ cd /root/env/code/superset/assets # package.json
$ npm install

# if visit https://github.com/jquery/jquery.git return timeout
$ vim /etc/hosts

 192.30.253.112 github.com
 151.101.100.133 assets-cdn.github.com
 192.30.253.117 api.github.com
 192.30.253.121 codeload.github.com

测试

1
2
3
4
5
6
7
$ cd /root/env/code
$ chmod 777 *sh
$ cd /root/env/code/superset/bin
$ chmod 777 superset

$ cd /root/env/code
$ bash run_tests.sh

IDE 中远程开发

Remote Debug

 详见我的另一篇博客中 Remote Debug 部分:《Python

二次开发

Others Category

描述

 对 HBase 的 Region 层面进行聚合,group 出来的 Region 会很多,在 DistributionPieViz 中展示会很卡顿,而且不美观

解决

增加 row_limit 可以排除 topN 之外的数据
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ cd /root/superset-0.15.4
$ vim ./lib/python2.7/site-packages/superset/viz.py

fieldsets = ({
'label': None,
'fields': (
'metrics', 'groupby',
'limit',
'pie_label_type',
('donut', 'show_legend'),
'labels_outside',
'row_limit',
)
},)
others_category 将 topN 之外的数据聚合
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
$ cd /root/superset-0.15.4
$ vim ./lib/python2.7/site-packages/superset/viz.py

fieldsets = ({
'label': None,
'fields': (
'metrics', 'groupby',
'limit',
'pie_label_type',
('donut', 'show_legend'),
'labels_outside',
'row_limit',
'others_category',
)
},)

$ vim ./lib/python2.7/site-packages/superset/forms.py

'others_category': (BetterBooleanField, {
"label": _("Others category"),
"default": True,
"description": _("Aggregate data outside of topN into a single category")
}),


# models.py
# Others 类别,没有被排在最后,而是重新又进行了一次排序
# "others_category": "y" 属性没有传递下来

self.status = None
self.error_message = None
self.others_category = form_data.get("others_category")

top_n = 10
if top_n > 0:
df_head = df.head(top_n)
df_tail = df.tail(len(df) - 10)
other_metrics_sum = []
for i in range(0, len(metrics) - 1):
metric = metrics[i]
other_metrics_sum[i] = df_tail[metric].sum()
df_other = pd.DataFrame([['Others', other_metrics_sum]], columns=df.columns)
df = df_head.append(df_other, ignore_index=True)

Tips: 已提 RP#2176 Aggregate data outside of topN into a single category

Y 轴数据异常

描述

 Y 轴本应该是 0 的起点,变成 -997m 负数

解决

 已提 RP#2307 Some problem in Y Axis

后期优化

MySQL 时区问题

查询

描述
1
2
3
4
5
6
7
8
9
10
11
12
$ lib/python2.7/site-packages/superset/config.py

 from dateutil import tz

 # Druid query timezone
 # tz.tzutc() : Using utc timezone
 # tz.tzlocal() : Using local timezone
 # other tz can be overridden by providing a local_config
 DRUID_IS_ACTIVE = True
 DRUID_TZ = tz.tzlocal() # +08:00

 # DRUID_TZ = tz.gettz('Asia/Shanghai')
解决

 已提 RP#2143 Using the time zone with specific name for querying Druid

展示

描述
1
dttm.tz_convert(dttm.tzinfo._filename.split('zoneinfo/')[1]) - pytz.timezone(dttm.tzinfo._filename.split('zoneinfo/')[1]).localize(EPOCH)
解决

 已提 RP#2370 Fix timezone issues in slices

Superset 升级

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 直接利用 pip install 的方式进行升级
$ pip freeze | grep superset
$ superset==0.13.2

$ pip install superset==-1
versions: 0.12.0, 0.13.0, 0.13.1, 0.13.2, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.3, 0.15.4

$ pip install superset==0.15.4

# 发现之前的配置数据 都消失了,需要做一些 config 的调整
$ vim ./lib/python2.7/site-packages/superset/config.py

# SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://root:root@192.168.1.12:3306/superset?charset=utf8'

$ vim /root/superset-0.15.4/bin/activate

# VIRTUAL_ENV="/root/superset"
VIRTUAL_ENV="/root/superset-0.15.4"

# then could just run "superset runserver -a 0.0.0.0 -p 9097"

Unknown column ‘datasources.filter_select_enabled’ in ‘field list’

描述
1
InternalError: (pymysql.err.InternalError) (1054, u"Unknown column 'datasources.filter_select_enabled' in 'field list'") [SQL: u'SELECT datasources.created_on AS datasources_created_on, datasources.changed_on AS datasources_changed_on, datasources.id AS datasources_id, datasources.datasource_name AS datasources_datasource_name, datasources.is_featured AS datasources_is_featured, datasources.is_hidden AS datasources_is_hidden, datasources.filter_select_enabled AS datasources_filter_select_enabled, datasources.description AS datasources_description, datasources.default_endpoint AS datasources_default_endpoint, datasources.user_id AS datasources_user_id, datasources.cluster_name AS datasources_cluster_name, datasources.offset AS datasources_offset, datasources.cache_timeout AS datasources_cache_timeout, datasources.params AS datasources_params, datasources.perm AS datasources_perm, datasources.changed_by_fk AS datasources_changed_by_fk, datasources.created_by_fk AS datasources_created_by_fk \nFROM datasources \nWHERE datasources.datasource_name = %(datasource_name_1)s \n LIMIT %(param_1)s'] [parameters: {u'param_1': 1, u'datasource_name_1': u'bi-dfp-oms-detail'}]
解决
1
2
$ superset db upgrade
$ superset refresh_druid

Issues with Druid timezones

描述

 Those methods that named tzutc and tzlocal in tz work for me…
 Oh no.. They are not working when i upgrade superset from v0.13.2 into v0.15.4, even if i try to use DRUID_TZ = tz.gettz(‘Asia/Shanghai’) :-(

 详见:Issues with Druid timezones #1369

解决
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ cd /root/superset-0.15.4
$ ./bin/python -m pip freeze | grep superset

superset==0.13.2

$ ./bin/python -m pip uninstall superset
$ ./bin/python -m pip install superset==0.15.4
$ ./bin/python -m pip freeze | grep superset

superset==0.15.4

$ ./bin/python ./bin/easy_install lib/pycharm-debug.egg
# config remote python

$ ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097
# nohup ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log &

$ ./bin/python ./bin/superset db upgrade
$ ./bin/python ./bin/superset refresh_druid

pydevd 无法进行 remote debug

描述

 版本从 0.13.2 升级到 0.15.4,在 debug 的时候会启动两个进程(会导致 pydevd 无法进行 remote debug)

1
2
3
4
$ ps -ef | grep superset | grep -v grep

root 22567 1632 19 12:05 pts/0 00:00:03 ./bin/python ./bin/superset runserver -d -p 9097
root 22578 22567 24 12:05 pts/0 00:00:03 /root/superset-0.15.4/bin/python ./bin/superset runserver -d -p 9097
解决
直接用 cli.py 启动 —not ok
1
2
3
4
5
6
7
8
9
10
11
$ vim ./lib/python2.7/site-packages/superset/config.py

# append
manager.run()

$ ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -a 0.0.0.0 -p 9097

$ ps -ef | grep superset | grep -v grep

root 25238 1632 35 13:07 pts/0 00:00:03 ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
root 25247 25238 55 13:07 pts/0 00:00:03 /root/superset-0.15.4/bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
尝试解决 WARNING:werkzeug: * Debugger is active! 问题
1
2
3
4
5
6
7
8
9
10
11
12
$ vim lib/python2.7/site-packages/werkzeug/serving.py

class ThreadedWSGIServer(ThreadingMixIn, BaseWSGIServer):

"""A WSGI server that does threading."""
multithread = True

$ vim lib/python2.7/site-packages/flask/app.py

options.setdefault('use_reloader', self.debug)

$ superset/__init__.py

 已提 RP#2136 Fix werkzeug instance was created twice in Debug Mode

Sqlite3 切换为 MySQL

尝试 SQLite 自带的 dump 命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# superset01				192.168.1.10		Superset
$ cd /root/.superset
$ ll -sail

1285 43256 -rw-r--r-- 1 root root 44288000 Jan 22 14:06 superset.db

$ sqlite3 superset.db
sqlite> .databases
seq name file
--- --------------- ----------------------------------------------------------
0 main /root/.superset/superset.db

sqlite> .tables
ab_permission columns multiformat_time_series
ab_permission_view css_templates query
ab_permission_view_role dashboard_slices random_time_series
ab_register_user dashboard_user slice_user
ab_role dashboards slices
ab_user datasources sql_metrics
ab_user_role dbs table_columns
ab_view_menu energy_usage tables
access_request favstar url
alembic_version logs wb_health_population
birth_names long_lat
clusters metrics

# not suit for mysql
# sqlite> .output superset.sql
# sqlite> .dump

$ vim dump_for_mysql.py

# https://github.com/EricHigdon/sqlite3tomysql

$ sqlite3 superset.db .dump | python dump_for_mysql.py > superset.sql

$ ls -sail

1285 43256 -rw-r--r-- 1 root root 44288000 Jan 22 14:06 superset.db
18631 76968 -rw-r--r-- 1 root root 78812197 Jan 22 14:35 superset.sql

$ vim superset.sql

id INTEGER NOT NULL,
# 替换为 (主键) 自增长
id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT,

$ scp superset.sql root@192.168.1.12:/home/mysql

自己实现 sqlite3tomysql.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# druid02    192.168.1.12    MySQL
$ ps -ef | grep mysql | grep -v druid | grep -v grep

mysql 11435 8530 0 14:13 pts/4 00:00:00 /bin/sh /home/mysql/bin/mysqld_safe --defaults-file=/home/mysql/my.cnf
mysql 12192 11435 0 14:13 pts/4 00:00:00 /home/mysql/bin/mysqld --defaults-file=/home/mysql/my.cnf --basedir=/home/mysql --datadir=/home/mysql/data --plugin-dir=/home/mysql/lib/mysql/plugin --log-error=/home/mysql/data/druid02.err --open-files-limit=8192 --pid-file=/home/mysql/data/druid02.pid --socket=/home/mysql/data/mysql.sock --port=3306
mysql 12223 8530 0 14:13 pts/4 00:00:00 mysql -uroot -p -S /home/mysql/data/mysql.sock


$ su - mysql
$ mysql -uroot -p -S /home/mysql/data/mysql.sock
mysql> show databases;
mysql> create database superset;
mysql> show databases;
mysql> use superset;

# 执行 sqlite3tomysql.py
mysql -uroot -p superset2 -S /home/mysql/data/mysql.sock --default-character-set=utf8 < superset.sql.schema.sql
mysql -uroot -p superset2 -S /home/mysql/data/mysql.sock --default-character-set=utf8 < superset.sql.data.sql

# 避免表之间 外键依赖,可以在 mysql 命令行中,使用 source .superset.sql.schema.sql 的方式,多次批量导入

元数据存储

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# superset01				192.168.1.10		Superset
$ cd /root/superset
$ find ./ -name config.py
./lib/python2.7/site-packages/caravel/config.py
./lib/python2.7/site-packages/sqlalchemy/testing/config.py
./lib/python2.7/site-packages/pandas/core/config.py
./lib/python2.7/site-packages/superset/config.py
./lib/python2.7/site-packages/setuptools/config.py
./lib/python2.7/site-packages/numpy/distutils/command/config.py
./lib/python2.7/site-packages/gunicorn/config.py
./lib/python2.7/site-packages/panoramix/config.py
./lib/python2.7/site-packages/flask/config.py
./lib/python2.7/site-packages/alembic/testing/config.py
./lib/python2.7/site-packages/alembic/config.py

$ vim ./lib/python2.7/site-packages/superset/config.py
# SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://root:root@192.168.1.12:3306/superset?charset=utf8'

启动

1
2
# 先执行,一系列 superset 初始化工作
$ nohup superset runserver -a 0.0.0.0 -p 9097 -w 4 2>&1 > logs/superset.log &

Tips: 代码 & 操作步骤,详见:Convert SQLite into MySQL

参数调优

1
2
3
4
5
# 适当增加 gunicorn 的 worker 数量(default:2)
$ cd /root/superset
$ source bin/activate
$ mkdir logs
$ nohup ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097 -w 4 2>&1 > logs/superset.log &

日志

ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.

描述
1
2
3
4
5
6
7
8
9
10
11
(superset) [root@superset01 superset-0.15.4]# ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.script is deprecated, use flask_script instead.
.format(x=modname), ExtDeprecationWarning
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.sqlalchemy is deprecated, use flask_sqlalchemy instead.
.format(x=modname), ExtDeprecationWarning
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.sqlalchemy._compat is deprecated, use flask_sqlalchemy._compat instead.
.format(x=modname), ExtDeprecationWarning
/root/superset-0.15.4/lib/python2.7/site-packages/flask_cache/init.py:152: UserWarning: Flask-Cache: CACHE_TYPE is set to null, caching is effectively disabled.
warnings.warn("Flask-Cache: CACHE_TYPE is set to null, "
/root/superset-0.15.4/lib/python2.7/site-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.
.format(x=modname), ExtDeprecationWarning
解决
1
2
3
4
5
6
7
$ vim ./bin/superset

+import warnings
+from flask.exthook import ExtDeprecationWarning
+warnings.simplefilter('ignore', ExtDeprecationWarning)
+
from superset.cli import manager

 已提 RP#2138 Fix ExtDeprecationWarning

遇到的坑

创建 user 时,需保证 email 的唯一性

1
2
3
Recognized Database Authentications.
2016-12-14 18:12:36,007:ERROR:flask_appbuilder.security.sqla.manager:Error adding new user to database. (sqlite3.IntegrityError) column email is not unique [SQL: u'INSERT INTO ab_user (first_name, last_name, username, password, active, email, last_login, login_count, fail_login_count, created_on, changed_on, created_by_fk, changed_by_fk) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (u'superset', u'yuzhouwan', u'superset', 'pbkdf2:sha1:1000$e3imUMx0$83b38fb2a0f628d1379379bb353fc80697c435a1', 1, u'yuzhouwan@gmail.com', None, None, None, '2016-12-14 18:12:36.004721', '2016-12-14 18:12:36.004773', None, None)]
No user created an error occured

 使用 admin / admin 用户登录,进行修改

缺少的依赖包

描述

1
RuntimeError: Compression requires the (missing) zlib module

解决

1
2
3
4
5
6
7
8
9
10
11
$ yum install zlib
$ yum install zlib-devel

# 进到 python2.7 目录 重新编译安装,软链接不需要重建
$ cd /root/software/Python-2.7.12
$ make
$ make install

# 进到 setup-tools 目录 重新安装
$ cd /root/software/setuptools-32.0.0
$ python setup.py install

Python 无法装载模块(RedHat Problem)

pip: command not found

1
2
3
4
5
6
7
8
9
10
11
12
# 利用装载模块的方式 使用 pip
$ python -m pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)

# 修改命令别名
$ vim ~/.bashrc

# 未生效可直接执行
alias pip='python -m pip'

$ pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)

virtualenv: command not found

1
2
3
4
5
$ vim ~/.bashrc
alias virtualenv='python -m virtualenv'

$ virtualenv --version
15.1.0

安装 superset 需要下载依赖库

sasl/sasl.h:没有那个文件或目录

描述
1
2
3
4
5
6
gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
error: command 'gcc' failed with exit status 1

cc1plus: 警告:命令行选项 “-Wstrict-prototypes” 对 Ada/C/ObjC 是有效的,但对 C++ 无效
在包含自 sasl/saslwrapper.cpp:254 的文件中:
sasl/saslwrapper.h:22:23: 错误:sasl/sasl.h:没有那个文件或目录
解决
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ gcc -v
使用内建 specs。
目标:x86_64-redhat-linux
配置为:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
线程模型:posix
gcc 版本 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)

# 安装 g++
# g++是c++的编译器,安装好之后,gcc会自动寻找c++程序所需的编译环境,进而编译成功
# wget ftp://rpmfind.net/linux/centos/6.8/os/x86_64/Packages/gcc-c++-4.4.7-17.el6.x86_64.rpm (需要完全一致 gcc 4.4.7-4才行)
# http://rpm.pbone.net/index.php3/stat/4/idpl/25438297/dir/scientific_linux_6/com/gcc-c++-4.4.7-4.el6.x86_64.rpm.html
# http://rpm.pbone.net/index.php3/stat/4/idpl/25440518/dir/scientific_linux_6/com/libstdc++-devel-4.4.7-4.el6.x86_64.rpm.html
$ rpm -ivh libstdc++-devel-4.4.7-4.el6.x86_64.rpm
$ rpm -ivh gcc-c++-4.4.7-4.el6.x86_64.rpm

$ g++ -v
使用内建 specs。
目标:x86_64-redhat-linux
配置为:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
线程模型:posix
gcc 版本 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)

命令行选项 -Wstrict-prototypes 对 Ada/C/ObjC 是有效的,但对 C++ 无效

描述

 cc1plus: 警告:命令行选项 “-Wstrict-prototypes” 对 Ada/C/ObjC 是有效的,但对 C++ 无效

解决
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# cmake 版本过低(这里是没有安装)
# https://cmake.org/ (stable: 3.6.3, lastest: 3.7.1, date: 2016/12/16)
# https://cmake.org/cmake/help/v3.6/
$ wget --no-check-certificate https://cmake.org/files/v3.6/cmake-3.6.3.tar.gz # To connect to cmake.org insecurely
$ tar zxvf cmake-3.6.3.tar.gz
$ cd cmake-3.6.3
$ ./bootstrap
$ make
$ gmake install

$ cmake -version
$ cmake version 3.6.3
$ CMake suite maintained and supported by Kitware (kitware.com/cmake).

# reboot (should)

$ cd ~
$ mkdir virtualenv
$ cd virtualenv
$ virtualenv env1
$ virtualenv --python=/usr/bin/python env1

# new problem
# IOError: [Errno 40] Too many levels of symbolic links: '/root/virtualenv/env1/bin/python'
# 不能直接 rm -rf env1,需要用 rmvirtualenv 才行
$ rmvirtualenv env1
$ cd env1
$ source bin/activate # 退出 deactivate
(env1) [root@edeppreapp01 env1] # python -V
Python 2.7.12

Could not find a version that satisfies the requirement pytz>dev

描述
1
2
3
4
5
6
7
8
9
10
11
12
# 如果一个一个依赖去安装 会很麻烦
Could not find a version that satisfies the requirement pytz>dev (from celery==3.1.23) (from versions: )
Could not find a version that satisfies the requirement billiard<3.4,>=3.3.0.23 (from celery==3.1.23) (from versions: )
No matching distribution found for amqp<2.0,>=1.4.9 (from kombu==3.0.35)
No matching distribution found for anyjson>=0.3.3 (from kombu==3.0.35)
No matching distribution found for kombu<3.1,>=3.0.34 (from celery==3.1.23)
No matching distribution found for celery==3.1.23 (from superset)
Could not find suitable distribution for Requirement.parse('werkzeug==0.11.10')
pip install thrift-0.9.3.tar.gz
No matching distribution found for six (from sasl==0.2.1)
No matching distribution found for sasl>=0.2.1 (from thrift-sasl==0.2.1)
No local packages or working download links found for thrift-sasl>=0.2.1
解决
1
2
3
4
5
6
7
8
9
10
$ pip list
$ pip freeze > requirements.txt
$ mkdir packages
$ pip install --download package -r requirements.txt

$ cd packages
$ scp celery-3.1.23-py2.py3-none-any.whl root@druid01:/root/software/packages

# --find-links 可以在指定目录中,找到 superset 的相关依赖,依次安装好
$ python -m pip install --no-index --find-links=packages superset # -r requirements.txt

ImportError: No module named ssl

解决

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# 安装 ssl
$ yum install yum-downloadonly -y

$ yum -y install ncurses ncurses-devel gcc-c++ libxml2-devel gd gd-devel libpng libpng-devel libjpeg libjpeg-devel libmcrypt libmcrypt-devel openldap-devel openldap-servers openldap-clients autoconf freetype-devel libtool-ltdl-devel openssl openssl-devel gcc automake autoconf libtool make --downloadonly --downloaddir=.

$ yum -y install GeoIP gmp libevent libmcrypt libtidy libXpm libxslt mhash mysql mysql-server nfs-utils nginx perl-DBD-MySQL perl-DBI php php-common php-fpm php-gd php-mbstring php-mcrypt php-mhash php-mysql php-pdo php-xml t1lib --downloadonly --downloaddir=.

$ rpm -Uvh --force --nodeps *.rpm


# 重新编译 Python
$ cd /root/software/Python-2.7.12
$ vim Modules/Setup.dist

# 取消注释
SSL=/usr/local/ssl
_ssl _ssl.c \
-DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
-L$(SSL)/lib -lssl -lcrypto

$ ./configure --enable-shared CFLAGS=-fPIC //--enable-shared option means to generate dynamic library libpython2.7.so.1.0
make && make install

# Not work
$ python --version
Python 2.7.12

$ python
Python 2.7.12 (default, Dec 19 2016, 10:58:27)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "/usr/local/lib/python2.7/ssl.py", line 97, in <module>
>>> import _ssl # if we can't import it, let the error propagate
>>> ImportError: No module named _ssl
>>> quit()

# 安装缺少的 openssl-devel
$ rpm -aq | grep openssl
openssl-1.0.1e-42.el6_7.4.x86_64

$ yum install openssl-devel -y

$ rpm -aq | grep openssl
openssl-1.0.1e-42.el6_7.4.x86_64
openssl-devel-1.0.1e-42.el6_7.4.x86_64

#修改 Setup 文件
$ vim /root/software/Python-2.7.12/Modules/Setup
# Socket module helper for socket(2)
_socket socketmodule.c timemodule.c

# Socket module helper for SSL support; you must comment out the other
# socket line above, and possibly edit the SSL variable:
#SSL=/usr/local/ssl
_ssl _ssl.c \
-DUSE_SSL -I$(SSL)/include -I$(SSL)/include/openssl \
-L$(SSL)/lib -lssl -lcrypto

# 重新编译
$ cd /root/software/Python-2.7.12
$ make && make install

$ python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>>

$ cd /root/virtualenv/superset/bin
[root@olap03-sit bin]# python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl

$ /root/virtualenv/superset/bin/python
Python 2.7.12 (default, Dec 16 2016, 16:23:17)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "/usr/local/python27/lib/python2.7/ssl.py", line 97, in <module>
>>> import _ssl # if we can't import it, let the error propagate
>>> ImportError: No module named _ssl


$ mv /root/virtualenv/superset/bin/python /root/virtualenv/superset/bin/python_old
$ ln -s /usr/local/bin/python /root/virtualenv/superset/bin/

$ ./python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> quit()
>>> [root@olap03-sit bin]#
>>> [root@olap03-sit bin]#
>>> [root@olap03-sit bin]# pwd
>>> /root/virtualenv/superset/bin
>>> [root@olap03-sit bin]# /root/virtualenv/superset/bin/python
>>> Python 2.7.12 (default, Dec 19 2016, 11:08:33)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> quit()
>>> [root@olap03-sit bin]# python
>>> Python 2.7.12 (default, Dec 19 2016, 11:08:33)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>>
>>> source bin/activate
>>> (superset) [root@olap03-sit superset]# which python
>>> /root/virtualenv/superset/bin/python
>>> (superset) [root@olap03-sit superset]# python
>>> Python 2.7.12 (default, Dec 19 2016, 11:08:33)
>>> [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> quit()

# ImportError: No module named gunicorn.app.base
import gunicorn.app.base

python: error while loading shared libraries: libpython2.7.so.1.0

描述

1
2
3
4
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC //--enable-shared option means to generate dynamic library libpython2.7.so.1.0
$ make && make install
$ python -V
python: error while loading shared libraries: libpython2.7.so.1.0: cannot open shared object file: No such file or directory

解决

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ yum reinstall python-libs		--not work

$ ll /usr/local/python27/lib/libpython2.7.so.1.0 --not work
$ vim /etc/ld.so.conf
include ld.so.conf.d/*.conf
include /usr/local/Python2.7/lib

$ /sbin/ldconfig -v | grep /
/lib:
/lib64:
/usr/lib:
/usr/lib64:
/lib64/tls: (hwcap: 0x8000000000000000)
/usr/lib64/sse2: (hwcap: 0x0000000004000000)
/usr/lib64/tls: (hwcap: 0x8000000000000000)

$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ python -V
Python 2.7.12

ImportError: No module named pysqlite2

解决

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ vim /root/superset/lib/python2.7/site-packages/sqlalchemy/dialects/sqlite/pysqlite.py

# 修改 sqlite3
@classmethod
def dbapi(cls):
try:
# 改为 from sqlite3 import dbapi2 as sqlite
from pysqlite2 import dbapi2 as sqlite
except ImportError as e:
try:
from sqlite3 import dbapi2 as sqlite # try 2.5+ stdlib name.
except ImportError:
raise e
return sqlite

# Redhat 5.3 环境下,要源代码安装 sqlite3,然后安装 python 才能有 _sqlite3.so 这个文件
$ wget https://sqlite.org/snapshot/sqlite-snapshot-201612131847.tar.gz
$ sqlite3 --version
3.6.20

pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

解决

方法一
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# 清除所有的 alias 和 superset 源码中 python 路径的修改
$ which pip
$ alias pip='python -m pip'
$ /root/superset/bin/python

$ vim ~/.bashrc
# alias pip='python -m pip'
# alias virtualenv='python -m virtualenv'

# Source global definitions
# export WORKON_HOME=~/virtualenv
# source /usr/local/bin/virtualenvwrapper.sh

$ source ~/.bashrc
$ deactivate
$ yum install python-pip

$ unalias pip
$ which pip
$ /usr/bin/pip

$ superset runserver -a 0.0.0.0 -p 9999

$ cd /usr/local/lib/python2.7/site-packages

$ python
Python 2.7.12 (default, Dec 19 2016, 11:08:33)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssl
>>> ssl
<module 'ssl' from '/usr/local/lib/python2.7/ssl.pyc'>
>>> quit()

$ vim mypkpath.pth
/usr/local/lib/python2.7

$ vim ~/.bashrc
alias python=/usr/local/bin/python
alias pip=/usr/bin/pip

$ source ~/.bashrc --not work(superset 的 py程序开头都有 #!/root/superset/bin/python)
$ vim /root/superset/bin/superset
#!/usr/local/bin/python
方法二
1
2
3
4
5
6
# 利用 prefix 将 python 的第三方库安装到 /usr/lib 中
$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ python -V
Python 2.7.12

Error while processing cluster ‘druid cluster’ (sqlite3. Operational Error) database is locked

描述

 [Web UI] Sources - Druid Clusters 配置 - Refresh Druid Metadata

原因

 Web 中无法维持长连接,会超时

解决

 superset refresh_druid

Tips: 目前最新的 v0.22.1 版本中,已经解决了这个问题,可以在页面上直接点击 “Sources - Refresh Druid Metadata” 按钮,完成操作(2017-12-12)

An unknown error occurred. (Status: 0) Maybe the request timed out?

描述

 部分图标 无法正常显示

解决

1
2
3
4
5
# 打开 debug 模式,查看详细日志,定位问题
$ vim ./lib/python2.7/site-packages/superset/config.py

# DEBUG = False
DEBUG = True

ImportError: No module named pymysql

解决

 pip install pymysql

uHost druid01 is not allowed to connect to this MySQL server

描述

 nohup superset runserver -a 0.0.0.0 -p 8888 2>&1 &

1
2017-01-22 16:36:53,013:ERROR:flask_appbuilder.security.sqla.manager:DB Creation and initialization failed: (pymysql.err.InternalError) (1130, u"Host 'druid01' is not allowed to connect to this MySQL server")

解决

1
GRANT ALL PRIVILEGES ON *.* TO 'root'@'druid01' IDENTIFIED BY 'root' WITH GRANT OPTION;

Permission for Druid

解决

 增加新的数据源之后,需要 superset init,来更新 permission 相关的数据表

Update Druid Cluster’s Name

解决

1
2
3
4
5
alter table datasources drop FOREIGN KEY `datasources_ibfk_2`;
update clusters set cluster_name='Druid Cluster' where cluster_name='druid cluster';
update datasources set cluster_name ='Druid Cluster' where cluster_name ='druid cluster';
alter table datasources add constraint `datasources_ibfk_2` FOREIGN KEY (`cluster_name`) REFERENCES `clusters` (`cluster_name`);
# show create table datasources; # troubleshooting

An unexpected error occurred: “https://registry.yarnpkg.com/convert-source-map: ETIMEDOUT”

描述

1
2
3
4
5
6
7
$ yarn
yarn install v1.3.2
info No lockfile found.
[1/4] Resolving packages...
error An unexpected error occurred: "https://registry.yarnpkg.com/@vx%2fbounds: ETIMEDOUT".
info If you think this is a bug, please open a bug report with the information provided in "/home/superset/software/incubator-superset-0.22.1/superset/assets/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.

解决

1
2
3
4
5
6
# 由于不知名的外星力量,需要先替换掉原始的 IP 地址
$ vim /etc/hosts
104.16.59.173 registry.yarnpkg.com

# 控制网络并发量,减少 TIMEOUT 发生的可能
$ yarn --network-concurrency 1

社区跟进

 详见:《如何成为 Apache 的 PMC

资料

Doc

Book

Source

欢迎加入我们的技术群,一起交流学习

群名称 群号
人工智能(高级)
人工智能(进阶)
BigData
算法