Apache Druid:一款高效的 OLAP 引擎

基本概念

概述

 Apache Druid™ 是目前非常流行的高性能的,分布式列存储的 OLAP 框架(准确来说是 MOLAP)。它是一款可以快速(实时)访问大量的、很少变化的数据的系统。并被设计为,在面对代码部署、机器故障和生产系统的其他可能性问题时,依旧能 100% 地正常提供服务

Druid Pumpkin

(图片来源:Vadim Ogievetsky 在万圣节的个人作品,已获得授权)

特性

分析事件流

 Druid 支持对 event-driven 数据进行快速地高并发查询。还可以实时地摄入流式数据,并提供亚秒级查询能力,以支持强大的 UI 交互

创新的架构设计

 Druid 是一种新型数据库,它结合了 OLAP 分析数据库、时间序列数据库 和 全文检索 的思想,以支持流式体系架构下的大部分应用场景

构建事件驱动的数据栈

 Druid 天然集成了消息队列(Kafka、AWS Kinesis 等)和数据湖(HDFS、AWS S3 等),使得其非常适用于流式总线和流处理器的查询层

解锁新的工作流

 Druid 旨在对实时数据和历史数据进行快速地即时分析。使用可快速更替的查询,进行趋势解释,数据探索,以响应各种分析诉求

多环境部署

 Druid 可以部署在任何的 *NIX 商用硬件上,无论是在云端还是内部部署。Druid 是 cloud-native 的,这意味着集群扩容和缩容,就像添加和删除进程一样简单

多数据源摄入

 Druid 支持将多种外部数据系统作为数据源,进行数据摄入,包括 HadoopSparkStormKafka

多版本控制

 多版本控制(MVCC,Multi-Version Concurrent Control),主要是为了解决多用户操作同一行记录时的并发问题。MVCC 设计思路是,在多租户同时访问数据库时,不使用粗暴的行锁,而是在事务型操作更新数据时,生成一个新版本的数据。如此,可以保证读写分离,避免了读写操作互相阻塞,以提高并发性能。另外约束,某一时刻只有最新版本的记录是有效的,同时也保证了数据的一致性

 而 Druid 中是使用数据更新时间来区分版本,历史节点只加载最新版本的数据。同时,实时数据索引离线数据批量覆盖同时进行的 Lambda 架构设计,既满足了实时响应的需求,又确保了数据的准确性

易于运维

  Druid 集群可以做到 Self-healing 和 Self-balancing。如果 Druid 服务器发生故障,系统将会自动绕过损坏的路由,直到这些机器被替换掉。在扩缩容集群的时候,只需要增加或下线服务器,集群本身会在后台自动 re-balance。Druid 在设计上保证了可以全天候工作,不会因为任何原因而停机,包括配置更改和集群升级

基础组件

Historical 进程

 Historical 进程是处理存储和查询“历史”数据(包括系统中所有的、已经存在足够长时间、可以被提交的流式数据)的主要工具(可以理解为,是整个 Druid 集群的支柱)。Historical 进程从 Deep Storage 中下载 Segment,并响应有关这些 Segment 的查询请求(这些请求来自 Broker 进程)。另外,Historical 进程不处理写入请求

 Historical 进程采用了 无共享架构设计,它知道如何去加载和删除 Segment,以及如何基于 Segment 来响应查询。因此,即便底层的 Deep Storage 无法正常工作,Historical 进程还是能针对其已同步的 Segments,正常提供查询服务

MiddleManager 进程

 MiddleManager 进程负责将新数据摄入到集群中,并发布新的 Segment

 MiddleManager 进程是执行提交的任务的工作节点。Middle Managers 将任务转发给在不同 JVM 中运行的 Peon 进程(如此,可以做到资源和日志的隔离)。MiddleManager、Peon、Task 的对应关系是,每个 Peon 一次只能运行一个任务,但是,一个 MiddleManager 可以管理多个 Peon 进程

Broker 进程

 Broker 进程从客户端接收查询请求,并将这些查询转发给 Historical 和 MiddleManager 进程(通过存在 Zookeeper 上的元数据,可以准确地知道 Segment 具体都存在哪个节点上)。Broker 在接收到这些查询的结果之后,将会合并查询结果并将它们返回给调用者。终端用户通常会查询 Broker,而不是直接查询 Historical 或 MiddleManager 进程

Coordinator 进程

 Coordinator 进程监视 Historical 节点。Coordinator 负责将 Segment 分配给指定的 Historical 节点,并确保 Segment 在 Historical 节点之间保持平衡。另外,Coordinator 还需要加载新的 Segment,以及基于配置的 Rule 来丢弃过时的 Segment

 Coordinator 是周期性运行的(由 druid.coordinator.period 配置指定,默认执行间隔为 60s)。因为需要评估集群的当前状态,才能决定应用哪种策略,所以,Coordinator 需要维护和 Zookeeper 的连接,以获取集群的信息。而关于 Segment 和 Rule 的信息保存在了元数据库中,所以也需要维护与元数据库的连接

Overlord 进程

 Overlord 进程监视 MiddleManager 进程,是数据摄入的控制器。他们负责将数据摄入的任务分配给 MiddleManagers 并协调 Segment 的发布。Overlord 进程负责接受、拆解、分配 Task,以及创建 Task 相关的锁,并返回 Task 的状态。大致流程如下图所示:

Overlord and MiddleManagers with Zookeeper in Apache Druid

(图片来源:Apache Druid™ 官网)

Router 进程

 Router 进程可以在 Brokers、Overlords 和 Coordinators 的前面提供一层统一的 API 网关。Router 进程本身是可选的,不过如果集群的数据规模已经达到了 TB 级别,还是需要考虑使用的。Rounter 节点可用于将查询路由到不同的 Broker 节点。默认情况下,代理根据 Rule 的设置方式,来路由查询请求。例如,如果将最近 1 个月的数据加载到热集群中,则最近一个月内的查询可以路由到一组专用 Broker,超出此范围的查询将路由到另一组 Broker,如此便实现了查询的冷热隔离

核心插件

Kafka Indexing Service

 Kafka Indexing Service 可以在 Overlord 上配置 Supervisor(这里的监管者具体是指 KafkaSupervisor,负责监控单个 DataSource 下的 KafkaIndexTask。在其构造的时候,可以接受 KafkaSupervisorSpec 以知晓 Kafka 的 Topic 相关的配置信息,以及摄入的规则,用于生成 KafkaIndexTask 索引任务),负责管理 Kafka 索引任务的创建和生命周期。这些索引任务使用 Kafka 自身的分区和偏移机制来读取事件,因此能够提供 exactly-once 摄取的保证(旧版本下,Tranquility 采用的是 push 的方式,则完全无法实现不丢不重的特性)。他们还能够从 Kafka 读取非近期事件,并且不受其他摄取机制强加的窗口期限的影响。Supervisor 可以监控索引任务的状态,以便管理故障,并保证了可伸缩性和易复制的特性

外部依赖

Metadata Storage

 存储元数据信息,包括 DataSource、Segment 和 Task,以及一些配置信息等。默认使用 Derby,通常会选择 MySQL 作为存储媒介

表名作用
druid_dataSource存储 dataSources,以便 Kafka Index Service 查找
druid_pendingSegments存储 pending 的 Segments
druid_segments存储每个 Segment 的 metadata 信息
(表字段:id, dataSource, created_date, start, end, partitioned, version, used, payload)
druid_rules关于 Segment 的 load / drop 规则
druid_config存放运行时配置信息
druid_tasks为 Indexing Service 保存 Task 信息
druid_taskLog为 Indexing Service 保存 Task 日志
druid_taskLock为 Indexing Service 保存 Task 锁
上面三张表,都是 Overlord 用来存放索引 Task 的数据,防止 Overlord 挂掉,而导致 Task 丢失
druid_supervisors为 Indexing Service 保存 Supervisor 信息
druid_audit记录配置、Coordinator 规则的变化

Deep Storage

 Deep Storage 作为每个 Druid Server 都可以访问的共享文件存储。通常是像 S3 或 HDFS 这样的分布式对象存储,或者是网络文件系统(NFS)。Druid 使用它来存储已被摄入系统的任何数据

 Druid 仅将 Deep Storage 用作数据的备份,并将其作为在 Druid 进程之间在后台传输数据的一种方式。当接受到查询请求,Historical 进程不会从 Deep Storage 读取数据,而是在响应任何查询之前,读取从本地磁盘 pre-fetched 的 Segments。这意味着 Druid 在查询期间永远不需要访问 Deep Storage,从而极大地降低了查询延迟。这也意味着,必须保证 Deep Storage 和 Historical 进程所有节点,能拥有足够的磁盘空间

Zookeeper

 用于管理集群的当前状态,并涵盖了以下的几个主要特性:

  • Historical 和 Realtime 节点发布 Segment 的协议
  • Coordinator 和 Historical 之间 load / drop Segment 的协议
  • Coordinator 节点的 Leader 选举
  • Overlord 节点的 Leader 选举
  • Indexing Service 的 Task 管理

数据结构

Apache Druid Data Structure

(使用 Keynote 绘制而成)

时间序列

 时间序列(Timestamp),本身 Druid 是时间序列数据库,Druid 中所有查询以及索引过程都和时间维度有关。Druid 底层使用绝对毫秒数保存时间戳,默认使用 ISO-8601 格式展示时间:yyyy-MM-ddThh:mm:sss.SSSZ

维度列

 维度列(Dimensions),Druid 的维度概念和广义的 OLAP 定义一致,一条记录中的字符、数值、多值等类型的数据均可看作是维度列。维度列可用于过滤筛选(filter)、分组(group)数据

度量列

 度量列(Metrics),Druid 的度量概念也与广义的 OLAP 定义一致,一条记录中的数值(Numeric)类型数据可看作是度量列,度量列被用于聚合(aggregation)和计算(computation)操作

配置

常用端口

端口描述端口号
Derby on your Coordinator; not needed if you are using a separate metadata store like MySQL or PostgreSQL1527
ZooKeeper; not needed if you are using a separate ZooKeeper cluster2181
Coordinator8081
Broker8082
Historical8083
Standalone Realtime, if used8084
Router, if used8088
Overlord8090
Middle Manager; you may need higher than port 8199 if you have a very high druid.worker.capacity8091, 8100 ~ 8199
Tranquility Server, if used8200
生产中,建议将 Zookeeper 和 Metadata Stroage 部署在独立的物理机上,而不是混合部署在 Coordinator 节点上

selectStrategy

 该参数默认为 fillCapacity,意味着分配 Task 的时候,会将某个 MiddleManager 分配满,才会分配新的 Task 到其他 MiddleManager 上。这里可以考虑使用 equalDistribution 策略,将 Task 均匀分配到 MiddleManager 上

1
2
3
$ cd $DRUID_HOME
$ vim conf/druid/overlord/runtime.properties
druid.indexer.selectStrategy=equalDistribution
不过,在 0.11.0 版本之后,默认策略已经改成了 equalDistribution,便没有该问题了。详见 WorkerBehaviorConfig #DEFAULT_STRATEGY

Coordinator Rule 配置

保留最近 30 天数据

Apache Druid Coordinator UI

(对 Coordinator 可视化页面的截图)

查询

PlyQL

基本用法

 通过 --host-q 分别指定 Broker 地址查询语句

1
2
3
4
5
6
7
8
9
10
$ cd /home/druid/software/imply-1.3.0
$ bin/plyql --host <broker host>:8082 -q "show tables" # --host <broker>:<port>
┌─────────────────────────┐
│ Tables_in_database │
├─────────────────────────┤
│ COLUMNS │
│ SCHEMATA │
│ TABLES │
│ yuzhouwan_metrics │
└─────────────────────────┘

表结构查询

1
2
3
4
5
6
7
8
9
$ bin/plyql --host <broker host>:8082 -q "describe yuzhouwan_metrics"
┌────────────┬────────┬──────┬─────┬─────────┬───────┐
│ Field │ Type │ Null │ Key │ Default │ Extra │
├────────────┼────────┼──────┼─────┼─────────┼───────┤
│ __time │ TIME │ YES │ │ │ │
│ metric01 │ NUMBER │ YES │ │ │ │
│ metric02 │ NUMBER │ YES │ │ │ │
│ // ... │ │ │ │ │ │
└────────────┴────────┴──────┴─────┴─────────┴───────┘

聚合查询

简单聚合

 简单的 max / min / count 查询语句

1
2
3
4
5
6
$ bin/plyql --host <broker host>:8082 -q "select max(gcCount_max) from yuzhouwan_metrics where serverName='druid01'"
┌──────────────────┐
│ max(gcCount_max) │
├──────────────────┤
│ 39710 │
└──────────────────┘
时间维度聚合

 利用 TIME_PART 进行时间维度的聚合

1
2
3
4
$ bin/plyql --host <broker host>:8082 -q "select TIME_PART(__time, MINUTE_OF_DAY, 'Asia/Shanghai'), max(gcCount_max) from yuzhouwan_metrics where serverName='druid01' and __time>='2017-04-04' and __time<'2017-04-05' group by 1" -Z Asia/Shanghai

# 不参与 group by 的 指标需要进行 sum / min / max 之类的聚合操作
$ bin/plyql --host <broker host>:8082 -q "select TIME_PART(__time, MINUTE_OF_DAY, 'Asia/Shanghai'), metric, sum(sum) as sum_value from yuzhouwan_metrics where level='level1' and metric='metric1' and __time>='2017-04-04' and __time<'2017-04-05' group by 1, 2 order by sum_value desc limit 10" -Z Asia/Shanghai -v

展示查询对应的 JSON 语句

 增加 -v 参数,可以将查询的 JSON 语句展示出来,用于检查 plyql 语句是否符合预期

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
$ bin/plyql --host <broker host>:8082 -q "select distinct level from yuzhouwan_metrics where __time>='2017-01-16 03:00'" -Z Asia/Shanghai -v

plyql version 0.9.6 (plywood version 0.15.4)
Received query:
select distinct level from yuzhouwan_metrics where __time>='2017-01-16 03:00'
---------------------------
Parsed query as the following plywood expression (as JSON):
{
"op": "split",
"operand": {
"op": "filter",
"operand": {
"op": "ref",
"name": "yuzhouwan_metrics"
},
"expression": {
"op": "greaterThanOrEqual",
"operand": {
"op": "ref",
"name": "__time",
"ignoreCase": true
}, // ...
{
"version": "v1",
"timestamp": "2017-01-16T03:00:00.000Z",
"event": {
"level": "level1",
"!DUMMY": 1608
}
}
}
]
^^^^^^^^^^^^^^^^^^^^^^^^^^
┌────────────────┐
│ level │
├────────────────┤
│ level1 │
│ level2 │
└────────────────┘

计算查询耗时情况

 利用 time 命令,可以计算出查询语句的耗时情况

1
2
3
4
5
$ time bin/plyql -h <broker host>:8082 -q "select * from yuzhouwan_metrics where __time>='2017-03-18' and __time<'2017-03-19' and level='level01' limit 100 " -Z Asia/Shanghai

real 0m0.886s
user 0m0.684s
sys 0m0.062s

RESTful API

Curl 查询命令

1
2
3
4
$ vim query.body
# 编写查询语句

$ curl -X POST "http://<broker host>:8080/druid/v2/?pretty" -H 'content-type: application/json' -d @query.body

JSON 查询主体

 除了指定包含查询语句的文件,还可以拼接 JSON 字符串进行查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
$ curl -X POST "http://<broker host>:8082/druid/v2/?pretty" -H 'content-type: application/json' -d '{
"dimensions": [
"dimensions1",
"dimensions2"
],
"aggregations": [
{
"filter": {
"type": "selector",
"dimension": "metric",
"value": "metrics01"
},
"aggregator": {
"type": "doubleSum",
"fieldName": "sum",
"name": "metric01"
},
"type": "filtered"
}
],
"filter": {
"type": "selector",
"dimension": "level",
"value": "day"
},
"intervals": "2017-02-09T15:03:12+08:00/2017-02-09T16:03:12+08:00",
"limitSpec": {
"limit": 10,
"type": "default",
"columns": [
{
"direction": "descending",
"dimension": "metric01"
}
]
},
"granularity": "all",
"postAggregations": [],
"queryType": "groupBy",
"dataSource": "yuzhouwan_metrics"
}'
在 postAggregations 中里面是串行执行的,并可以传递计算结果

Druid Client

 一旦有了 Client 之后,我们就可以做很多事情,比如 流控、权限管理、统一 SQL 层 等(社区正在 #5006 中讨论,欢迎加入)。目前,社区已经有针对 JavaPythonGolangScalaClojure 等语言,实现了对应的 Client 工具

架构

设计总图

Apache Druid 构架图

(图片来源:Apache Druid™ 官网)

Lambda 流式架构

Apache Druid Lambda

(利用 Axure™ 绘制而成)

OLTP vs. OLAP

OLTP(Online Transaction Processing)OLAP(Online Analytical Processing)
数据的目的控制和运行基本业务任务帮助规划、解决问题和决策支持
数据是什么业务流程的快照商业活动的多维视图
插入和更新由终端用户发起由长时间运行的批处理作业定时触发
查询简单查询,且返回的记录条数也相对较少通常涉及聚合的复杂查询
处理速度通常非常快取决于所涉及的数据量,以及是否建立索引
空间要求可以对历史数据进行归档,以节省空间由于聚合结构和历史数据的存在,相对更大
数据库设计相对标准化通常使用较少的表进行非规范化

整体知识树

Apache Druid

(利用 MindNode™ 绘制而成)

源码阅读

数据结构

R-tree

介绍

 一个空间数据库由一系列对应空间对象的 tuple 组成,而每一个 tuple 具有一个唯一标示(tuple identifier,简称 tupleID),数据库可以通过这个唯一标示获取到该 tuple。R-Tree 所做的就是将这些 tupleID 索引起来

参考

HyperLogLog

介绍

 基数计数会得到一个近似精确的计算结果,比如在执行 Count / Distinct Count 等计数查询的时候,会返回一个浮点数作为预估值

参考

依赖框架

Apache Calcite 查询引擎

Google Guice 注入

Google Guava 扩展库

JMH 压测

Query 查询流程

Druid Query Process

(利用 StarUML™ 绘制而成)

可视化

Pivot

配置启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ whereis node
node: /usr/local/bin/node

$ /usr/local/bin/node -v
v4.2.2

# 将环境变量引入,而不用 sudo su druid
$ su - druid

# `--with-comments` 指标可以去掉不用,避免 comment 生成出现问题(部分注释,行开头漏 # 号的情况)
$ /usr/local/bin/node /home/druid/software/druid/dist/pivot/bin/pivot --druid <druid.broker.host>:8082 --print-config > /home/druid/software/druid/dist/pivot/bin/yuzhouwan_metrics.yaml

# 需要使用相对路径
$ cd /home/druid/software/imply-2.0.0
$ nohup dist/pivot/bin/pivot -c /home/druid/software/druid/dist/pivot/bin/config_yuzhouwan_metrics.yaml >> /home/druid/software/druid/dist/pivot/bin/nohup.log 2>&1 &

$ vim /home/druid/software/druid/dist/pivot/bin/config_yuzhouwan_metrics.yaml

# 在 pivot 的配置文件中,可以利用简单的表达式,进行计算,如:除以采集的时间窗口,算得 `OPS`
- name: metrics02_OPS
title: metrics02 ops
expression: $main.sum($metrics02_Sum) / $main.sum($period_Sum)

效果图

HBase Metrics in Pivot

(对 Pivot™ 可视化页面的截图)

踩过的坑

指标项过多,维护配置困难
解决

 可以通过 列转行 的方式,在 dimensions 里面增加一个 metric 维度,来管理指标项。如此,可以有效地避免在 metricsSpec 里面维护大量的指标。同时,也方便了动态新增指标项

 不过,列换行 也会带来数据膨胀的问题。如果在资源受限的情况下,很可能还是得在 metricsSepc 里面维护指标。这样的话,可以使用我写的 DruidUtils 来快速生成配置文件,避免手动去维护配置

Graphite

基础环境

OS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ uname -a
Linux olap03-sit.yuzhouwan.com 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/version
Linux version 2.6.32-431.el6.x86_64 (mockbuild@c6b8.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013

# For Fedora and RHEL-derivatives
# [Doc]: Other System http://airbnb.io/superset/installation.html#os-dependencies
$ sudo yum upgrade python-setuptools -y
$ sudo yum install openssl openssl-devel install zlib zlib-devel readline readline-devel sqlite-devel libffi-devel -y

# Machines
druid.yuzhouwan.com 10.10.10.1 Druid
graphite.yuzhouwan.com 192.168.1.101 Graphite
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
$ python --version
Python 2.7.8

[Note]: Superset is tested using Python 2.7 and Python 3.4+. Python 3 is the recommended version, Python 2.6 won't be supported.'

## 升级 Python(stable: Python 2.7.12 | 3.4.5, lastest: Python 3.5.2 [2016/12/15])
https://www.python.org/downloads/

# 在 python ftp 服务器中下载到 对应版本的 python
$ wget http://python.org/ftp/python/2.7.12/Python-2.7.12.tgz

# 编译
$ tar -zxvf Python-2.7.12.tgz
$ cd /root/software/Python-2.7.12
$ ./configure --prefix=/usr/local/python27
$ make && make install

$ ls /usr/local/python27/ -al

drwxr-xr-x. 6 root root 4096 12月 15 14:22 .
drwxr-xr-x. 13 root root 4096 12月 15 14:20 ..
drwxr-xr-x. 2 root root 4096 12月 15 14:22 bin
drwxr-xr-x. 3 root root 4096 12月 15 14:21 include
drwxr-xr-x. 4 root root 4096 12月 15 14:22 lib
drwxr-xr-x. 3 root root 4096 12月 15 14:22 share


# 覆盖原来的 python6
$ which python
/usr/local/bin/python

$ mv /usr/local/bin/python /usr/local/bin/python_old
$ ln -s /usr/local/python27/bin/python /usr/local/bin/

$ python -V
Python 2.7.12

# 修改 yum 引用的 python 版本为旧版 2.6 的 python
$ vim /usr/bin/yum
# 第一行修改为 python2.6
#!/usr/bin/python2.6

$ yum --version | sed '2,$d'
3.2.29
Pip
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ pip --version
pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)

# upgrade setup tools and pip
$ pip install --upgrade setuptools pip

## Offline 环境下安装 pip
# https://pypi.python.org/pypi/setuptools#code-of-conduct 下载 setuptools-32.0.0.tar.gz
$ tar zxvf setuptools-32.0.0.tar.gz
$ cd setuptools-32.0.0

$ cd setuptools-32.0.0
$ python setup.py install

# https://pypi.python.org/pypi/pip 下载 pip-9.0.1.tar.gz
$ wget --no-check-certificate https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
$ tar zxvf pip-9.0.1.tar.gz
$ cd pip-9.0.1
$ python setup.py install

Installed /usr/local/python27/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
Processing dependencies for pip==9.0.1
Finished processing dependencies for pip==9.0.1

$ pip --version
pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)
VirtualEnv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ pip install virtualenv

# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ . ./venv/bin/activate

## Offline 环境下安装 virtualenv
# https://pypi.python.org/pypi/virtualenv#downloads 下载 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install

$ virtualenv --version
15.1.0

Graphite 相关

安装
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# root@graphite-sit.yuzhouwan.com (192.168.1.102)
$ cd /opt
$ virtualenv -p /usr/local/bin/python --system-site-packages graphite
$ cd graphite
$ source bin/activate

$ pip install https://github.com/graphite-project/ceres/tarball/master (ceres-0.10.0rc1)
$ pip install whisper (whisper-0.9.15)

# trouble shooting
$ which python
/root/graphite/bin/python (in virtualenv, otherwise "/usr/local/bin/python")

$ ll /root/graphite/bin/whisper*py
-rwxr-xr-x 1 root root 2847 Jan 3 17:06 /root/graphite/bin/whisper-create.py
-rwxr-xr-x 1 root root 2208 Jan 3 17:06 /root/graphite/bin/whisper-diff.py
-rwxr-xr-x 1 root root 2912 Jan 3 17:06 /root/graphite/bin/whisper-dump.py
-rwxr-xr-x 1 root root 1790 Jan 3 17:06 /root/graphite/bin/whisper-fetch.py
-rwxr-xr-x 1 root root 4309 Jan 3 17:06 /root/graphite/bin/whisper-fill.py
-rwxr-xr-x 1 root root 1081 Jan 3 17:06 /root/graphite/bin/whisper-info.py
-rwxr-xr-x 1 root root 685 Jan 3 17:06 /root/graphite/bin/whisper-merge.py
-rwxr-xr-x 1 root root 5994 Jan 3 17:06 /root/graphite/bin/whisper-resize.py
-rwxr-xr-x 1 root root 929 Jan 3 17:06 /root/graphite/bin/whisper-set-aggregation-method.py
-rwxr-xr-x 1 root root 980 Jan 3 17:06 /root/graphite/bin/whisper-update.py


$ pip install carbon (carbon-0.9.15 constantly-15.1.0 incremental-16.10.1 twisted-16.6.0 txamqp-0.6.2 zope.interface-4.3.3)

# trouble shooting
$ ll /root/graphite/bin/carbon*py
-rwxr-xr-x 1 root root 1095 Jan 3 17:12 /root/graphite/bin/carbon-aggregator.py
-rwxr-xr-x 1 root root 1095 Jan 3 17:12 /root/graphite/bin/carbon-cache.py
-rwxr-xr-x 1 root root 4498 Jan 3 17:12 /root/graphite/bin/carbon-client.py
-rwxr-xr-x 1 root root 1095 Jan 3 17:12 /root/graphite/bin/carbon-relay.py

$ pip install graphite-web
$ pip install cairocffi

# pip freeze | grep graphite-web
# graphite-web==0.9.15
graphite 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
$ cd /root/graphite/conf      (otherwise /opt/graphite/conf)

$ ls -sail
total 72
-rw-r--r-- 1 root root 1798 Jan 3 17:54 aggregation-rules.conf.example
-rw-r--r-- 1 root root 274 Jan 3 17:54 blacklist.conf.example
-rw-r--r-- 1 root root 2594 Jan 3 17:54 carbon.amqp.conf.example
-rw-r--r-- 1 root root 17809 Jan 3 17:54 carbon.conf.example
-rw-r--r-- 1 root root 888 Jan 3 17:54 relay-rules.conf.example
-rw-r--r-- 1 root root 558 Jan 3 17:54 rewrite-rules.conf.example
-rw-r--r-- 1 root root 827 Jan 3 17:54 storage-aggregation.conf.example
-rw-r--r-- 1 root root 489 Jan 3 17:54 storage-schemas.conf.example
-rw-r--r-- 1 root root 315 Jan 3 17:54 whitelist.conf.example

$ cp aggregation-rules.conf.example aggregation-rules.conf
$ cp blacklist.conf.example blacklist.conf
$ cp carbon.amqp.conf.example carbon.amqp.conf
$ cp carbon.conf.example carbon.conf
# following 3 conf files need to install graphite-web firstly
$ cp dashboard.conf.example dashboard.conf
$ cp graphite.wsgi.example graphite.wsgi
$ cp graphTemplates.conf.example graphTemplates.conf
#
$ cp relay-rules.conf.example relay-rules.conf
$ cp rewrite-rules.conf.example rewrite-rules.conf
$ cp storage-aggregation.conf.example storage-aggregation.conf
$ cp storage-schemas.conf.example storage-schemas.conf
$ cp whitelist.conf.example whitelist.conf

$ /root/graphite/bin/carbon-cache.py start

Starting carbon-cache (instance a)

# trouble shooting
$ ps -ef | grep carbon
root 12074 1 0 18:58 ? 00:00:00 /root/graphite/bin/python /root/graphite/bin/carbon-cache.py start

$ vim /root/graphite/conf/carbon.conf
# carbon.conf 文件中,在 cache 区段下,接收端口这一行包含一个默认值,用于通过平文本协议(plaintext protocol)接受输入指标项
[cache]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2003

$ yum install nc -y
# echo "<metric path> <metric value> <metric timestamp>" | nc -q0 ${SERVER} ${PORT}
$ echo "carbon.agents.graphite-tutorial.metricsReceived 28198 `date +%s`" | nc -c localhost 2003

# Carbon 与 Whisper 交互,将这些时间序列数据存储到文件系统中,可以用 whisper-info 脚本获取为这些指标项创建的 Whisper 文件的元数据信息
$ /root/graphite/bin/whisper-info.py /root/graphite/storage/whisper/carbon/agents/graphite-tutorial/metricsReceived.wsp
graphite web 应用
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# dependency
# pip install Django==1.9.12 会导致 'WSGIRequest' object has no attribute 'REQUEST' 异常
$ pip install django==1.8.17
$ pip install django-tagging

# configure
$ cd /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite
$ cp local_settings.py.example local_settings.py
# 创建 sqlite3 数据库 & 赋读写权限 & 修改 local_settings.py
# 详见,Question3


$ cd /root/graphite/conf
$ cp dashboard.conf.example dashboard.conf
$ cp graphTemplates.conf.example graphTemplates.conf

# init database
$ cd /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/
$ python /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/manage.py syncdb

Would you like to create one now? (yes/no): yes
Username (leave blank to use 'root'): graphite
Email address: bj@yuzhouwan.com
Password:
Password (again):
Superuser created successfully.

# start
$ mkdir -p /root/graphite/storage/log/webapp/
$ echo '' > /root/graphite/storage/log/webapp/process.log
$ cd /root/graphite
$ PYTHONPATH=/root/graphite/storage/whisper /root/graphite/bin/run-graphite-devel-server.py --port=8085 --libs=/root/graphite/lib/python2.7/site-packages/opt/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &
# 或者用,python /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/manage.py runserver 0.0.0.0:8085

# trouble shooting
$ tail -f /root/graphite/storage/log/webapp/process.log
http://192.168.1.102:8085/
graphite events
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# PYTHONPATH=$GRAPHITE_ROOT/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb
$ PYTHONPATH=/root/graphite/lib/python2.7/site-packages/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb

Operations to perform:
Synchronize unmigrated apps: account, cli, render, whitelist, metrics, url_shortener, dashboard, composer, events, browser
Apply all migrations: admin, contenttypes, tagging, auth, sessions
Synchronizing apps without migrations:
Creating tables...
Running deferred SQL...
Running migrations:
Rendering model states... DONE
Applying admin.0002_logentry_remove_auto_add... OK
Applying auth.0007_alter_validators_add_error_messages... OK


$ curl -X POST "http://10.10.10.2:8085/events/" -d '{ "what": "Event - deploy", "tags": ["deploy"], "when": 1467844481, "data": "deploy of master branch happened at Wed Jul 6 22:34:41 UTC 2016" }'

# trouble shooting
http://10.10.10.2:8085/events/ graphite events when what tags
22:34:41 Wed 06 Jul 2016 Event - deploy [u'deploy']


$ curl -s "http://10.10.10.2:8085/render/?target=events('exception')&format=json" | json_pp
[
{
"target" : "events(exception)",
"datapoints" : [
[
1, 1388966651
],
[
3, 1388966652
]
]
}
]
graphite-index
1
2
3
4
5
6
7
8
9
10
11
# douban new UI for graphite
$ git clone https://github.com/douban/graph-index.git
$ cd graph-index

$ vim config.py
graphite_url = 'http://192.168.1.101:9097'

$ crontab -e
*/5 * * * * python /root/software/graphite-index

$ python /graph-index.py

整合 Druid

迁移到内网环境
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# 192.168.1.102 to 10.10.10.2 (sit)

# ps -ef | grep graphite # 关闭所有进程
# rsync 替换 scp 可以确保软链接也能被 cp(补充:用 tar zcvf 打包也是不能解决的)
$ rsync -avuz -e ssh /root/graphite root@10.10.10.2:/root

# 192.168.1.102 to 192.168.2.101 to 192.168.1.101 (product)
$ rsync -avuz -e ssh /root/graphite jinjy@192.168.2.101:/home/jinjy
$ rsync -avuz -e ssh /home/jinjy/graphite root@192.168.1.101:/root

# default: --port=8085
$ /root/graphite/bin/carbon-cache.py start
$ PYTHONPATH=/root/graphite/storage/whisper /root/graphite/bin/run-graphite-devel-server.py --port=9097 --libs=/root/graphite/lib/python2.7/site-packages/opt/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &

# trouble shooting
$ ps -ef | grep graphite
root 30754 1 0 15:42 ? 00:00:00 /root/graphite/bin/python /root/graphite/bin/carbon-cache.py start
root 30825 28048 3 15:43 pts/1 00:00:00 /root/graphite/bin/python /root/graphite/bin/django-admin runserver --pythonpath /root/graphite/webapp --settings graphite.settings 0.0.0.0:9097
root 30829 30825 5 15:43 pts/1 00:00:00 /root/graphite/bin/python /root/graphite/bin/django-admin runserver --pythonpath /root/graphite/webapp --settings graphite.settings 0.0.0.0:9097

$ cd /root/graphite/storage/log/carbon-cache/carbon-cache-a
tail -f console.log creates.log listener.log # carbon 接收 event 事件相关的日志记录
tail -f /root/graphite/storage/log/webapp/process.log

http://192.168.1.101:9097/


# virtualenv
$ rsync -avuz -e ssh /root/software jinjy@192.168.1.101:/home/jinjy
$ rsync -avuz -e ssh /home/jinjy/software/Python-2.7.12.tgz root@192.168.1.101:/root/software

$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make -j4 && make -j4 install
$ /sbin/ldconfig -v | grep /
$ python -V
Python 2.7.12


# 虽然软链接已经 rsync 过来了,但是目标机器相关目录下,没有对应的 python 的动态链接库
$ file /root/graphite/lib/python2.7/lib-dynload
/root/graphite/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload' '`


# 需要和联网环境中,创建 virtualenv 时的 python 全局环境一致
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make -j4 && make -j4 install
$ /sbin/ldconfig -v | grep /

$ ls /usr/local/python27/lib/python2.7/lib-dynload -sail
修改 Druid 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
$ sudo su druid
$ cd /home/druid/software/druid

$ find | grep common.runtime.properties | grep -v quickstart | grep -v dist
./conf/druid/_common/common.runtime.properties

$ cp /home/druid/software/druid/conf/druid/_common/common.runtime.properties /home/druid/software/druid/conf/druid/_common/common.runtime.properties.bak
$ vim /home/druid/software/druid/conf/druid/_common/common.runtime.properties

# module
druid.extensions.loadList=[..., "graphite-emitter"]
#
# Monitoring
#
druid.monitoring.monitors=["com.metamx.metrics.JvmMonitor"]
druid.emitter=http
#druid.emitter=logging
druid.emitter.logging.logLevel=info
druid.emitter.http.recipientBaseUrl=http://10.37.2.142:9999/metrics

# monitor
druid.monitoring.monitors=["com.metamx.metrics.JvmMonitor"]
druid.emitter=composing
druid.emitter.composing.emitters=["graphite", "logging"]
druid.emitter.graphite.hostname=localhost
# 端口需要注意,不是 2003(即,非 /root/graphite/conf/carbon.conf 中的 LINE_RECEIVER_PORT or LINE_RECEIVER_PORT,而是 PICKLE_RECEIVER_PORT)
druid.emitter.graphite.port=2004
# druid.emitter.graphite.eventConverter={"type":"whiteList", "namespacePrefix": "cluster_x", "ignoreHostname":true, "ignoreServiceName":false, "mapFile":"/a/b/c"}
druid.emitter.graphite.eventConverter={"ingest/events/thrownAway":["dataSource"],"ingest/events/unparseable":["dataSource"],"ingest/events/processed":["dataSource"],"ingest/handoff/failed":["dataSource"],"ingest/persists":[],"ingest/rows/output":[],"jvm/gc":[],"jvm/mem":[],"query/cpu/time":["dataSource","type"],"query/node/time":["dataSource","type"],"query/node/ttfb":["dataSource","type"],"query/partial/time":["dataSource","type"],"query/segment/time":["dataSource","type"],"query/segmentAndCache/time":["dataSource","type"],"query/time":["dataSource","type"],"query/wait/time":["dataSource","type"],"segment/count":[],"segment/dropQueue/count":[],"segment/loadQueue/count":[],"segment/loadQueue/failed":[],"segment/loadQueue/size":[],"segment/scan/pending":[],"segment/size":[],"segment/usedPercent":[]}
druid.emitter.logging.logLevel=info


druid.emitter.graphite.eventConverter={"type":"all", "namespacePrefix": "druid", "ignoreHostname": false, "ignoreServiceName": false}
## pertty format start ##
{
"ingest/events/thrownAway": ["dataSource"],
"ingest/events/unparseable": ["dataSource"],
"ingest/events/processed": ["dataSource"],
"ingest/handoff/failed": ["dataSource"],
"ingest/persists": [],
"ingest/rows/output": [],
"jvm/gc": [],
"jvm/mem": [],
"query/cpu/time": [
"dataSource",
"type"
],
"query/node/time": [
"dataSource",
"type"
],
"query/node/ttfb": [
"dataSource",
"type"
],
"query/partial/time": [
"dataSource",
"type"
],
"query/segment/time": [
"dataSource",
"type"
],
"query/segmentAndCache/time": [
"dataSource",
"type"
],
"query/time": [
"dataSource",
"type"
],
"query/wait/time": [
"dataSource",
"type"
],
"segment/count": [],
"segment/dropQueue/count": [],
"segment/loadQueue/count": [],
"segment/loadQueue/failed": [],
"segment/loadQueue/size": [],
"segment/scan/pending": [],
"segment/size": [],
"segment/usedPercent": []
}
## pertty format end ##


# kill historical process to make configure activate
$ jps -m
1867 Main server historical
26339 Main server middleManager

$ kill 1867
校验
1
2
3
4
5
6
7
8
9
10
11
12
# trouble shooting
$ tail -f var/sv/supervise.log
# 可以看到
[Thu Jan 5 11:18:17 2017] Running command[historical], logging to[/home/druid/software/imply-2.0.0/var/sv/historical.log]: bin/run-druid historical conf
[Thu Jan 5 11:18:21 2017] Command[historical] exited (pid = 1752, exited = 1)
[Thu Jan 5 11:18:21 2017] Command[historical] failed, see logfile for more details: /home/druid/software/imply-2.0.0/var/sv/historical.log


$ tail -f /home/druid/software/imply-2.0.0/var/sv/historical.log
2017-01-05T11:34:29,203 INFO [GraphiteEmitter-1] io.druid.emitter.graphite.GraphiteEmitter - trying to connect to graphite server
# 如果连接不上,会报错 ERROR [GraphiteEmitter-1] io.druid.emitter.graphite.GraphiteEmitter - 拒绝连接
# 则检查 graphite 进程是否正常

优化配置

依赖
Django
1
2
3
4
5
6
7
8
$ pip freeze | grep Django
Django==1.8

$ pip install --upgrade Django
Successfully installed Django-1.10.5

$ pip uninstall Django
$ pip install Django==1.8.17 # 高版本 会导致 'WSGIRequest' object has no attribute 'REQUEST' 异常
Graphite-Web 相关
1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ vim requirements.txt

python-memcached==1.47
txAMQP==0.4
simplejson==2.1.6
django-tagging==0.4.3
gunicorn
pytz
pyparsing==1.5.7
cairocffi
whitenoise


$ pip install -r requirements.txt
采集
1
2
3
4
5
6
7
8
9
10
11
$ vim /root/graphite/conf/storage-schemas.conf

[carbon]
pattern = ^carbon\.
retentions = 60:90d

[default_1min_for_1day]
pattern = .*
# retentions = 60s:1d
# 改为 3 种时间粒度
retentions = 10s:6h,1m:7d,10m:1y
监控
1
2
3
4
5
6
7
8
$ python /root/graphite/examples/example-client.py

sending message

-----------------------------------------------------
system.loadavg_1min 0.26 1483690449
system.loadavg_5min 0.30 1483690449
system.loadavg_15min 0.35 1483690449
启动命令汇总
1
2
3
4
5
$ python /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/manage.py syncdb
$ PYTHONPATH=/root/graphite/lib/python2.7/site-packages/opt/graphite/webapp django-admin.py migrate --settings=graphite.settings --run-syncdb

$ /root/graphite/bin/carbon-cache.py start
$ PYTHONPATH=/root/graphite/storage/whisper /root/graphite/bin/run-graphite-devel-server.py --port=9097 --libs=/root/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &

踩过的坑

ImportError: No module named carbon.util
描述
1
2
3
4
5
(graphite) [root@graphite-sit.yuzhouwan.com conf]# /root/graphite/bin/carbon-cache.py start
Traceback (most recent call last):
File "/root/graphite/bin/carbon-cache.py", line 28, in <module>
from carbon.util import run_twistd_plugin
ImportError: No module named carbon.util
解决
  • 是否是 carbon 安装未完全成功 —not ok
1
2
3
4
5
6
7
8
9
10
$ pip freeze

carbon==0.9.15
ceres==0.10.0rc1
constantly==15.1.0
incremental==16.10.1
Twisted==16.6.0
txAMQP==0.6.2
whisper==0.9.15
zope.interface==4.3.3
  • graphite’s default prefix (/opt/graphite)’ —ok
1
2
$ mv /root/graphite/lib/python2.7/site-packages/opt/graphite/lib/carbon /root/graphite/lib/python2.7/site-packages/
$ mv /root/graphite/lib/python2.7/site-packages/opt/graphite/lib/twisted/plugins/carbon_* /root/graphite/lib/python2.7/site-packages/twisted/plugins/
参考
django.db.utils.OperationalError: unable to open database file
描述
1
2
3
4
5
6
7
8
9
10
11
$ python manage.py syncdb

/root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/settings.py:246: UserWarning: SECRET_KEY is set to an unsafe default. This should be set in local_settings.py for better security
warn('SECRET_KEY is set to an unsafe default. This should be set in local_settings.py for better security')
Traceback (most recent call last):
File "manage.py", line 13, in <module>
execute_from_command_line(sys.argv)
File "/root/graphite/lib/python2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
utility.execute()
// ...
django.db.utils.OperationalError: unable to open database file
解决
  • change default SECRET_KEY in settings.py —ok
1
2
3
4
5
$ vim /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/settings.py

# Django 1.5 requires this so we set a default but warn the user
# SECRET_KEY = 'UNSAFE_DEFAULT'
SECRET_KEY = 'graphite'
  • change DATABASE_NAME in sqlites
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ mkdir /root/graphite/sqlite
$ cd /root/graphite/sqlite

# create database
$ sqlite3 graphite.db
$ sqlite3
sqlite>.help
sqlite>.databases
seq name file
--- --------------- ------------------------------------------
0 main /root/graphite/sqlite/graphite.db

Crtl + D (exit like python)

# change DATABASE_NAME --not ok
DATABASE_NAME='/root/graphite/sqlite/graphite.db'
echo $DATABASE_NAME

# run 'python manage.py syncdb' again, then the graphite database disappeared
  • modify settings.py for sqlite database —ok
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
$ cd /root/graphite/storage
$ mkdir db
$ cd db
$ sqlite3 graphite.db
$ vim /root/graphite/lib/python2.7/site-packages/django/conf/project_template/project_name/settings.py

# GRAPHITE_STORAGE_DIR = '/root/graphite/sqlite/graphite.db' --not ok

#DATABASES = {
# 'default': {
# 'ENGINE': 'django.db.backends.sqlite3',
# 'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
# }
#}
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
'NAME': '/root/graphite/storage/db/graphite.db', # Or path to database file if using sqlite3.
'USER': '', # Not used with sqlite3.
'PASSWORD': '', # Not used with sqlite3.
'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
'PORT': '', # Set to empty string for default. Not used with sqlite3.
}
}

# trouble shooting
sqlite3 /root/graphite/storage/db/graphite.db
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .databases
seq name file
--- --------------- --------------------------------------
0 main /root/graphite/storage/db/graphite.db


$ cd /root/graphite/
$ find | grep /settings.py | grep -v pyc
./lib/python2.7/site-packages/opt/graphite/webapp/graphite/settings.py
./lib/python2.7/site-packages/tagging/tests/settings.py
./lib/python2.7/site-packages/tagging/settings.py
./lib/python2.7/site-packages/django/conf/project_template/project_name/settings.py

# 全部修改完成,即可修复
  • django version too low (<= v1.4) —no
1
2
$ django-admin version
1.8
  • 访问权限 —not ok
1
2
3
4
5
6
7
8
9
$ cut -d: -f1 /etc/passwd | grep graphite
$ echo $USER
root

$ cd /root/graphite/storage/db
$ sudo chown root:root graphite.db
$ sudo chmod o+rw graphite.db
$ sudo chmod o+rwx db/
$ sudo chmod o+rwx ../webapp/
ImportError: No module named graphite.settings
描述
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ ./bin/run-graphite-devel-server.py --port=8085 --libs=/root/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &

tail: /root/graphite/storage/log/webapp/process.log: file truncated
Traceback (most recent call last):
File "/root/graphite/bin/django-admin", line 11, in <module>
sys.exit(execute_from_command_line())
File "/root/graphite/lib/python2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
utility.execute()
File "/root/graphite/lib/python2.7/site-packages/django/core/management/__init__.py", line 303, in execute
settings.INSTALLED_APPS
File "/root/graphite/lib/python2.7/site-packages/django/conf/__init__.py", line 48, in __getattr__
self._setup(name)
File "/root/graphite/lib/python2.7/site-packages/django/conf/__init__.py", line 44, in _setup
self._wrapped = Settings(settings_module)
File "/root/graphite/lib/python2.7/site-packages/django/conf/__init__.py", line 92, in __init__
mod = importlib.import_module(self.SETTINGS_MODULE)
File "/usr/local/python27/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
ImportError: No module named graphite.settings
解决
  • 指定 PYTHONPATH —ok
1
2
3
4
5
6
$ PYTHONPATH=/root/graphite/storage/whisper /root/graphite/bin/run-graphite-devel-server.py --port=8085 --libs=/root/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &

# new problem
ImportError: Cannot import either sping or piddle.

$ PYTHONPATH=/root/graphite/storage/whisper /root/graphite/bin/run-graphite-devel-server.py --port=8085 --libs=/root/graphite/lib/python2.7/site-packages/opt/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &
  • 修改 local_settings.py —not ok
1
2
3
4
5
6
7
8
9
10
11
12
$ vim /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/local_settings.py

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
'NAME': '/root/graphite/storage/db/graphite.db', # Or path to database file if using sqlite3.
'USER': '', # Not used with sqlite3.
'PASSWORD': '', # Not used with sqlite3.
'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
'PORT': '', # Set to empty string for default. Not used with sqlite3.
}
}
OError: [Errno 2] No such file or directory: ‘/root/graphite/lib/python2.7/site-packages/opt/graphite/storage/log/webapp/info.log’
描述
1
2
3
4
5
6
7
8
9
# 访问 http://192.168.1.102:8085/

Traceback (most recent call last):
File "/root/graphite/lib/python2.7/site-packages/django/core/handlers/base.py", line 119, in get_response
resolver_match = resolver.resolve(request.path_info)
// ...
File "/usr/local/python27/lib/python2.7/logging/__init__.py", line 943, in _open
stream = open(self.baseFilename, self.mode)
IOError: [Errno 2] No such file or directory: '/root/graphite/lib/python2.7/site-packages/opt/graphite/storage/log/webapp/info.log'
解决
  • 增加 info.log 文件 —ok
1
2
$ mkdir -p /root/graphite/lib/python2.7/site-packages/opt/graphite/storage/log/webapp/
$ echo '' > /root/graphite/lib/python2.7/site-packages/opt/graphite/storage/log/webapp/info.log
Graphite Web 页面无 event 数据
描述
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 访问 http://192.168.1.101:9097/events/

# 在 Druid 中是能看到 historical 进程的确在产生数据,并且成功连接到了 graphite
$ tail -f /home/druid/software/imply-2.0.0/var/sv/historical.log

2017-01-05T11:34:29,203 INFO [GraphiteEmitter-1] io.druid.emitter.graphite.GraphiteEmitter - trying to connect to graphite server
# 如果连接不上,会报错 ERROR [GraphiteEmitter-1] io.druid.emitter.graphite.GraphiteEmitter - 拒绝连接
# 则检查 graphite 进程是否正常


# 在 Graphite 中也能看到数据被收到了
$ cd /root/graphite/storage/log/carbon-cache/carbon-cache-a
$ tail -f console.log creates.log listener.log

05/01/2017 20:05:18 :: Sorted 75 cache queues in 0.000208 seconds
# 如果数据有误,会报错 05/01/2017 20:32:32 :: invalid line ((L1483619493L) received from client 10.10.10.1:41752, ignoring
# 则检查 druid emitter 配置是否正常
解决
  • 是否是 SQLite 数据库没有成功存储
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# graphite 配置没有问题
$ vim /root/graphite/lib/python2.7/site-packages/django/conf/project_template/project_name/settings.py

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
'NAME': '/root/graphite/storage/db/graphite.db', # Or path to database file if using sqlite3.
'USER': '', # Not used with sqlite3.
'PASSWORD': '', # Not used with sqlite3.
'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
'PORT': '', # Set to empty string for default. Not used with sqlite3.
}
}

# 发现 sqlite 中并没有将 events 记录下来
(graphite) [root@kylin03-pre db]# sqlite3 /root/graphite/storage/db/graphite.db
SQLite version 3.6.20
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .databases
seq name file
--- --------------- ----------------------------------------
0 main /root/graphite/storage/db/graphite.db
sqlite> .tables
account_mygraph dashboard_dashboard
account_profile dashboard_dashboard_owners
account_variable django_admin_log
account_view django_content_type
account_window django_migrations
auth_group django_session
auth_group_permissions events_event
auth_permission tagging_tag
auth_user tagging_taggeditem
auth_user_groups url_shortener_link
auth_user_user_permissions
sqlite> select * from auth_user;
1|pbkdf2_sha256$20000$oEgzveEmcg9B$8xbilUymXlwVBAaB48xpUQwsfIucmeP/4C4YF3U6SlI=|1|graphite|||bj@yuzhouwan.com|1|1|2017-01-04 05:59:10.615950|2017-01-05 08:24:54.957631
2|pbkdf2_sha256$20000$gG1lK6FNg0h7$dXH47Wqc+Gj/qTyI6EKOajd+Pj1kKN+U5CtnmDo0K/0=|0|default|||default@localhost.localdomain|0|1|2017-01-04 06:53:34.687401|
3|pbkdf2_sha256$20000$fcQ5sYbw0cjk$anjZc4J0eRE51HGJ6D50c0c9+d08iY7lhWseke9RmEY=|0|druid||||0|1|2017-01-05 09:03:48.696161|
sqlite> select * from events_event; # no data!

# 尝试使用 MySQL 替换 SQLite
# 192.168.1.102
$ mkdir -p /root/software/mysql
yum install -y --downloadonly --downloaddir=/root/software/mysql mysql
yum install -y --downloadonly --downloaddir=/root/software/mysql mysql-server
yum install -y --downloadonly --downloaddir=/root/software/mysql MySQL-python
# 192.168.1.101
yum install -y mysql mysql-server MySQL-python

$ cd /root/software/mysql
$ wget http://dev.mysql.com/get/mysql57-community-release-el5-7.noarch.rpm
$ yum localinstall mysql57-community-release-el5-7.noarch.rpm
$ yum repolist enabled | grep "mysql.*-community.*"
$ yum install mysql-community-server
$ vim /usr/bin/yum-config-manager
#!/usr/bin/python2.6 -tt

$ yum-config-manager --enable mysql57-community
$ service mysqld start

$ mysql -uroot -p -S /home/mysql/data/mysql.sock

# 后面规范化部署的时候,可以创建 graphite 用户,并赋权
CREATE DATABASE graphite;
# GRANT ALL PRIVILEGES ON graphite.* TO 'graphite'@'localhost' IDENTIFIED BY 'sysadmin';
GRANT ALL PRIVILEGES ON graphite.* TO 'root'@'localhost' IDENTIFIED BY 'sysadmin';
FLUSH PRIVILEGES;

$ vim /root/graphite/lib/python2.7/site-packages/django/conf/project_template/project_name/settings.py

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
# 'NAME': 'jdbc:mysql://192.168.1.101:3306/graphite',
'NAME': 'graphite',
'USER': 'root',
# 'HOST': 'localhost',
'PASSWORD': 'root'
}
}
# TIME_ZONE = 'UTC'
TIME_ZONE = 'Asia/Shanghai'
# DEBUG = False
DEBUG = True

$ cd /root/graphite/
$ find | grep /settings.py | grep -v pyc
$ vim /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/settings.py
$ vim /root/graphite/lib/python2.7/site-packages/tagging/tests/settings.py
$ vim /root/graphite/lib/python2.7/site-packages/tagging/settings.py
# ./lib/python2.7/site-packages/django/conf/project_template/project_name/settings.py

# 全部修改完成,即可修复

$ python /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/manage.py syncdb
# 如果需要添加其他的 superuser,可以使用如下命令 admin/admin
# echo "from django.contrib.auth.models import User; User.objects.create_superuser('admin', 'admin@hihuron.com', 'sysadmin')" | python /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp/graphite/manage.py shell

$ /root/graphite/bin/carbon-cache.py start
$ PYTHONPATH=/root/graphite/storage/whisper /root/graphite/bin/run-graphite-devel-server.py --port=9097 --libs=/root/graphite/lib/python2.7/site-packages/opt/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &

$ cd /root/graphite/webapp
$ cp -r content/ /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp
$ cd /root/graphite/lib/python2.7/site-packages/opt/graphite/webapp
$ cp -r graphite/ /root/graphite/webapp

$ PYTHONPATH=/root/graphite/storage/whisper /root/graphite/bin/run-graphite-devel-server.py --port=9097 --libs=/root/graphite/webapp /root/graphite 1>/root/graphite/storage/log/webapp/process.log 2>&1 &
ImportError: No module named twisted.python.util
描述
1
2
3
4
5
6
7
$ python carbon-cache.py start
Traceback (most recent call last):
File "carbon-cache.py", line 28, in <module>
from carbon.util import run_twistd_plugin
File "/opt/graphite/lib/carbon/util.py", line 20, in <module>
from twisted.python.util import initgroups
ImportError: No module named twisted.python.util
解决
1
2
3
4
5
6
# pip freeze | grep zope.interface # 没有则需要安装
# pip install zope.interface==3.6.0
$ wget https://pypi.python.org/packages/source/T/Twisted/Twisted-14.0.0.tar.bz2#md5=9625c094e0a18da77faa4627b98c9815 --no-check-certificate
$ tar -jxf Twisted-14.0.0.tar.bz2
$ cd Twisted-14.0.0;
$ python setup.py install
‘WSGIRequest’ object has no attribute ‘REQUEST’
描述
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# 访问 http://192.168.1.102:9097/

AttributeError at /render/
'WSGIRequest' object has no attribute 'REQUEST'
Request Method: GET
Request URL: http://192.168.1.102:9097/render/?width=586&height=308&_salt=1483685265.903
Django Version: 1.9.12
Exception Type: AttributeError
Exception Value:
'WSGIRequest' object has no attribute 'REQUEST'
Exception Location: /root/graphite/webapp/graphite/render/views.py in parseOptions, line 236
Python Executable: /root/graphite/bin/python
Python Version: 2.7.12
Python Path:
['/root/graphite/webapp',
'/root/graphite/webapp',
'/root/graphite/webapp',
'/root/graphite/bin',
'/root/graphite/webapp',
'/root/graphite/storage/whisper',
'/root/graphite/lib/python27.zip',
'/root/graphite/lib/python2.7',
'/root/graphite/lib/python2.7/plat-linux2',
'/root/graphite/lib/python2.7/lib-tk',
'/root/graphite/lib/python2.7/lib-old',
'/root/graphite/lib/python2.7/lib-dynload',
'/usr/local/python27/lib/python2.7',
'/usr/local/python27/lib/python2.7/plat-linux2',
'/usr/local/python27/lib/python2.7/lib-tk',
'/root/graphite/lib/python2.7/site-packages',
'/root/graphite/lib/python2.7/site-packages/graphite-0.71-py2.7.egg',
'/root/graphite/lib/python2.7/site-packages/spring-5.8.7-py2.7-linux-x86_64.egg',
'/root/graphite/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-linux-x86_64.egg',
'/root/graphite/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg',
'/root/graphite/lib/python2.7/site-packages/numpy-1.12.0rc2-py2.7-linux-x86_64.egg',
'/root/graphite/lib/python2.7/site-packages/logger-1.4-py2.7.egg',
'/root/graphite/lib/python2.7/site-packages/decorator-4.0.10-py2.7.egg',
'/root/graphite/lib/python2.7/site-packages/sping-1.1.15-py2.5.egg',
'/usr/local/python27/lib/python2.7/site-packages',
'/root/graphite/webapp/graphite/thirdparty']
Server time: Fri, 6 Jan 2017 14:47:46 +0800
解决
1
2
3
4
5
# 可能 Django 版本不对应导致的?

django==1.10.5 --no
django==1.9.12 --no
django==1.8.17 --ok

参考

Apache Superset

 出于篇幅考虑,单独写了一篇博客,详见:Apache Superset 二次开发

踩过的坑

true / false 存为维度后变成了 NULL

解决

 Druid 本身是无法将 true / false 之类的 boolean 类型作为维度的,可以考虑将 "true" / "false" 字符串作为维度存入
 但是,如果自定义的 Bean 对象中,有 String isTimeout = "false" 的属性存在,就不能直接使用 JSON.toJSONString 进行转换。因为 toJSONString 方法中会识别出 "true"/"false" 字符串,并将其自动转化为 boolean 类型。因此,需要通过 Map<String, Object> 将所有字段都存入,然后再调用 JSON.toJSONString 方法即可

1
2
3
4
5
6
7
8
9
10
11
$ bin/plyql --host localhost:8082 -q "select * from log"
┌─────────────────────────────────────────┬───────┬───────────┬─────┬─────┬──────┬──────────────────────────────────────┐
│ __time │ count │ isTimeout │ max │ min │ sum │ uuid │
├─────────────────────────────────────────┼───────┼───────────┼─────┼─────┼──────┼──────────────────────────────────────┤
│ Wed Aug 02 2017 17:35:00 GMT+0800 (CST) │ 4 │ NULL │ 860 │ 860 │ 3440 │ 4621a23d-8270-4bc3-948a-f577b460d72b │
│ Wed Aug 02 2017 17:42:00 GMT+0800 (CST) │ 1 │ NULL │ 860 │ 860 │ 860 │ 4621a23d-8270-4bc3-948a-f577b460d72b │
│ Wed Aug 02 2017 17:44:00 GMT+0800 (CST) │ 1 │ NULL │ 860 │ 860 │ 860 │ 4621a23d-8270-4bc3-948a-f577b460d72b │
│ Wed Aug 02 2017 18:03:00 GMT+0800 (CST) │ 3 │ NULL │ 0 │ 0 │ 0 │ 85f030bd-d737-4863-9af1-e6fd8bd3b15c │
│ Wed Aug 02 2017 19:01:24 GMT+0800 (CST) │ 2 │ NULL │ 0 │ 0 │ 0 │ 85f030bd-d737-4863-9af1-e6fd8bd3b15c │
│ Wed Aug 02 2017 19:09:49 GMT+0800 (CST) │ 1 │ false │ 0 │ 0 │ 0 │ ba11de00-7faf-4eaf-a8ea-1cf3c5033de5 │
└─────────────────────────────────────────┴───────┴───────────┴─────┴─────┴──────┴──────────────────────────────────────┘

Pool was initialized with limit = 0

描述

 执行 RESTful 查询语句的时候,报错

1
2
3
4
5
6
{
"error": "Unknown exception",
"errorClass": "java.lang.IllegalStateException",
"errorMessage": "Pool was initialized with limit = 0, there are no objects to take.",
"host": "druid01:8101"
}

解决

 检查 Broker、Historical、MiddleManger 是否都已经配置了 druid.processing.numMergeBuffers 参数

1
2
3
4
5
$ cd /home/druid/software/druid/conf/druid
$ cat broker/runtime.properties historical/runtime.properties middleManager/runtime.properties | grep numMergeBuffers
druid.processing.numMergeBuffers=4
druid.processing.numMergeBuffers=4
druid.processing.numMergeBuffers=4

No buckets?? seems there is no data to index

解决

 除去 HDFS 里面的确没有数据的情况,还有可能是因为 Hadoop 任务的 TimeZone 没有设置正确

1
2
"mapreduce.map.java.opts":"-Duser.timezone=UTC -Dfile.encoding=UTF-8"
"mapreduce.reduce.java.opts":"-Duser.timezone=UTC -Dfile.encoding=UTF-8"

参考

hyperUniqueCardinality 输出浮点数,导致 having 无法使用 equalTo 进行过滤

描述

 通过下述查询,因为使用了 hyperUniqueCardinality 近似查询,导致 alias-6 输出的值为浮点数(如:10.0000213),如此无法在 having 中使用 equalTo 进行精确过滤(=10)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
"jsonClass": "GroupByQuerySpec",
"queryType": "groupBy",
"dataSource": "yuzhouwan",
"dimensions": [{
"jsonClass": "ExtractionDimensionSpec",
"type": "extraction",
"dimension": "__time",
"outputName": "alias-4",
"extractionFn": {
"jsonClass": "TimeFormatExtractionFunctionSpec",
"type": "timeFormat",
"format": "yyyy-MM-dd",
"timeZone": "Asia/Shanghai",
"locale": "en_US"
}
}],
"having": {
"jsonClass": "ComparisonHavingSpec",
"type": "greaterThan",
"aggregation": "alias-6",
"value": 10
},
"granularity": "all",
"aggregations": [{
"type": "hyperUnique",
"name": "alias-5",
"fieldName": "__HLL_booksnumber1"
}],
"postAggregations": [{
"jsonClass": "HyperUniqueCardinalityPostAggregationSpec",
"type": "hyperUniqueCardinality",
"fieldName": "alias-5",
"name": "alias-6"
}],
"intervals": ["2018-08-08T00:00:00.000+08:00/2018-08-18T00:00:00.000+08:00"],
"context": {
"queryId": "yuzhouwan-127.0.0.1-3395157377882475",
"groupByStrategy": "v2"
}
}

解决

 通过在 postAggregations 中,增加一个 JavaScript post-aggregator 计算过程,再利用 Math.round 进行四舍五入即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{
"jsonClass": "GroupByQuerySpec",
"queryType": "groupBy",
"dataSource": "yuzhouwan",
"dimensions": [{
"jsonClass": "ExtractionDimensionSpec",
"type": "extraction",
"dimension": "__time",
"outputName": "alias-4",
"extractionFn": {
"jsonClass": "TimeFormatExtractionFunctionSpec",
"type": "timeFormat",
"format": "yyyy-MM-dd",
"timeZone": "Asia/Shanghai",
"locale": "en_US"
}
}],
"having": {
"jsonClass": "ComparisonHavingSpec",
"type": "equalTo",
"aggregation": "alias-6",
"value": 10
},
"granularity": "all",
"aggregations": [{
"type": "hyperUnique",
"name": "alias-5",
"fieldName": "__HLL_booksnumber1"
}],
"postAggregations": [{
"jsonClass": "HyperUniqueCardinalityPostAggregationSpec",
"type": "hyperUniqueCardinality",
"fieldName": "alias-5",
"name": "alias6"
}, {
"type": "javascript",
"name": "alias-6",
"fieldNames": ["alias6"],
"function": "function(alias6) { return Math.round(alias6); }"
}],
"intervals": ["2018-08-08T00:00:00.000+08:00/2018-08-16T00:00:00.000+08:00"],
"context": {
"queryId": "yuzhouwan-127.0.0.1-3395157377882475",
"groupByStrategy": "v2"
}
}

社区跟进

 详见:《如何成为 ASF 的 Committer

资料

Doc

Blog

Common

Druid

Kylin

TimescaleDB

Confluo

PipelineDB

Pinot

Palo

NiFi

Gorilla

Wiki

Book

欢迎加入我们的技术群,一起交流学习

人工智能 (高级)& (进阶)| BigData | 算法

Benedict Jin wechat
Subscribe to my blog by scanning my public wechat account.