实用技巧

List

1
2
3
4
5
6
7
List(1, 9, 2, 4, 5) span (_ < 3)       // (List(1), List(9, 2, 4, 5))  碰到不符合就结束

List(1, 9, 2, 4, 5) partition (_ < 3) // (List(1, 2), List(9, 4, 5)) 扫描所有

List(1, 9, 2, 4, 5) splitAt 2 // (List(1, 9), List(2, 4, 5)) 以下标为分割点

List(1, 9, 2, 4, 5) groupBy (5 < _) // Map(false -> List(1, 2, 4, 5), true -> List(9)) 分割成 Map 对象,以 Boolean 类型为 Key

Iterator

grouped

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import scala.collection.{AbstractIterator, mutable}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.BigquerySparkSession._

val conf = new SparkConf()
val builder = SparkSession.builder().config(conf).enableHiveSupport()
val spark = builder.getOrCreateBigquerySparkSession()
val df = spark.sql("use db; select * from table")

val dataset = df.rdd.mapPartitions(iter => {

// 将每个 partition 中的多行数据,以 100 为长度作为一组,进行一次批处理
iter.grouped(100)
.flatMap(rows => {
val records = new mutable.MutableList[String]()
rows.foreach(row => records.add(JSON.toJSONString(row, false)))
records
})
})

val filteredEmptyLine = dataset
.filter(_ != null)
.map(JSON.toJSONString(_, false))
.filter(_.trim.length != 0)
阅读全文 »

Nginx 是什么?

Nginx™ [engine x] is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server

环境搭建

下载

 在 Nginx Archive 下载页面,下载 nginx-1.13.12.tar.gz 安装包

安装依赖

1
2
$ yum -y install openssl openssl-devel
$ yum -y install pcre-devel

编译安装

1
2
3
4
5
$ tar zxvf nginx-1.13.12.tar.gz
# 必须要跳转到 nginx 安装目录下
$ cd nginx-1.13.12
$ ./configure --prefix=/usr/local/nginx --conf-path=/usr/local/nginx/nginx.conf
$ make -j4 && make -j4 install

启动

1
2
$ cd /usr/local/nginx/
$ sbin/nginx -c /usr/local/nginx/nginx.conf
1
$ ps -ef | grep nginx
1
2
3
4
root     107034      1  0 Oct31 ?        00:00:00 nginx: master process sbin/nginx
nobody 107036 107034 0 Oct31 ? 00:00:00 nginx: worker process
nobody 107266 107265 0 Oct31 ? 00:00:00 tsar --check --apache --cpu --mem --load --io --traffic --tcp --partition --nginx --swap
root 107270 97588 0 Oct31 pts/1 00:00:00 grep nginx
阅读全文 »

Node.js 是什么?

 Node.js® is a platform built on Chrome’s JavaScript runtime for easily building fast, scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices.

Current Version: v0.10.33

为什么要有 Node 模块?

 模块,是 Node 让代码易于重用的一种组织和包装方式

阅读全文 »