$ cd /opt
$ wget http://ftp.tc.edu.tw/pub/Apache/kylin/apache-kylin-1.6.0/apache-kylin-1.6.0-hbase1.x-bin.tar.gz
$ tar xf apache-kylin-1.6.0-hbase1.x-bin.tar.gz
$ vim /etc/profile
export KYLIN_HOME=/opt/apache-kylin-1.6.0-hbase1.x-bin
$ source /etc/profile
(二)环境检查
$cd /opt/apache-kylin-1.6.0-hbase1.x-bin
$./bin/check-env.sh
KYLIN_HOME is set to /opt/apache-kylin-1.6.0-hbase1.x-binmkdir: Permission denied: user=root, access=WRITE, inode="/kylin":hdfs:hdfs:drwxr-xr-xfailed to create /kylin, Please make sure the user has right to access /kylin
#提示使用hdfs用户
#check-env.sh脚本执行的是检查本地hive,hbase,hadoop等环境情况。
#并会在hdfs中创建一个kylin的工作目录。
$ su hdfs
$ ./bin/check-env.sh
KYLIN_HOME is set to /opt/apache-kylin-1.6.0-hbase1.x-bin
$ hadoop fs -ls / #多了一个/kylin的目录drwxr-xr-x - hdfs hdfs 0 2017-01-19 10:08 /kylin
(三)启动
$ chown hdfs.hadoop /opt/apache-kylin-1.6.0-hbase1.x-bin
$ ./bin/kylin.sh start
A new Kylin instance is started by hdfs, stop it using "kylin.sh stop"Please visit
You can check the log at /opt/apache-kylin-1.6.0-hbase1.x-bin/logs/kylin.log
接下来选择维度和度量,这是构建预计算模型cube中最为重要的两个属性。 度量: 度量是具体考察的聚合数量值,例如:销售数量、销售金额、人均购买量。计算机一点描述就是在SQL中就是聚合函数。
例如:select cate,count(1),sum(num) from fact_table where date>’20161112’ group by cate;
count(1)、sum(num)是度量 维度: 维度是观察数据的角度。例如:销售日期、销售地点。计算机一点的描述就是在SQL中就是where、group by里的字段
例如:select cate,count(1),sum(num) from fact_table where date>’20161112’ group by cate;
date、cate是维度
比较一下在kylin中查询和直接在hive中查询的速度。
执行一个group by order by的查询。
SQL:select ip, max(loadmax) as loadmax,max(connectmax) as connectmax, max(eth0max) as eth0max, max(eth1max) as eth1max ,max(rospace) as rospace,max(team) as team from resource group by ip order by loadmax asc ;
在Kylin预计算之后,这条查询只用了0.11s
直接在hive中进行计算时间是30.05s
时间相差270倍!!!
(六)样例数据
#kylin自带一个样例,包含1w条数据的样本
$ ./bin/sample.sh
Sample cube is created successfully in project 'learn_kylin'.
Restart Kylin server or reload the metadata from web UI to see the change.
$ ./bin/kylin.sh stop
stopping Kylin:15334
$ ./bin/kylin.sh start 可以在Kylin中看到learn_kylin这个项目。并且有创建好的model和cube,可以供参考和学习。