当前位置:网站首页>OLAP analysis engine kylin4.0
OLAP analysis engine kylin4.0
2022-06-25 04:45:00 【Youpei】
List of articles
Portal :https://www.bilibili.com/video/BV1Wt4y1s7rZ?share_source=copy_web
kylin4.0

- The underlying build engine passes spark Build data source connection , Such as hadoop、hive、csv
- Will be constructed cube Deposit in parquet Inside , Coexist to hdfs On . there cube Can be understood as a special structure , Composed of latitude , It is equal to building a three-dimensional model in which all dimensions are arranged and combined in advance
- kylin Metadata in is similar to hive, Save in mysql in
- The routing layer is equivalent to retrieving , If there is already cube If the conditions are met, go and check parquet, If the conditions are not met, it will be converted to sparksql To query the docked data source
- The query engine is used to parse the data sent from the user's client sql sentence
- rest server The client can access the interface , You can also let the client send sql
- Because it is sprak Program , So it can be yarn/k8s Resource management and typesetting
therefore , In fact ,kylin does , It is the indicator that you specify the relevant measurement after passing in the fact latitude , It automatically builds 2^n -1 Latitude system
quick start
docker structure
# Pull the mirror image
docker pull apachekylin/apache-kylin-standalone:4.0.0
# Start the container
docker run -d \
-m 8G \
-p 7070:7070 \
-p 8088:8088 \
-p 50070:50070 \
-p 8032:8032 \
-p 8042:8042 \
-p 2181:2181 \
apachekylin/apache-kylin-standalone:4.0.0
After launching the browser http://ip:7070/kylin Direct access to web End
Currently, data source docking is supported hive surface 、csv file , Read hive Will synchronize when hive Table information

Directly import the created test table
- Create a model process : Creating models — Select relevant data — Select latitude — Choose the facts — Zoning + Filter settings

- establish cube technological process : establish cube( You need to select a model )— Latitude selection ( Select... From the model latitude )— Measure selection ( Select... From the fact field in the model )— Refresh the set ( Set up parquet Time of file merging : Default 7 Days are small and ,28 The sky is big and )— Advanced settings ( Set up aggregation group 、rowkey、Cuboids)— Set up kylin Related parameters ( The default in accordance with the kylin*.properties)— Final confirmation

- structure cube

- Building , meanwhile yarn There will also be application Running

- Build complete , Can write sql Inquire about , Year on year query comparison hive It's very fast
Query considerations
When you create a model, you specify join Mode of connection , If the query is not enabled, press , Write other join The way sql Will report a mistake
Fields are modeled 、cube It's time to specify , If the query is not enabled, press , Selecting other fields will also result in an error
The aggregate query method is also specified , If the query is not enabled, press , Selecting other field aggregations will also result in an error
The facts must come first , Dimension table after
restful api call
- curl Execute a certain item sql
curl -X POST -H "Authorization: Basic QURNSU46S1lMSU4=" -H 'Content-Type: application/json' -d '{"sql":"select dname,sum(sal) from emp e join dept d on e.deptno = d.deptno group by dname","project":"FirstProject"}' http://172.16.6.14:7070/kylin/api/query
- Scheduled scripts , You can use azkaban/oozie Such a scheduling tool to execute
#!/bin/bash
# From 1 Parameters get cube_name
cube_name=$1
# From 2 Get build parameters cube Time
if [ -n "$2" ]
then
do_date=$2
else
do_date=`date -d '-1 day' +%F`
fi
# Get the execution time 00:00:00 Time stamp (0 The time zone )
start_date_unix=`date -d "$do_date 08:00:00" +%s`
# Second timestamps change to millisecond timestamps
start_date=$(($start_date_unix*1000))
# Get the execution time 24:00 The timestamp
stop_date=$(($start_date+86400000))
curl -X PUT -H "Authorization: Basic QURNSU46S1lMSU4=" -H 'ContentType: application/json' -d '{"startTime":'$start_date', "endTime":'$stop_date', "buildType":"BUILD"}'
http://172.16.6.14:7070/kylin/api/cubes/$cube_name/build
# notes : No modification kylin The time zone , therefore kylin The internal only recognizes 0 Time zone time ,0 Time zone 0 Point is East 8 Morning in the district 8 spot , So we have to write in the script $do_date 08:00:00 To make up for the time difference .
Open query and press
Open query and press , No build cube The use of sparksql go to hive Go to find out
- stay conf Under the table of contents ,kylin.properties
query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushDownRunnerSparkImpl
Query engine
- Sparder, It feels like spark-shell, Application resources have been occupied ,kylin By default, it will be enabled in the first query statement , So the first one is usually slow , It can also be opened directly by default , Change the parameter to true that will do
kylin.query.auto-sparder-context-enabled-enabled=true
- hdfs Catalog
- Temporary file storage directory :/project_name/job_tmp Cuboid
- File storage directory : /project_name /parquet/cube_name/segment_name_XXX
- Dimension table snapshot storage directory :/project_name /table_snapshot
- Spark Run log directory :/project_name/spark_logs

- kylin Related query parameters
####spark Operation mode ####
#kylin.query.spark-conf.spark.master=yarn
####spark driver The core number ####
#kylin.query.spark-conf.spark.driver.cores=1
####spark driver Run a memory ####
#kylin.query.spark-conf.spark.driver.memory=4G
####spark driver Run off heap memory ####
#kylin.query.spark-conf.spark.driver.memoryOverhead=1G
####spark executor The core number ####
#kylin.query.spark-conf.spark.executor.cores=1
####spark executor Number ####
#kylin.query.spark-conf.spark.executor.instances=1
####spark executor Run a memory ####
#kylin.query.spark-conf.spark.executor.memory=4G
####spark executor Run off heap memory ####
#kylin.query.spark-conf.spark.executor.memoryOverhead=1G
cube Build optimization
Derived latitude
- According to the common sense, the latitude of the indicator field is 2^n-1 individual cuboid Need to create
- Now, the primary key of the dimension table is used to represent other indicator dimensions of the dimension table , Suppose that the original two dimension tables each have three indicators , Then we have to 26-1=63 individual cuboid, Now you only need two primary key combinations, that is 22-1=3 individual cuboid
- Not recommended , because kylin The power lies in the precomputing build cube, If you use derived latitude , This will cause the query to calculate , Reduce the calculation during pre calculation , have the order reversed
- In the build cube When choosing the type of latitude ,Deriver Keyword indicates derived latitude

Aggregate group
- It is mainly aimed at those that cannot be used in the actual environment cuboid, It is possible not to build... During precomputation
- Forced latitude : Represents the built cuboid Must include this latitude , Otherwise, do not build , Be careful , Mandatory latitude cannot appear alone , The following is also wrong

- Hierarchical latitude : It can be understood as the prerequisite latitude of a certain latitude , That is to say B When it appears A Must appear ,A—>B

- Joint latitude : Indicates that the latitude must appear together

- actual cube Building applications

cube Parameter tuning
Use appropriate spark resources ,cube It can also be adjusted according to its own parameters spark Running resources

Global dictionary : Mainly the de duplication function , For integers, you can use bitmap duplicate removal , And for String type ,kylin The method adopted is first to string Building maps , Reuse bitmap De duplication of mapped values

- Snapshot table optimization : Each snapshot table corresponds to hive Latitude table

Query performance optimization
- Sort columns : That is, when building rowkey The order of , You can drag and drop the sorting order of each column , stay rowkeys In addition to the sorting order of the columns on the page , There are also slices according to a certain column , Sharding can improve query performance , Corresponding kylin Underlying storage parquet file

- Reduce small or uneven parquet file

Connection tool integration
jdbc
<dependencies>
<dependency>
<groupId>org.apache.kylin</groupId>
<artifactId>kylin-jdbc</artifactId>
<version>4.0.1</version>
</dependency>
</dependencies>
import java.sql.*;
public class KylinTest
{
public static void main(String[] args) throws Exception
{
//Kylin_JDBC drive
String KYLIN_DRIVER = "org.apache.kylin.jdbc.Driver";
//Kylin_URL
String KYLIN_URL = "jdbc:kylin://172.16.6.14:7071/FirstProject";
//Kylin Username
String KYLIN_USER = "ADMIN";
//Kylin Password
String KYLIN_PASSWD = "KYLIN";
// Add driver information
Class.forName(KYLIN_DRIVER);
// Get the connection
Connection connection = DriverManager.getConnection(KYLIN_URL, KYLIN_USER, KYLIN_PASSWD);
// precompile SQL
PreparedStatement ps = connection.prepareStatement("select
dname, sum(sal) from emp e join dept d on e.deptno = d.deptno group by dname ");
// Execute the query
ResultSet resultSet = ps.executeQuery();
// Traversal print
while(resultSet.next())
{
System.out.println(resultSet.getString(1) + ":" + resultSet.getDouble(2));
}
}
}
MDX Integrate
May adopt docker Deploy , Pull the mirror image :
docker pull apachekylin/apache-kylin-standalone:kylin-4.0.1-mondrian
Start the container :
docker run -d \
-m 8G \
-p 7070:7070 \
-p 7080:7080 \
-p 8088:8088 \
-p 50070:50070 \
-p 8032:8032 \
-p 8042:8042 \
-p 2181:2181 \
apachekylin/apache-kylin-standalone:kylin-4.0.1-mondrian
- Kylin page :http://127.0.0.1:7070/kylin/login
- MDX for Kylin page :http://127.0.0.1:7080
- HDFS NameNode page :http://127.0.0.1:50070
- YARN ResourceManager page :http://127.0.0.1:8088
边栏推荐
- 台式电脑连不上wifi怎么办
- CTF_ Web:8-bit controllable character getshell
- The solution of wechat applet switchtab unable to take parameters
- 使用文本分析识别一段文本中的主要性别
- CTF_ Web: Changan cup-2021 old but a little new & asuka
- leetcode1221. Split balance string
- 哪个编程语言实现hello world最烦琐?
- Cannot import name 'escape' from 'jinja2' [solved successfully]
- OpenSea PHP开发包
- CTF_ Web: advanced problem WP (5-8) of attack and defense world expert zone
猜你喜欢

Deep learning - several types of learning

Efficient NoSQL database service Amazon dynamodb experience sharing

ROS2/DDS/QoS/主题的记录

Bingbing's learning notes: implementation of circular queue

Gbase 8s overall architecture

CTF_ Web: Changan cup-2021 old but a little new & asuka

Chapter IX app project test (2) test tools

机器学习深度学习——向量化

Kotlin Compose 完善toDo项目 Surface 渲染背景 与阴影

CTF_ Web: Advanced questions of attack and defense world expert zone WP (19-21)
随机推荐
OOP vector addition and subtraction (friend + copy construction)
Solution of gbase 8s livelock and deadlock
Upgrade PHP to php7 The impact of X (I). The problem of session retention. Keep login
多睡觉,能减肥,芝加哥大学最新研究:每天多睡1小时,等于少吃一根炸鸡腿...
我的IC之旅——资深芯片设计验证工程师成长——“胡”说IC工程师完美进阶
Code scanning payment flow chart of Alipay payment function developed by PHP
深度学习——几种学习类型
为什么TCP握手刚刚好是3次呢?
GBASE 8s存储过程流程控制
JS' sort() function
Part I Verilog quick start
Deep learning - several types of learning
CTF_ Web: Advanced questions of attack and defense world expert zone WP (1-4)
CTF_ Web: advanced problem WP (5-8) of attack and defense world expert zone
JDBC (IV)
Chapter IX app project test (2) test tools
台式电脑连不上wifi怎么办
SOC验证环境的启动方式
Kotlin Compose 完善toDo项目 Surface 渲染背景 与阴影
Leader: who can use redis expired monitoring to close orders and get out of here!