当前位置:网站首页>HTAP Quick Start Guide
HTAP Quick Start Guide
2022-06-27 06:34:00 【Tianxiang shop】
This guide describes how to get started quickly TiDB One stop hybrid online transaction and online analytical processing (Hybrid Transactional and Analytical Processing, HTAP) function .
Be careful
The steps in this guide apply only to the Quick Start experience , Not suitable for production environment . To explore HTAP More functions , Please refer to Explore in depth HTAP.
Basic concepts
Before trial , You need to TiDB Row storage engine for online transaction processing TiKV And the column storage engine for real-time analysis scenarios TiFlash Have some basic knowledge :
- HTAP Storage engine : Bank deposit (Row-store) And inventory (columnar-store) At the same time , Automatic synchronization , Maintain strong consistency . The line is saved as an online transaction OLTP Provide optimization , Column storage is online analysis and processing OLAP Provide performance optimization .
- HTAP Data consistency : As a distributed transactional key value database ,TiKV Provides satisfaction ACID Constrained distributed transaction interface , And pass Raft The protocol ensures the consistency and high availability of multi replica data .TiFlash adopt Multi-Raft Learner Protocol real time from TiKV Copy the data , Ensure that TiKV There's a strong consistency between the data .
- HTAP Data isolation :TiKV、TiFlash It can be deployed on different machines on demand , solve HTAP The problem of resource isolation .
- MPP Calculation engine : from v5.0 Since version ,TiFlash A distributed computing framework is introduced MPP, Allows data exchange between nodes and provides high performance 、 High throughput SQL Algorithm , It can greatly shorten the execution time of analysis query .
Experience steps
The steps in this article are TPC-H Data sets, for example , Experience one of the query scenarios TiDB HTAP Convenience and high performance .TPC-H It is a popular decision support in the industry (Desision Support) Business Benchmark. It contains a large amount of data , A business decision analysis system needs to respond to different types of ad hoc queries with high complexity . If you need to experience TPC-H complete 22 strip SQL, You can visit tidb-bench Warehouse Or read TPC-H The official website explains how to generate query statements and data .
The first 1 Step : Deploy trial environment
On trial TiDB HTAP Before function , Please follow TiDB Database Quick Start Guide Step preparation in TiDB Local test environment , Execute the following command to start TiDB colony :
tiup playground
Be careful
tiup playground The command is only applicable to the Quick Start experience , Not suitable for production environment .
The first 2 Step : Prepare trial data
Go through the following steps , Will generate a TPC-H Datasets are used to experience TiDB HTAP function . If you are right about TPC-H Interested in , You can view its description .
Be careful
If you want to use your existing data for analysis and query , Can be Data migration to TiDB in ; If you want to design and generate your own data , Can pass SQL Statements or related tools generate .
Use the following command to install the data generation tool :
tiup install benchUse the following command to generate data :
tiup bench tpch --sf=1 prepareWhen the command line outputs
Finishedwhen , Indicates that the data generation is completed .Run the following SQL Statement to view the generated data :
SELECT CONCAT(table_schema,'.',table_name) AS 'Table Name', table_rows AS 'Number of Rows', CONCAT(ROUND(data_length/(1024*1024*1024),4),'G') AS 'Data Size', CONCAT(ROUND(index_length/(1024*1024*1024),4),'G') AS 'Index Size', CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),4),'G') AS'Total'FROM information_schema.TABLES WHERE table_schema LIKE 'test';As you can see from the output , A total of eight tables have been generated , The largest amount of data in a table is 600 Line ten thousand ( Because the data is randomly generated by tools , So the actual amount of data generated is SQL The actual query value shall prevail ).
+---------------+----------------+-----------+------------+---------+ | Table Name | Number of Rows | Data Size | Index Size | Total | +---------------+----------------+-----------+------------+---------+ | test.nation | 25 | 0.0000G | 0.0000G | 0.0000G | | test.region | 5 | 0.0000G | 0.0000G | 0.0000G | | test.part | 200000 | 0.0245G | 0.0000G | 0.0245G | | test.supplier | 10000 | 0.0014G | 0.0000G | 0.0014G | | test.partsupp | 800000 | 0.1174G | 0.0119G | 0.1293G | | test.customer | 150000 | 0.0242G | 0.0000G | 0.0242G | | test.orders | 1514336 | 0.1673G | 0.0000G | 0.1673G | | test.lineitem | 6001215 | 0.7756G | 0.0894G | 0.8651G | +---------------+----------------+-----------+------------+---------+ 8 rows in set (0.06 sec)This is a commercial ordering system database . among ,
test.nationThe table is the country information 、test.regionThe table is the regional information 、test.partTables are part information 、test.supplierTable is the supplier information 、test.partsuppTable is the supplier's part information 、test.customerTables are consumer information 、test.ordersTable is the order information 、test.lineitemTable is the information of online products .
The first 3 Step : Use row storage to query data
Perform the following SQL sentence , You can experience when using only line storage ( Most databases ) when TiDB The performance of the :
SELECT l_orderkey, SUM( l_extendedprice * (1 - l_discount) ) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < DATE '1996-01-01' AND l_shipdate > DATE '1996-02-01' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue DESC, o_orderdate limit 10;
This is a shipping priority query , Used to give the priority and potential revenue of the highest revenue order among the orders that have not been shipped before the specified date . Potential revenue is defined as l_extendedprice * (1-l_discount) And . Orders are listed in descending order of revenue . In this example , This query will list potential query revenues in the top 10 Orders that have not yet been shipped .
The first 4 Step : Synchronize inventory data
TiFlash After the deployment is completed, it will not be synchronized automatically TiKV data , You can MySQL Client to TiDB Send the following DDL The command specifies that synchronization to TiFlash Table of . After designation ,TiDB The corresponding TiFlash copy .
ALTER TABLE test.customer SET TIFLASH REPLICA 1; ALTER TABLE test.orders SET TIFLASH REPLICA 1; ALTER TABLE test.lineitem SET TIFLASH REPLICA 1;
To inquire TiFlash Synchronization status of the table , Please use the following SQL sentence :
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'customer'; SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'orders'; SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'lineitem';
Among the above query results :
AVAILABLEField represents the... Of the table TiFlash Whether the copy is available .1 On behalf of the available ,0 Means unavailable . Once the replica state becomes available, it will not change , If you pass DDL Command to modify the number of copies will recalculate the synchronization progress .PROGRESSField represents the synchronization progress , stay 0.0~1.0 Between ,1 Represents at least 1 Replicas have been synchronized .
The first 5 Step : Use HTAP Analyze data faster
Re execution The first 3 Step Medium SQL sentence , You can feel TiDB HTAP The performance of the .
For creating TiFlash Copy of the table ,TiDB The optimizer will automatically choose whether to use... Based on the cost estimation TiFlash copy . To see if... Is actually selected TiFlash copy , have access to desc or explain analyze sentence , for example :
explain analyze SELECT l_orderkey, SUM( l_extendedprice * (1 - l_discount) ) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < DATE '1996-01-01' AND l_shipdate > DATE '1996-02-01' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue DESC, o_orderdate limit 10;
If the result shows ExchangeSender and ExchangeReceiver operator , indicate MPP In force .
Besides , You can also specify that each calculation part of the entire query only uses TiFlash engine , Please refer to Use TiDB Read TiFlash.
You can compare the results of two queries with the query performance .
边栏推荐
- Block level elements & inline elements
- 【QT小记】QT中正则表达式QRegularExpression的基本使用
- 网关状态检测 echo request/reply
- [QT notes] simple understanding of QT meta object system
- TiDB 中的视图功能
- 力扣 179、最大数
- C Primer Plus Chapter 11_ Strings and string functions_ Codes and exercises
- 卷积神经网络---CNN模型的应用(找矿预测)
- G1 and ZGC garbage collector
- JVM garbage collection mechanism
猜你喜欢
随机推荐
Matlab GUI interface simulation DC motor and AC motor speed simulation
浅谈GPU:历史发展,架构
tracepoint
el-select多个时,el-select筛选选中过的值,第二个el-select中过滤上一个选中的值
乐观事务和悲观事务
【入门】正则表达式基础入门笔记
Multithreading basic part part 1
An Empirical Evaluation of In-Memory Multi-Version Concurrency Control
1317. convert an integer to the sum of two zero free integers
Thinking technology: how to solve the dilemma in work and life?
JVM tuning ideas
Partial function of Scala
427- binary tree (617. merge binary tree, 700. search in binary search tree, 98. verify binary search tree, 530. minimum absolute difference of binary search tree)
高斯分布Gaussian distribution、線性回歸、邏輯回歸logistics regression
Caldera installation and simple use
线程间等待与唤醒机制、单例模式、阻塞队列、定时器
Block level elements & inline elements
LeetCode 0086.分隔链表
Assembly language - Wang Shuang Chapter 8 two basic problems in data processing - Notes
Small program of C language practice (consolidate and deepen the understanding of knowledge points)









