当前位置:网站首页>HTAP Quick Start Guide
HTAP Quick Start Guide
2022-06-27 06:34:00 【Tianxiang shop】
This guide describes how to get started quickly TiDB One stop hybrid online transaction and online analytical processing (Hybrid Transactional and Analytical Processing, HTAP) function .
Be careful
The steps in this guide apply only to the Quick Start experience , Not suitable for production environment . To explore HTAP More functions , Please refer to Explore in depth HTAP.
Basic concepts
Before trial , You need to TiDB Row storage engine for online transaction processing TiKV And the column storage engine for real-time analysis scenarios TiFlash Have some basic knowledge :
- HTAP Storage engine : Bank deposit (Row-store) And inventory (columnar-store) At the same time , Automatic synchronization , Maintain strong consistency . The line is saved as an online transaction OLTP Provide optimization , Column storage is online analysis and processing OLAP Provide performance optimization .
- HTAP Data consistency : As a distributed transactional key value database ,TiKV Provides satisfaction ACID Constrained distributed transaction interface , And pass Raft The protocol ensures the consistency and high availability of multi replica data .TiFlash adopt Multi-Raft Learner Protocol real time from TiKV Copy the data , Ensure that TiKV There's a strong consistency between the data .
- HTAP Data isolation :TiKV、TiFlash It can be deployed on different machines on demand , solve HTAP The problem of resource isolation .
- MPP Calculation engine : from v5.0 Since version ,TiFlash A distributed computing framework is introduced MPP, Allows data exchange between nodes and provides high performance 、 High throughput SQL Algorithm , It can greatly shorten the execution time of analysis query .
Experience steps
The steps in this article are TPC-H Data sets, for example , Experience one of the query scenarios TiDB HTAP Convenience and high performance .TPC-H It is a popular decision support in the industry (Desision Support) Business Benchmark. It contains a large amount of data , A business decision analysis system needs to respond to different types of ad hoc queries with high complexity . If you need to experience TPC-H complete 22 strip SQL, You can visit tidb-bench Warehouse Or read TPC-H The official website explains how to generate query statements and data .
The first 1 Step : Deploy trial environment
On trial TiDB HTAP Before function , Please follow TiDB Database Quick Start Guide Step preparation in TiDB Local test environment , Execute the following command to start TiDB colony :
tiup playground
Be careful
tiup playground The command is only applicable to the Quick Start experience , Not suitable for production environment .
The first 2 Step : Prepare trial data
Go through the following steps , Will generate a TPC-H Datasets are used to experience TiDB HTAP function . If you are right about TPC-H Interested in , You can view its description .
Be careful
If you want to use your existing data for analysis and query , Can be Data migration to TiDB in ; If you want to design and generate your own data , Can pass SQL Statements or related tools generate .
Use the following command to install the data generation tool :
tiup install benchUse the following command to generate data :
tiup bench tpch --sf=1 prepareWhen the command line outputs
Finishedwhen , Indicates that the data generation is completed .Run the following SQL Statement to view the generated data :
SELECT CONCAT(table_schema,'.',table_name) AS 'Table Name', table_rows AS 'Number of Rows', CONCAT(ROUND(data_length/(1024*1024*1024),4),'G') AS 'Data Size', CONCAT(ROUND(index_length/(1024*1024*1024),4),'G') AS 'Index Size', CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),4),'G') AS'Total'FROM information_schema.TABLES WHERE table_schema LIKE 'test';As you can see from the output , A total of eight tables have been generated , The largest amount of data in a table is 600 Line ten thousand ( Because the data is randomly generated by tools , So the actual amount of data generated is SQL The actual query value shall prevail ).
+---------------+----------------+-----------+------------+---------+ | Table Name | Number of Rows | Data Size | Index Size | Total | +---------------+----------------+-----------+------------+---------+ | test.nation | 25 | 0.0000G | 0.0000G | 0.0000G | | test.region | 5 | 0.0000G | 0.0000G | 0.0000G | | test.part | 200000 | 0.0245G | 0.0000G | 0.0245G | | test.supplier | 10000 | 0.0014G | 0.0000G | 0.0014G | | test.partsupp | 800000 | 0.1174G | 0.0119G | 0.1293G | | test.customer | 150000 | 0.0242G | 0.0000G | 0.0242G | | test.orders | 1514336 | 0.1673G | 0.0000G | 0.1673G | | test.lineitem | 6001215 | 0.7756G | 0.0894G | 0.8651G | +---------------+----------------+-----------+------------+---------+ 8 rows in set (0.06 sec)This is a commercial ordering system database . among ,
test.nationThe table is the country information 、test.regionThe table is the regional information 、test.partTables are part information 、test.supplierTable is the supplier information 、test.partsuppTable is the supplier's part information 、test.customerTables are consumer information 、test.ordersTable is the order information 、test.lineitemTable is the information of online products .
The first 3 Step : Use row storage to query data
Perform the following SQL sentence , You can experience when using only line storage ( Most databases ) when TiDB The performance of the :
SELECT l_orderkey, SUM( l_extendedprice * (1 - l_discount) ) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < DATE '1996-01-01' AND l_shipdate > DATE '1996-02-01' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue DESC, o_orderdate limit 10;
This is a shipping priority query , Used to give the priority and potential revenue of the highest revenue order among the orders that have not been shipped before the specified date . Potential revenue is defined as l_extendedprice * (1-l_discount) And . Orders are listed in descending order of revenue . In this example , This query will list potential query revenues in the top 10 Orders that have not yet been shipped .
The first 4 Step : Synchronize inventory data
TiFlash After the deployment is completed, it will not be synchronized automatically TiKV data , You can MySQL Client to TiDB Send the following DDL The command specifies that synchronization to TiFlash Table of . After designation ,TiDB The corresponding TiFlash copy .
ALTER TABLE test.customer SET TIFLASH REPLICA 1; ALTER TABLE test.orders SET TIFLASH REPLICA 1; ALTER TABLE test.lineitem SET TIFLASH REPLICA 1;
To inquire TiFlash Synchronization status of the table , Please use the following SQL sentence :
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'customer'; SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'orders'; SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = 'test' and TABLE_NAME = 'lineitem';
Among the above query results :
AVAILABLEField represents the... Of the table TiFlash Whether the copy is available .1 On behalf of the available ,0 Means unavailable . Once the replica state becomes available, it will not change , If you pass DDL Command to modify the number of copies will recalculate the synchronization progress .PROGRESSField represents the synchronization progress , stay 0.0~1.0 Between ,1 Represents at least 1 Replicas have been synchronized .
The first 5 Step : Use HTAP Analyze data faster
Re execution The first 3 Step Medium SQL sentence , You can feel TiDB HTAP The performance of the .
For creating TiFlash Copy of the table ,TiDB The optimizer will automatically choose whether to use... Based on the cost estimation TiFlash copy . To see if... Is actually selected TiFlash copy , have access to desc or explain analyze sentence , for example :
explain analyze SELECT l_orderkey, SUM( l_extendedprice * (1 - l_discount) ) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < DATE '1996-01-01' AND l_shipdate > DATE '1996-02-01' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue DESC, o_orderdate limit 10;
If the result shows ExchangeSender and ExchangeReceiver operator , indicate MPP In force .
Besides , You can also specify that each calculation part of the entire query only uses TiFlash engine , Please refer to Use TiDB Read TiFlash.
You can compare the results of two queries with the query performance .
边栏推荐
猜你喜欢
[email protected][2389:1: columnNameTypeOrConstraint : ( ( tableConstraint ) | ( columnNameT"/>NoViableAltException([email protected][2389:1: columnNameTypeOrConstraint : ( ( tableConstraint ) | ( columnNameT

G1 and ZGC garbage collector

JVM common instructions

Multithreading basic part2

Altium Designer 19 器件丝印标号位置批量统一摆放

0.0.0.0:x的含义

The fourth question of the 299th weekly match 6103 Minimum fraction of edges removed from the tree

Assembly language - Wang Shuang Chapter 8 two basic problems in data processing - Notes

快速实现单片机和手机蓝牙通信

观测电机转速转矩
随机推荐
Redis 缓存穿透、缓存击穿、缓存雪崩
427- binary tree (617. merge binary tree, 700. search in binary search tree, 98. verify binary search tree, 530. minimum absolute difference of binary search tree)
Partial function of Scala
An Empirical Evaluation of In-Memory Multi-Version Concurrency Control
Unrecognized VM option ‘‘
浅谈GPU:历史发展,架构
multiprocessing. Detailed explanation of pool
thrift
Centos7.9安装mysql 5.7,并设置开机启动
vscode korofileheader 的配置
The restart status of the openstack instance will change to the error handling method. The openstack built by the container restarts the compute service method of the computing node and prompts the gi
信息系统项目管理师---第七章 项目成本管理
Quick personal site building guide using WordPress
【QT小作】使用结构体数据生成读写配置文件代码
Crawler learning 5--- anti crawling identification picture verification code (ddddocr and pyteseract measured effect)
G1 and ZGC garbage collector
JVM common instructions
Small program of C language practice (consolidate and deepen the understanding of knowledge points)
JVM常用指令
Cloud-Native Database Systems at Alibaba: Opportunities and Challenges