当前位置：网站首页>Doris' table creation and data division

Column It can be divided into two categories ：Key and Value. From a business perspective ,Key and Value It can correspond to dimension column and indicator column respectively . From the perspective of the aggregation model ,Key Rows with the same columns , Will converge into one line . among Value The column aggregation method is specified by the user when creating the table . more ： Refer to the Doris Data model .

Tablet & Partition

stay Doris In the storage engine of , User data is horizontally divided into several data slices （Tablet, Also called data Points barrels ）. Every Tablet Contains several data rows , each Tablet There is no intersection between the data , And it's physically stored independently .

Multiple Tablet Logically belong to different partitions （Partition）. One Tablet It belongs to only one Partition, And one Partition Contains several Tablet. because Tablet Physically independent storage , So it can be regarded as Partition Physically independent .Tablet It's data mobility 、 Copy and other operations Smallest physical storage unit .

Several Partition Form a Table.Partition It can be regarded as the smallest snap in in logic , Data import and deletion , Can or can only be for one Partition Conduct .

Data partitioning

This is illustrated by a table creation operation Doris Data division of .

CREATE TABLE IF NOT EXISTS example_db.expamle_tbl
(
    `user_id` LARGEINT NOT NULL COMMENT " user id",
    `date` DATE NOT NULL COMMENT " Data entry date and time ",
    `timestamp` DATETIME NOT NULL COMMENT " Timestamp of data injection ",
    `city` VARCHAR(20) COMMENT " User City ",
    `age` SMALLINT COMMENT " User age ",
    `sex` TINYINT COMMENT " User's gender ",
    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT " Last time the user visited ",
    `cost` BIGINT SUM DEFAULT "0" COMMENT " Total user consumption ",
    `max_dwell_time` INT MAX DEFAULT "0" COMMENT " Maximum user residence time ",
    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT " User minimum residence time "
)
ENGINE=olap
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
    PARTITION `p202001` VALUES LESS THAN ("2020-02-01"),
    PARTITION `p202002` VALUES LESS THAN ("2020-03-01"),
    PARTITION `p202003` VALUES LESS THAN ("2020-04-01")
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
    "replication_num" = "3",
    "storage_medium" = "SSD",
    "storage_cooldown_time" = "2021-01-01 12:00:00"
);

Column definition

AGGREGATE KEY In the data model , All do not specify aggregation method （SUM、REPLACE、MAX、MIN） The columns of are treated as Key Column . And the rest are Value Column .

When defining Columns , Please refer to the following suggestions ：

Key Columns must be in all Value Before the column .
Try to choose integer type . Because the calculation and search efficiency of integer type is much higher than that of string .
The selection principle of integer types with different lengths , follow Enough can be used. .
about VARCHAR and STRING The length of the type , follow Enough can be used. .
Total byte length of all columns （ Include Key and Value） No more than 100KB.

Partition and barrel

Doris Support two-tier data division . The first layer is Partition, Support only Range The way of division . The second level is Bucket（Tablet）, Support only Hash The way of division .

You can also use only one layer of partitions . When using one layer partition , Only support Bucket Divide .

1.Partition

Partition Columns can specify one or more columns . Partition class must be KEY Column . The use of multi column partition is later Multi column partition Summary introduction .
Regardless of the type of partition column , When writing partition values , Double quotes are required .
Partition columns are usually time columns , To facilitate the management of old and new data .
There is theoretically no upper limit on the number of partitions .
When not used Partition Build table , The system will automatically generate a table with the same name as the table name , Of the full value range Partition. The Partition Invisible to users , And can't be deleted .
Partition Supported by VALUES LESS THAN (...) Specify only the upper bound , The system will take the upper bound of the previous partition as the lower bound of the partition , Generate a left closed right open interval . adopt , Also support passing VALUES [...) Specify both upper and lower bounds , Generate a left closed right open interval .
adopt VALUES [...) It is easy to understand that both upper and lower bounds are specified .

2.Bucket

If used Partition, be DISTRIBUTED ... The statement describes the data in Within each partition The division rules of . If not used Partition, It describes the division rules of the data of the whole table .
Bucket columns can be multiple columns , But it has to be Key Column . The barrel column can be compared with Partition The columns are the same or different .
Selection of bucket column , Is in Query throughput and Query concurrency A trade-off between ：
1. If you select multiple bucket Columns , The data is more evenly distributed . If a query condition does not contain the equivalent conditions of all bucket Columns , Then the query will trigger all buckets to be scanned at the same time , In this way, the throughput of queries will increase , The latency of a single query is reduced . This method is suitable for query scenarios with high throughput and low concurrency .
2. If only one or a few bucket columns are selected , Then the corresponding point query can trigger only one bucket sorting scan . here , When multiple point queries are concurrent , These queries have a high probability of triggering different bucket scanning , Between queries IO The impact is small （ Especially when different buckets are distributed on different disks ）, Therefore, this method is suitable for high concurrency point query scenarios .
There is no upper limit on the number of barrels in theory .

3. About Partition and Bucket Suggestions on the quantity and amount of data .

A watch of Tablet The total quantity is equal to (Partition num * Bucket num).
A watch of Tablet Number , Without considering capacity expansion , The recommended number of disks is slightly more than that of the whole cluster .
Single Tablet In theory, there is no upper and lower bounds on the amount of data , But the suggestion is in 1G - 10G Within the scope of . If single Tablet The amount of data is too small , Then the aggregation effect of data is poor , And metadata management is under great pressure . If the amount of data is too large , It is not conducive to the migration of replicas 、 A filling , And it will increase Schema Change perhaps Rollup The cost of retry when the operation fails （ The granularity of these failed retries is Tablet）.
When Tablet When the principle of data quantity and the principle of quantity conflict , It is suggested to give priority to the principle of data volume .
Under table , For each division Bucket The quantity is specified uniformly . But when dynamically adding partitions （ADD PARTITION）, You can specify the name of the new partition separately Bucket Number . This function can be used to deal with data shrinkage or expansion .
One Partition Of Bucket Once the quantity is specified , Non modifiable . So I'm making sure Bucket In quantity , Cluster expansion needs to be considered in advance . For example, there are only 3 platform host, Each station host Yes 1 Block plate . If Bucket The quantity of is only set to 3 Or less , Then even if we add more machines in the later stage , Nor can it improve concurrency .
For example ： Suppose there is 10 platform BE, Each station BE In the case of one disk . If the total size of a table is 500MB, Then we can consider 4-8 A shard .5GB：8-16 individual .50GB：32 individual .500GB： Suggest zoning , The size of each partition is 50GB about , Every section 16-32 A shard .5TB： Suggest zoning , The size of each partition is 50GB about , Every section 16-32 A shard .

notes ： The amount of data in the table can be show data Command view , Result divided by number of copies , That is, the amount of data in the table .

PROPERTIES

At the end of the statement under construction PROPERTIES in , You can specify the following two parameters ：

1.replication_num

Every Tablet Number of copies of . The default is 3, It is suggested to keep the default . In the statement of creating a table , all Partition Medium Tablet The number of copies is specified uniformly . When adding new partitions , You can specify the new partition separately Tablet Number of copies of .
The number of copies can be modified at run time . It is strongly recommended to keep odd numbers .
The maximum number of replicas depends on the number of independent replicas in the cluster IP The number of （ Notice that it's not BE Number ）.Doris The principle of replica distribution in is , The same is not allowed Tablet Copies of are distributed on the same physical machine , The physical machine is identified by IP. therefore , Even if deployed on the same physical machine 3 One or more BE example , If these BE Of IP identical , You can only set the number of copies to 1.
For some small , And update the dimension table infrequently , You can consider setting more copies . In this way Join When inquiring , There is a greater probability of local data Join.

2.storage_medium & storage_cooldown_time

BE The data storage directory of can be explicitly specified as SSD perhaps HDD（ adopt .SSD perhaps .HDD Suffixes distinguish ）. Build table , All can be specified uniformly Partition Initial storage media . Be careful , The suffix is used to explicitly specify the disk media , It does not check whether it matches the actual media type .
The default initial storage media can be accessed through fe Configuration file for fe.conf It is specified in default_storage_medium=xxx, If not specified , The default is HDD. If specified as SSD, The data is initially stored in SSD On .
If not specified storage_cooldown_time, By default 30 Days later , The data will come from SSD Automatically migrate to HDD On . If you specify storage_cooldown_time, On arrival storage_cooldown_time After time , Data will be migrated .
Be careful , When specifying storage_medium when , If FE Parameters enable_strict_storage_medium_check by True This parameter is just a “ Do my best ” Set up . Even if there are no settings in the cluster SSD storage medium , No errors reported , It's automatically stored in the available data directory . Again , If SSD The media is not accessible 、 The space is insufficient , May lead to the initial data stored directly on other available media . And the data is due to migrate to HDD when , If HDD The media is not accessible 、 The space is insufficient , Migration may also fail （ But I'll keep trying ）. If FE Parameters enable_strict_storage_medium_check by False If there is no setting in the cluster SSD When storing media , Will report a mistake Failed to find enough host in all backends with storage medium is SSD.

ENGINE

In this example ,ENGINE The type is olap, The default ENGINE type . stay Doris in , Only this ENGINE The type is by Doris Responsible for data management and storage . other ENGINE type , Such as mysql、broker、es wait , In essence, it is just a mapping of tables in other external databases or systems , In order to make sure Doris You can read these data . and Doris Does not create 、 Manage and store any non olap ENGINE Types of tables and data .

`IF NOT EXISTS`  Indicates if the table has not been created , Create .
 Note that only the existence of the table name is judged here , It will not judge whether the new table structure is the same as the existing table structure .
 So if there is a table with the same name but different structure , The command will also return success , But it does not mean that new tables and new structures have been created .

原网站

版权声明
本文为[TRX1024]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280529584018.html