当前位置:网站首页>Doris creates OLAP, mysql, and broker tables
Doris creates OLAP, mysql, and broker tables
2022-06-13 03:28:00 【TRX1024】
Catalog
6. Create a page with HLL List of columns
Use Mysql Client function HELP CREATE TABLE Take notes .
One 、 Create Syntax
grammar
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [database.]table_name
(column_definition1[, column_definition2, ...])
[ENGINE = [olap|mysql|broker]]
[key_desc]
[partition_desc]
[distribution_desc]
[PROPERTIES ("key"="value", ...)];
[BROKER PROPERTIES ("key"="value", ...)];describe
1. column_definition
Table field .
grammar :
col_name col_type [agg_type] [NULL | NOT NULL] [DEFAULT "default_value"]explain :
col_name: Column name
col_type: Column type
TINYINT(1 byte )
Range :-2^7 + 1 ~ 2^7 - 1
SMALLINT(2 byte )
Range :-2^15 + 1 ~ 2^15 - 1
INT(4 byte )
Range :-2^31 + 1 ~ 2^31 - 1
BIGINT(8 byte )
Range :-2^63 + 1 ~ 2^63 - 1
LARGEINT(16 byte )
Range :0 ~ 2^127 - 1
FLOAT(4 byte )
Support scientific counting
DOUBLE(12 byte )
Support scientific counting
DECIMAL[(precision, scale)] (40 byte )
Decimal type to ensure accuracy . The default is DECIMAL(10, 0)
precision: 1 ~ 27
scale: 0 ~ 9
The integer part is 1 ~ 18
Scientific counting is not supported
DATE(3 byte )
Range :1900-01-01 ~ 9999-12-31
DATETIME(8 byte )
Range :1900-01-01 00:00:00 ~ 9999-12-31 23:59:59
CHAR[(length)]
Fixed length string . Length range :1 ~ 255. The default is 1
VARCHAR[(length)]
Variable length string . Length range :1 ~ 65533
HLL (1~16385 Bytes )
hll Column type , There is no need to specify the length and default value 、 Length aggregation based on data
Degree of control within the system , also HLL Columns can only pass through matching hll_union_agg、Hll_cardinality、hll_hash Query or use agg_type: Aggregation type
- If you don't specify , Then the column is key Column . otherwise , This column is value Column
- SUM、MAX、MIN、REPLACE、HLL_UNION( Only used for HLL Column , by HLL Unique aggregation method )
- This type only applies to aggregate models (key_desc Of type by AGGREGATE KEY) Useful , Other models do not need to specify this .
Is it allowed to be NULL: The default is not allowed to be NULL.NULL Values are used in the imported data \N To express
2. ENGINE type
Specify the table type , The default is olap. Optional mysql, broker.
1) If it is mysql, You need to in properties Provide the following information :
PROPERTIES (
"host" = "mysql_server_host",
"port" = "mysql_server_port",
"user" = "your_user_name",
"password" = "your_password",
"database" = "database_name",
"table" = "table_name"
)- "table" In the entry "table_name" yes mysql Real table name in . and CREATE TABLE Statement table_name Is that the mysql Table in Palo The name of , Can be different .
- stay Palo establish mysql The purpose of the table is to be able to Palo visit mysql database . and Palo It does not maintain itself 、 Store any mysql data .
2) If it is broker, Indicates that the access to the table needs to pass the specified broker, Need to be in properties Provide the following information :
PROPERTIES (
"broker_name" = "broker_name",
"paths" = "file_path1[,file_path2]",
"column_separator" = "value_separator"
"line_delimiter" = "value_delimiter"
)In addition, we need to provide Broker Needed Property Information , adopt BROKER PROPERTIES To pass on , for example HDFS Need to
BROKER PROPERTIES(
"username" = "name",
"password" = "password"
)- This is based on different Broker type , The content that needs to be passed in is also different
- "paths" If there are multiple files in , Comma [,] Division
- If the file name contains a comma , So use %2c To replace
- If the file name contains %, Use %25 Instead of
- Now the file content format supports CSV, Support GZ,BZ2,LZ4,LZO(LZOP) Compressed format .
3. key_desc
Specify the data model , Data model concepts .
grammar :
key_type(k1[,k2 ...])The data is in accordance with the specified key Sort columns , And according to different key_type With different characteristics .
key_type Support the following types :
- AGGREGATE KEY :key List the same records ,value Columns are aggregated according to the specified aggregation type , Fit Report 、 Multidimensional analysis and other business scenarios .
- UNIQUE KEY:key List the same records ,value Columns are overwritten in the order of import , Suitable for pressing key List the point query businesses for adding, deleting, and modifying queries .
- DUPLICATE KEY:key List the same records , At the same time Palo in , It is suitable for business scenarios where detailed data is stored or data is not aggregated .
Be careful : except AGGREGATE KEY Outside , other key_type Under table ,value Column does not need to specify aggregate type .
4. partition_desc
The specified partition .
1) Range Partition
grammar :
PARTITION BY RANGE (k1)
(
PARTITION partition_name VALUES LESS THAN MAXVALUE|("value1")
PARTITION partition_name VALUES LESS THAN MAXVALUE|("value2")
...
)explain :
Use specified key Column and the specified range of values .
1) Partition names only support beginning with letters , Letter 、 Numbers and underscores
2) Currently, only the following types of columns are supported as Range Partition column , Only one partition column can be specified
TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME
3) The section is left closed and right open , The left boundary of the first partition is the minimum
4) NULL Values are stored only in the partition containing the minimum value . When the partition containing the minimum value is deleted ,NULL Values will not be imported .Be careful :
1) Partitions are generally used for data management of time dimension
2) There is a need for data backtracking , Consider that the first partition is empty , So that partitions can be added later 5. distribution_desc
Hash Points barrels
grammar :
DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]
explain :
Use specified key Column to hash buckets . The default number of partitions is 10
Suggest : It is recommended to use Hash Barrel separation method
6. PROPERTIES
1) If ENGINE The type is olap, Can be in properties Specified column storage in ( At present, only listing and saving are supported )
PROPERTIES (
"storage_type" = "[column]",
)2) If ENGINE The type is olap, Can be in properties Set the initial storage medium for the table data 、 Storage expiration time and number of copies .
PROPERTIES (
"storage_medium" = "[SSD|HDD]",
["storage_cooldown_time" = "yyyy-MM-dd HH:mm:ss"],
["replication_num" = "3"]
)- storage_medium: Used to specify the initial storage media for this partition , Can choose SSD or HDD. The default is HDD.
- storage_cooldown_time: When the storage medium is set to SSD when , Specify that the partition is in SSD Storage expiration time on . Default storage 7 God . The format is :"yyyy-MM-dd HH:mm:ss".
- replication_num: Specify the number of copies of the partition . The default is 3.
- When the table is a single partition table , These attributes are the attributes of the table .
- When the table is a two-level partition , These attributes are attached to each partition .
- If you want different partitions to have different properties . Can pass ADD PARTITION or MODIFY PARTITION To operate
3) If Engine The type is olap, also storage_type by column, You can specify a column to use bloom filter Indexes
bloom filter The index only applies when the query condition is in and equal The situation of , The more scattered the values in this column, the better the effect
Currently, only the following columns are supported : except TINYINT FLOAT DOUBLE Out of type key The column and aggregation method are REPLACE Of value Column
PROPERTIES (
"bloom_filter_columns"="k1,k2,k3"
)4) If you want to use Colocate Join characteristic , Need to be in properties It is specified in
PROPERTIES (
"colocate_with"="table1"
)
Two 、 Table creation instance
1. Create a olap surface , Use HASH Points barrels , Use column store , identical key Aggregate records
CREATE TABLE example_db.table_hash
(
k1 TINYINT,
k2 DECIMAL(10, 2) DEFAULT "10.5",
v1 CHAR(10) REPLACE,
v2 INT SUM
)
ENGINE=olap
AGGREGATE KEY(k1, k2)
DISTRIBUTED BY HASH(k1) BUCKETS 32
PROPERTIES ("storage_type"="column");2. Create a olap surface , Use Hash Points barrels , Use column store , identical key Set the initial storage medium and cooling time
CREATE TABLE example_db.table_hash
(
k1 BIGINT,
k2 LARGEINT,
v1 VARCHAR(2048) REPLACE,
v2 SMALLINT SUM DEFAULT "10"
)
ENGINE=olap
UNIQUE KEY(k1, k2)
DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
PROPERTIES(
"storage_type"="column",
"storage_medium" = "SSD",
"storage_cooldown_time" = "2015-06-04 00:00:00"
);3. Create a olap surface , Use Key Range Partition , Use Hash Points barrels , By default, column save is used ,
identical key Records of exist at the same time , Set the initial storage medium and cooling time
CREATE TABLE example_db.table_range
(
k1 DATE,
k2 INT,
k3 SMALLINT,
v1 VARCHAR(2048),
v2 DATETIME DEFAULT "2014-02-04 15:36:00"
)
ENGINE=olap
DUPLICATE KEY(k1, k2, k3)
PARTITION BY RANGE (k1)
(
PARTITION p1 VALUES LESS THAN ("2014-01-01"),
PARTITION p2 VALUES LESS THAN ("2014-06-01"),
PARTITION p3 VALUES LESS THAN ("2014-12-01")
)
DISTRIBUTED BY HASH(k2) BUCKETS 32
PROPERTIES(
"storage_medium" = "SSD", "storage_cooldown_time" = "2015-06-04 00:00:00"
); explain :
This statement will divide the data into the following 3 Zones :
( { MIN }, {"2014-01-01"} )
[ {"2014-01-01"}, {"2014-06-01"} )
[ {"2014-06-01"}, {"2014-12-01"} )
Data outside these partitions will be treated as illegal data and filtered
4. Create a mysql surface
CREATE TABLE example_db.table_mysql
(
k1 DATE,
k2 INT,
k3 SMALLINT,
k4 VARCHAR(2048),
k5 DATETIME
)
ENGINE=mysql
PROPERTIES
(
"host" = "127.0.0.1",
"port" = "8239",
"user" = "mysql_user",
"password" = "mysql_passwd",
"database" = "mysql_db_test",
"table" = "mysql_table_test"
)5. Create a data file stored in HDFS Upper broker External table , Data usage "|" Division ,"\n" Line break
CREATE EXTERNAL TABLE example_db.table_broker (
k1 DATE,
k2 INT,
k3 SMALLINT,
k4 VARCHAR(2048),
k5 DATETIME
)
ENGINE=broker
PROPERTIES (
"broker_name" = "hdfs",
"path" = "hdfs://hdfs_host:hdfs_port/data1,hdfs://hdfs_host:hdfs_port/data2,hdfs://hdfs_host:hdfs_port/data3%2c4",
"column_separator" = "|",
"line_delimiter" = "\n"
)
BROKER PROPERTIES (
"username" = "hdfs_user",
"password" = "hdfs_password"
)6. Create a page with HLL List of columns
CREATE TABLE example_db.example_table
(
k1 TINYINT,
k2 DECIMAL(10, 2) DEFAULT "10.5",
v1 HLL HLL_UNION,
v2 HLL HLL_UNION
)
ENGINE=olap
AGGREGATE KEY(k1, k2)
DISTRIBUTED BY HASH(k1) BUCKETS 32
PROPERTIES ("storage_type"="column");
边栏推荐
- Redis memory optimization and distributed locking
- Prefecture level city - air flow coefficient data - updated to 2019 (including 10m wind speed, boundary height, etc.)
- The most complete ongdb and neo4j resource portal in history
- Configuration and practice of shardingsphere JDBC sub database separation of read and write
- (9) Explain broadcasting mechanism in detail
- Isolation level, unreal read, gap lock, next key lock
- [JVM Series 2] runtime data area
- Detailed explanation of curl command
- MySQL learning summary 7: create and manage databases, create tables, modify tables, and delete tables
- Figure data * reconstruction subgraph
猜你喜欢

视频播放屡破1000W+,在快手如何利用二次元打造爆款

MySQL learning summary 12: system variables, user variables, definition conditions and handlers

Sparksql of spark
![[azure data platform] ETL tool (5) -- use azure data factory data stream to convert data](/img/5c/79319a73881b645edaca77990f68a8.jpg)
[azure data platform] ETL tool (5) -- use azure data factory data stream to convert data

2-year experience summary to tell you how to do a good job in project management

Patrick Pichette, partner of inovia, former chief financial officer of Google and current chairman of twitter, joined the board of directors of neo4j
![[JVM series 4] common JVM commands](/img/32/339bf8a2679ca37a285f345ab50f00.jpg)
[JVM series 4] common JVM commands

Domestic zynq standalone pl-ps interrupt commissioning

C simple understanding - generics

技术博客,经验分享宝典
随机推荐
Time processing class in PHP
年金险产品保险期满之后能领多少钱?
Masa auth - SSO and identity design
Differences between XAML and XML
Solution of Kitti data set unable to download
视频播放屡破1000W+,在快手如何利用二次元打造爆款
Summary of rust language practice
Coal industry database - coal price, consumption, power generation & Provincial Civil and industrial power consumption data
Panel for measuring innovation efficiency of 31 provinces in China (using Malmquist method)
MASA Auth - SSO與Identity設計
Use of interceptors webmvcconfigurer
How to write product requirements documents
PHP import classes in namespace
C method parameter: params
简述:分布式CAP理论和BASE理论
Peking University HP financial index - matching enterprise green innovation index 2011-2020: enterprise name, year, industry classification and other multi indicator data
MySQL learning summary 6: data type, integer, floating point number, fixed-point number, text string, binary string
Video playback has repeatedly broken 1000w+, how to use the second dimension to create a popular model in Kwai
Capital digit to number format
English语法_频率副词