当前位置：网站首页>Doris creates OLAP, mysql, and broker tables

        CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [database.]table_name
        (column_definition1[, column_definition2, ...])
        [ENGINE = [olap|mysql|broker]]
        [key_desc]
        [partition_desc]
        [distribution_desc]
        [PROPERTIES ("key"="value", ...)];
        [BROKER PROPERTIES ("key"="value", ...)];

describe

1. column_definition

Table field .

grammar ：

col_name col_type [agg_type] [NULL | NOT NULL] [DEFAULT "default_value"]

explain ：

col_name： Column name
col_type： Column type

TINYINT（1 byte ）
     Range ：-2^7 + 1 ~ 2^7 - 1
SMALLINT（2 byte ）
     Range ：-2^15 + 1 ~ 2^15 - 1
INT（4 byte ）
     Range ：-2^31 + 1 ~ 2^31 - 1
BIGINT（8 byte ）
     Range ：-2^63 + 1 ~ 2^63 - 1
LARGEINT（16 byte ）
     Range ：0 ~ 2^127 - 1
FLOAT（4 byte ）
     Support scientific counting 
DOUBLE（12 byte ）
     Support scientific counting 
DECIMAL[(precision, scale)] (40 byte )
     Decimal type to ensure accuracy . The default is  DECIMAL(10, 0)
    precision: 1 ~ 27
    scale: 0 ~ 9
     The integer part is  1 ~ 18
     Scientific counting is not supported 
DATE（3 byte ）
     Range ：1900-01-01 ~ 9999-12-31
DATETIME（8 byte ）
     Range ：1900-01-01 00:00:00 ~ 9999-12-31 23:59:59
CHAR[(length)]
     Fixed length string . Length range ：1 ~ 255. The default is 1
VARCHAR[(length)]
     Variable length string . Length range ：1 ~ 65533
HLL (1~16385 Bytes )
    hll Column type , There is no need to specify the length and default value 、 Length aggregation based on data 
     Degree of control within the system , also HLL Columns can only pass through matching hll_union_agg、Hll_cardinality、hll_hash Query or use

agg_type： Aggregation type

If you don't specify , Then the column is key Column . otherwise , This column is value Column
SUM、MAX、MIN、REPLACE、HLL_UNION( Only used for HLL Column , by HLL Unique aggregation method )
This type only applies to aggregate models (key_desc Of type by AGGREGATE KEY) Useful , Other models do not need to specify this .

Is it allowed to be NULL: The default is not allowed to be NULL.NULL Values are used in the imported data \N To express

2. ENGINE type

Specify the table type , The default is olap. Optional mysql, broker.

1) If it is mysql, You need to in properties Provide the following information ：

PROPERTIES (
"host" = "mysql_server_host",
"port" = "mysql_server_port",
"user" = "your_user_name",
"password" = "your_password",
"database" = "database_name",
"table" = "table_name"
)

"table" In the entry "table_name" yes mysql Real table name in . and CREATE TABLE Statement table_name Is that the mysql Table in Palo The name of , Can be different .
stay Palo establish mysql The purpose of the table is to be able to Palo visit mysql database . and Palo It does not maintain itself 、 Store any mysql data .

2) If it is broker, Indicates that the access to the table needs to pass the specified broker, Need to be in properties Provide the following information ：

PROPERTIES (
"broker_name" = "broker_name",
"paths" = "file_path1[,file_path2]",
"column_separator" = "value_separator"
"line_delimiter" = "value_delimiter"
)

In addition, we need to provide Broker Needed Property Information , adopt BROKER PROPERTIES To pass on , for example HDFS Need to

BROKER PROPERTIES(
"username" = "name",
"password" = "password"
)

This is based on different Broker type , The content that needs to be passed in is also different
"paths" If there are multiple files in , Comma [,] Division
If the file name contains a comma , So use %2c To replace
If the file name contains %, Use %25 Instead of
Now the file content format supports CSV, Support GZ,BZ2,LZ4,LZO(LZOP) Compressed format .

3. key_desc

Specify the data model , Data model concepts .

grammar ：

key_type(k1[,k2 ...])

The data is in accordance with the specified key Sort columns , And according to different key_type With different characteristics .

key_type Support the following types ：

AGGREGATE KEY ：key List the same records ,value Columns are aggregated according to the specified aggregation type , Fit Report 、 Multidimensional analysis and other business scenarios .
UNIQUE KEY：key List the same records ,value Columns are overwritten in the order of import , Suitable for pressing key List the point query businesses for adding, deleting, and modifying queries .
DUPLICATE KEY：key List the same records , At the same time Palo in , It is suitable for business scenarios where detailed data is stored or data is not aggregated .

Be careful ： except AGGREGATE KEY Outside , other key_type Under table ,value Column does not need to specify aggregate type .

4. partition_desc

The specified partition .

1) Range Partition
grammar ：

            PARTITION BY RANGE (k1)
            (
            PARTITION partition_name VALUES LESS THAN MAXVALUE|("value1")
            PARTITION partition_name VALUES LESS THAN MAXVALUE|("value2")
            ...
            )

explain ：

             Use specified  key  Column and the specified range of values .
            1)  Partition names only support beginning with letters , Letter 、 Numbers and underscores 
            2)  Currently, only the following types of columns are supported as  Range  Partition column , Only one partition column can be specified 
                TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME
            3)  The section is left closed and right open , The left boundary of the first partition is the minimum 
            4) NULL  Values are stored only in the partition containing the minimum value . When the partition containing the minimum value is deleted ,NULL  Values will not be imported .

Be careful ：


1)  Partitions are generally used for data management of time dimension 
2)  There is a need for data backtracking , Consider that the first partition is empty , So that partitions can be added later

5. distribution_desc

Hash Points barrels
grammar ：
DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]
explain ：
Use specified key Column to hash buckets . The default number of partitions is 10

Suggest : It is recommended to use Hash Barrel separation method

6. PROPERTIES

1) If ENGINE The type is olap, Can be in properties Specified column storage in （ At present, only listing and saving are supported ）

            PROPERTIES (
            "storage_type" = "[column]",
            )

2) If ENGINE The type is olap, Can be in properties Set the initial storage medium for the table data 、 Storage expiration time and number of copies .

           PROPERTIES (
           "storage_medium" = "[SSD|HDD]",
           ["storage_cooldown_time" = "yyyy-MM-dd HH:mm:ss"],
           ["replication_num" = "3"]
           )

storage_medium： Used to specify the initial storage media for this partition , Can choose SSD or HDD. The default is HDD.
storage_cooldown_time： When the storage medium is set to SSD when , Specify that the partition is in SSD Storage expiration time on . Default storage 7 God . The format is ："yyyy-MM-dd HH:mm:ss".
replication_num: Specify the number of copies of the partition . The default is 3.
When the table is a single partition table , These attributes are the attributes of the table .
When the table is a two-level partition , These attributes are attached to each partition .
If you want different partitions to have different properties . Can pass ADD PARTITION or MODIFY PARTITION To operate

3) If Engine The type is olap, also storage_type by column, You can specify a column to use bloom filter Indexes

bloom filter The index only applies when the query condition is in and equal The situation of , The more scattered the values in this column, the better the effect

Currently, only the following columns are supported : except TINYINT FLOAT DOUBLE Out of type key The column and aggregation method are REPLACE Of value Column

PROPERTIES (
   "bloom_filter_columns"="k1,k2,k3"
)

4) If you want to use Colocate Join characteristic , Need to be in properties It is specified in

           PROPERTIES (
           "colocate_with"="table1"
           )

Two 、 Table creation instance

1. Create a olap surface , Use HASH Points barrels , Use column store , identical key Aggregate records

        CREATE TABLE example_db.table_hash
        (
        k1 TINYINT,
        k2 DECIMAL(10, 2) DEFAULT "10.5",
        v1 CHAR(10) REPLACE,
        v2 INT SUM
        )
        ENGINE=olap
        AGGREGATE KEY(k1, k2)
        DISTRIBUTED BY HASH(k1) BUCKETS 32
        PROPERTIES ("storage_type"="column");

2. Create a olap surface , Use Hash Points barrels , Use column store , identical key Set the initial storage medium and cooling time

CREATE TABLE example_db.table_hash
        (
        k1 BIGINT,
        k2 LARGEINT,
        v1 VARCHAR(2048) REPLACE,
        v2 SMALLINT SUM DEFAULT "10"
        )
        ENGINE=olap
        UNIQUE KEY(k1, k2)
        DISTRIBUTED BY HASH (k1, k2) BUCKETS 32
        PROPERTIES(
        "storage_type"="column",
        "storage_medium" = "SSD",
        "storage_cooldown_time" = "2015-06-04 00:00:00"
        );

3. Create a olap surface , Use Key Range Partition , Use Hash Points barrels , By default, column save is used ,
identical key Records of exist at the same time , Set the initial storage medium and cooling time

CREATE TABLE example_db.table_range
        (
        k1 DATE,
        k2 INT,
        k3 SMALLINT,
        v1 VARCHAR(2048),
        v2 DATETIME DEFAULT "2014-02-04 15:36:00"
        )
        ENGINE=olap
        DUPLICATE KEY(k1, k2, k3)
        PARTITION BY RANGE (k1)
        (
        PARTITION p1 VALUES LESS THAN ("2014-01-01"),
        PARTITION p2 VALUES LESS THAN ("2014-06-01"),
        PARTITION p3 VALUES LESS THAN ("2014-12-01")
        )
        DISTRIBUTED BY HASH(k2) BUCKETS 32
        PROPERTIES(
        "storage_medium" = "SSD", "storage_cooldown_time" = "2015-06-04 00:00:00"
        );

explain ：
This statement will divide the data into the following 3 Zones ：
( { MIN }, {"2014-01-01"} )
[ {"2014-01-01"}, {"2014-06-01"} )
[ {"2014-06-01"}, {"2014-12-01"} )

Data outside these partitions will be treated as illegal data and filtered

4. Create a mysql surface

 CREATE TABLE example_db.table_mysql
        (
        k1 DATE,
        k2 INT,
        k3 SMALLINT,
        k4 VARCHAR(2048),
        k5 DATETIME
        )
        ENGINE=mysql
        PROPERTIES
        (
        "host" = "127.0.0.1",
        "port" = "8239",
        "user" = "mysql_user",
        "password" = "mysql_passwd",
        "database" = "mysql_db_test",
        "table" = "mysql_table_test"
        )

5. Create a data file stored in HDFS Upper broker External table , Data usage "|" Division ,"\n" Line break

CREATE EXTERNAL TABLE example_db.table_broker (
        k1 DATE,
        k2 INT,
        k3 SMALLINT,
        k4 VARCHAR(2048),
        k5 DATETIME
        )
        ENGINE=broker
        PROPERTIES (
        "broker_name" = "hdfs",
        "path" = "hdfs://hdfs_host:hdfs_port/data1,hdfs://hdfs_host:hdfs_port/data2,hdfs://hdfs_host:hdfs_port/data3%2c4",
        "column_separator" = "|",
        "line_delimiter" = "\n"
        )
        BROKER PROPERTIES (
        "username" = "hdfs_user",
        "password" = "hdfs_password"
        )

6. Create a page with HLL List of columns

CREATE TABLE example_db.example_table
        (
        k1 TINYINT,
        k2 DECIMAL(10, 2) DEFAULT "10.5",
        v1 HLL HLL_UNION,
        v2 HLL HLL_UNION
        )
        ENGINE=olap
        AGGREGATE KEY(k1, k2)
        DISTRIBUTED BY HASH(k1) BUCKETS 32
        PROPERTIES ("storage_type"="column");

原网站

版权声明
本文为[TRX1024]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280529584069.html