当前位置:网站首页>Try new functions | decrypt Doris complex data type array
Try new functions | decrypt Doris complex data type array
2022-07-26 22:06:00 【ApacheDoris】

About ARRAY type
Complex data types are usually combinations of existing types , Generally, they have the ability to directly access and process the data in the portfolio . Common complex types are :ARRAY、MAP、STRUCT etc. .
Doris Complex data type ARRAY Already available in master Branch first experience , expect 1.2 Version release . The main purpose of this article is to introduce Doris Complex type ARRAY The basic usage of , It is convenient for the little partners in need to experience .
ARRAY Basic usage
Turn on ARRAY switch
Want to experience ARRAY , Need to open ARRAY Type switch (enable_array_type)
sql> set enable_array_type=true;
First step Create a ARRAY List of columns
The grammar is ARRAY<T> ,T by ARRAY Subtypes of . At present ARRAY The supported subtypes are :BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE, DATETIME, CHAR, VARCHAR, STRING
notes : At present ARRAY Support only DUPLICATE KEY data Use on the model .
sql> CREATE TABLE `array_test` (`id` INT NULL,`c_array` ARRAY<INT> NULL) ENGINE=OLAPDUPLICATE KEY(`id`)DISTRIBUTED BY HASH(`id`) BUCKETS 5PROPERTIES ("replication_allocation" = "tag.location.default: 1");
The second step Import ARRAY data
ARRAY Columns can be NULL、 It's empty , ARRAY The interior can also contain NULL Elements
sql> INSERT INTO `array_test` VALUES(1, [1, 2, 3]), (2, [4, NULL, 5, NULL]), (3, []), (4, NULL);
The third step see ARRAY data
sql> SELECT * FROM `array_test` ORDER BY `id`;+------+--------------------+| id | c_array |+------+--------------------+| 1 | [1, 2, 3] || 2 | [4, NULL, 5, NULL] || 3 | [] || 4 | NULL |+------+--------------------+
ARRAY Data import
At present ARRAY Support Json、Parquet、Orc、Csv Format data import . Let's say Json Format as an example , Introduce to you ARRAY Data import :
First step Data preparation
Local preparation Json Format data , close ARRAY Column :
[{"id": 1, "c_array": [1,2,3]}, {"id": 2, "c_array": [4,5]}]
The second step Data import
adopt stream load The way will be ready json Import test data into Doris:
curl --location-trusted -u $user:$pwd -H "strip_outer_array: true" -H "format: json" -T local_json_input.json -XPUT http://127.0.0.1:$port/api/$db/$table/_stream_load{"TxnId": 13021,"Label": "1f83c4a1-43ad-49d6-8134-5d40f3fc35c3","TwoPhaseCommit": "false","Status": "Success","Message": "OK","NumberTotalRows": 2,"NumberLoadedRows": 2,"NumberFilteredRows": 0,"NumberUnselectedRows": 0,"LoadBytes": 61,"LoadTimeMs": 36,"BeginTxnTimeMs": 0,"StreamLoadPutTimeMs": 2,"ReadDataTimeMs": 0,"WriteDataTimeMs": 16,"CommitAndPublishTimeMs": 14}
The third step View the import results
View the imported results , Find out ARRAY Data has been imported successfully :
sql> SELECT * FROM `array_test`;+------+-----------+| id | c_array |+------+-----------+| 1 | [1, 2, 3] || 2 | [4, 5] |+------+-----------+
ARRAY Related functions
The advantage of complex types is that they can directly access and process internal elements ,Doris Provides a wealth of ARRAY Processing function . because Doris Users follow Hive And Spark High user overlap ,Doris ARRAY Correlation function syntax , Follow Hive And Spark High grammatical similarity .
At present ARRAY The supported related functions are summarized in the following table , More detailed semantics and usage , Can be in Doris Search the corresponding function in the manual .
notes :ARRAY Correlation function , Only supported in vectorization engine (set enable_vectorized_engine=true open )
Doris ARRAY Correlation function summary | |
function | explain |
element_at | obtain ARRAY pass the civil examinations N Elements |
array_slice | obtain ARRAY Some elements specified in |
array_contains | Judge ARRAY Whether it contains an element |
array_position | return ARRAY Where an element first appears in |
cardinality(size) | return ARRAY Number of elements included |
array_min | return ARRAY The smallest element in |
array_max | return ARRAY The largest element in |
array_sum | return ARRAY The cumulative sum of elements in |
array_avg | return ARRAY The average of the elements in |
array_product | return ARRAY The product of the elements in |
arrays_overlap | Whether two ARRAY Whether there are common elements |
array_union | Return to two ARRAY Union |
array_intersect | Return to two ARRAY Intersection |
array_except | Return to two ARRAY The difference between the set |
array_sort | Yes ARRAY The elements in are sorted |
reverse(todo) | take ARRAY Reverse the elements in |
array_distinct | ARRAY De duplication of elements in |
array_remove | Delete ARRAY Element specified in |
concat_ws | take ARRAY The elements in are spliced into String |
explode | ARRAY Column turned ( Need to cooperate with Lateral View Use ) |
| explode_outer | |
collect_list | ARRAY Transfer line column |
| collect_set | |
ARRAY Examples of row column conversion
When we are doing data analysis , Transfer line column 、 The need for column to row conversion is very common . Here are some examples Doris in ARRAY How to convert rows and columns .
ARRAY Column turned
for example , There is a table recording the company where someone once worked (array_demo_col)
sql> SELECT * FROM `array_demo_col` ORDER BY `name`;+----------+--------------------------------------+| name | companies |+----------+--------------------------------------+| lisi | ['companyB', 'companyT', 'companyD'] || zhangsan | ['companyA', 'companyT'] |+----------+--------------------------------------+
Now I want to check , Who has been companyB Worked in the company , We can go through explode Column turned , Then filter according to the specific situation to realize :
sql> SELECT `name` FROM `array_demo_col` LATERAL VIEW explode(`companies`) comTable AS company WHERE company='companyB';+------+| name |+------+| lisi |+------+
ARRAY Transfer line column
Continue with the table above (array_demo_col) For example , If we want to group by company , Colleagues who used to work in the company are ARRAY In the column . At this time , We can use collect_list To perform row column conversion , Concrete example SQL And the effect is as follows :
sql> SELECT `company`,collect_list(`name`) AS namesFROM(SELECT `name`,`company`FROM `array_demo_col` LATERAL VIEW explode(`companies`) comTable AS company) AS array_demo_rowGROUP BY `company` ORDER BY `company`;+----------+----------------------+| company | names |+----------+----------------------+| companyA | ['zhangsan'] || companyB | ['lisi'] || companyD | ['lisi'] || companyT | ['lisi', 'zhangsan'] |+----------+----------------------+
summary
This paper mainly introduces Doris Complex data type ARRAY data The basic usage of , List of capabilities and some limitations in use , It is convenient for everyone to better understand and experience ARRAY data type . There are any problems or uncovered needs in your experience , Please refer to the end of the article Feedback Contact us partly .
Limited space ,ARRAY The implementation principle of will be later It is gradually revealed in the continuation article , Welcome to pay attention Apache Doris Get the latest content through the official official account .
- do person Medium Shao -
Zhu Xiaoli
Baidu PALO Senior R & D Engineer of the team , Rich experience in storage engine and big data research and development , Good at Doris Execute engine and storage engine development . yes Apache Doris Complex data type ARRAY Main implementers .
If you encounter any use problems , Welcome anytime adopt GitHub Discussion Forum or Dev Mail group Get in touch with us .
GitHub Forum :https://github.com/apache/doris/discussions
Dev Mail group :[email protected]
WeChat official account :

Apache Doris Official website :
http://doris.apache.org
Apache Doris Github:
https://github.com/apache/doris
Apache Doris Developer mail group :
Excellent article recommendation
Application practice | Knowledge is based on Apache Doris Of DMP System architecture and practice Technical realization | Apache Doris Hot and cold data storage ( One ) Apache Doris Technical realization - Hot and cold data storage ( Two ) Application practice | Routine Load Import Kafka Data consumption delay tuning
This article is from WeChat official account. - ApacheDoris(gh_80d448709a68).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- [MySql]substr用法-查询表的某个字段的具体位数的值
- Japan approves the export of EUV photoresist to South Korea, and the crisis of Samsung and SK Hynix may be alleviated
- 开发转测试:从零开始的6年自动化之路
- [shutter -- geTx] pop up - dialog, snackBar, bottomsheet
- js验证复杂密码
- Flask对token的解码&挂载&装饰器&七牛云上传
- Go --- identifiers and keywords in go language
- Just one dependency to give swagger a new skin, which is simple and cool~
- day07-
- NPM, NPM Chinese documents, NPM learning and using
猜你喜欢

开发转测试:从零开始的6年自动化之路

Leetcode exercise - Sword finger offer II 005. maximum product of word length

09 expr 命令

08 du 命令

Triangular wave spectrum of MATLAB excitation model

Isilon's onefs common operation commands (I)

07 df 命令

Attack and defense world ----- ics-07

Also on Data Governance

《暑假每日一题》Week 7:7.18 - 7.24
随机推荐
也谈数据治理
Thoroughly understand the principle and implementation of service discovery
Is it safe to open an account on flush? How to choose a securities firm for opening an account
xshell7个人免费下载,使用
What to do if the browser home page is tampered with, and how to recover if the home page is tampered with
Ren Zhengfei talked about the suppression of the United States again: to live is to win, and to defeat the United States
Flash source code startup phase
Flink 在讯飞 AI 营销业务的实时数据分析实践
[shutter -- geTx] pop up - dialog, snackBar, bottomsheet
Flink's real-time data analysis practice in iFLYTEK AI marketing business
MOS 管示意图
Go----Go 语言中的标识符和关键字
逻辑漏洞----任意账号注册
July training (the 26th day) - and check the collection
深入源码剖析String类为什么不可变?(还不明白就来打我)
Schematic diagram of MOS tube
Overview of MPLS Basics
Altium designer 22 Chinese character garbled
In depth analysis of the source code, why is the string class immutable? (hit me before you understand)
Knowledge base tools | wechat, document center, image display page can be generated by dragging (with template, directly used)