当前位置:网站首页>Use and cases of partitions
Use and cases of partitions
2022-07-27 00:59:00 【A photographer who can't play is not a good programmer】
The use of partitioned tables
List of articles
Why use partitions
In the use of select When querying data , If there are no restrictions, the entire table will be scanned , But there are many data that are not what we want , It takes a lot of time .
Partition table corresponds to hdfs It is a separate folder ( There will be cases later ), When searching for data, we only need to scan the data in this folder , Greatly improve the efficiency of the query . Partitions are divided into static partitions and dynamic partitions , Later, we will focus on the differences between the two .
Let's learn about partitions from a simple example :
Static partitioning
1. Create a partition table :
CREATE table student(id int, name string, age int)
PARTITIONED by(dt string)
ROW format delimited fields terminated by ','
STORED as textfile;
2. insert data
1,zs,18
2,li,19
3,ws,20
load data local inpath '/opt/data/student.txt' overwrite into table student partition(dt='2022-07-20');
insert into table student partition(dt='2022-07-21') values(4,'qq',21);
insert into table student partition(dt='2022-07-22') values(5,'ww',22);
insert into table student partition(dt='2022-07-23') values(6,'ee',23);
3. Query data in table
hive > select * from student;
OK
student.id student.name student.age student.dt
1 zs 18 2022-07-20
2 li 19 2022-07-20
3 ws 20 2022-07-20
4 qq 21 2022-07-21
5 ww 22 2022-07-22
6 ee 23 2022-07-23
Time taken: 0.293 seconds, Fetched: 6 row(s)
Query the data in a partition :
hive > select * from student where dt = '2022-07-20';
OK
student.id student.name student.age student.dt
1 zs 18 2022-07-20
2 li 19 2022-07-20
3 ws 20 2022-07-20
Time taken: 0.361 seconds, Fetched: 3 row(s)
4.(1) View the partitions in the table
hive > show partitions student;
OK
partition
dt=2022-07-20
dt=2022-07-21
dt=2022-07-22
dt=2022-07-23
Time taken: 0.104 seconds, Fetched: 4 row(s)
(2) see hdfs Corresponding directory structure 

You can see different partitions in hdfs Corresponding to different directories
Dynamic partitioning :
Create a partition table :
create table student_dync(
id int,
name string,
age int
) partitioned by(dt string)
row format delimited fields terminated by ','
stored as textfile;
Insert data into table :
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table student_dync partition (dt)
select
id,
name,
age,
dt
from student_tmp;
No partition is specified when inserting data into the table , Then look at the partitions in the following table :
hive > show partitions student_dync;
OK
partition
dt=2022-08-01
dt=2022-08-02
dt=2022-08-03
dt=2022-08-04
Time taken: 0.058 seconds, Fetched: 4 row(s)

You can also see that different partitions correspond to different directories .
summary : The difference between dynamic partition and static partition
1. Static partition columns are determined at compile time by manually specifying partitions , Dynamic partitioning is in sql Only when it is implemented can it be determined .
2. Because the partition needs to be manually specified in use , Therefore, static partitions are suitable for a small number of partitions 、 Data with clear partition name
3. Dynamic partition is suitable for the situation with a large number of partitions , The partition field position of dynamic partition is at the end , Multiple partition fields are placed in partition order
4. When inserting data , Static partition support load and insert Two ways , Dynamic partitioning only supports insert The way
边栏推荐
猜你喜欢

Data warehouse knowledge points
![[WUSTCTF2020]CV Maker](/img/64/06023938e83acc832f06733b6c4d63.png)
[WUSTCTF2020]CV Maker

10 Web APIs

Application of encoding in XSS
![[HFCTF2020]EasyLogin](/img/23/91912865a01180ee191a513be22c03.png)
[HFCTF2020]EasyLogin
![[CISCN2019 华东南赛区]Double Secret](/img/51/9597968ff1747a67e10a70b785ee9f.png)
[CISCN2019 华东南赛区]Double Secret

One of the Flink requirements - processfunction (requirement: alarm if the temperature rises continuously within 30 seconds)
Alibaba internal "shutter" core advanced notes~

VMware Workstation 虚拟机启动就直接蓝屏重启问题解决

Solve the problem of direct blue screen restart when VMware Workstation virtual machine starts
随机推荐
[CTF攻防世界] WEB区 关于Cookie的题目
Elaborate on the differences and usage of call, apply and bind 20211031
[ciscn2019 North China division Day1 web2]ikun
SparkSql之编程方式
Logback custom messageconverter
Spark源码学习——Data Serialization
2022.7.9DAY601
JSCORE day_ 02(7.1)
JSCORE day_04(7.5)
2022.DAY600
MySql - 如何确定一个字段适合构建索引?
[b01lers2020]Welcome to Earth
SSRF explanation and burp automatic detection SSRF
[NPUCTF2020]ezinclude
Redisson 工作原理-源码分析
el-checkbox中的checked勾选状态问题 2021-08-02
[WUSTCTF2020]CV Maker
[BJDCTF2020]EzPHP
[漏洞实战] 逻辑漏洞挖掘
Consistency inspection and evaluation method kappa