当前位置:网站首页>What is partition and barrel division?
What is partition and barrel division?
2022-07-25 22:05:00 【Big gray wolf】
What is partition and barrel ?
One 、 Partition

1、 What is partition
When storing the data of the entire table , according to ” Column value of partition key “ Divide into multiple subdirectories to store . The area can be formally understood as a folder .
Be careful :
The subdirectory name is the partition name ( Column value of partition key ).
2、 Why partition
With the increase of system running time , The amount of data in the table will be larger and larger , and Hive When querying data, we usually use " Full table scan ", This will greatly reduce the query efficiency . Hive Introduced partition technology , Can avoid Hive Full table scan , Improve query efficiency .
For example, we need to collect the log data of a large website , Daily log data of a website exists on the same table , Because a large number of logs are generated every day , This results in huge data table contents , Full table scanning during query consumes a lot of resources . Well, in this case , We can partition the data table by date , Data of different dates are stored in different partitions , When querying, you can directly search from the partition by specifying the value of the partition field .
3、 Create a partition table
When creating partition tables , Through keywords partitioned by (name string) Declare that the table is a partitioned table , And according to the field name partition ,name All records with consistent values are stored in one partition , Partition properties name The type is string type . Of course , You can partition by multiple columns , That is, continue to partition the data of a partition according to some columns .
Be careful :
Never think of it as partitioning the real columns in the attribute table according to the similarities and differences of attribute values . For example, the column on which the partition is based name It doesn't really exist in the data table , It's a pseudo column we added to facilitate management , The value of this column is also artificially specified , It is not partitioned according to different values after reading from the data table . We cannot follow the actual columns in a data table , Such as userid To partition .
Two 、 Points barrels
1、 What's a split barrel
Bucket partition is a more fine-grained partition than partition . Bucket is realized by hashing the specified column , The data under a column name is divided into a group of buckets by hash value , And make each bucket correspond to a storage file under the column name .
Be careful :
stay hdfs In the catalogue , Barrels exist in the form of documents , Rather than exist in the form of folders like partitions .
2、 Why divide the barrels
When the number of partitions is so large that it may cause the file system to crash , We need to use buckets to solve the problem .
The data in the partition can be further split into buckets , Unlike partitioning, columns are split directly , Buckets often use the hash value of the column to break up the data , And distributed to different buckets, so as to complete the data bucket division process .
Be careful
hive Use the value used to divide the bucket hash, And use hash Divide the result by the number of buckets to divide the buckets , Ensure that there is data in each bucket , But the number of data pieces in each bucket is not necessarily equal .
3、 Create buckets
Unlike partitions , The partition is not based on columns in the real data table file , It's the pseudo column we specify , But the bucket division is based on the real column in the data table, not the pseudo column . Therefore, when specifying the column on which the partition is based, you should specify the type of column , Because this column does not exist in the data table file , It is equivalent to creating a new column . The bucket division is based on the existing columns in the table , The data type of this column is obviously known , So you don't need to specify the type of column .
Reference article
[1] Hive Partition 、 Barrel dividing operation and its comparison
边栏推荐
- 面了个腾讯三年经验的测试员,让我见识到了真正的测试天花板
- 【饭谈】那些看似为公司着想,实际却很自私的故事 (一:造轮子)
- C language: random generated number + bubble sort
- Ability to choose
- [hand torn STL] unordered_ set、unordered_ Map (encapsulated with hash table)
- JSP novice
- New maixhub deployment (v831 and k210)
- Composition of dog food
- EL表达式改进JSP
- C语言游戏 双缓存解决闪屏问题 详细总结[通俗易懂]
猜你喜欢

Summary of function test points of wechat sending circle of friends on mobile terminal

Performance debugging -- chrome performance

Shopify sellers: share some tips for social media marketing!

『SignalR』.NET使用 SignalR 进行实时通信初体验

The automation testing post spent 20K recruiting, but in the end, there was no suitable one. Both fresh students are better than them

Redis 使用详解
![[dinner talk] those things that seem to be for the sake of the company but are actually incomprehensible (2: soft quality](/img/11/42c4674d23ee93850fb3d2de0d0932.png)
[dinner talk] those things that seem to be for the sake of the company but are actually incomprehensible (2: soft quality "eye edge" during interview)

如何将一个域名解析到多个IP地址?

虚拟内存与磁盘

jenkins+SVN配置
随机推荐
【GO基础02】第一个程序
Mouseevent event -- mouse coordinate description -- Focus event -- input event -- throttle -- mousewheel (wheel event)
Create EDA - why should I learn EDA
How to solve the problem of using the download Plug-in for export?
Tesseract OCR初探
All you want to know about interface testing is here
What should I do if I encounter the problem of verification code during automatic testing?
JSP初识
JS timer and swiper plug-in
Redis master-slave architecture lock failure problem (master-slave)
Redis foundation 2 (notes)
开源的RSS订阅器FreshRSS
El expression improves JSP
如何将一个域名解析到多个IP地址?
Uninstall NPM and install NPM_ Use 'NPM uninstall' to uninstall the NPM package 'recommended collection'
Bitcoin.com:usdd represents a truly decentralized stable currency
Animation curves are used every day. Can you make one by yourself? After reading this article, you will!
Children's programming electronic society graphical programming level examination scratch level 1 real problem analysis (judgment question) June 2022
sql语句练习题整理
mouseEvent事件——mouse坐标描述——focus事件——input事件——节流(thorttle)——mouseWheel(滚轮事件)