当前位置:网站首页>Understand the execution principle of show create table
Understand the execution principle of show create table
2022-06-09 22:42:00 【Data warehouse practitioner】
This article mainly introduces show create table The source flow of command execution , Make sure. sparksql How and hive Metabase interaction , Query the corresponding table metadata, Then the final result is spliced and displayed to the user .
Today's article is also from 【 Source code co reading group 】 A discussion of , Chat first :
We usually pay close attention to select Such a query statement , But little attention is paid to show create table The execution process of such a statement , It is really difficult to find blogs that write relevant content on the Internet . Just borrow this question , Dig deep into the operating principle , therefore , flowers 2 Hours , Roll over the source code , The basic conclusion is obtained :
ha-ha , Thank you for your approval , All the friends want to record a video , Let me write an article first , then , Record another short screen .
Excavate below , The source code is boring , But it is also a window through which we can see the truth ~~
This article is based on spark 3.2
Outline of this article
1、 Writing can simulate from hive Local test class for table lookup
1、 Writing can simulate from hive Local test class for table lookup
We are reading sparksql Source code , For convenience , Basically, they use df.createOrReplaceTempView("XXX") Form like this , To generate some data , These are enough for us to study 90% The above rules , But these cannot be simulated hive The situation of , If we set up a remote connection hive Environment , It takes a lot of energy .
not so bad , stay sparksql In the source code project , We can do it through inheritance TestHiveSingleton, No need to build hive Under the circumstances , To simulate the hive.
This is in 【 Source code read together 】 We will talk about ~~
The test class code is as follows :
2、hive Correspondence between entity classes and metabase tables and fields in
MTable( class )--> TBLS( surface )
MDatabase( class )-->DBS( surface )
MStorageDescriptor( class )-->SDS( surface )
MFieldSchema( class )-->TYPE_FIELDS( surface )
partitionKeys(MTable Class filed) -->PARTITION_KEYS( surface )
parameters (MTable Class filed--> TABLE_PARAMS( surface )
The following configuration contains the corresponding relationships between fields in the class and table fields :
<class name="MTable" table="TBLS" identity-type="datastore" detachable="true">
<datastore-identity>
<column name="TBL_ID"/>
</datastore-identity>
<index name="UniqueTable" unique="true">
<column name="TBL_NAME"/>
<column name="DB_ID"/>
</index>
<field name="tableName">
<column name="TBL_NAME" length="256" jdbc-type="VARCHAR"/>
</field>
<field name="database">
<column name="DB_ID"/>
</field>
<field name="partitionKeys" table="PARTITION_KEYS" >
<collection element-type="MFieldSchema"/>
<join>
<primary-key name="PARTITION_KEY_PK">
<column name="TBL_ID"/>
<column name="PKEY_NAME"/>
</primary-key>
<column name="TBL_ID"/>
</join>
<element>
<embedded>
<field name="name">
<column name="PKEY_NAME" length="128" jdbc-type="VARCHAR"/>
</field>
<field name="type">
<column name="PKEY_TYPE" length="767" jdbc-type="VARCHAR" allows-null="false"/>
</field>
<field name="comment" >
<column name="PKEY_COMMENT" length="4000" jdbc-type="VARCHAR" allows-null="true"/>
</field>
</embedded>
</element>
</field>
<field name="sd" dependent="true">
<column name="SD_ID"/>
</field>
<field name="owner">
<column name="OWNER" length="767" jdbc-type="VARCHAR"/>
</field>
<field name="createTime">
<column name="CREATE_TIME" jdbc-type="integer"/>
</field>
<field name="lastAccessTime">
<column name="LAST_ACCESS_TIME" jdbc-type="integer"/>
</field>
<field name="retention">
<column name="RETENTION" jdbc-type="integer"/>
</field>
<field name="parameters" table="TABLE_PARAMS">
<map key-type="java.lang.String" value-type="java.lang.String"/>
<join>
<column name="TBL_ID"/>
</join>
<key>
<column name="PARAM_KEY" length="256" jdbc-type="VARCHAR"/>
</key>
<value>
<column name="PARAM_VALUE" length="32672" jdbc-type="VARCHAR"/>
</value>
</field>
<field name="viewOriginalText" default-fetch-group="false">
<column name="VIEW_ORIGINAL_TEXT" jdbc-type="LONGVARCHAR"/>
</field>
<field name="viewExpandedText" default-fetch-group="false">
<column name="VIEW_EXPANDED_TEXT" jdbc-type="LONGVARCHAR"/>
</field>
<field name="rewriteEnabled">
<column name="IS_REWRITE_ENABLED"/>
</field>
<field name="tableType">
<column name="TBL_TYPE" length="128" jdbc-type="VARCHAR"/>
</field>
</class>3、 Source code analysis execution process
adopt println, Output show create table orders Physical execution plan for , You can see , Here's what it actually does ShowCreateTableCommand This class .
Code flow :
Two core approaches :
check hive Meta database (ObjectStore.getMTable)
- mtbl = (MTable) query.execute(table, db) Corresponding sql:
Get some basic information about the table (tbl_id, tbl_type etc. )
SELECT
DISTINCT 'org.apache.hadoop.hive.metastore.model.MTable' AS NUCLEUS_TYPE,
A0.CREATE_TIME,
A0.LAST_ACCESS_TIME,
A0.OWNER,
A0.RETENTION,
A0.IS_REWRITE_ENABLED,
A0.TBL_NAME,
A0.TBL_TYPE,
A0.TBL_ID
FROM
TBLS A0
LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID
WHERE
A0.TBL_NAME = ?
AND B0."NAME" = ?debug Medium sql:
sql Correspondence between fields and entity classes :
debug The process is as follows :
You can see that after executing this method , Some basic fields are filled in
- pm.retrieve(mtbl) Corresponding sql:
get database(MDatabase),sd(MStorageDescriptor),parameters,partitionKeys
SELECT
B0."DESC",
B0.DB_LOCATION_URI,
B0."NAME",
B0.OWNER_NAME,
B0.OWNER_TYPE,
B0.DB_ID,
C0.INPUT_FORMAT,
C0.IS_COMPRESSED,
C0.IS_STOREDASSUBDIRECTORIES,
C0.LOCATION,
C0.NUM_BUCKETS,
C0.OUTPUT_FORMAT,
C0.SD_ID,
A0.VIEW_EXPANDED_TEXT,
A0.VIEW_ORIGINAL_TEXT
FROM
TBLS A0
LEFT OUTER JOIN DBS B0 ON A0.DB_ID = B0.DB_ID
LEFT OUTER JOIN SDS C0 ON A0.SD_ID = C0.SD_ID
WHERE
A0.TBL_ID = ?debug Medium sql:
sql Correspondence between fields and entity classes :
debug The process is as follows :
Real calculation parameters,partitionKeys when , There will be another callback , To get the schema:
Based on hive Metadata information , Generate the final presentation (ShowCreateTableCommand.run )
private def showCreateDataSourceTable(metadata: CatalogTable, builder: StringBuilder): Unit = {
//colums
showDataSourceTableDataColumns(metadata, builder)
//table Parameters of : Storage format, etc
showDataSourceTableOptions(metadata, builder)
showDataSourceTableNonDataColumns(metadata, builder)
//table Notes
showTableComment(metadata, builder)
//location
showTableLocation(metadata, builder)
// such as :TBLPROPERTIES
showTableProperties(metadata, builder)
}
The result of the final splicing :
CREATE TABLE `default`.`orders` (
\ n `id` INT,
\ n `make` STRING,
\ n `type` STRING,
\ n `price` INT,
\ n `pdate` STRING,
\ n `customer` STRING,
\ n `city` STRING,
\ n `state` STRING,
\ n `month` INT
) \ nUSING parquet \ nPARTITIONED BY (state, month) \ nTBLPROPERTIES (\ n 'transient_lastDdlTime' = '1651553453') \ n边栏推荐
- 火钳刘明~带专三年小测试总结的一点小小面试建议
- Information leakage and computational complexity of EMD like methods in time series prediction
- 关于mongodb的那些安装、配置、报错处理、CRUD操作等再总结
- 继承的所有特征
- M-Arch(雅特力M4)【AT-START-F425测评】No.06 驱动段码LCD
- Web3在遥远的未来?不,它已经来了!
- Common SQL statements
- C语言试题163之计算某一天是对应年的第几天,这一年一共多少天;计算两个日期之间相隔的天数。两个日期由键盘输入。
- Slightly more complex queries
- 元宇宙或将会取代互联网成为下一个十年,甚至二十年人们主流的生活方式
猜你喜欢

Veracrypt create file type encrypted volume

Find My技术|物联网时代,苹果Find My实现真正的智能防丢

Fire tongs Liu Ming ~ a little interview suggestion summarized from the three-year small test of Dai Zhuan

元宇宙并非法外之地,该如何保护个人权益?

【图像分割】基于各向异性热扩散方程的图像分割附matlab代码

从内核代码了解SQL如何解析

How to use sqlplus to remotely connect ASM instances

Clickhouse series: Clickhouse optimized block+lsm

先睹为快!Benji Bananas 第一季奖励活动数据一览!
![[the second revolution of report tools] optimize report structure and improve report operation performance based on SPL language](/img/53/d6f05e8050e27dc9d59f1196753512.png)
[the second revolution of report tools] optimize report structure and improve report operation performance based on SPL language
随机推荐
Collection operation of MySQL
Integer reverse output of C language test question 166
[image reconstruction] regularization based image super-resolution reconstruction with matlab code
im即时通讯开发:移动端协议UDP还是TCP?
Lidar related introduction
C语言试题162之圆周率π
Web3中的 重复的 Web1历程
Lua learning notes (4) -- building mobdebug remote development environment
【刷题篇】布尔运算
Analyzing native crash using addr2line
The survey shows that MacOS application developers generally say that their biggest challenge is how the product is discovered by users
先睹为快!Benji Bananas 第一季奖励活动数据一览!
Define requestanimationframe to execute once a second
lua学习笔记(4)-- 搭建mobdebug 远程开发环境
Solving definite integral of C language test question 164
10 common high-frequency business scenarios that trigger IO bottlenecks
Who is the slowest child in C language test 169
还在怀疑数字藏品么?国家队都开始入局了
NFT及OpenSea交易背后的技术分享
SQL advanced processing