当前位置:网站首页>How Clickhouse queries imported data within a specified time period
How Clickhouse queries imported data within a specified time period
2022-06-24 05:08:00 【jasong】
One purpose
- Data query
- Data migration and import
Two Why say ClickHouse Data migration
- Clickhouse copier No incremental import
- Clickhouse remote slower , And for ClickHouse Internal table
- The data filtering dimension is small
3、 ... and ClickHouse MergeTreeData
QueryPlanPtr MergeTreeDataSelectExecutor::readFromParts(
MergeTreeData::DataPartsVector parts,
const Names & column_names_to_return,
const StorageMetadataPtr & metadata_snapshot,
const SelectQueryInfo & query_info,
const Context & context,
const UInt64 max_block_size,
const unsigned num_streams,
const PartitionIdToMaxBlock * max_block_numbers_to_read) const
{
for (const String & name : column_names_to_return)
{
if (name == "_part")
{
part_column_queried = true;
virt_column_names.push_back(name);
}
else if (name == "_part_index")
{
virt_column_names.push_back(name);
}
else if (name == "_partition_id")
{
virt_column_names.push_back(name);
}
else if (name == "_part_uuid")
{
part_uuid_column_queried = true;
virt_column_names.push_back(name);
}
else if (name == "_sample_factor")
{
sample_factor_column_queried = true;
virt_column_names.push_back(name);
}
else
{
real_column_names.push_back(name);
}
}3.1 How to use it
- ClickHouse MergeTree The data has the above virtual fields
- So we can simply and directly limit the data dimension without modifying the code part Granularity
Four operations
4.1 Create tables and import
## 1 View table fields DESCRIBE TABLE db_1.test_26 Query id: 856af95b-cb07-43d9-a776-5e6fd3d3c456 ┌─name──┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐ │ id │ UInt16 │ │ │ │ │ │ │ value │ UInt32 │ │ │ │ │ │ │ dt │ Date │ │ │ │ │ │ └───────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘ 3 rows in set. Elapsed: 0.004 sec. ## Write ignore
4.2 Inquire about
## 2 View all data SELECT * FROM db_1.test_26 Query id: 6211055b-02af-482e-bc55-ccd765b0b929 ┌─id─┬─value─┬─────────dt─┐ │ 11 │ 2013 │ 1975-06-12 │ └────┴───────┴────────────┘ ┌─id─┬─value─┬─────────dt─┐ │ 11 │ 2013 │ 1975-06-12 │ │ 11 │ 2013 │ 1975-06-12 │ │ 11 │ 2013 │ 1975-06-12 │ │ 11 │ 2013 │ 1975-06-12 │ └────┴───────┴────────────┘ ┌─id─┬─value─┬─────────dt─┐ │ 11 │ 2013 │ 1975-06-12 │ └────┴───────┴────────────┘ 6 rows in set. Elapsed: 0.148 sec.
4.3 _part Virtual hidden fields
## 3 View the corresponding data part
SELECT
id,
value,
dt,
_part
FROM db_1.test_26
Query id: b7d81a80-089a-4434-b82e-a0e27c60c8ac
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_5_5_0 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_6_6_0 │
└────┴───────┴────────────┴──────────────┘
6 rows in set. Elapsed: 0.111 sec. 4.4 system.parts utilize
DESCRIBE TABLE system.parts
Query id: 2dea5ab6-6857-4708-8919-a09f2382f059
┌─name──────────────────────────────────┬─type────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ partition │ String │ │ │ │ │ │
│ name │ String │ │ │ │ │ │
│ uuid │ UUID │ │ │ │ │ │
│ part_type │ String │ │ │ │ │ │
│ active │ UInt8 │ │ │ │ │ │
│ marks │ UInt64 │ │ │ │ │ │
│ rows │ UInt64 │ │ │ │ │ │
│ bytes_on_disk │ UInt64 │ │ │ │ │ │
│ data_compressed_bytes │ UInt64 │ │ │ │ │ │
│ data_uncompressed_bytes │ UInt64 │ │ │ │ │ │
│ marks_bytes │ UInt64 │ │ │ │ │ │
│ modification_time │ DateTime │ │ │ │ │ │
│ remove_time │ DateTime │ │ │ │ │ │
│ refcount │ UInt32 │ │ │ │ │ │
│ min_date │ Date │ │ │ │ │ │
│ max_date │ Date │ │ │ │ │ │
│ min_time │ DateTime │ │ │ │ │ │
│ max_time │ DateTime │ │ │ │ │ │
│ partition_id │ String │ │ │ │ │ │
│ min_block_number │ Int64 │ │ │ │ │ │
│ max_block_number │ Int64 │ │ │ │ │ │
│ level │ UInt32 │ │ │ │ │ │
│ data_version │ UInt64 │ │ │ │ │ │
│ primary_key_bytes_in_memory │ UInt64 │ │ │ │ │ │
│ primary_key_bytes_in_memory_allocated │ UInt64 │ │ │ │ │ │
│ is_frozen │ UInt8 │ │ │ │ │ │
│ database │ String │ │ │ │ │ │
│ table │ String │ │ │ │ │ │
│ engine │ String │ │ │ │ │ │
│ disk_name │ String │ │ │ │ │ │
│ path │ String │ │ │ │ │ │
│ hash_of_all_files │ String │ │ │ │ │ │
│ hash_of_uncompressed_files │ String │ │ │ │ │ │
│ uncompressed_hash_of_compressed_files │ String │ │ │ │ │ │
│ delete_ttl_info_min │ DateTime │ │ │ │ │ │
│ delete_ttl_info_max │ DateTime │ │ │ │ │ │
│ move_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ move_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ move_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ default_compression_codec │ String │ │ │ │ │ │
│ recompression_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ recompression_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ recompression_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ group_by_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ group_by_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ group_by_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ rows_where_ttl_info.expression │ Array(String) │ │ │ │ │ │
│ rows_where_ttl_info.min │ Array(DateTime) │ │ │ │ │ │
│ rows_where_ttl_info.max │ Array(DateTime) │ │ │ │ │ │
│ bytes │ UInt64 │ ALIAS │ bytes_on_disk │ │ │ │
│ marks_size │ UInt64 │ ALIAS │ marks_bytes │ │ │ │
└───────────────────────────────────────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
51 rows in set. Elapsed: 0.006 sec.
## 4 see part Modification log
SELECT
name,
modification_time
FROM system.parts
WHERE (database = 'db_1') AND (table = 'test_26')
Query id: 3e8b8a92-cfbe-4a87-bdc3-8a3b420a29a4
┌─name─────────┬───modification_time─┐
│ 197506_1_4_1 │ 2021-08-14 23:39:19 │
│ 197506_5_5_0 │ 2021-08-17 09:55:16 │
│ 197506_6_6_0 │ 2021-08-24 16:54:11 │### At present part The data will be filtered out later
└──────────────┴─────────────────────┘
3 rows in set. Elapsed: 0.020 sec.4.5 Filter
### 5 Filter the data we want
### eg : part Date on 2021-08-24 16:00:00 Previous data
### Through the original table and system table system.parts Migration
### 197506_6_6_0 The part The data is filtered out
SELECT
id,
value,
dt,
_part
FROM db_1.test_26 AS A
INNER JOIN system.parts AS B ON A._part = B.name
WHERE B.modification_time < '2021-08-24 16:00:00'
Query id: 8f9345dd-3529-4d80-beaf-bc0457d64dc9
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_5_5_0 │
└────┴───────┴────────────┴──────────────┘
┌─id─┬─value─┬─────────dt─┬─_part────────┐
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
│ 11 │ 2013 │ 1975-06-12 │ 197506_1_4_1 │
└────┴───────┴────────────┴──────────────┘4.6 Get our data
### 6 What needs to be executed finally SQL
SELECT
id,
value,
dt
FROM db_1.test_26 AS A
INNER JOIN system.parts AS B ON A._part = B.name
WHERE B.modification_time < '2021-08-24 16:00:00'
Query id: 29794880-0ccb-43c9-8618-65b8c438086a
┌─id─┬─value─┬─────────dt─┐
│ 11 │ 2013 │ 1975-06-12 │
│ 11 │ 2013 │ 1975-06-12 │
│ 11 │ 2013 │ 1975-06-12 │
│ 11 │ 2013 │ 1975-06-12 │
└────┴───────┴────────────┘
┌─id─┬─value─┬─────────dt─┐
│ 11 │ 2013 │ 1975-06-12 │
└────┴───────┴────────────┘
5 rows in set. Elapsed: 0.138 sec. 5、 ... and CDW-ClickHouse
Tencent cloud CDW-ClickHouse data ETL To Oceanus
Oceanus Use ClickHouse-JDBC Action link ClickHouse
Then we can pass Oceanus Control time range
Realization ClickHouse Full and incremental import and ClickHouse And migration ClickHouse
Oceanus ClickHouse Data warehouse
边栏推荐
- Is it useful to build an industrial knowledge map platform?
- 3 minutes to understand JSON schema
- Spirit breath development log (15)
- Cos+cdn realizes the distribution of game client version files in a faster, better and cheaper manner
- Hard core observation 553 AI needs to identify almost everyone in the world with hundreds of billions of photos
- RedHat 8 time synchronization and time zone modification
- IP and traffic reconciliation tool networktrafficview
- What is an evpn switch?
- Introduction to gradient descent method - black horse programmer machine learning handout
- How does ECS publish websites? What software tools are needed?
猜你喜欢

Facebook internal announcement: instant messaging will be re integrated

Leetcode (question 2) - adding two numbers

Let children learn the application essence of steam Education

Hard core observation 553 AI needs to identify almost everyone in the world with hundreds of billions of photos

少儿编程教育在特定场景中的普及作用

Are you ready for the exam preparation strategy of level II cost engineer in 2022?

Leetcode (question 1) - sum of two numbers

SAP mts/ato/mto/eto topic 10: ETO mode q+ empty mode unvalued inventory policy customization

014_ TimePicker time selector

少儿编程课程改革后的培养方式
随机推荐
Detailed explanation of tcpip protocol
The personal information protection law was formally reviewed and passed. What issues should enterprises pay attention to?
What is the use of domain name cloud resolution? What are the factors for domain name errors
Pg-pool-ii read / write separation experience
Bi-sql and & or & in
System design: Agent & redundancy & replication
Introduction à la méthode de descente par Gradient - document d'apprentissage automatique pour les programmeurs de chevaux noirs
Activity recommendation | cloud native community meetup phase VII Shenzhen station begins to sign up!
Is there a free ECS? What should I pay attention to when I rent a server
How does the mobile phone remotely connect to the ECS? What should be paid attention to during the operation
Automatically convert local pictures to network pictures when writing articles
2021-08-27: the normal odometer will display natural numbers in turn to indicate mileage, Kyrgyzstan
4G industrial VPN router
Tencent cloud audio and video award-winning evaluation | leave online messages or submit evaluation, win Dajiang UAV /iphone/switch and other awards
What domain names do not need to be filed? Is there any process for domain name registration
Introduction to vulnerability priority technology (VPT)
Use of golang testing framework goshub
NAT
CTF learning notes 18:iwesec file upload vulnerability-03-content-type filtering bypass
How to file ECS? What should be paid attention to when selecting ECS