当前位置:网站首页>Detailed explanation of the delete problem of ClickHouse delete data
Detailed explanation of the delete problem of ClickHouse delete data
2022-07-30 21:32:00 【Ajekseg】
Background:
A data table in ClickHouse has 7000w data a day. Use the delete command to delete data for a week. The SQL has been executed successfully. The data in the original table still exists in a short time and has not been deleted. After a while, the queryData deletion succeeded.
SQL submitted successfully:

The query data still exists in a short time [40s]


After consulting the data, it is known that ClickHouse provides the ability of DELETE and UPDATE. This type of operation is called Mutation query, which can be regarded as a variant of the ALTER statement.Although Mutation can finally realize modification and deletion, it cannot be fully understood in the usual sense of UPDATE and DELETE, we must recognize its difference:
- Mutation statement is a "heavy" operation, more suitable for batch data modification and deletion;
- It does not support transactions. Once the statement is submitted for execution, it will immediately affect the existing data and cannot be rolled back;
- The execution of the Mutation statement is an asynchronous background process that returns immediately after the statement is committed.
Because the general test data is very small, the DELETE operation feels no different from the commonly used OLTP database.But we should understand that this is an asynchronous background execution action.The successful submission of the statement does not mean that the specific logic has been executed, and its specific execution progress needs to be queried through the system.mutations system table.
DELETE statement syntax:
ALTER TABLE [db_name.]table_name DELETE WHERE filter_expr
The scope of data deletion is determined by the WHERE query clause. Deletion is implemented as follows:
Some changes have occurred to the data directory after a DELETE operation.[/chbase/data/default/test_table] Each original data directory has an additional directory with the same name, and the suffix of _[number] is added at the end.In addition, there is a file named mutation_[number].txt in the directory, and the content of the file is as follows:
# cat mutation_6.txt
format version: 1
create time: 2022-02-16 13:33:27 commands: DELETE WHERE ID = ‘1’
mutation_6.txt is a log file that completely records the execution statement and time of this DELETE operation, and the suffix _6 of the file name corresponds to the suffix of the newly added directory.So where do the suffixed numbers come from?system.mutations system table:
SELECT database,table,mutation_id,block_numbers.number as num,is_doneFROM system.mutations
In summary, the logic of the entire mutation operation is relatively clear.Every time ClickHouse executes an ALTER DELETE statement, a corresponding execution plan will be generated in the mutations system table. When is_done is equal to 1, the execution is completed.At the same time, in the root directory of the data table, a log file corresponding to the mutation_id will be generated to record relevant information.The data deletion process is based on each partition directory of the data table, and all directories are rewritten as new directories. The naming rule of the new directory is to add system.mutations to the original name.block_numbers.number.In the process of data rewriting, the data that needs to be deleted will be removed.The old data directory is not deleted immediately, but is marked as inactive (active is 0).These inactive directories will not be physically deleted until the next merge action of the MergeTree engine is triggered.
Let me introduce myself first. The editor graduated from Jiaotong University in 2013. I worked in a small company and went to big factories such as Huawei and OPPO. I joined Ali in 2018, until now.I know that most junior and intermediate java engineers want to upgrade their skills, they often need to explore their own growth or sign up to study, but for training institutions, the tuition fee is nearly 10,000 yuan, which is really stressful.Self-learning that is not systematic is very inefficient and lengthy, and it is easy to hit the ceiling and the technology stops.Therefore, I collected a "full set of learning materials for java development" for everyone. The original intention is also very simple. I hope to help friends who want to learn by themselves but don't know where to start, and at the same time reduce everyone's burden.Add the business card below to get a full set of learning materials
边栏推荐
猜你喜欢

Navicat new database

Image Restoration by Estimating Frequency Distribution of Local Patches

数据指标口径不统一、重复开发?亿信ABI指标管理平台帮你解决

WeChat reading, export notes

我是如何让公司后台管理系统焕然一新的(上) -性能优化
![[Deep Learning] Understanding of Domain Adaptation in Transfer Learning and Introduction of 3 Techniques](/img/51/b351385c1f0f4e0a545e54c8ae7491.png)
[Deep Learning] Understanding of Domain Adaptation in Transfer Learning and Introduction of 3 Techniques

LeetCode·每日一题·952.按公因数计算最大组件大小·并查集

数字货币期货现货交易技巧,把握关键进场的买入点!(纯干货)

Apache DolphinScheduler新一代分布式工作流任务调度平台实战-上

基于ABP实现DDD--仓储实践
随机推荐
go慢速入门——函数
nVisual网络可视化管理平台功能和价值点
go语言慢速入门——流程控制语句
MySQL 用户授权
ELF: Loading process
MySQL笔记1(数据库的好处,数据库的概念,数据库的特点,MySQL的启动,数据模型,SQL)
Automatically generate test modules using JUnit4 and JUnitGenerator V2.0 in IDEA
新书上市 |《谁在掷骰子?》在“不确定性时代”中确定前行
弹性盒子模型
不用bs4的原因居然是名字太长?爬取彩票开奖信息
文字的选择与排版
活动推荐 | 2022年深圳最值得参加的边缘计算活动
The structure of knowledge in the corners of the C language
关于SFML Rect.inl文件报错的问题
系统结构考点之并行主存
Google Earth Engine ——ee.List.sequence函数的使用
Navicat new database
Navicat连接MySQL时弹出:1045:Access denied for user ‘root’@’localhost’
KingbaseES V8R6备份恢复案例之---同一数据库创建不同stanza备份
DistSQL 深度解析:打造动态化的分布式数据库