当前位置:网站首页>Detailed explanation of the delete problem of ClickHouse delete data
Detailed explanation of the delete problem of ClickHouse delete data
2022-07-30 21:32:00 【Ajekseg】
Background:
A data table in ClickHouse has 7000w data a day. Use the delete command to delete data for a week. The SQL has been executed successfully. The data in the original table still exists in a short time and has not been deleted. After a while, the queryData deletion succeeded.
SQL submitted successfully:

The query data still exists in a short time [40s]


After consulting the data, it is known that ClickHouse provides the ability of DELETE and UPDATE. This type of operation is called Mutation query, which can be regarded as a variant of the ALTER statement.Although Mutation can finally realize modification and deletion, it cannot be fully understood in the usual sense of UPDATE and DELETE, we must recognize its difference:
- Mutation statement is a "heavy" operation, more suitable for batch data modification and deletion;
- It does not support transactions. Once the statement is submitted for execution, it will immediately affect the existing data and cannot be rolled back;
- The execution of the Mutation statement is an asynchronous background process that returns immediately after the statement is committed.
Because the general test data is very small, the DELETE operation feels no different from the commonly used OLTP database.But we should understand that this is an asynchronous background execution action.The successful submission of the statement does not mean that the specific logic has been executed, and its specific execution progress needs to be queried through the system.mutations system table.
DELETE statement syntax:
ALTER TABLE [db_name.]table_name DELETE WHERE filter_expr
The scope of data deletion is determined by the WHERE query clause. Deletion is implemented as follows:
Some changes have occurred to the data directory after a DELETE operation.[/chbase/data/default/test_table] Each original data directory has an additional directory with the same name, and the suffix of _[number] is added at the end.In addition, there is a file named mutation_[number].txt in the directory, and the content of the file is as follows:
# cat mutation_6.txt
format version: 1
create time: 2022-02-16 13:33:27 commands: DELETE WHERE ID = ‘1’
mutation_6.txt is a log file that completely records the execution statement and time of this DELETE operation, and the suffix _6 of the file name corresponds to the suffix of the newly added directory.So where do the suffixed numbers come from?system.mutations system table:
SELECT database,table,mutation_id,block_numbers.number as num,is_doneFROM system.mutations
In summary, the logic of the entire mutation operation is relatively clear.Every time ClickHouse executes an ALTER DELETE statement, a corresponding execution plan will be generated in the mutations system table. When is_done is equal to 1, the execution is completed.At the same time, in the root directory of the data table, a log file corresponding to the mutation_id will be generated to record relevant information.The data deletion process is based on each partition directory of the data table, and all directories are rewritten as new directories. The naming rule of the new directory is to add system.mutations to the original name.block_numbers.number.In the process of data rewriting, the data that needs to be deleted will be removed.The old data directory is not deleted immediately, but is marked as inactive (active is 0).These inactive directories will not be physically deleted until the next merge action of the MergeTree engine is triggered.
Let me introduce myself first. The editor graduated from Jiaotong University in 2013. I worked in a small company and went to big factories such as Huawei and OPPO. I joined Ali in 2018, until now.I know that most junior and intermediate java engineers want to upgrade their skills, they often need to explore their own growth or sign up to study, but for training institutions, the tuition fee is nearly 10,000 yuan, which is really stressful.Self-learning that is not systematic is very inefficient and lengthy, and it is easy to hit the ceiling and the technology stops.Therefore, I collected a "full set of learning materials for java development" for everyone. The original intention is also very simple. I hope to help friends who want to learn by themselves but don't know where to start, and at the same time reduce everyone's burden.Add the business card below to get a full set of learning materials
边栏推荐
猜你喜欢
随机推荐
在IDEA中使用JUnit4和JUnitGenerator V2.0自动生成测试模块
Motion Tuned Spatio-temporal Quality Assessmentof Natural Videos
【限时福利】21天学习挑战赛 - MySQL从入门到精通
Generate OOM records in a production environment. Conclusion: Don't be lazy to query useless fields unless you are completely sure.
拿什么来保护数据安全?基层数据安全体系建设待提升
3分钟带你了解微信小程序开发
QUALITY-GATED CONVOLUTIONAL LSTM FOR ENHANCING COMPRESSED VIDEO
冲刺第六周
LeetCode·Daily Question·952. Calculate Maximum Component Size by Common Factor·Union Check
基于ABP实现DDD--仓储实践
openim支持十万超级大群
类似 MS Project 的项目管理工具有哪些
JS中获取元素属性的8大方法
socket:内核初始化及创建流(文件)详细过程
【深度学习】对迁移学习中域适应的理解和3种技术的介绍
mysql deadlock
About the data synchronization delay of MySQL master-slave replication
JDBC(详解)
kubernetes
Image Restoration by Estimating Frequency Distribution of Local Patches








