当前位置:网站首页>Detailed explanation of the delete problem of ClickHouse delete data
Detailed explanation of the delete problem of ClickHouse delete data
2022-07-30 21:32:00 【Ajekseg】
Background:
A data table in ClickHouse has 7000w data a day. Use the delete command to delete data for a week. The SQL has been executed successfully. The data in the original table still exists in a short time and has not been deleted. After a while, the queryData deletion succeeded.
SQL submitted successfully:

The query data still exists in a short time [40s]


After consulting the data, it is known that ClickHouse provides the ability of DELETE and UPDATE. This type of operation is called Mutation query, which can be regarded as a variant of the ALTER statement.Although Mutation can finally realize modification and deletion, it cannot be fully understood in the usual sense of UPDATE and DELETE, we must recognize its difference:
- Mutation statement is a "heavy" operation, more suitable for batch data modification and deletion;
- It does not support transactions. Once the statement is submitted for execution, it will immediately affect the existing data and cannot be rolled back;
- The execution of the Mutation statement is an asynchronous background process that returns immediately after the statement is committed.
Because the general test data is very small, the DELETE operation feels no different from the commonly used OLTP database.But we should understand that this is an asynchronous background execution action.The successful submission of the statement does not mean that the specific logic has been executed, and its specific execution progress needs to be queried through the system.mutations system table.
DELETE statement syntax:
ALTER TABLE [db_name.]table_name DELETE WHERE filter_expr
The scope of data deletion is determined by the WHERE query clause. Deletion is implemented as follows:
Some changes have occurred to the data directory after a DELETE operation.[/chbase/data/default/test_table] Each original data directory has an additional directory with the same name, and the suffix of _[number] is added at the end.In addition, there is a file named mutation_[number].txt in the directory, and the content of the file is as follows:
# cat mutation_6.txt
format version: 1
create time: 2022-02-16 13:33:27 commands: DELETE WHERE ID = ‘1’
mutation_6.txt is a log file that completely records the execution statement and time of this DELETE operation, and the suffix _6 of the file name corresponds to the suffix of the newly added directory.So where do the suffixed numbers come from?system.mutations system table:
SELECT database,table,mutation_id,block_numbers.number as num,is_doneFROM system.mutations
In summary, the logic of the entire mutation operation is relatively clear.Every time ClickHouse executes an ALTER DELETE statement, a corresponding execution plan will be generated in the mutations system table. When is_done is equal to 1, the execution is completed.At the same time, in the root directory of the data table, a log file corresponding to the mutation_id will be generated to record relevant information.The data deletion process is based on each partition directory of the data table, and all directories are rewritten as new directories. The naming rule of the new directory is to add system.mutations to the original name.block_numbers.number.In the process of data rewriting, the data that needs to be deleted will be removed.The old data directory is not deleted immediately, but is marked as inactive (active is 0).These inactive directories will not be physically deleted until the next merge action of the MergeTree engine is triggered.
Let me introduce myself first. The editor graduated from Jiaotong University in 2013. I worked in a small company and went to big factories such as Huawei and OPPO. I joined Ali in 2018, until now.I know that most junior and intermediate java engineers want to upgrade their skills, they often need to explore their own growth or sign up to study, but for training institutions, the tuition fee is nearly 10,000 yuan, which is really stressful.Self-learning that is not systematic is very inefficient and lengthy, and it is easy to hit the ceiling and the technology stops.Therefore, I collected a "full set of learning materials for java development" for everyone. The original intention is also very simple. I hope to help friends who want to learn by themselves but don't know where to start, and at the same time reduce everyone's burden.Add the business card below to get a full set of learning materials
边栏推荐
猜你喜欢
![[Deep Learning] Understanding of Domain Adaptation in Transfer Learning and Introduction of 3 Techniques](/img/51/b351385c1f0f4e0a545e54c8ae7491.png)
[Deep Learning] Understanding of Domain Adaptation in Transfer Learning and Introduction of 3 Techniques

WeChat reading, export notes
![[Machine Learning] The Beauty of Mathematics Behind Gradient Descent](/img/63/c9d5d9370c28dbce0195e1ff26869b.jpg)
[Machine Learning] The Beauty of Mathematics Behind Gradient Descent

手把手教你搭建一台永久运行的个人服务器

ClickHouse 创建数据库建表视图字典 SQL

大家都在用的plm项目管理软件有哪些

为什么那么多自学软件测试的人,后来都放弃了...

ClickHouse 数据插入、更新与删除操作 SQL

Deep Non-Local Kalman Network for VideoCompression Artifact Reduction

Typescript 严格模式有多严格?
随机推荐
Uni-app 小程序 App 的广告变现之路:激励视频广告
Automatically generate test modules using JUnit4 and JUnitGenerator V2.0 in IDEA
数据指标口径不统一、重复开发?亿信ABI指标管理平台帮你解决
Oracle ADG状态查看与相关视图
Enhancing Quality for HEVC Compressed Videos
解决centos8 MySQL密码问题ERROR 1820 (HY000) You must reset your password using ALTER USER
Google Earth Engine ——
WeChat reading, export notes
(7/29)基础板子最小生成树prim+kruskal
8 ways to get element attributes in JS
类似 MS Project 的项目管理工具有哪些
外包干了三年,废了...
【深度学习】目标检测|SSD原理与实现
字节对齐之C语言犄角旮旯的知识
我是如何让公司后台管理系统焕然一新的(上) -性能优化
Quick Master QML Chapter 6 Animation
Deep Non-Local Kalman Network for VideoCompression Artifact Reduction
在IDEA中使用JUnit4和JUnitGenerator V2.0自动生成测试模块
socket:内核初始化及创建流(文件)详细过程
navicat新建数据库