当前位置:网站首页>Detailed explanation of the delete problem of ClickHouse delete data
Detailed explanation of the delete problem of ClickHouse delete data
2022-07-30 21:32:00 【Ajekseg】
Background:
A data table in ClickHouse has 7000w data a day. Use the delete command to delete data for a week. The SQL has been executed successfully. The data in the original table still exists in a short time and has not been deleted. After a while, the queryData deletion succeeded.
SQL submitted successfully:
The query data still exists in a short time [40s]
After consulting the data, it is known that ClickHouse provides the ability of DELETE and UPDATE. This type of operation is called Mutation query, which can be regarded as a variant of the ALTER statement.Although Mutation can finally realize modification and deletion, it cannot be fully understood in the usual sense of UPDATE and DELETE, we must recognize its difference:
- Mutation statement is a "heavy" operation, more suitable for batch data modification and deletion;
- It does not support transactions. Once the statement is submitted for execution, it will immediately affect the existing data and cannot be rolled back;
- The execution of the Mutation statement is an asynchronous background process that returns immediately after the statement is committed.
Because the general test data is very small, the DELETE operation feels no different from the commonly used OLTP database.But we should understand that this is an asynchronous background execution action.The successful submission of the statement does not mean that the specific logic has been executed, and its specific execution progress needs to be queried through the system.mutations system table.
DELETE statement syntax:
ALTER TABLE [db_name.]table_name DELETE WHERE filter_expr
The scope of data deletion is determined by the WHERE query clause. Deletion is implemented as follows:
Some changes have occurred to the data directory after a DELETE operation.[/chbase/data/default/test_table] Each original data directory has an additional directory with the same name, and the suffix of _[number] is added at the end.In addition, there is a file named mutation_[number].txt in the directory, and the content of the file is as follows:
# cat mutation_6.txt
format version: 1
create time: 2022-02-16 13:33:27 commands: DELETE WHERE ID = ‘1’
mutation_6.txt is a log file that completely records the execution statement and time of this DELETE operation, and the suffix _6 of the file name corresponds to the suffix of the newly added directory.So where do the suffixed numbers come from?system.mutations system table:
SELECT database,table,mutation_id,block_numbers.number as num,is_doneFROM system.mutations
In summary, the logic of the entire mutation operation is relatively clear.Every time ClickHouse executes an ALTER DELETE statement, a corresponding execution plan will be generated in the mutations system table. When is_done is equal to 1, the execution is completed.At the same time, in the root directory of the data table, a log file corresponding to the mutation_id will be generated to record relevant information.The data deletion process is based on each partition directory of the data table, and all directories are rewritten as new directories. The naming rule of the new directory is to add system.mutations to the original name.block_numbers.number.In the process of data rewriting, the data that needs to be deleted will be removed.The old data directory is not deleted immediately, but is marked as inactive (active is 0).These inactive directories will not be physically deleted until the next merge action of the MergeTree engine is triggered.
Let me introduce myself first. The editor graduated from Jiaotong University in 2013. I worked in a small company and went to big factories such as Huawei and OPPO. I joined Ali in 2018, until now.I know that most junior and intermediate java engineers want to upgrade their skills, they often need to explore their own growth or sign up to study, but for training institutions, the tuition fee is nearly 10,000 yuan, which is really stressful.Self-learning that is not systematic is very inefficient and lengthy, and it is easy to hit the ceiling and the technology stops.Therefore, I collected a "full set of learning materials for java development" for everyone. The original intention is also very simple. I hope to help friends who want to learn by themselves but don't know where to start, and at the same time reduce everyone's burden.Add the business card below to get a full set of learning materials
边栏推荐
- go慢速入门——函数
- ClickHouse删除数据之delete问题详解
- openim支持十万超级大群
- IDEA2018.3.5 cancel double-click Shift shortcut
- Motion Tuned Spatio-temporal Quality Assessmentof Natural Videos
- mpls简介
- Generate OOM records in a production environment. Conclusion: Don't be lazy to query useless fields unless you are completely sure.
- JSESSIONID description in cookie
- 系统结构考点之并行主存
- 不用bs4的原因居然是名字太长?爬取彩票开奖信息
猜你喜欢
LeetCode · 23. Merge K ascending linked lists · recursion · iteration
数据指标口径不统一、重复开发?亿信ABI指标管理平台帮你解决
vlan简单实验
手动从0搭建ABP框架-ABP官方完整解决方案和手动搭建简化解决方案实践
MySQL笔记1(数据库的好处,数据库的概念,数据库的特点,MySQL的启动,数据模型,SQL)
Teach you how to build a permanently running personal server
How strict Typescript strict mode?
kubernetes
NEOVIM下载安装与配置
数据质量提升
随机推荐
Google Earth Engine ——
MySQL60题作业
牛客小白月赛53 A-E
Niu Ke Xiaobaiyue Race 53 A-E
TransGAN代码复现—九天毕昇平台
navicat新建数据库
kubernetes
Google Earth Engine ——我们如何筛选一个列表中的排序以时间为例
Motion Tuned Spatio-temporal Quality Assessmentof Natural Videos
解决npm warn config global `--global`, `--local` are deprecated. use `--location=global` instead
Motion Tuned Spatio-temporal Quality Assessmentof Natural Videos
【机器学习】梯度下降背后的数学之美
qt使用动态库(DLL)
三层架构简单配置
MySQL删除表数据 MySQL清空表命令 3种方法
mysql deadlock
JDBC (detailed explanation)
共用体、共用体与结构体的区别、枚举之C语言犄角旮旯的知识
Why do so many people who teach themselves software testing give up later...
MySQL 用户授权