当前位置:网站首页>MySQL case: analysis of full-text indexing
MySQL case: analysis of full-text indexing
2022-06-24 07:33:00 【[email protected]】
Preface
Full text indexing , It is a way to create inverted indexes , Ways to quickly match document content . and B+ Tree index is the same , Inverted index is also an index structure , An inverted index is composed of all non repeated word segmentation in the document and the mapping of its document . Inverted indexes generally have two different structures , One is inverted file index, The other is full inverted index.
(1)inverted file index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID)}
Number | Text | Documents |
|---|---|---|
1 | how | (1,3) |
2 | are | (1,3) |
3 | you | (1,3) |
4 | fine | (2,4) |
5 | thanks | (2,4) |
(2)full inverted index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID: In the document )}
Number | Text | Documents |
|---|---|---|
1 | how | (1:1),(3:1) |
2 | are | (1:2),(3:2) |
3 | you | (1:3),(3:3) |
4 | fine | (2:1),(4:1) |
5 | thanks | (2:2),(4:2) |
Realization principle
Auxiliary table
stay MySQL InnoDB in , When a full-text index is created , A series of auxiliary tables will be created at the same time , Information for storing inverted indexes .
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name | space |
+----------+----------------------------------------------------+-------+
| 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 |
| 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 |
| 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 |
| 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 |
| 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 |
| 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 |
| 330 | test/FTS_0000000000000147_BEING_DELETED | 286 |
| 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 |
| 332 | test/FTS_0000000000000147_CONFIG | 288 |
| 328 | test/FTS_0000000000000147_DELETED | 284 |
| 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 |
| 327 | test/opening_lines | 283 |
+----------+----------------------------------------------------+-------+(1)FTS_0000000000000147_00000000000001c9_INDEX_1-6: this 6 Auxiliary tables are used to store inverted indexes , Stored is the participle 、 file ID And location ; namely InnoDB It's using full inverted index.
(2)FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE:FTS_0000000000000147_DELETED What is stored is what has been deleted 、 Documents that have not been removed from full-text index data ,FTS_0000000000000147_DELETED_CACHE Is its cache table .
(3)FTS_0000000000000147_BEING_DELETED/FTS_0000000000000147_BEING_DELETED_CACHE:FTS_0000000000000147_BEING_DELETED What is stored is what has been deleted 、 Documents that are being removed from full-text index data ,FTS_0000000000000147_BEING_DELETED_CACHE Is its cache table .
(4)FTS_0000000000000147_CONFIG: Store internal information about full-text indexes ; The most important thing is to store FTS_SYNCED_DOC_ID, Represents a document that has been parsed and flushed ; Happen when crash recovery when , Can pass FTS_SYNCED_DOC_ID To determine which documents have not been swiped 、 It needs to be re parsed and added to the full-text index cache .
Insert data into
If when inserting a document , It is necessary to carry out word segmentation 、 Operations such as updating auxiliary tables , That could cost a lot . To avoid this problem ,InnoDB Full text index cache is introduced , Used to cache recently inserted data , The data will not be written to the auxiliary table in batches until the cache is full ; Can pass INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE Query recently inserted data ; Can pass innodb_ft_cache_size/innodb_ft_total_cache_size Parameters control a single table / Full text index cache size for all tables ; Another thing to note , Full text index cache , Only the recently inserted data is cached , Instead of caching the data of the auxiliary table , When the result is returned , You need to merge the data of the auxiliary table and the recently inserted data in the cache before returning .
Data deletion
If you delete a document , You need to update the auxiliary table , This can also be costly . To avoid this problem ,InnoDB Only deleted documents will be recorded in FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE surface , It will not be deleted from the auxiliary table , If you want to thoroughly clean up the deleted data , Need to pass through optimize table Rebuild full text index .
mysql> set GLOBAL innodb_optimize_fulltext_only=ON; Query OK, 0 rows affected (0.01 sec) mysql> OPTIMIZE TABLE opening_lines; +--------------------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------------------+----------+----------+----------+ | test.opening_lines | optimize | status | OK | +--------------------+----------+----------+----------+ 1 row in set (0.01 sec)
Data update
For data updates ,InnoDB Data is deleted first 、 And then insert the data , Refer to the above for the specific operation process .
Watch
We mentioned before , When a full-text index is created , A series of auxiliary tables are also created at the same time , Used to store information about full-text indexes ; however , We can't directly query these auxiliary tables , Only by querying information_schema Under the encapsulated temporary table to monitor the full-text index status , As follows :
INNODB_FT_CONFIG INNODB_FT_INDEX_TABLE INNODB_FT_INDEX_CACHE INNODB_FT_DEFAULT_STOPWORD INNODB_FT_DELETED INNODB_FT_BEING_DELETED
Basic grammar
Syntax of full-text indexing , The syntax is not very different from that of a normal index , It's as follows :
(1) Create full text index
alter table $table_name add fulltext index $index_name($column_name); create fulltext index $index_name on $table_name($column_name);
(2) Delete full text index
alter table $table_name drop index $index_name;
(3) Inquire about
select xxx from $table_name where match($column_name) against(xxx);
summary
In some specific situations , Full text indexing is still very useful , Can greatly speed up the query speed ; however ,MySQL The full-text index of has great limitations , For example, it is not supported to specify the delimiter of the participle ( Default is space ),ngram The parser can specify fixed length participles , But the practicality is still poor . If it is a scenario with high requirements for Full-text Retrieval , Recommended or used ES Products such as .
版权声明
本文为[[email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/06/20210630195005941p.html
边栏推荐
- [MySQL usage Script] clone data tables, save query data to data tables, and create temporary tables
- Hyperledger fabric ledger snapshot - fast data synchronization
- 【图像融合】基于NSST结合PCNN实现图像融合附matlab代码
- [understanding of opportunity -29]: Guiguzi - internal dialogue - five levels of communication with superiors
- Learning to use BACnet gateway of building control system is not so difficult
- 选择器(>,~,+,[])
- Combine with (& &) logic or (||), dynamic binding and ternary operation
- Face pincher: a hot meta universe stylist
- 【TS】函数类型
- bjdctf_2020_babystack
猜你喜欢

Unexpected token u in JSON at position 0

get_started_3dsctf_2016

How can win11 set the CPU performance to be fully turned on? How does win11cpu set high performance mode?

关于取模数据序号定位的说明 区码定位是指GBK编码
![[frame rate doubling] development and implementation of FPGA based video frame rate doubling system Verilog](/img/38/92486c92557e6e5a10a362eb2b7bdf.png)
[frame rate doubling] development and implementation of FPGA based video frame rate doubling system Verilog

RDD基础知识点

【图像融合】基于NSST结合PCNN实现图像融合附matlab代码

bjdctf_2020_babystack
![[Proteus] Arduino uno + ds1307+lcd1602 time display](/img/96/d8c1cacc8a633c679b1a58a1eb8cb9.png)
[Proteus] Arduino uno + ds1307+lcd1602 time display

学会使用楼宇控制系统BACnet网关没那么难
随机推荐
Canal installation configuration
【图像分割】基于形态学实现视网膜血管分割附matlab代码
[learn FPGA programming from scratch -42]: Vision - technological evolution of chip design in the "post Moorish era" - 1 - current situation
Unexpected token u in JSON at position 0
buuctf misc 从娃娃抓起
MFC使用控制台时 项目路径中不能有空格和中文,否则会报错误 LNK1342 未能保存要编辑的二进制文件的备份副本等
【图像融合】基于伪 Wigner 分布 (PWD) 实现图像融合附matlab代码
How VPN works
【WordPress建站】6. 文章内容防复制
Ultra wideband pulse positioning scheme, UWB precise positioning technology, wireless indoor positioning application
现货黄金有哪些眩人的小技术?
如何删除/选择电脑上的输入法
[frame rate doubling] development and implementation of FPGA based video frame rate doubling system Verilog
[understanding of opportunity -29]: Guiguzi - internal dialogue - five levels of communication with superiors
Win11分磁盘怎么分?Win11系统怎么分磁盘?
【WordPress建站】5. 设置代码高亮
[WordPress website] 5 Set code highlight
[pointnet] matlab simulation of 3D point cloud target classification and recognition based on pointnet
6000多万铲屎官,捧得出一个国产主粮的春天吗?
RDD basic knowledge points