当前位置:网站首页>MySQL case: analysis of full-text indexing
MySQL case: analysis of full-text indexing
2022-06-24 07:33:00 【[email protected]】
Preface
Full text indexing , It is a way to create inverted indexes , Ways to quickly match document content . and B+ Tree index is the same , Inverted index is also an index structure , An inverted index is composed of all non repeated word segmentation in the document and the mapping of its document . Inverted indexes generally have two different structures , One is inverted file index, The other is full inverted index.
(1)inverted file index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID)}
Number | Text | Documents |
|---|---|---|
1 | how | (1,3) |
2 | are | (1,3) |
3 | you | (1,3) |
4 | fine | (2,4) |
5 | thanks | (2,4) |
(2)full inverted index, The mapping relationship stored inside is { participle ,( The document where the participle is located ID: In the document )}
Number | Text | Documents |
|---|---|---|
1 | how | (1:1),(3:1) |
2 | are | (1:2),(3:2) |
3 | you | (1:3),(3:3) |
4 | fine | (2:1),(4:1) |
5 | thanks | (2:2),(4:2) |
Realization principle
Auxiliary table
stay MySQL InnoDB in , When a full-text index is created , A series of auxiliary tables will be created at the same time , Information for storing inverted indexes .
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name | space |
+----------+----------------------------------------------------+-------+
| 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 |
| 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 |
| 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 |
| 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 |
| 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 |
| 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 |
| 330 | test/FTS_0000000000000147_BEING_DELETED | 286 |
| 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 |
| 332 | test/FTS_0000000000000147_CONFIG | 288 |
| 328 | test/FTS_0000000000000147_DELETED | 284 |
| 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 |
| 327 | test/opening_lines | 283 |
+----------+----------------------------------------------------+-------+(1)FTS_0000000000000147_00000000000001c9_INDEX_1-6: this 6 Auxiliary tables are used to store inverted indexes , Stored is the participle 、 file ID And location ; namely InnoDB It's using full inverted index.
(2)FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE:FTS_0000000000000147_DELETED What is stored is what has been deleted 、 Documents that have not been removed from full-text index data ,FTS_0000000000000147_DELETED_CACHE Is its cache table .
(3)FTS_0000000000000147_BEING_DELETED/FTS_0000000000000147_BEING_DELETED_CACHE:FTS_0000000000000147_BEING_DELETED What is stored is what has been deleted 、 Documents that are being removed from full-text index data ,FTS_0000000000000147_BEING_DELETED_CACHE Is its cache table .
(4)FTS_0000000000000147_CONFIG: Store internal information about full-text indexes ; The most important thing is to store FTS_SYNCED_DOC_ID, Represents a document that has been parsed and flushed ; Happen when crash recovery when , Can pass FTS_SYNCED_DOC_ID To determine which documents have not been swiped 、 It needs to be re parsed and added to the full-text index cache .
Insert data into
If when inserting a document , It is necessary to carry out word segmentation 、 Operations such as updating auxiliary tables , That could cost a lot . To avoid this problem ,InnoDB Full text index cache is introduced , Used to cache recently inserted data , The data will not be written to the auxiliary table in batches until the cache is full ; Can pass INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE Query recently inserted data ; Can pass innodb_ft_cache_size/innodb_ft_total_cache_size Parameters control a single table / Full text index cache size for all tables ; Another thing to note , Full text index cache , Only the recently inserted data is cached , Instead of caching the data of the auxiliary table , When the result is returned , You need to merge the data of the auxiliary table and the recently inserted data in the cache before returning .
Data deletion
If you delete a document , You need to update the auxiliary table , This can also be costly . To avoid this problem ,InnoDB Only deleted documents will be recorded in FTS_0000000000000147_DELETED/FTS_0000000000000147_DELETED_CACHE surface , It will not be deleted from the auxiliary table , If you want to thoroughly clean up the deleted data , Need to pass through optimize table Rebuild full text index .
mysql> set GLOBAL innodb_optimize_fulltext_only=ON; Query OK, 0 rows affected (0.01 sec) mysql> OPTIMIZE TABLE opening_lines; +--------------------+----------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------------------+----------+----------+----------+ | test.opening_lines | optimize | status | OK | +--------------------+----------+----------+----------+ 1 row in set (0.01 sec)
Data update
For data updates ,InnoDB Data is deleted first 、 And then insert the data , Refer to the above for the specific operation process .
Watch
We mentioned before , When a full-text index is created , A series of auxiliary tables are also created at the same time , Used to store information about full-text indexes ; however , We can't directly query these auxiliary tables , Only by querying information_schema Under the encapsulated temporary table to monitor the full-text index status , As follows :
INNODB_FT_CONFIG INNODB_FT_INDEX_TABLE INNODB_FT_INDEX_CACHE INNODB_FT_DEFAULT_STOPWORD INNODB_FT_DELETED INNODB_FT_BEING_DELETED
Basic grammar
Syntax of full-text indexing , The syntax is not very different from that of a normal index , It's as follows :
(1) Create full text index
alter table $table_name add fulltext index $index_name($column_name); create fulltext index $index_name on $table_name($column_name);
(2) Delete full text index
alter table $table_name drop index $index_name;
(3) Inquire about
select xxx from $table_name where match($column_name) against(xxx);
summary
In some specific situations , Full text indexing is still very useful , Can greatly speed up the query speed ; however ,MySQL The full-text index of has great limitations , For example, it is not supported to specify the delimiter of the participle ( Default is space ),ngram The parser can specify fixed length participles , But the practicality is still poor . If it is a scenario with high requirements for Full-text Retrieval , Recommended or used ES Products such as .
版权声明
本文为[[email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/06/20210630195005941p.html
边栏推荐
- What is an intrusion detection system?
- How to distinguish PAAS, IAAs and SaaS?
- How can genetic testing help patients fight disease?
- [image fusion] image fusion based on directional discrete cosine transform and principal component analysis with matlab code
- PCL 点云按比率随机采样
- Description of module data serial number positioning area code positioning refers to GBK code
- 湖北专升本-湖师计科
- [pointnet] matlab simulation of 3D point cloud target classification and recognition based on pointnet
- buuctf misc 从娃娃抓起
- [MySQL usage Script] clone data tables, save query data to data tables, and create temporary tables
猜你喜欢

【图像融合】基于像素显着性结合小波变换实现多焦点和多光谱图像融合附matlab代码
![[pointnet] matlab simulation of 3D point cloud target classification and recognition based on pointnet](/img/86/5db689cdac2a927a23dff3fb9594b0.png)
[pointnet] matlab simulation of 3D point cloud target classification and recognition based on pointnet
![[image feature extraction] image feature extraction based on pulse coupled neural network (PCNN) including Matlab source code](/img/b3/26cfa385aa357c3a7a77e9db47e94c.png)
[image feature extraction] image feature extraction based on pulse coupled neural network (PCNN) including Matlab source code
![[WUSTCTF2020]爬](/img/b6/4a0582144c3125e7a0666bbbbfe29d.png)
[WUSTCTF2020]爬

Étalonnage de la caméra (objectif et principe d'étalonnage)

Combine with (& &) logic or (||), dynamic binding and ternary operation

Software performance test analysis and tuning practice path - JMeter's performance pressure test analysis and tuning of RPC Services - manuscript excerpts

20个不容错过的ES6技巧
![选择器(>,~,+,[])](/img/7e/2becfcf7a7b2e743772deee5916caf.png)
选择器(>,~,+,[])

如何删除/选择电脑上的输入法
随机推荐
【图像融合】基于NSST结合PCNN实现图像融合附matlab代码
buuctf misc 从娃娃抓起
buuctf misc [UTCTF2020]docx
Face pincher: a hot meta universe stylist
What are the dazzling skills of spot gold?
【信号识别】基于深度学习CNN实现信号调制分类附matlab代码
Win11 points how to divide disks? How to divide disks in win11 system?
选择器(>,~,+,[])
RDD basic knowledge points
get_ started_ 3dsctf_ two thousand and sixteen
20 not to be missed ES6 tips
Canal installation configuration
Win11笔记本省电模式怎么开启?Win11电脑节电模式打开方法
捏脸师: 炙手可热的元宇宙造型师
What is automated testing? What software projects are suitable for automated testing?
[MySQL usage Script] clone data tables, save query data to data tables, and create temporary tables
MSSQL high permission injection write horse to Chinese path
PCL point cloud random sampling by ratio
How can win11 set the CPU performance to be fully turned on? How does win11cpu set high performance mode?
Dichotomous special training