当前位置:网站首页>Full text search of MySQL
Full text search of MySQL
2022-07-05 12:34:00 【just4you】
summary
Version Description
- MySQL5.5 in , Only MyISAM Support full text search .
- MySQL5.6 in ,InnoDB Start supporting Full-text Retrieval .
- MySQL5.7 in , have access to N-Gram The plug-in supports full-text retrieval . This plug-in supports / Japan / Korean .
Restrictions on the use of full-text retrieval
- Only when the type is CHAR、VARCHAR perhaps TEXT Create a full-text index on the field of .
- Only support InnoDB and MyISAM engine .
- A table can only be created One Full text search fields . If you need to retrieve multiple fields , You need to create multiple fields together One Indexes .
N-Gram Parser
- MySQL Using global variables in "ngram_token_size" To configure the N-Gram in n Size ; Value range :1~10, The default value is :2.
- Will usually ngram_token_size The value of is set to the minimum number of words to query . If you need to search for words , Will the ngram_token_size Set to 1. The default value is 2 Under the circumstances , You can't get any results by searching for words .
- Chinese words are at least two Chinese characters , The default value is recommended 2.
Parameter setting
modify MySQL Configuration file for :my.ini or my.cnf
[client]
ft_min_word_len=2
[mysqld]
ft_min_word_len=2
ngram_token_size=2
Remember to restart after modification MySQL service
Full text search operation
establish
# Create... When creating a table
CREATE TABLE t_member (
`id` INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
`cn_name` VARCHAR(100),
`remark` TEXT,
FULLTEXT `ft_idx_1`(`cn_name`, `remark`) WITH PARSER ngram
) ENGINE = INNODB;
# Add... When modifying the table
ALTER TABLE t_member ADD FULLTEXT INDEX `ft_idx_1`(`cn_name`,`remark`) WITH PARSER ngram;
# Create directly
CREATE FULLTEXT INDEX `ft_idx_1` ON t_member (`cn_name`,`remark`) WITH PARSER ngram;
Delete
DROP INDEX `ft_idx_1` ON t_member;
The reconstruction
Only applicable to MyISAM engine , Execute after modifying the full-text search settings of the table .
repair table t_member quick;
The use of full-text retrieval
Basic grammar
SELECT < Field table > FROM < Table name > WHERE MATCH( Field ) AGAINST (‘ Keywords to search ’ search mode );
Be careful : MATCH The number of fields in should be the same as that in the definition of full-text search .
# Full text search defines `cn_name`,`remark` Field , So in MATCH Write these two fields in
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST(' Zhang San ');
Query matching degree
It can be used to query the matching degree of data
SELECT `cn_name`, `remark`, MATCH(`cn_name`, `remark`) AGAINST(' Zhang San ') FROM t_member;
Full text retrieval mode
1. Natural language patterns (NATURAL LANGUAGE MODE)
MySQL Default full text retrieval mode . Operators... Cannot be used in this mode , For simple queries .
2. BOOLEAN Pattern (BOOLEAN MODE)
This mode can use operators , You can specify keywords Must appear perhaps Must not appear perhaps Keyword weight And so on .
50% The limitation of
When using full-text retrieval , Often mentioned “50% The limitation of ”. I found a paragraph explaining as follows :
Remove words from more than half of the matching lines , for example , Every line has this Words of this word , The use this When I go to check , There will be no results , This is useful when there are a lot of records , The reason is that the database doesn't think it makes sense to find all the rows , At this time ,this Almost regarded as stopword( Break words ); But if there are only two lines of records , Nothing can be found out , Because every word appears 50%( Or more ), To avoid this situation , Please use IN BOOLEAN MODE.
But in the test , Use Chinese keywords to query , Even if there are keywords in every record , You can also find the results ( Is there something wrong with my test data ? Or this 50% The restriction is only valid for English ?).
BOOLEAN Syntax in patterns
- +: Be sure to have ( Data bars without this keyword are ignored ).
- -: There can be no ( Exclude specified keywords , Those with this keyword are ignored ).
- >: Increase the weight value of the matching data .
- <: Reduce the weight value of the matching data .
- ~: Turn the correlation from positive to negative , Indicates that having the word reduces the correlation ( But not like “-” Rule it out ), It's just at the bottom , The weight value decreases .
- *: All kinds of words , Follow the query keywords .
- “”: Use double quotation marks to indicate that the content to be queried must be completely consistent , Don't split the words .
give an example
# No operator , Represents or
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST (' commonly Search for ' IN BOOLEAN MODE);
# Must include at the same time “ commonly ” and “ Search for ”
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST ('+ commonly + Search for ' IN BOOLEAN MODE);
# Must contain “ Search for ”, But if it includes “ commonly ”, The correlation will be higher .
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST ('+ Search for commonly ' IN BOOLEAN MODE);
# Must contain “ commonly ”, At the same time, it cannot contain “ Search for ”.
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST ('+ commonly - Search for ' IN BOOLEAN MODE);
# Must contain “ Search for ”, But if it also includes “ commonly ” Words , Relevance is better than not including “ commonly ” Your record is low .
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST ('+ Search for ~ commonly ' IN BOOLEAN MODE);
# Query must contain “ commonly ”“ Simple ” perhaps “ commonly ”“ Search for ” The record of , however “ commonly ”“ Simple ” Is more relevant than “ commonly ”“ Search for ” high .
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST ('+ commonly +(> Simple < Search for )' IN BOOLEAN MODE);
# *: asterisk , Query records that contain words beginning with a search .
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST (' Search for *' IN BOOLEAN MODE);
# Double quotes , Enclose the words to be searched , The effect is similar to like '%some words%', for example “some words of wisdom” Will be matched to , and “some noise words” It won't be matched . But for Chinese , It feels like the effect is average .
# No double quotes , It turns out
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST (' commonly Simple Search for ' IN BOOLEAN MODE);
# There are double quotation marks , No results , Generally, Chinese doesn't query like this
SELECT `cn_name`, `remark` FROM t_member WHERE MATCH(`cn_name`, `remark`) AGAINST ('" commonly Simple Search for "' IN BOOLEAN MODE);
Summary
- For small projects ,MySQL Full text search should be enough , even to the extent that like It's enough .
- There is a higher demand for full-text retrieval , Or we should keep up with the upgrading of Technology , Or use it ES Well :).
边栏推荐
- How to clear floating?
- Automated test lifecycle
- Deep discussion on the decoding of sent protocol
- C language structure is initialized as a function parameter
- MySQL view
- MySQL transaction
- MySQL stored procedure
- 强化学习-学习笔记3 | 策略学习
- PXE startup configuration and principle
- ZABBIX agent2 monitors mongodb templates and configuration operations
猜你喜欢

How can beginners learn flutter efficiently?

ZABBIX customized monitoring disk IO performance

Making and using the cutting tool of TTF font library

About cache exceptions: solutions for cache avalanche, breakdown, and penetration
Take you two minutes to quickly master the route and navigation of flutter

Detailed structure and code of inception V3

Hexadecimal conversion summary

Simple production of wechat applet cloud development authorization login

强化学习-学习笔记3 | 策略学习

Solve the problem of cache and database double write data consistency
随机推荐
Recyclerview paging slide
Redis highly available sentinel mechanism
ZABBIX monitors mongodb templates and configuration operations
Clear neo4j database data
Embedded software architecture design - message interaction
Introduction to GNN
Learn memory management of JVM 01 - first memory
语义分割实验:Unet网络/MSRC2数据集
Anaconda creates a virtual environment and installs pytorch
Distributed solution - completely solve website cross domain requests
Solution to order timeout unpaid
C alarm design
A new WiFi option for smart home -- the application of simplewifi in wireless smart home
Detailed steps for upgrading window mysql5.5 to 5.7.36
About cache exceptions: solutions for cache avalanche, breakdown, and penetration
The relationship between the size change of characteristic graph and various parameters before and after DL convolution operation
Matlab boundarymask function (find the boundary of the divided area)
ZABBIX customized monitoring disk IO performance
byte2String、string2Byte
Experimental design - using stack to realize calculator