当前位置:网站首页>How to realize the spelling correction function in search engine
How to realize the spelling correction function in search engine
2022-06-30 00:03:00 【Zhaohan】
When the user is in the search box , When typing a misspelled word , Let's compare this word with the words in the thesaurus , Calculate edit distance , The word with the smallest distance will be edited , As a corrected word , Prompt to user .
This is the basic principle of spelling correction . however , A real commercial search engine , The spelling correction function is obviously not that simple . One side , Simply use editing distance to correct errors , The effect is not necessarily good ; On the other hand , The amount of data in the thesaurus may be very large , Search engines should support a large number of searches every day , So the performance of error correction is very high .
Aiming at the problem that the error correction effect is not good , We have many optimization ideas , Here are some .
We don't just take out the word with the smallest editing distance , But take out the one with the smallest editing distance TOP 10, Then according to other parameters , Decide which word to choose as the spelling correction word . For example, use search popularity to determine which word is used as spelling correction word .
We can also use a variety of editing distance calculation methods , For example, the two types mentioned today , Then edit the least distance TOP 10, Then find the intersection , Using the result of intersection , Continue to optimize .
We can also count the search logs of users , Get a list of the most commonly misspelled words , And the corresponding correctly spelled words . When the search engine corrects spelling , First, look in this list of the most commonly misspelled words . If once found , Directly return the corresponding correct word . The effect of this correction is very good .
We have more advanced methods , Introduction of personalization factors . For each user , Maintain this user specific search preferences , That is, commonly used search keywords . When the user enters the wrong word , Let's start with the search keywords that users often use , Calculate edit distance , Find the word with the smallest editing distance .
For error correction performance , We also have corresponding optimization methods . Here are two optimization ideas of divide and conquer .
If the error correction function TPS Not high , We can deploy multiple machines , Each machine runs an independent error correction function . When there is an error correction request , We use load balancing , Assigned to one of the machines , To calculate the edit distance , Get error correcting words .
If the response time of the error correction system is too long , That is to say , Each error correction request takes too long to process , We can put the vocabulary of error correction , Split into many machines . When there is an error correction request , Let's put this misspelled word , Send to these multiple machines at the same time , Let multiple machines process in parallel , Get the word with the smallest editing distance , Then compare and merge , Finally, an optimal error correction word is determined .
Real search engine spelling correction optimization , It must be more than what I said above , But everything is the same . Master the core principles , Is to master the method to solve the problem , The rest depends on your own flexible use and practice .
边栏推荐
- Sword finger offer 15 Number of 1 in binary
- 绿树公司官方网站
- Et la tarte aux framboises 4? Quels sont les jeux possibles?
- Construction of module 5 of actual combat Battalion
- QT learning 02 GUI program example analysis
- Web APIs 环境对象 丨黑马程序员
- Construction of module 5 of actual combat Battalion
- Solr基础操作16
- 视频ToneMapping(HDR转SDR)中的颜色空间转换问题(BT2020转BT709,YCbCr、YUV和RGB)
- 6.28 problem solving
猜你喜欢

Digital collection of cultural relics, opening a new way of cultural inheritance

Start harvesting! Nailing: adjust the maximum number of free users of "nailing team". If the number exceeds 10, it will not work normally

Construction of module 5 of actual combat Battalion

Jetpack之Room的使用,结合Flow

Halcon practical: design idea of solder joint detection

6.29 problem solving

架构实战营模块5作业

QT learning 01 GUI program principle analysis

FPGA Development (2) -- IIC communication

Koa2 learning and using
随机推荐
设置安全组、域名备案、申请ssl证书
Cacti settings for spin polling
手机开户一般哪个证券公司好?另外,手机开户安全么?
网上开户选哪个证券公司?还有,在线开户安全么?
Basic operations such as MySQL startup under Windows platform
Basic tutorial for installing monggodb in win10
Which securities company should I choose to open an account online? Also, is it safe to open an account online?
Zhongkang holdings opens the offering: it plans to raise HK $395million net, and it is expected to be listed on July 12
Analysis of define incdir command in TCL script of Modelsim
matlab习题 —— 程序控制流程练习
Bee common configuration
I wonder if I can open an account today? In addition, is it safe to open an account online now?
This simple little function saves 213 hours for our production research team in half a year
Xutils3 transfer set
QT learning 05 introduction to QT creator project
Applet plug-in access, development and precautions
Yunhe enmo, gaiguoqiang, identify it and grasp it before the domestic database boils
视频ToneMapping(HDR转SDR)中的颜色空间转换问题(BT2020转BT709,YCbCr、YUV和RGB)
Construction of module 5 of actual combat Battalion
Cloner un Graphe non recté [bfs accède à chaque bord et pas seulement aux noeuds]