当前位置:网站首页>Analysis of query intention recognition
Analysis of query intention recognition
2020-11-06 01:32:00 【Elementary school students in IT field】
outline
Recently, I've been studying search technology , In the work, it mainly involves the realization of information search function . We used elasticsearch Search engine ,es Basics and es Advanced 1. Because of the need to iterate over the search function , So I continue to study the search principle and performance optimization . This paper mainly studies the following points :
What is search
Search for metrics
Intention recognition
query rewrite
What is search
The technical construction of a search engine mainly includes three parts :
(1) Yes query The understanding of the
(2) To the content ( file ) The understanding of the
(3) Yes query And content ( file ) Match and sort
General evaluation index of search
Basic indicators :
Recall rate (Recall)= Number of related documents checked out / Number of related documents , Also known as recall ,R∈[0,1]
Accuracy rate (Precision)= Number of related documents checked out / Number of documents checked out , Also known as the precision rate ,P∈[0,1]
F value : Recall rate R And accuracy P The harmonic mean of
The stage of search development :
At the beginning of application : Keyword based search
Application development period : Full text search based on main and subheadings
Maturity of application : Ranking optimization for search
LTR
The evolutionary period of application : Personalized search
Intention recognition /“ One thousand thousand ”/ Search suggestions, etc
Intention recognition
What is it?
Classify sentences or what we often say query Divide it into corresponding intention categories
Belong to “ Yes query The understanding of the ” part
It's essentially a question of classification
General process of intention recognition search :
S1. User's original query yes “michal jrdan”
S2. Query Correction The result of the module is :“Michael Jordan”
S3. Query Suggestion The result of module pull-down prompt is :“Michael Jordan berkley” and “Michael Jordan NBA”, Suppose the user chooses “Michael Jordan berkley”
S4. Query Expansion The result of query extension of the model is :“Michael Jordan berkley” and “Michael I. Jordan berkley”
S5. Query Classification The result of module query and classification is :academic
S6. Last semantic tag (Semantic Tagging) Module for Named Entity Recognition 、 The result of attribute recognition is :[Michael Jordan: The person's name ][berkley:location]:academic
The premise of intention recognition
The division of intention : Skill / field
Requirements classification of user query :
(1) Navigation class
(2) Information class
(3) Transaction class
The concept is introduced :
A complete interaction between users and search engines is called a Search Session, stay Session The information provided in includes : User query words (Query), The title of the search result the user clicked (Title), If the user is Session During the change of query words ( For example, from Query1 -->Query2), Then subsequent searches and clicks will be recorded , Until the user leaves the search , be Session end .
The method of intention recognition
1. A list of words / Rule analysis
2. Based on the query click log – Generally a search log record will include time - Query string - Click on URL Record - Information such as position in the result .
3. Machine learning methods ( Mining Based on rules , be based on Bayes、LR、SVM And so on )– Classification problem
query The classification of
eg: Identify the attributes of each entity word , Go to the index and match the corresponding fields exactly , So as to improve the accuracy of recall
4. Based on Neural Networks ( Deep learning )–FastText
The difficulty of intention recognition
1、 The input is not standard , I have already introduced , The expression of the same appeal by different users is different .
2、 Multi purpose , The query term is :” water ”, It's mineral water , Or make-up water for girls .
3、 Data cold start . When user behavior data is small , It's hard to get accurate intentions .
4、 There is no fixed evaluation standard .pv,ipv,ctr,cvr This kind of quantifiable index is the overall evaluation of the search system , There is no standard quantitative index for user intention prediction .
query rewrite
query rewrite , Category related , Named entity recognition and
query Rewriting includes :
query error correction – If the search engine returns an empty result / Or too little , At this time, the processing of spelling correction should be added
query Expand :
eg. “Michael Jordan berkley” and “Michael I. Jordan berkley”
(1) Synonym extension table
(2) Use word vectors to expand synonyms
(3) If query No corresponding return , Then expand the original according to the historical data of users query
query Delete – Decide which to discard / Some words ( Entity recognition )
Reference material
https://www.jianshu.com/p/e46eae028af3
https://blog.csdn.net/shijing_0214/article/details/71250327
https://blog.csdn.net/shijing_0214/article/details/71080642
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- Pattern matching: The gestalt approach一种序列的文本相似度方法
- PHPSHE 短信插件说明
- Free patent download tutorial (HowNet, Espacenet)
- 每个前端工程师都应该懂的前端性能优化总结:
- Azure data factory (3) integrate azure Devops to realize CI / CD
- 5.5 controlleradvice notes - SSM in depth analysis and project practice
- Music generation through deep neural network
- Summary of common algorithms of binary tree
- 至联云解析:IPFS/Filecoin挖矿为什么这么难?
- Brief introduction of TF flags
猜你喜欢

Python filtering sensitive word records

Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!

速看!互联网、电商离线大数据分析最佳实践!(附网盘链接)

axios学习笔记(二):轻松弄懂XHR的使用及如何封装简易axios
![[C / C + + 1] clion configuration and running C language](/img/5b/ba96ff4447b150f50560e5d47cb8d1.jpg)
[C / C + + 1] clion configuration and running C language

Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】

The difference between gbdt and XGB, and the mathematical derivation of gradient descent method and Newton method

ES6学习笔记(二):教你玩转类的继承和类的对象

What to do if you are squeezed by old programmers? I don't want to quit

Pattern matching: The gestalt approach一种序列的文本相似度方法
随机推荐
PN8162 20W PD快充芯片,PD快充充电器方案
前端都应懂的入门基础-github基础
Analysis of partial source codes of qthread
2019年的一个小目标,成为csdn的博客专家,纪念一下
速看!互联网、电商离线大数据分析最佳实践!(附网盘链接)
6.2 handleradapter adapter processor (in-depth analysis of SSM and project practice)
Network security engineer Demo: the original * * is to get your computer administrator rights! 【***】
Jetcache buried some of the operation, you can't accept it
至联云解析:IPFS/Filecoin挖矿为什么这么难?
What is the difference between data scientists and machine learning engineers? - kdnuggets
Electron application uses electronic builder and electronic updater to realize automatic update
[JMeter] two ways to realize interface Association: regular representation extractor and JSON extractor
Python filtering sensitive word records
vue-codemirror基本用法:实现搜索功能、代码折叠功能、获取编辑器值及时验证
React design pattern: in depth understanding of react & Redux principle
Python saves the list data
I think it is necessary to write a general idempotent component
Vite + TS quickly build vue3 project and introduce related features
6.3 handlerexceptionresolver exception handling (in-depth analysis of SSM and project practice)
Python基础变量类型——List浅析