当前位置:网站首页>Analysis of query intention recognition
Analysis of query intention recognition
2020-11-06 01:32:00 【Elementary school students in IT field】
outline
Recently, I've been studying search technology , In the work, it mainly involves the realization of information search function . We used elasticsearch Search engine ,es Basics and es Advanced 1. Because of the need to iterate over the search function , So I continue to study the search principle and performance optimization . This paper mainly studies the following points :
What is search
Search for metrics
Intention recognition
query rewrite
What is search
The technical construction of a search engine mainly includes three parts :
(1) Yes query The understanding of the
(2) To the content ( file ) The understanding of the
(3) Yes query And content ( file ) Match and sort
General evaluation index of search
Basic indicators :
Recall rate (Recall)= Number of related documents checked out / Number of related documents , Also known as recall ,R∈[0,1]
Accuracy rate (Precision)= Number of related documents checked out / Number of documents checked out , Also known as the precision rate ,P∈[0,1]
F value : Recall rate R And accuracy P The harmonic mean of
The stage of search development :
At the beginning of application : Keyword based search
Application development period : Full text search based on main and subheadings
Maturity of application : Ranking optimization for search
LTR
The evolutionary period of application : Personalized search
Intention recognition /“ One thousand thousand ”/ Search suggestions, etc
Intention recognition
What is it?
Classify sentences or what we often say query Divide it into corresponding intention categories
Belong to “ Yes query The understanding of the ” part
It's essentially a question of classification
General process of intention recognition search :
S1. User's original query yes “michal jrdan”
S2. Query Correction The result of the module is :“Michael Jordan”
S3. Query Suggestion The result of module pull-down prompt is :“Michael Jordan berkley” and “Michael Jordan NBA”, Suppose the user chooses “Michael Jordan berkley”
S4. Query Expansion The result of query extension of the model is :“Michael Jordan berkley” and “Michael I. Jordan berkley”
S5. Query Classification The result of module query and classification is :academic
S6. Last semantic tag (Semantic Tagging) Module for Named Entity Recognition 、 The result of attribute recognition is :[Michael Jordan: The person's name ][berkley:location]:academic
The premise of intention recognition
The division of intention : Skill / field
Requirements classification of user query :
(1) Navigation class
(2) Information class
(3) Transaction class
The concept is introduced :
A complete interaction between users and search engines is called a Search Session, stay Session The information provided in includes : User query words (Query), The title of the search result the user clicked (Title), If the user is Session During the change of query words ( For example, from Query1 -->Query2), Then subsequent searches and clicks will be recorded , Until the user leaves the search , be Session end .
The method of intention recognition
1. A list of words / Rule analysis
2. Based on the query click log – Generally a search log record will include time - Query string - Click on URL Record - Information such as position in the result .
3. Machine learning methods ( Mining Based on rules , be based on Bayes、LR、SVM And so on )– Classification problem
query The classification of
eg: Identify the attributes of each entity word , Go to the index and match the corresponding fields exactly , So as to improve the accuracy of recall
4. Based on Neural Networks ( Deep learning )–FastText
The difficulty of intention recognition
1、 The input is not standard , I have already introduced , The expression of the same appeal by different users is different .
2、 Multi purpose , The query term is :” water ”, It's mineral water , Or make-up water for girls .
3、 Data cold start . When user behavior data is small , It's hard to get accurate intentions .
4、 There is no fixed evaluation standard .pv,ipv,ctr,cvr This kind of quantifiable index is the overall evaluation of the search system , There is no standard quantitative index for user intention prediction .
query rewrite
query rewrite , Category related , Named entity recognition and
query Rewriting includes :
query error correction – If the search engine returns an empty result / Or too little , At this time, the processing of spelling correction should be added
query Expand :
eg. “Michael Jordan berkley” and “Michael I. Jordan berkley”
(1) Synonym extension table
(2) Use word vectors to expand synonyms
(3) If query No corresponding return , Then expand the original according to the historical data of users query
query Delete – Decide which to discard / Some words ( Entity recognition )
Reference material
https://www.jianshu.com/p/e46eae028af3
https://blog.csdn.net/shijing_0214/article/details/71250327
https://blog.csdn.net/shijing_0214/article/details/71080642
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- git rebase的時候捅婁子了,怎麼辦?線上等……
- 合约交易系统开发|智能合约交易平台搭建
- 零基础打造一款属于自己的网页搜索引擎
- Analysis of etcd core mechanism
- Let the front-end siege division develop independently from the back-end: Mock.js
- 零基础打造一款属于自己的网页搜索引擎
- Lane change detection
- Python Jieba segmentation (stuttering segmentation), extracting words, loading words, modifying word frequency, defining thesaurus
- It's so embarrassing, fans broke ten thousand, used for a year!
- How to use Python 2.7 after installing anaconda3?
猜你喜欢
axios学习笔记(二):轻松弄懂XHR的使用及如何封装简易axios
The road of C + + Learning: from introduction to mastery
Natural language processing - BM25 commonly used in search
2019年的一个小目标,成为csdn的博客专家,纪念一下
Jetcache buried some of the operation, you can't accept it
Character string and memory operation function in C language
Do not understand UML class diagram? Take a look at this edition of rural love class diagram, a learn!
The difference between Es5 class and ES6 class
Summary of common algorithms of binary tree
ES6学习笔记(五):轻松了解ES6的内置扩展对象
随机推荐
How to use Python 2.7 after installing anaconda3?
Interface pressure test: installation, use and instruction of siege pressure test
Recommendation system based on deep learning
Calculation script for time series data
Individual annual work summary and 2019 work plan (Internet)
华为云“四个可靠”的方法论
小程序入门到精通(二):了解小程序开发4个重要文件
NLP model Bert: from introduction to mastery (1)
一篇文章带你了解CSS 渐变知识
Skywalking series blog 1 - install stand-alone skywalking
Character string and memory operation function in C language
一篇文章带你了解CSS3图片边框
What to do if you are squeezed by old programmers? I don't want to quit
Natural language processing - BM25 commonly used in search
仅用六种字符来完成Hello World,你能做到吗?
I think it is necessary to write a general idempotent component
Azure data factory (3) integrate azure Devops to realize CI / CD
Summary of common algorithms of linked list
Natural language processing - wrong word recognition (based on Python) kenlm, pycorrector
Subordination judgment in structured data