当前位置:网站首页>Analysis of query intention recognition
Analysis of query intention recognition
2020-11-06 01:32:00 【Elementary school students in IT field】
outline
Recently, I've been studying search technology , In the work, it mainly involves the realization of information search function . We used elasticsearch Search engine ,es Basics and es Advanced 1. Because of the need to iterate over the search function , So I continue to study the search principle and performance optimization . This paper mainly studies the following points :
What is search
Search for metrics
Intention recognition
query rewrite
What is search
The technical construction of a search engine mainly includes three parts :
(1) Yes query The understanding of the
(2) To the content ( file ) The understanding of the
(3) Yes query And content ( file ) Match and sort
General evaluation index of search
Basic indicators :
Recall rate (Recall)= Number of related documents checked out / Number of related documents , Also known as recall ,R∈[0,1]
Accuracy rate (Precision)= Number of related documents checked out / Number of documents checked out , Also known as the precision rate ,P∈[0,1]
F value : Recall rate R And accuracy P The harmonic mean of
The stage of search development :
At the beginning of application : Keyword based search
Application development period : Full text search based on main and subheadings
Maturity of application : Ranking optimization for search
LTR
The evolutionary period of application : Personalized search
Intention recognition /“ One thousand thousand ”/ Search suggestions, etc
Intention recognition
What is it?
Classify sentences or what we often say query Divide it into corresponding intention categories
Belong to “ Yes query The understanding of the ” part
It's essentially a question of classification
General process of intention recognition search :
S1. User's original query yes “michal jrdan”
S2. Query Correction The result of the module is :“Michael Jordan”
S3. Query Suggestion The result of module pull-down prompt is :“Michael Jordan berkley” and “Michael Jordan NBA”, Suppose the user chooses “Michael Jordan berkley”
S4. Query Expansion The result of query extension of the model is :“Michael Jordan berkley” and “Michael I. Jordan berkley”
S5. Query Classification The result of module query and classification is :academic
S6. Last semantic tag (Semantic Tagging) Module for Named Entity Recognition 、 The result of attribute recognition is :[Michael Jordan: The person's name ][berkley:location]:academic
The premise of intention recognition
The division of intention : Skill / field
Requirements classification of user query :
(1) Navigation class
(2) Information class
(3) Transaction class
The concept is introduced :
A complete interaction between users and search engines is called a Search Session, stay Session The information provided in includes : User query words (Query), The title of the search result the user clicked (Title), If the user is Session During the change of query words ( For example, from Query1 -->Query2), Then subsequent searches and clicks will be recorded , Until the user leaves the search , be Session end .
The method of intention recognition
1. A list of words / Rule analysis
2. Based on the query click log – Generally a search log record will include time - Query string - Click on URL Record - Information such as position in the result .
3. Machine learning methods ( Mining Based on rules , be based on Bayes、LR、SVM And so on )– Classification problem
query The classification of
eg: Identify the attributes of each entity word , Go to the index and match the corresponding fields exactly , So as to improve the accuracy of recall
4. Based on Neural Networks ( Deep learning )–FastText
The difficulty of intention recognition
1、 The input is not standard , I have already introduced , The expression of the same appeal by different users is different .
2、 Multi purpose , The query term is :” water ”, It's mineral water , Or make-up water for girls .
3、 Data cold start . When user behavior data is small , It's hard to get accurate intentions .
4、 There is no fixed evaluation standard .pv,ipv,ctr,cvr This kind of quantifiable index is the overall evaluation of the search system , There is no standard quantitative index for user intention prediction .
query rewrite
query rewrite , Category related , Named entity recognition and
query Rewriting includes :
query error correction – If the search engine returns an empty result / Or too little , At this time, the processing of spelling correction should be added
query Expand :
eg. “Michael Jordan berkley” and “Michael I. Jordan berkley”
(1) Synonym extension table
(2) Use word vectors to expand synonyms
(3) If query No corresponding return , Then expand the original according to the historical data of users query
query Delete – Decide which to discard / Some words ( Entity recognition )
Reference material
https://www.jianshu.com/p/e46eae028af3
https://blog.csdn.net/shijing_0214/article/details/71250327
https://blog.csdn.net/shijing_0214/article/details/71080642
版权声明
本文为[Elementary school students in IT field]所创,转载请带上原文链接,感谢
边栏推荐
- 在大规模 Kubernetes 集群上实现高 SLO 的方法
- Vue.js Mobile end left slide delete component
- Tool class under JUC package, its name is locksupport! Did you make it?
- Python download module to accelerate the implementation of recording
- Keyboard entry lottery random draw
- Arrangement of basic knowledge points
- 一篇文章带你了解HTML表格及其主要属性介绍
- The difference between gbdt and XGB, and the mathematical derivation of gradient descent method and Newton method
- 合约交易系统开发|智能合约交易平台搭建
- 速看!互联网、电商离线大数据分析最佳实践!(附网盘链接)
猜你喜欢

Elasticsearch数据库 | Elasticsearch-7.5.0应用搭建实战

Filecoin主网上线以来Filecoin矿机扇区密封到底是什么意思

Brief introduction of TF flags

一篇文章带你了解HTML表格及其主要属性介绍

Architecture article collection

Just now, I popularized two unique skills of login to Xuemei

教你轻松搞懂vue-codemirror的基本用法:主要实现代码编辑、验证提示、代码格式化

Python Jieba segmentation (stuttering segmentation), extracting words, loading words, modifying word frequency, defining thesaurus

Three Python tips for reading, creating and running multiple files

TensorFlow中的Tensor是什么?
随机推荐
vue-codemirror基本用法:实现搜索功能、代码折叠功能、获取编辑器值及时验证
Our best practices for writing react components
Python基础变量类型——List浅析
一篇文章带你了解CSS 渐变知识
Let the front-end siege division develop independently from the back-end: Mock.js
Brief introduction of TF flags
Wow, elasticsearch multi field weight sorting can play like this
Tool class under JUC package, its name is locksupport! Did you make it?
中小微企业选择共享办公室怎么样?
Skywalking series blog 2-skywalking using
比特币一度突破14000美元,即将面临美国大选考验
一篇文章带你了解CSS3 背景知识
JVM memory area and garbage collection
use Asponse.Words Working with word templates
PHP应用对接Justswap专用开发包【JustSwap.PHP】
6.1.2 handlermapping mapping processor (2) (in-depth analysis of SSM and project practice)
Filecoin的经济模型与未来价值是如何支撑FIL币价格破千的
Windows 10 tensorflow (2) regression analysis of principles, deep learning framework (gradient descent method to solve regression parameters)
Pattern matching: The gestalt approach一种序列的文本相似度方法
Deep understanding of common methods of JS array