当前位置:网站首页>Query word weight, search word weight calculation
Query word weight, search word weight calculation
2022-07-02 02:15:00 【AI Zeng Xiaojian】
query Word weight (term weighting) In order to Calculation query After word segmentation , Every term The importance of . Commonly used indicators are tf*idf(query in term Of tf Most of them are 1), That is, a term The more times it appears , Indicates that the less information , On the contrary term Less times , It shows that the more information . however term Is not as important as term The number of occurrences of is strictly monotonic , also idf Lack of contextual considerations ( such as “windows” stay “windows Application software ” It's more important , And in the “windows xp System iphone xs Guide photos ” The importance of is relatively low ). Word weight calculation as a basic resource in text relevance , Lose words and other tasks play an important role , Its optimization methods are mainly divided into the following three categories :
1) Based on corpus statistics
2) Based on click logs
3) Based on supervised learning
This paper first introduces some computational methods based on corpus statistics .
One 、imp(importance Abbreviation )
idf One drawback of is to rely solely on Word frequency comparison ,imp From query Based on the proportion of importance in , The static weighting of words is optimized by iterative calculation , The calculation process is as follows :

among BT by term Of imp value , The initial value can be set to 1,Tmp_i yes query No i individual term Proportion of importance ,N Refers to all including i individual term Of query number .
Two 、DIMP(Dynamic imp)
idf and imp A common disadvantage of is that they are all static empowerment .DIMP according to query The context of each term Dynamic empowerment , The main assumption is arbitrary query The word weight in can be determined by the Correlation query Word weight of , The calculation process can be divided into two parts :
1) Top down query Tree construction
Different construction methods are adopted according to the actual scene , Here is a way to search . Here's the picture , Given query As root node , First of all get query Correlation query As the second layer node , On the basis of the second layer , Enumerate related query The son of query As the third layer node , The last layer is after the word segmentation term node . therefore query The nodes of tree species are text strings with different granularity , Edges are the correlation between text strings . In the auction word recommendation task , user query Are relatively short keywords , It can build corresponding through the common purchase relationship between auction words query Trees .
边栏推荐
- Logging only errors to the console Set system property ‘log4j2. debug‘ to sh
- 剑指 Offer 31. 栈的压入、弹出序列
- How to use a product to promote "brand thrill"?
- Is the knowledge of University useless and outdated?
- [question] - why is optical flow not good for static scenes
- leetcode2309. 兼具大小写的最好英文字母(简单,周赛)
- leetcode2312. Selling wood blocks (difficult, weekly race)
- How to hide the scroll bar of scroll view in uniapp
- How does MySQL solve the problem of not releasing space after deleting a large amount of data
- CVPR 2022 | 大连理工提出自校准照明框架,用于现实场景的微光图像增强
猜你喜欢

Cesium dynamic diffusion point effect

Design and implementation of key value storage engine based on LSM tree

With the innovation and upgrading of development tools, Kunpeng promotes the "bamboo forest" growth of the computing industry

Medical management system (C language course for freshmen)

Selection of field types for creating tables in MySQL database

Golang lock

CSDN article underlined, font color changed, picture centered, 1 second to understand

No programming code technology! Four step easy flower store applet

AR增强现实可应用的场景

附加:信息脱敏;
随机推荐
[技术发展-21]:网络与通信技术的应用与发展快速概览-1- 互联网网络技术
Design and implementation of key value storage engine based on LSM tree
Is the knowledge of University useless and outdated?
Sword finger offer 31 Stack push in and pop-up sequence
Construction and maintenance of business websites [10]
Construction and maintenance of business websites [15]
软件开发生命周期 --瀑布模型
2022 Q2 - 提升技能的技巧总结
Duplicate keys detected: ‘0‘. This may cause an update error. found in
[opencv] - comprehensive examples of five image filters
Sword finger offer 47 Maximum value of gifts
A quick understanding of analog electricity
研发中台拆分过程的一些心得总结
"C language programming", 4th Edition, edited by he Qinming and Yan Hui, after class exercise answers Chapter 3 branch structure
The middle element and the rightmost element of the shutter
Open那啥的搭建文档
Construction and maintenance of business websites [11]
Medical management system (C language course for freshmen)
LFM信号加噪、时频分析、滤波
flutter 中間一個元素,最右邊一個元素