当前位置:网站首页>Query word weight, search word weight calculation
Query word weight, search word weight calculation
2022-07-02 02:15:00 【AI Zeng Xiaojian】
query Word weight (term weighting) In order to Calculation query After word segmentation , Every term The importance of . Commonly used indicators are tf*idf(query in term Of tf Most of them are 1), That is, a term The more times it appears , Indicates that the less information , On the contrary term Less times , It shows that the more information . however term Is not as important as term The number of occurrences of is strictly monotonic , also idf Lack of contextual considerations ( such as “windows” stay “windows Application software ” It's more important , And in the “windows xp System iphone xs Guide photos ” The importance of is relatively low ). Word weight calculation as a basic resource in text relevance , Lose words and other tasks play an important role , Its optimization methods are mainly divided into the following three categories :
1) Based on corpus statistics
2) Based on click logs
3) Based on supervised learning
This paper first introduces some computational methods based on corpus statistics .
One 、imp(importance Abbreviation )
idf One drawback of is to rely solely on Word frequency comparison ,imp From query Based on the proportion of importance in , The static weighting of words is optimized by iterative calculation , The calculation process is as follows :

among BT by term Of imp value , The initial value can be set to 1,Tmp_i yes query No i individual term Proportion of importance ,N Refers to all including i individual term Of query number .
Two 、DIMP(Dynamic imp)
idf and imp A common disadvantage of is that they are all static empowerment .DIMP according to query The context of each term Dynamic empowerment , The main assumption is arbitrary query The word weight in can be determined by the Correlation query Word weight of , The calculation process can be divided into two parts :
1) Top down query Tree construction
Different construction methods are adopted according to the actual scene , Here is a way to search . Here's the picture , Given query As root node , First of all get query Correlation query As the second layer node , On the basis of the second layer , Enumerate related query The son of query As the third layer node , The last layer is after the word segmentation term node . therefore query The nodes of tree species are text strings with different granularity , Edges are the correlation between text strings . In the auction word recommendation task , user query Are relatively short keywords , It can build corresponding through the common purchase relationship between auction words query Trees .
边栏推荐
- Construction and maintenance of business websites [12]
- No programming code technology! Four step easy flower store applet
- [Video] visual interpretation of Markov chain principle and Mrs example of R language region conversion | data sharing
- Kibana操控ES
- Open那啥的搭建文档
- [Video] Markov chain Monte Carlo method MCMC principle and R language implementation | data sharing
- WebGPU(一):基本概念
- leetcode373. 查找和最小的 K 对数字(中等)
- Exception handling of class C in yyds dry goods inventory
- Flutter un élément au milieu, l'élément le plus à droite
猜你喜欢

If you want to rewind the video picture, what simple methods can you use?

CSDN article underlined, font color changed, picture centered, 1 second to understand

Pytest testing framework

How to use a product to promote "brand thrill"?

What is the MySQL column to row function

Word search applet design report based on cloud development +ppt+ project source code + demonstration video

Golang lock

How to use redis ordered collection

Leetcode face T10 (1-9) array, ByteDance interview sharing

Additional: information desensitization;
随机推荐
leetcode2309. The best English letters with both upper and lower case (simple, weekly)
The difference between new and malloc
leetcode373. 查找和最小的 K 对数字(中等)
how to add one row in the dataframe?
mysql列转行函数指的是什么
C return multiple values getter setter queries the database and adds the list return value to the window
leetcode373. Find and minimum k-pair numbers (medium)
Is the knowledge of University useless and outdated?
734. Energy stone (greed, backpack)
Flutter un élément au milieu, l'élément le plus à droite
Construction and maintenance of business websites [14]
Construction and maintenance of business websites [10]
STM32F103 - two circuit PWM control motor
Leetcode face T10 (1-9) array, ByteDance interview sharing
Post infiltration flow encryption
剑指 Offer 42. 连续子数组的最大和
Five skills of adding audio codec to embedded system
Additional: information desensitization;
剑指 Offer 62. 圆圈中最后剩下的数字
Ar Augmented Reality applicable scenarios