当前位置:网站首页>Query word weight, search word weight calculation
Query word weight, search word weight calculation
2022-07-02 02:15:00 【AI Zeng Xiaojian】
query Word weight (term weighting) In order to Calculation query After word segmentation , Every term The importance of . Commonly used indicators are tf*idf(query in term Of tf Most of them are 1), That is, a term The more times it appears , Indicates that the less information , On the contrary term Less times , It shows that the more information . however term Is not as important as term The number of occurrences of is strictly monotonic , also idf Lack of contextual considerations ( such as “windows” stay “windows Application software ” It's more important , And in the “windows xp System iphone xs Guide photos ” The importance of is relatively low ). Word weight calculation as a basic resource in text relevance , Lose words and other tasks play an important role , Its optimization methods are mainly divided into the following three categories :
1) Based on corpus statistics
2) Based on click logs
3) Based on supervised learning
This paper first introduces some computational methods based on corpus statistics .
One 、imp(importance Abbreviation )
idf One drawback of is to rely solely on Word frequency comparison ,imp From query Based on the proportion of importance in , The static weighting of words is optimized by iterative calculation , The calculation process is as follows :
among BT by term Of imp value , The initial value can be set to 1,Tmp_i yes query No i individual term Proportion of importance ,N Refers to all including i individual term Of query number .
Two 、DIMP(Dynamic imp)
idf and imp A common disadvantage of is that they are all static empowerment .DIMP according to query The context of each term Dynamic empowerment , The main assumption is arbitrary query The word weight in can be determined by the Correlation query Word weight of , The calculation process can be divided into two parts :
1) Top down query Tree construction
Different construction methods are adopted according to the actual scene , Here is a way to search . Here's the picture , Given query As root node , First of all get query Correlation query As the second layer node , On the basis of the second layer , Enumerate related query The son of query As the third layer node , The last layer is after the word segmentation term node . therefore query The nodes of tree species are text strings with different granularity , Edges are the correlation between text strings . In the auction word recommendation task , user query Are relatively short keywords , It can build corresponding through the common purchase relationship between auction words query Trees .
边栏推荐
- Regular expression learning notes
- 剑指 Offer 29. 顺时针打印矩阵
- leetcode2312. Selling wood blocks (difficult, weekly race)
- 1222. Password dropping (interval DP, bracket matching)
- The concepts and differences between MySQL stored procedures and stored functions, as well as how to create them, the role of delimiter, the viewing, modification, deletion of stored procedures and fu
- Redis环境搭建和使用的方法
- Ar Augmented Reality applicable scenarios
- JMeter (I) - download, installation and plug-in management
- 734. Energy stone (greed, backpack)
- C # use system data. The split mixed mode assembly is generated for the "v2.0.50727" version of the runtime, and it cannot be loaded in the 4.0 runtime without configuring other information
猜你喜欢
大厂裁员潮不断,双非本科出身的我却逆风翻盘挺进阿里
[technology development -21]: rapid overview of the application and development of network and communication technology -1- Internet Network Technology
Deployment practice and problem solving of dash application development environment based on jupyter Lab
JMeter (II) - install the custom thread groups plug-in
What is the MySQL column to row function
Leetcode face T10 (1-9) array, ByteDance interview sharing
MySQL中一条SQL是怎么执行的
How to batch add background and transition effects to videos?
【带你学c带你飞】2day 第8章 指针(练习8.1 密码开锁)
query词权重, 搜索词权重计算
随机推荐
Architecture evolution from MVC to DDD
How to solve MySQL master-slave delay problem
oracle创建只读权限的用户简单四步走
【毕业季】研究生学长分享怎样让本科更有意义
C write TXT file
Niuke - Huawei question bank (51~60)
Construction and maintenance of business websites [14]
The middle element and the rightmost element of the shutter
How to hide the scroll bar of scroll view in uniapp
leetcode2311. Longest binary subsequence less than or equal to K (medium, weekly)
Architecture evolution from MVC to DDD
leetcode2311. 小于等于 K 的最长二进制子序列(中等,周赛)
Pytest testing framework
Cesium dynamic diffusion point effect
"C language programming", 4th Edition, edited by he Qinming and Yan Hui, after class exercise answers Chapter 3 branch structure Exercise 3
MySQL主从延迟问题怎么解决
Vsocde has cli every time it is opened js
[Video] visual interpretation of Markov chain principle and Mrs example of R language region conversion | data sharing
query词权重, 搜索词权重计算
剑指 Offer 42. 连续子数组的最大和