当前位置:网站首页>Query word weight, search word weight calculation
Query word weight, search word weight calculation
2022-07-02 02:15:00 【AI Zeng Xiaojian】
query Word weight (term weighting) In order to Calculation query After word segmentation , Every term The importance of . Commonly used indicators are tf*idf(query in term Of tf Most of them are 1), That is, a term The more times it appears , Indicates that the less information , On the contrary term Less times , It shows that the more information . however term Is not as important as term The number of occurrences of is strictly monotonic , also idf Lack of contextual considerations ( such as “windows” stay “windows Application software ” It's more important , And in the “windows xp System iphone xs Guide photos ” The importance of is relatively low ). Word weight calculation as a basic resource in text relevance , Lose words and other tasks play an important role , Its optimization methods are mainly divided into the following three categories :
1) Based on corpus statistics
2) Based on click logs
3) Based on supervised learning
This paper first introduces some computational methods based on corpus statistics .
One 、imp(importance Abbreviation )
idf One drawback of is to rely solely on Word frequency comparison ,imp From query Based on the proportion of importance in , The static weighting of words is optimized by iterative calculation , The calculation process is as follows :

among BT by term Of imp value , The initial value can be set to 1,Tmp_i yes query No i individual term Proportion of importance ,N Refers to all including i individual term Of query number .
Two 、DIMP(Dynamic imp)
idf and imp A common disadvantage of is that they are all static empowerment .DIMP according to query The context of each term Dynamic empowerment , The main assumption is arbitrary query The word weight in can be determined by the Correlation query Word weight of , The calculation process can be divided into two parts :
1) Top down query Tree construction
Different construction methods are adopted according to the actual scene , Here is a way to search . Here's the picture , Given query As root node , First of all get query Correlation query As the second layer node , On the basis of the second layer , Enumerate related query The son of query As the third layer node , The last layer is after the word segmentation term node . therefore query The nodes of tree species are text strings with different granularity , Edges are the correlation between text strings . In the auction word recommendation task , user query Are relatively short keywords , It can build corresponding through the common purchase relationship between auction words query Trees .
边栏推荐
- [question] - why is optical flow not good for static scenes
- How to use redis ordered collection
- How to build and use redis environment
- [Video] visual interpretation of Markov chain principle and Mrs example of R language region conversion | data sharing
- MySQL约束与多表查询实例分析
- Which is a good Bluetooth headset of about 300? 2022 high cost performance Bluetooth headset inventory
- Exception handling of class C in yyds dry goods inventory
- "C language programming", 4th Edition, edited by he Qinming and Yan Hui, after class exercise answers Chapter 3 branch structure Exercise 3
- Software development life cycle -- waterfall model
- CSDN article underlined, font color changed, picture centered, 1 second to understand
猜你喜欢

Architecture evolution from MVC to DDD

CSDN article underlined, font color changed, picture centered, 1 second to understand

leetcode373. 查找和最小的 K 对数字(中等)

MySQL constraints and multi table query example analysis

leetcode2310. The one digit number is the sum of integers of K (medium, weekly)

花一个星期时间呕心沥血整理出高频软件测试/自动化测试面试题和答案

MySQL operates the database through the CMD command line, and the image cannot be found during the real machine debugging of fluent
![[graduation season] graduate seniors share how to make undergraduate more meaningful](/img/03/9adc44476e87b2499aa0ebb11cb247.png)
[graduation season] graduate seniors share how to make undergraduate more meaningful

Software development life cycle -- waterfall model

What are the necessary things for students to start school? Ranking list of Bluetooth headsets with good sound quality
随机推荐
Start from scratch - Web Host - 01
Construction and maintenance of business websites [14]
Kibana操控ES
Opengauss database backup and recovery guide
Sword finger offer 47 Maximum value of gifts
【带你学c带你飞】day 5 第2章 用C语言编写程序(习题2)
Selection of field types for creating tables in MySQL database
【带你学c带你飞】4day第2章 用C语言编写程序(练习 2.5 生成乘方表与阶乘表
Open那啥的搭建文档
剑指 Offer II 031. 最近最少使用缓存
An analysis of circuit for quick understanding
This is the report that leaders like! Learn dynamic visual charts, promotion and salary increase are indispensable
* and & symbols in C language
【OpenCV】-5种图像滤波的综合示例
【深度学习】infomap 人脸聚类 facecluster
query词权重, 搜索词权重计算
【带你学c带你飞】3day第2章 用C语言编写程序(练习 2.3 计算分段函数)
The difference between new and malloc
Medical management system (C language course for freshmen)
医药管理系统(大一下C语言课设)