当前位置:网站首页>[information retrieval] link analysis
[information retrieval] link analysis
2022-07-04 14:28:00 【Alex_ SCY】
(1). Reading materials 《Introduction to Information Retrieval》 The first 464-470 page 21.2 As described in section PageRank computing method ( adopt power iteration To achieve ), use Java Language or other commonly used languages to implement the algorithm . Take the structure shown in the following figure as an example to calculate each document Of PageRank value , among teleportation rate=0.05.
Preset some program parameters :
Create adjacency matrix according to the graph given in the title :
For this question , The adjacency matrix is shown below :
linkMatrix[i][j]=1 It indicates that there is a slave node i Point to the node j The directed side of .
Then start to calculate the transition probability matrix :
There are three steps :
- Use 1 Take out each 1
- The processed result matrix is multiplied by 1-α
- Add alpha/N
Finally, the transition probability matrix corresponding to the ontology can be obtained :
Perform power iteration :
Initialize the probability distribution vector :
Then iterate according to the following formula , Until the probability distribution vector converges :
The final calculation results are as follows : It can converge after one iteration
namely Pagerank(d1)=0.017,Pagerank(d2)=0.492,Pagerank(d3)=0.492. A simple analysis shows that ,d2 And d3 It's symmetrical . At the same time, because there is no document Point to d1, Only when you encounter random jump, you will jump to document1, therefore Pagerank(d1) Will be significantly smaller than the other two values .
(2). In another way ( No power iteration The way ) Calculate with pen ( No program calculation ) topic (1) Each of them document Of PageRank value . Detailed description and calculation process are required .(10 branch )
It can be calculated according to algebraic algorithm :
PageRank The definition of :
therefore :
among M Is the transition probability matrix ( No random jump ),1 by [1*N] The holonomic matrix of ,I Is the unit matrix
And that gives us ,Pagerank(d1)=0.017,Pagerank(d2)=0.492,Pagerank(d3)=0.492, The results are obtained by the power iteration method .
(3). Reading materials 《Introduction to Information Retrieval》 The first 474-477 page 21.3 As described in section HITS computing method ( adopt power iteration To achieve ), use Java Language or other commonly used languages to implement the algorithm . Take the structure shown in the following figure as an example to calculate each document Of authority Values and hub value .
Preset some program parameters :
Then label the nodes in the graph , And generate the corresponding adjacency matrix , As shown below :
Adjacency matrix :
initialization hub as well as authority vector :
Start the iteration according to the following formula , At the same time, after each iteration, the vector needs to be normalized , until hub And authority Vector convergence :
The final running results are as follows :
Final hub The largest value is the node 8( be in base set) , node 8 Pointing to the node 7 and 9, meanwhile 7 and 9 It is pointed to by multiple nodes (authority High value ), So nodes 8 Of hub The highest value is very reasonable .
authority The largest value is the node 7 ( be in root set), node 7 By node 2,3,8 Point to , At the same time node 2,3,8 Of hub High value , So nodes 7 Of authority The highest value is very reasonable .
hub/authority Value can reflect the navigation and authority of a web page . Different website purposes should focus on different indicators , For example, navigation websites should focus on hub value , This can point more accurately ; Portal websites should focus on authority value , Let more navigation websites point to it , Improve Authority .
边栏推荐
- GCC [6] - 4 stages of compilation
- [MySQL from introduction to proficiency] [advanced chapter] (V) SQL statement execution process of MySQL
- 电商系统中红包活动设计
- Golang uses JSON unmarshal number to interface{} number to become float64 type (turn)
- Chapter 16 string localization and message Dictionary (2)
- Leetcode T49: 字母异位词分组
- Popular framework: the use of glide
- Use of tiledlayout function in MATLAB
- PyTorch的自动求导机制详细解析,PyTorch的核心魔法
- 商业智能BI财务分析,狭义的财务分析和广义的财务分析有何不同?
猜你喜欢
随机推荐
Nowcoder rearrange linked list
失败率高达80%,企业数字化转型路上有哪些挑战?
Detailed analysis of pytorch's automatic derivation mechanism, pytorch's core magic
Data center concept
LiveData
Visual Studio调试方式详解
Supprimer les lettres dupliquées [avidité + pile monotone (maintenir la séquence monotone avec un tableau + Len)]
MySQL的触发器
Ml: introduction, principle, use method and detailed introduction of classic cases of snap value
LifeCycle
【云原生】我怎么会和这个数据库杠上了?
Solutions to the problems of miui12.5 red rice k20pro using Au or povo2
92.(cesium篇)cesium楼栋分层
R language uses dplyr package group_ The by function and the summarize function calculate the mean and standard deviation of the target variables based on the grouped variables
R语言使用epiDisplay包的dotplot函数通过点图的形式可视化不同区间数据点的频率、使用by参数指定分组参数可视化不同分组的点图分布
AI and Life Sciences
第十七章 进程内存
DDD application and practice of domestic hotel transactions -- Code
Data warehouse interview question preparation
递增的三元子序列[贪心训练]