当前位置:网站首页>[information retrieval] link analysis
[information retrieval] link analysis
2022-07-04 14:28:00 【Alex_ SCY】
(1). Reading materials 《Introduction to Information Retrieval》 The first 464-470 page 21.2 As described in section PageRank computing method ( adopt power iteration To achieve ), use Java Language or other commonly used languages to implement the algorithm . Take the structure shown in the following figure as an example to calculate each document Of PageRank value , among teleportation rate=0.05.
Preset some program parameters :
Create adjacency matrix according to the graph given in the title :
For this question , The adjacency matrix is shown below :
linkMatrix[i][j]=1 It indicates that there is a slave node i Point to the node j The directed side of .
Then start to calculate the transition probability matrix :
There are three steps :
- Use 1 Take out each 1
- The processed result matrix is multiplied by 1-α
- Add alpha/N
Finally, the transition probability matrix corresponding to the ontology can be obtained :
Perform power iteration :
Initialize the probability distribution vector :
Then iterate according to the following formula , Until the probability distribution vector converges :
The final calculation results are as follows : It can converge after one iteration
namely Pagerank(d1)=0.017,Pagerank(d2)=0.492,Pagerank(d3)=0.492. A simple analysis shows that ,d2 And d3 It's symmetrical . At the same time, because there is no document Point to d1, Only when you encounter random jump, you will jump to document1, therefore Pagerank(d1) Will be significantly smaller than the other two values .
(2). In another way ( No power iteration The way ) Calculate with pen ( No program calculation ) topic (1) Each of them document Of PageRank value . Detailed description and calculation process are required .(10 branch )
It can be calculated according to algebraic algorithm :
PageRank The definition of :
therefore :
among M Is the transition probability matrix ( No random jump ),1 by [1*N] The holonomic matrix of ,I Is the unit matrix
And that gives us ,Pagerank(d1)=0.017,Pagerank(d2)=0.492,Pagerank(d3)=0.492, The results are obtained by the power iteration method .
(3). Reading materials 《Introduction to Information Retrieval》 The first 474-477 page 21.3 As described in section HITS computing method ( adopt power iteration To achieve ), use Java Language or other commonly used languages to implement the algorithm . Take the structure shown in the following figure as an example to calculate each document Of authority Values and hub value .
Preset some program parameters :
Then label the nodes in the graph , And generate the corresponding adjacency matrix , As shown below :
Adjacency matrix :
initialization hub as well as authority vector :
Start the iteration according to the following formula , At the same time, after each iteration, the vector needs to be normalized , until hub And authority Vector convergence :
The final running results are as follows :
Final hub The largest value is the node 8( be in base set) , node 8 Pointing to the node 7 and 9, meanwhile 7 and 9 It is pointed to by multiple nodes (authority High value ), So nodes 8 Of hub The highest value is very reasonable .
authority The largest value is the node 7 ( be in root set), node 7 By node 2,3,8 Point to , At the same time node 2,3,8 Of hub High value , So nodes 7 Of authority The highest value is very reasonable .
hub/authority Value can reflect the navigation and authority of a web page . Different website purposes should focus on different indicators , For example, navigation websites should focus on hub value , This can point more accurately ; Portal websites should focus on authority value , Let more navigation websites point to it , Improve Authority .
边栏推荐
- Codeforce:c. sum of substrings
- Remove duplicate letters [greedy + monotonic stack (maintain monotonic sequence with array +len)]
- Solutions aux problèmes d'utilisation de l'au ou du povo 2 dans le riz rouge k20pro MIUI 12.5
- Xcode 异常图片导致ipa包增大问题
- Supprimer les lettres dupliquées [avidité + pile monotone (maintenir la séquence monotone avec un tableau + Len)]
- R语言使用dplyr包的mutate函数对指定数据列进行标准化处理(使用mean函数和sd函数)并基于分组变量计算标准化后的目标变量的分组均值
- gin集成支付宝支付
- Abnormal value detection using shap value
- Some problems and ideas of data embedding point
- ML:SHAP值的简介、原理、使用方法、经典案例之详细攻略
猜你喜欢
DDD application and practice of domestic hotel transactions -- Code
Map of mL: Based on Boston house price regression prediction data set, an interpretable case of xgboost model using map value
Vscode common plug-ins summary
(1)性能调优的标准和做好调优的正确姿势-有性能问题,上HeapDump性能社区!
Detailed analysis of pytorch's automatic derivation mechanism, pytorch's core magic
Pandora IOT development board learning (RT thread) - Experiment 3 button experiment (learning notes)
Scratch Castle Adventure Electronic Society graphical programming scratch grade examination level 3 true questions and answers analysis June 2022
Why should Base64 encoding be used for image transmission
Use of tiledlayout function in MATLAB
数据中台概念
随机推荐
(1)性能调优的标准和做好调优的正确姿势-有性能问题,上HeapDump性能社区!
聊聊保证线程安全的 10 个小技巧
flink sql-client.sh 使用教程
sql优化之explain
leetcode:6110. 网格图中递增路径的数目【dfs + cache】
Oppo find N2 product form first exposure: supplement all short boards
R language ggplot2 visualization: gganimate package creates dynamic line graph animation (GIF) and uses transition_ The reveal function displays data step by step along a given dimension in the animat
R语言使用dplyr包的mutate函数对指定数据列进行标准化处理(使用mean函数和sd函数)并基于分组变量计算标准化后的目标变量的分组均值
Visual Studio调试方式详解
R language uses bwplot function in lattice package to visualize box plot and par Settings parameter custom theme mode
测试流程整理(2)
Rich text editing: wangeditor tutorial
迅为IMX6Q开发板QT系统移植tinyplay
R language dplyr package summary_ If function calculates the mean and median of all numerical data columns in dataframe data, and summarizes all numerical variables based on conditions
The game goes to sea and operates globally
An overview of 2D human posture estimation
基于51单片机的超声波测距仪
Leetcode T48: rotating images
Progress in architecture
How to package QT and share exe