当前位置:网站首页>[paper reading] raw+:a two view graph propagation method with word coupling for readability assessment
[paper reading] raw+:a two view graph propagation method with word coupling for readability assessment
2022-07-26 13:53:00 【Muxi Krystal】
Key Point:
- Graph based classification ( Use the relationship between documents )
- Coupled word bag model ( Evaluate the relevance of words in reading difficulties )
- Two view propagation method ( At the same time, we use word bag model and language features )
Method
in general , It is divided into the following two stages :
- Characteristic means ( Map the document into feature vectors )
- Readability classification ( Graph construction 、 Graph merging and label propagation )
One . Coupled word bag model (The Coupled Bag-of-Words Model)
1. Construct word coupling matrix (the word coupling matrix)
- hypothesis : Simple words tend to appear in simple sentences , Difficult words tend to appear in difficult sentences .
- idea : Assess the difficulty of reading a word , Through its co-occurrence probability distribution in sentences with different difficulty levels .
- Note: Use unlabeled sentences , Use heuristic functions (heuristic functions) Label the sentences with reading difficulty .
Step1:Per-sentence reading difficulty estimation
- Use heuristic function to make a rough estimation , Get the weak tag of each sentence (weak label).
- utilize 8 Language features ( It can be applied to sentence level ), To build the 8 A heuristic function is used to calculate the difficulty score of sentences .

- Discretize continuous fractions , Use the following formula to get the reading level of each sentence (3 In terms of )

Step2:Per-word difficulty distribution estimation
- Every word t t t The corresponding length is i Vector ( i i i Is the number of reading grades ).
- Every dimension of a vector i i i Is a probability value p t ( i ) p_t(i) pt(i), Equals that the word appears at the corresponding level i i i The proportion of the number of sentences in the total number of sentences in which the word appears , The calculation formula is as follows :

Step3:Word coupling matrix construction
- The value of each element in the word coupling matrix corresponds to the relationship between two words , The value is the similarity of the vector between two words / The difference of probability distribution .
- A total of 3 Word coupling matrix C s u r , C l e x , C s y n C^{sur},C^{lex},C^{syn} Csur,Clex,Csyn.
- Filtering strategy ( Avoid that it will be very time-consuming when the vocabulary is too large ): According to the entropy of words , Set the percentage , Filter low entropy words .
2. Generate a coupled word bag model
Through the word coupling matrix and the basic word bag matrix (BoW matrix) Multiply , Get the coupled word bag matrix (coupled BoW matrix).

A total of 3 Coupling TF-IDF matrix , Namely M s u r M^{sur} Msur, M l e x M^{lex} Mlex, M s y n M^{syn} Msyn, Coupled word bag matrix (cBoW) Will be dense , And pay attention to the similarity of reading difficulties .
Two . Language features (The Linguistic Features)
- Build a matrix M l ∈ R n l × ∣ D ∣ M^l\in R^{n_l\times \left | D \right | } Ml∈Rnl×∣D∣, among n l n_l nl Is the number of language features .
- The language features selected in this article are language-independent, To support the different language independent features of their proposed methods .
- Surface Features, Lexical Features, Syntactic Features
3、 ... and . Dual view propagation (Two-View Graph Propagation)
The construction of graph

Merging of graphs
It is divided into the merging of isomorphic diagrams in views and the merging of heterogeneous diagrams between views .Intra-view homogeneous graph merging
The basic idea is to keep the common edges and remove the edges with redundant information . First keep in 3 Neighbor nodes exist in all graphs , For nodes that exist in at least one graph , Select and point v Nodes with the least public neighbors ( The purpose is to ensure that the number of triangles in the graph is at least ). The boundary right is 3 The average of the corresponding edges of a graph .
边栏推荐
- Docker integrates the redis sentinel mode (one master, two slave and three sentinels)
- Team research and development from ants' foraging process (Reprint)
- JSON data returned by controller
- In 2022, we "sent away" so many Internet products in only one month
- See you tomorrow at the industrial session of cloud intelligence technology forum!
- 带你熟悉云网络的“电话簿”:DNS
- Completable future practical usage
- 二叉树的层序遍历(C语言实现)
- 【Oauth2】七、微信OAuth2授权登录
- 2022年,我们只用一个月就“送走”了这么多互联网产品
猜你喜欢

Pytoch learning notes (III) use, modification, training (cpu/gpu) and verification of the model

Docker container MySQL enables binlog and scheduled backup

How to write the introduction of GIS method journals and papers?

Pytorch学习笔记(二)神经网络的使用

白帽子揭秘:互联网千亿黑产吓退马斯克

云智技术论坛工业专场 明天见!

2022年,我们只用一个月就“送走”了这么多互联网产品

TDSQL-C Serverless:助力初创企业实现降本增效

gdb常用命令

Docker swarm cluster builds highly available MySQL active and standby
随机推荐
The.Net webapi uses groupname to group controllers to render the swagger UI
El table implements editable table
Redis learning notes
POM file details
(Reprint) creation methods of various points in ArcGIS Engine
Explain four interesting NPM usages with charts
[oauth2] VII. Wechat oauth2 authorized login
[NOIP2003 普及组]栈
[turn] judge the relationship between two geometries in ArcGIS
2022-07-26日报:Alphafold DB数据库建立一周年,官推盘点亮点研究
With frequent data leakage and deletion events, how should enterprises build a security defense line?
How to write the introduction of GIS method journals and papers?
Latest battle report: Ten certifications and five best practices
GDB common commands
[noip2003 popularity group] stack
聚力打造四个“高地”,携手合作伙伴共铸国云!
Multithreaded completable future usage
.net6 encounter with the League of heroes - create a game assistant according to the official LCU API
421. Maximum XOR value of two numbers in the array
Pytorch学习笔记(二)神经网络的使用