当前位置:网站首页>论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
2022-07-06 02:00:00 【闵帆】
摘要: 分享对论文的理解. 原文见 D. Saini, A. K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang and M. Varma, GalaXC: Graph neural networks with labelwise attention for extreme classification, in WWW 2021. 7 位作者中 6 位是微软研究院的人, 跟他们杠, 我觉得自己简直脑袋秀逗了.
1. 论文贡献
- 处理标签存在于文档内的情况: labels and documents cohabit the same space.
- 利用标签文本与标签相关性: label text and label correlations, label metadata.
- 标签级注意力机制: label-wise attention mechanism.
- 热启动 (部分标签已知) 时效果好: warm-start scenarios where predictions need to be made on data points with partially revealed label sets,
- 能处理几百万个标签.
- 又快又好.
2. 动机
- 已有工作表明,与使用与应用程序无关的特征(例如传统的词袋特征)相比,学习密集的特定于应用程序的文档表示可以带来更好的预测。These works have demonstrated that learning dense application-specific document representations can lead to better predictions than using application-agnostic features such as the traditional bag-of-words features.
- 5-10 个标记的短文本. 如使用标题进行相关网页或广告的预测. Short textual descriptions with typically only 5-10 tokens. Examples include applications such as predicting related webpages or related products using only the title of a given webpage/product and predicting relevant ads/keywords/searches for
user queries. - 使用多种元数据如标签文本、标签相关性、标签层次结构, 更好地服务于尾部标签. XC applications often make available label metadata in various forms such as label text, label correlations or label hierarchies.
- 标签特征. Contemporary XC algorithms have explored utilizing label features.
- 热启动与辅助数据源. Warm-start and auxiliary sources of data.
- 已有工作多数使用文档图而不是文档-标签图 (见 Table 1). existing works mostly use document-document graphs and not joint document-label graphs at extreme scales.
2. 基本符号
符号 | 含义 | 备注 |
---|---|---|
G \mathbb{G} G | 二部图 | G = ( D ∪ L , E ) \mathbb{G} = (\mathbb{D} \cup \mathbb{L}, \mathbb{E}) G=(D∪L,E) |
D \mathbb{D} D | 文本节点集合 | 元素记作 d d d, 基数为 N N N |
L \mathbb{L} L | 标签节点集合 | 元素记作 l l l, 基数为 L L L |
y i \mathbf{y}_i yi | 第 i i i 个文本的真实标签向量 | 取值范围为 { − 1 , + 1 } L \{-1, +1\}^L { −1,+1}L |
x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 | 第 i i i 个文档的特征向量 | D D D 维 |
z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 | 第 l l l 个标签的特征向量 | D D D 维 |
v ^ n 0 \hat{\mathbf{v}}_n^0 v^n0 | x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 与 z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 的统一表示 | D D D 维 |
N \mathcal{N} N | 求邻居操作 | V → 2 V \mathbb{V} \to 2^\mathbb{V} V→2V |
C \mathcal{C} C | 卷积操作 | |
T \mathcal{T} T | 转型操作 | transformation |
a ^ n k \hat{\mathbf{a}}_n^k a^nk | C k ( { v ^ m k − 1 , a ^ m k − 1 : m ∈ N ( n ) } ) \mathcal{C}_k(\{\hat{\mathbf{v}}_m^{k-1}, \hat{\mathbf{a}}_m^{k-1}: m \in \mathcal{N}(n)\}) Ck({ v^mk−1,a^mk−1:m∈N(n)}) | GNN 操作 |
v ^ n k \hat{\mathbf{v}}_n^k v^nk | T k ( { v ^ n k − 1 , a ^ n k − 1 } ) \mathcal{T}_k(\{\hat{\mathbf{v}}_n^{k-1}, \hat{\mathbf{a}}_n^{k-1}\}) Tk({ v^nk−1,a^nk−1}) | GNN 操作 |
W \mathbf{W} W | 系数矩阵 | D × L D \times L D×L 维 |
K K K | hop 数 | |
e l k e_{lk} elk | 标签 l l l 在第 k k k 个 hop 的标量 |
3. 方案
Graph convolution block 的具体操作是
a ^ n k = C k ( a ^ n k − 1 ) = ( 1 + ϵ k ) ⋅ a ^ n k − 1 + ∑ m ∈ N ( n ) a ^ m k − 1 \hat{\mathbf{a}}_n^k = \mathcal{C}_k(\hat{\mathbf{a}}_n^{k-1}) = (1 + \epsilon_k) \cdot \hat{\mathbf{a}}_n^{k-1} + \sum_{m \in \mathcal{N}(n)}\hat{\mathbf{a}}_m^{k-1} a^nk=Ck(a^nk−1)=(1+ϵk)⋅a^nk−1+m∈N(n)∑a^mk−1
Embedding 的具体操作是
v ^ n k = T k ( a ^ n k ) \hat{\mathbf{v}}_n^k = \mathcal{T}_k(\hat{\mathbf{a}}_n^k) v^nk=Tk(a^nk)
令
α l k = exp ( e l k ) / ∑ k ′ ∈ [ K ] exp e l k ′ \alpha_{lk} = \exp(e_{lk}) / \sum_{k' \in [K]} \exp e_{lk'} αlk=exp(elk)/k′∈[K]∑expelk′
它表示第 k k k 个 hop 时的占比.
标签嵌入计算式为
x ^ ( l ) = ∑ k ∈ [ k ] α l k ⋅ x ^ k \hat{\mathbf{x}}^{(l)} = \sum_{k \in [k]} \alpha_{lk} \cdot \hat{\mathbf{x}}^{k} x^(l)=k∈[k]∑αlk⋅x^k
注意: 这里的 k k k 次方还未理解.
标签得分为
s l = * w l , x ^ ( l ) * s_l = \langle \mathbf{w}_l, \hat{\mathbf{x}}^{(l)} \rangle sl=*wl,x^(l)*
4. 小结
在读懂程序之前, 根本无法理解这篇论文.
边栏推荐
- 【Flask】响应、session与Message Flashing
- How to set an alias inside a bash shell script so that is it visible from the outside?
- Redis-字符串类型
- PHP campus financial management system for computer graduation design
- [solution] every time idea starts, it will build project
- Cadre du Paddle: aperçu du paddlelnp [bibliothèque de développement pour le traitement du langage naturel des rames volantes]
- 500 lines of code to understand the principle of mecached cache client driver
- [network attack and defense training exercises]
- How does redis implement multiple zones?
- [ssrf-01] principle and utilization examples of server-side Request Forgery vulnerability
猜你喜欢
Overview of spark RDD
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
selenium 等待方式
Force buckle 1020 Number of enclaves
Grabbing and sorting out external articles -- status bar [4]
02. Go language development environment configuration
【Flask】官方教程(Tutorial)-part1:项目布局、应用程序设置、定义和访问数据库
Force buckle 9 palindromes
Alibaba canal usage details (pit draining version)_ MySQL and ES data synchronization
It's wrong to install PHP zbarcode extension. I don't know if any God can help me solve it. 7.3 for PHP environment
随机推荐
leetcode-两数之和
Force buckle 1020 Number of enclaves
Folio. Ink is a free, fast and easy-to-use image sharing tool
You are using pip version 21.1.1; however, version 22.0.3 is available. You should consider upgradin
Basic operations of databases and tables ----- primary key constraints
PHP campus movie website system for computer graduation design
A basic lintcode MySQL database problem
同一个 SqlSession 中执行两条一模一样的SQL语句查询得到的 total 数量不一样
How to upgrade kubernetes in place
MySQL lethal serial question 1 -- are you familiar with MySQL transactions?
NumPy 数组索引 切片
Overview of spark RDD
LeetCode 322. Change exchange (dynamic planning)
Have a look at this generation
Selenium waiting mode
selenium 元素定位(2)
Install redis
TrueType字体文件提取关键信息
阿裏測開面試題
Computer graduation design PHP college classroom application management system