当前位置:网站首页>论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
2022-07-06 02:00:00 【闵帆】
摘要: 分享对论文的理解. 原文见 D. Saini, A. K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang and M. Varma, GalaXC: Graph neural networks with labelwise attention for extreme classification, in WWW 2021. 7 位作者中 6 位是微软研究院的人, 跟他们杠, 我觉得自己简直脑袋秀逗了.
1. 论文贡献
- 处理标签存在于文档内的情况: labels and documents cohabit the same space.
- 利用标签文本与标签相关性: label text and label correlations, label metadata.
- 标签级注意力机制: label-wise attention mechanism.
- 热启动 (部分标签已知) 时效果好: warm-start scenarios where predictions need to be made on data points with partially revealed label sets,
- 能处理几百万个标签.
- 又快又好.
2. 动机
- 已有工作表明,与使用与应用程序无关的特征(例如传统的词袋特征)相比,学习密集的特定于应用程序的文档表示可以带来更好的预测。These works have demonstrated that learning dense application-specific document representations can lead to better predictions than using application-agnostic features such as the traditional bag-of-words features.
- 5-10 个标记的短文本. 如使用标题进行相关网页或广告的预测. Short textual descriptions with typically only 5-10 tokens. Examples include applications such as predicting related webpages or related products using only the title of a given webpage/product and predicting relevant ads/keywords/searches for
user queries. - 使用多种元数据如标签文本、标签相关性、标签层次结构, 更好地服务于尾部标签. XC applications often make available label metadata in various forms such as label text, label correlations or label hierarchies.
- 标签特征. Contemporary XC algorithms have explored utilizing label features.
- 热启动与辅助数据源. Warm-start and auxiliary sources of data.
- 已有工作多数使用文档图而不是文档-标签图 (见 Table 1). existing works mostly use document-document graphs and not joint document-label graphs at extreme scales.
2. 基本符号
| 符号 | 含义 | 备注 |
|---|---|---|
| G \mathbb{G} G | 二部图 | G = ( D ∪ L , E ) \mathbb{G} = (\mathbb{D} \cup \mathbb{L}, \mathbb{E}) G=(D∪L,E) |
| D \mathbb{D} D | 文本节点集合 | 元素记作 d d d, 基数为 N N N |
| L \mathbb{L} L | 标签节点集合 | 元素记作 l l l, 基数为 L L L |
| y i \mathbf{y}_i yi | 第 i i i 个文本的真实标签向量 | 取值范围为 { − 1 , + 1 } L \{-1, +1\}^L { −1,+1}L |
| x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 | 第 i i i 个文档的特征向量 | D D D 维 |
| z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 | 第 l l l 个标签的特征向量 | D D D 维 |
| v ^ n 0 \hat{\mathbf{v}}_n^0 v^n0 | x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 与 z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 的统一表示 | D D D 维 |
| N \mathcal{N} N | 求邻居操作 | V → 2 V \mathbb{V} \to 2^\mathbb{V} V→2V |
| C \mathcal{C} C | 卷积操作 | |
| T \mathcal{T} T | 转型操作 | transformation |
| a ^ n k \hat{\mathbf{a}}_n^k a^nk | C k ( { v ^ m k − 1 , a ^ m k − 1 : m ∈ N ( n ) } ) \mathcal{C}_k(\{\hat{\mathbf{v}}_m^{k-1}, \hat{\mathbf{a}}_m^{k-1}: m \in \mathcal{N}(n)\}) Ck({ v^mk−1,a^mk−1:m∈N(n)}) | GNN 操作 |
| v ^ n k \hat{\mathbf{v}}_n^k v^nk | T k ( { v ^ n k − 1 , a ^ n k − 1 } ) \mathcal{T}_k(\{\hat{\mathbf{v}}_n^{k-1}, \hat{\mathbf{a}}_n^{k-1}\}) Tk({ v^nk−1,a^nk−1}) | GNN 操作 |
| W \mathbf{W} W | 系数矩阵 | D × L D \times L D×L 维 |
| K K K | hop 数 | |
| e l k e_{lk} elk | 标签 l l l 在第 k k k 个 hop 的标量 |
3. 方案
Graph convolution block 的具体操作是
a ^ n k = C k ( a ^ n k − 1 ) = ( 1 + ϵ k ) ⋅ a ^ n k − 1 + ∑ m ∈ N ( n ) a ^ m k − 1 \hat{\mathbf{a}}_n^k = \mathcal{C}_k(\hat{\mathbf{a}}_n^{k-1}) = (1 + \epsilon_k) \cdot \hat{\mathbf{a}}_n^{k-1} + \sum_{m \in \mathcal{N}(n)}\hat{\mathbf{a}}_m^{k-1} a^nk=Ck(a^nk−1)=(1+ϵk)⋅a^nk−1+m∈N(n)∑a^mk−1
Embedding 的具体操作是
v ^ n k = T k ( a ^ n k ) \hat{\mathbf{v}}_n^k = \mathcal{T}_k(\hat{\mathbf{a}}_n^k) v^nk=Tk(a^nk)
令
α l k = exp ( e l k ) / ∑ k ′ ∈ [ K ] exp e l k ′ \alpha_{lk} = \exp(e_{lk}) / \sum_{k' \in [K]} \exp e_{lk'} αlk=exp(elk)/k′∈[K]∑expelk′
它表示第 k k k 个 hop 时的占比.
标签嵌入计算式为
x ^ ( l ) = ∑ k ∈ [ k ] α l k ⋅ x ^ k \hat{\mathbf{x}}^{(l)} = \sum_{k \in [k]} \alpha_{lk} \cdot \hat{\mathbf{x}}^{k} x^(l)=k∈[k]∑αlk⋅x^k
注意: 这里的 k k k 次方还未理解.
标签得分为
s l = * w l , x ^ ( l ) * s_l = \langle \mathbf{w}_l, \hat{\mathbf{x}}^{(l)} \rangle sl=*wl,x^(l)*
4. 小结
在读懂程序之前, 根本无法理解这篇论文.
边栏推荐
- Using SA token to solve websocket handshake authentication
- Have a look at this generation
- Regular expressions: examples (1)
- Use image components to slide through photo albums and mobile phone photo album pages
- Redis守护进程无法停止解决方案
- 2022 PMP project management examination agile knowledge points (8)
- 插卡4G工业路由器充电桩智能柜专网视频监控4G转以太网转WiFi有线网速测试 软硬件定制
- 02.Go语言开发环境配置
- [flask] official tutorial -part2: Blueprint - view, template, static file
- RDD conversion operator of spark
猜你喜欢

It's wrong to install PHP zbarcode extension. I don't know if any God can help me solve it. 7.3 for PHP environment

Leetcode3. Implement strstr()

Redis-列表

02.Go语言开发环境配置

Force buckle 9 palindromes
![[solution] every time idea starts, it will build project](/img/fc/e68f3e459768abb559f787314c2124.jpg)
[solution] every time idea starts, it will build project

RDD conversion operator of spark

0211 embedded C language learning

2022年PMP项目管理考试敏捷知识点(8)

Maya hollowed out modeling
随机推荐
Executing two identical SQL statements in the same sqlsession will result in different total numbers
[flask] official tutorial -part2: Blueprint - view, template, static file
Publish your own toolkit notes using NPM
Blue Bridge Cup embedded_ STM32 learning_ Key_ Explain in detail
Paddle框架:PaddleNLP概述【飞桨自然语言处理开发库】
Xshell 7 Student Edition
Flowable source code comments (36) process instance migration status job processor, BPMN history cleanup job processor, external worker task completion job processor
Basic operations of database and table ----- set the fields of the table to be automatically added
Redis key operation
[depth first search notes] Abstract DFS
Reasonable and sensible
同一个 SqlSession 中执行两条一模一样的SQL语句查询得到的 total 数量不一样
LeetCode 322. Change exchange (dynamic planning)
genius-storage使用文档,一个浏览器缓存工具
UE4 unreal engine, editor basic application, usage skills (IV)
GBase 8c数据库升级报错
leetcode-2. Palindrome judgment
[flask] official tutorial -part3: blog blueprint, project installability
Redis list
剑指 Offer 12. 矩阵中的路径