当前位置:网站首页>论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
2022-07-06 02:00:00 【闵帆】
摘要: 分享对论文的理解. 原文见 D. Saini, A. K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang and M. Varma, GalaXC: Graph neural networks with labelwise attention for extreme classification, in WWW 2021. 7 位作者中 6 位是微软研究院的人, 跟他们杠, 我觉得自己简直脑袋秀逗了.
1. 论文贡献
- 处理标签存在于文档内的情况: labels and documents cohabit the same space.
- 利用标签文本与标签相关性: label text and label correlations, label metadata.
- 标签级注意力机制: label-wise attention mechanism.
- 热启动 (部分标签已知) 时效果好: warm-start scenarios where predictions need to be made on data points with partially revealed label sets,
- 能处理几百万个标签.
- 又快又好.
2. 动机
- 已有工作表明,与使用与应用程序无关的特征(例如传统的词袋特征)相比,学习密集的特定于应用程序的文档表示可以带来更好的预测。These works have demonstrated that learning dense application-specific document representations can lead to better predictions than using application-agnostic features such as the traditional bag-of-words features.
- 5-10 个标记的短文本. 如使用标题进行相关网页或广告的预测. Short textual descriptions with typically only 5-10 tokens. Examples include applications such as predicting related webpages or related products using only the title of a given webpage/product and predicting relevant ads/keywords/searches for
user queries. - 使用多种元数据如标签文本、标签相关性、标签层次结构, 更好地服务于尾部标签. XC applications often make available label metadata in various forms such as label text, label correlations or label hierarchies.
- 标签特征. Contemporary XC algorithms have explored utilizing label features.
- 热启动与辅助数据源. Warm-start and auxiliary sources of data.
- 已有工作多数使用文档图而不是文档-标签图 (见 Table 1). existing works mostly use document-document graphs and not joint document-label graphs at extreme scales.
2. 基本符号
| 符号 | 含义 | 备注 |
|---|---|---|
| G \mathbb{G} G | 二部图 | G = ( D ∪ L , E ) \mathbb{G} = (\mathbb{D} \cup \mathbb{L}, \mathbb{E}) G=(D∪L,E) |
| D \mathbb{D} D | 文本节点集合 | 元素记作 d d d, 基数为 N N N |
| L \mathbb{L} L | 标签节点集合 | 元素记作 l l l, 基数为 L L L |
| y i \mathbf{y}_i yi | 第 i i i 个文本的真实标签向量 | 取值范围为 { − 1 , + 1 } L \{-1, +1\}^L { −1,+1}L |
| x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 | 第 i i i 个文档的特征向量 | D D D 维 |
| z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 | 第 l l l 个标签的特征向量 | D D D 维 |
| v ^ n 0 \hat{\mathbf{v}}_n^0 v^n0 | x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 与 z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 的统一表示 | D D D 维 |
| N \mathcal{N} N | 求邻居操作 | V → 2 V \mathbb{V} \to 2^\mathbb{V} V→2V |
| C \mathcal{C} C | 卷积操作 | |
| T \mathcal{T} T | 转型操作 | transformation |
| a ^ n k \hat{\mathbf{a}}_n^k a^nk | C k ( { v ^ m k − 1 , a ^ m k − 1 : m ∈ N ( n ) } ) \mathcal{C}_k(\{\hat{\mathbf{v}}_m^{k-1}, \hat{\mathbf{a}}_m^{k-1}: m \in \mathcal{N}(n)\}) Ck({ v^mk−1,a^mk−1:m∈N(n)}) | GNN 操作 |
| v ^ n k \hat{\mathbf{v}}_n^k v^nk | T k ( { v ^ n k − 1 , a ^ n k − 1 } ) \mathcal{T}_k(\{\hat{\mathbf{v}}_n^{k-1}, \hat{\mathbf{a}}_n^{k-1}\}) Tk({ v^nk−1,a^nk−1}) | GNN 操作 |
| W \mathbf{W} W | 系数矩阵 | D × L D \times L D×L 维 |
| K K K | hop 数 | |
| e l k e_{lk} elk | 标签 l l l 在第 k k k 个 hop 的标量 |
3. 方案
Graph convolution block 的具体操作是
a ^ n k = C k ( a ^ n k − 1 ) = ( 1 + ϵ k ) ⋅ a ^ n k − 1 + ∑ m ∈ N ( n ) a ^ m k − 1 \hat{\mathbf{a}}_n^k = \mathcal{C}_k(\hat{\mathbf{a}}_n^{k-1}) = (1 + \epsilon_k) \cdot \hat{\mathbf{a}}_n^{k-1} + \sum_{m \in \mathcal{N}(n)}\hat{\mathbf{a}}_m^{k-1} a^nk=Ck(a^nk−1)=(1+ϵk)⋅a^nk−1+m∈N(n)∑a^mk−1
Embedding 的具体操作是
v ^ n k = T k ( a ^ n k ) \hat{\mathbf{v}}_n^k = \mathcal{T}_k(\hat{\mathbf{a}}_n^k) v^nk=Tk(a^nk)
令
α l k = exp ( e l k ) / ∑ k ′ ∈ [ K ] exp e l k ′ \alpha_{lk} = \exp(e_{lk}) / \sum_{k' \in [K]} \exp e_{lk'} αlk=exp(elk)/k′∈[K]∑expelk′
它表示第 k k k 个 hop 时的占比.
标签嵌入计算式为
x ^ ( l ) = ∑ k ∈ [ k ] α l k ⋅ x ^ k \hat{\mathbf{x}}^{(l)} = \sum_{k \in [k]} \alpha_{lk} \cdot \hat{\mathbf{x}}^{k} x^(l)=k∈[k]∑αlk⋅x^k
注意: 这里的 k k k 次方还未理解.
标签得分为
s l = * w l , x ^ ( l ) * s_l = \langle \mathbf{w}_l, \hat{\mathbf{x}}^{(l)} \rangle sl=*wl,x^(l)*
4. 小结
在读懂程序之前, 根本无法理解这篇论文.
边栏推荐
- Leetcode3. Implement strstr()
- Virtual machine network, networking settings, interconnection with host computer, network configuration
- 01. Go language introduction
- Cadre du Paddle: aperçu du paddlelnp [bibliothèque de développement pour le traitement du langage naturel des rames volantes]
- 2022年PMP项目管理考试敏捷知识点(8)
- 国家级非遗传承人高清旺《四大美人》皮影数字藏品惊艳亮相!
- Thinking about the best practice of dynamics 365 development collaboration
- [flask] static file and template rendering
- MySQL lethal serial question 1 -- are you familiar with MySQL transactions?
- 干货!通过软硬件协同设计加速稀疏神经网络
猜你喜欢

【Flask】官方教程(Tutorial)-part2:蓝图-视图、模板、静态文件

dried food! Accelerating sparse neural network through hardware and software co design

Blue Bridge Cup embedded_ STM32_ New project file_ Explain in detail

Computer graduation design PHP part-time recruitment management system for College Students

Using SA token to solve websocket handshake authentication

Redis-列表

Using SA token to solve websocket handshake authentication

How to improve the level of pinduoduo store? Dianyingtong came to tell you

SQL statement

Win10 add file extension
随机推荐
一题多解,ASP.NET Core应用启动初始化的N种方案[上篇]
Cookie concept, basic use, principle, details and Chinese transmission
Exness: Mercedes Benz's profits exceed expectations, and it is predicted that there will be a supply chain shortage in 2022
Accelerating spark data access with alluxio in kubernetes
Campus second-hand transaction based on wechat applet
RDD conversion operator of spark
Computer graduation design PHP part-time recruitment management system for College Students
Redis string type
Dynamics 365 开发协作最佳实践思考
Paddle框架:PaddleNLP概述【飞桨自然语言处理开发库】
论文笔记: 图神经网络 GAT
RDD creation method of spark
Initialize MySQL database when docker container starts
Redis daemon cannot stop the solution
[solution] add multiple directories in different parts of the same word document
Luo Gu P1170 Bugs Bunny and Hunter
01.Go语言介绍
Extracting key information from TrueType font files
Bidding promotion process
NumPy 数组索引 切片