当前位置:网站首页>论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
论文笔记: 极限多标签学习 GalaXC (暂存, 还没学完)
2022-07-06 02:00:00 【闵帆】
摘要: 分享对论文的理解. 原文见 D. Saini, A. K. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang and M. Varma, GalaXC: Graph neural networks with labelwise attention for extreme classification, in WWW 2021. 7 位作者中 6 位是微软研究院的人, 跟他们杠, 我觉得自己简直脑袋秀逗了.
1. 论文贡献
- 处理标签存在于文档内的情况: labels and documents cohabit the same space.
- 利用标签文本与标签相关性: label text and label correlations, label metadata.
- 标签级注意力机制: label-wise attention mechanism.
- 热启动 (部分标签已知) 时效果好: warm-start scenarios where predictions need to be made on data points with partially revealed label sets,
- 能处理几百万个标签.
- 又快又好.
2. 动机
- 已有工作表明,与使用与应用程序无关的特征(例如传统的词袋特征)相比,学习密集的特定于应用程序的文档表示可以带来更好的预测。These works have demonstrated that learning dense application-specific document representations can lead to better predictions than using application-agnostic features such as the traditional bag-of-words features.
- 5-10 个标记的短文本. 如使用标题进行相关网页或广告的预测. Short textual descriptions with typically only 5-10 tokens. Examples include applications such as predicting related webpages or related products using only the title of a given webpage/product and predicting relevant ads/keywords/searches for
user queries. - 使用多种元数据如标签文本、标签相关性、标签层次结构, 更好地服务于尾部标签. XC applications often make available label metadata in various forms such as label text, label correlations or label hierarchies.
- 标签特征. Contemporary XC algorithms have explored utilizing label features.
- 热启动与辅助数据源. Warm-start and auxiliary sources of data.
- 已有工作多数使用文档图而不是文档-标签图 (见 Table 1). existing works mostly use document-document graphs and not joint document-label graphs at extreme scales.
2. 基本符号
| 符号 | 含义 | 备注 |
|---|---|---|
| G \mathbb{G} G | 二部图 | G = ( D ∪ L , E ) \mathbb{G} = (\mathbb{D} \cup \mathbb{L}, \mathbb{E}) G=(D∪L,E) |
| D \mathbb{D} D | 文本节点集合 | 元素记作 d d d, 基数为 N N N |
| L \mathbb{L} L | 标签节点集合 | 元素记作 l l l, 基数为 L L L |
| y i \mathbf{y}_i yi | 第 i i i 个文本的真实标签向量 | 取值范围为 { − 1 , + 1 } L \{-1, +1\}^L { −1,+1}L |
| x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 | 第 i i i 个文档的特征向量 | D D D 维 |
| z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 | 第 l l l 个标签的特征向量 | D D D 维 |
| v ^ n 0 \hat{\mathbf{v}}_n^0 v^n0 | x ^ i 0 \hat{\mathbf{x}}_i^0 x^i0 与 z ^ l 0 \hat{\mathbf{z}}_l^0 z^l0 的统一表示 | D D D 维 |
| N \mathcal{N} N | 求邻居操作 | V → 2 V \mathbb{V} \to 2^\mathbb{V} V→2V |
| C \mathcal{C} C | 卷积操作 | |
| T \mathcal{T} T | 转型操作 | transformation |
| a ^ n k \hat{\mathbf{a}}_n^k a^nk | C k ( { v ^ m k − 1 , a ^ m k − 1 : m ∈ N ( n ) } ) \mathcal{C}_k(\{\hat{\mathbf{v}}_m^{k-1}, \hat{\mathbf{a}}_m^{k-1}: m \in \mathcal{N}(n)\}) Ck({ v^mk−1,a^mk−1:m∈N(n)}) | GNN 操作 |
| v ^ n k \hat{\mathbf{v}}_n^k v^nk | T k ( { v ^ n k − 1 , a ^ n k − 1 } ) \mathcal{T}_k(\{\hat{\mathbf{v}}_n^{k-1}, \hat{\mathbf{a}}_n^{k-1}\}) Tk({ v^nk−1,a^nk−1}) | GNN 操作 |
| W \mathbf{W} W | 系数矩阵 | D × L D \times L D×L 维 |
| K K K | hop 数 | |
| e l k e_{lk} elk | 标签 l l l 在第 k k k 个 hop 的标量 |
3. 方案
Graph convolution block 的具体操作是
a ^ n k = C k ( a ^ n k − 1 ) = ( 1 + ϵ k ) ⋅ a ^ n k − 1 + ∑ m ∈ N ( n ) a ^ m k − 1 \hat{\mathbf{a}}_n^k = \mathcal{C}_k(\hat{\mathbf{a}}_n^{k-1}) = (1 + \epsilon_k) \cdot \hat{\mathbf{a}}_n^{k-1} + \sum_{m \in \mathcal{N}(n)}\hat{\mathbf{a}}_m^{k-1} a^nk=Ck(a^nk−1)=(1+ϵk)⋅a^nk−1+m∈N(n)∑a^mk−1
Embedding 的具体操作是
v ^ n k = T k ( a ^ n k ) \hat{\mathbf{v}}_n^k = \mathcal{T}_k(\hat{\mathbf{a}}_n^k) v^nk=Tk(a^nk)
令
α l k = exp ( e l k ) / ∑ k ′ ∈ [ K ] exp e l k ′ \alpha_{lk} = \exp(e_{lk}) / \sum_{k' \in [K]} \exp e_{lk'} αlk=exp(elk)/k′∈[K]∑expelk′
它表示第 k k k 个 hop 时的占比.
标签嵌入计算式为
x ^ ( l ) = ∑ k ∈ [ k ] α l k ⋅ x ^ k \hat{\mathbf{x}}^{(l)} = \sum_{k \in [k]} \alpha_{lk} \cdot \hat{\mathbf{x}}^{k} x^(l)=k∈[k]∑αlk⋅x^k
注意: 这里的 k k k 次方还未理解.
标签得分为
s l = * w l , x ^ ( l ) * s_l = \langle \mathbf{w}_l, \hat{\mathbf{x}}^{(l)} \rangle sl=*wl,x^(l)*
4. 小结
在读懂程序之前, 根本无法理解这篇论文.
边栏推荐
- [solution] every time idea starts, it will build project
- Pangolin Library: subgraph
- Flutter Doctor:Xcode 安装不完整
- A basic lintcode MySQL database problem
- Tensorflow customize the whole training process
- 一题多解,ASP.NET Core应用启动初始化的N种方案[上篇]
- Apicloud openframe realizes the transfer and return of parameters to the previous page - basic improvement
- Virtual machine network, networking settings, interconnection with host computer, network configuration
- Redis list
- RDD conversion operator of spark
猜你喜欢

Blue Bridge Cup embedded_ STM32 learning_ Key_ Explain in detail

leetcode-两数之和

2 power view

Basic operations of databases and tables ----- unique constraints

Computer graduation design PHP campus restaurant online ordering system

Using SA token to solve websocket handshake authentication

Alibaba canal usage details (pit draining version)_ MySQL and ES data synchronization

Exness: Mercedes Benz's profits exceed expectations, and it is predicted that there will be a supply chain shortage in 2022

TrueType字体文件提取关键信息
![NLP fourth paradigm: overview of prompt [pre train, prompt, predict] [Liu Pengfei]](/img/11/a01348dbfcae2042ec9f3e40065f3a.png)
NLP fourth paradigm: overview of prompt [pre train, prompt, predict] [Liu Pengfei]
随机推荐
安装Redis
Ali test Open face test
NiO related knowledge (II)
【网络攻防实训习题】
Folio. Ink is a free, fast and easy-to-use image sharing tool
【clickhouse】ClickHouse Practice in EOI
【Flask】获取请求信息、重定向、错误处理
NLP第四范式:Prompt概述【Pre-train,Prompt(提示),Predict】【刘鹏飞】
PHP campus financial management system for computer graduation design
Know MySQL database
Have a look at this generation
Initialize MySQL database when docker container starts
How to upgrade kubernetes in place
leetcode-两数之和
Card 4G industrial router charging pile intelligent cabinet private network video monitoring 4G to Ethernet to WiFi wired network speed test software and hardware customization
NLP fourth paradigm: overview of prompt [pre train, prompt, predict] [Liu Pengfei]
Leetcode3. Implement strstr()
Accelerating spark data access with alluxio in kubernetes
Paddle framework: paddlenlp overview [propeller natural language processing development library]
[Jiudu OJ 09] two points to find student information