当前位置:网站首页>爱可可AI前沿推介(6.22)
爱可可AI前沿推介(6.22)
2022-06-22 10:40:00 【智源社区】
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人
转自爱可可爱生活
摘要:基于Bootstrapped Transformer的离线强化学习、适用各种采样率的通用神经音频升频模型、高斯蓝噪声、基于高斯扩散的有损压缩、面向自然语言理解系统的十亿级参数编码器预训练与蒸馏、面向监督语言建模的生成器-分类器、抽象关系任务的神经架构归纳偏差研究、面向GAN反转和编辑的空间自适应多层选择、前缀语言模型是统一模态学习器
1、[LG] Bootstrapped Transformer for Offline Reinforcement Learning
K Wang, H Zhao, X Luo, K Ren, W Zhang, D Li
[Shanghai Jiao Tong University & Microsoft Research Asia]
基于Bootstrapped Transformer的离线强化学习。离线强化学习(RL)旨在从之前收集的静态轨迹数据中学习策略,而无需与真实环境互动。最近的工作提供了一个新的视角,即把离线强化学习看作是一个通用的序列生成问题,采用序列模型,如Transformer架构来模拟轨迹上的分布,重新利用beam search作为一种规划算法。然而,在一般的离线强化学习任务中使用的训练数据集相当有限,且往往存在分布覆盖率不足的问题,可能对训练序列生成模型有不良影响,但在之前的工作中没有引起足够的重视。本文提出一种名为Bootstrapped Transformer的新算法,结合了bootstrapping的思想,利用学到的模型自生成更多的离线数据,以进一步促进序列模型的训练。在两个离线强化学习基准上进行了广泛的实验,证明该模型可以在很大程度上弥补现有离线强化学习训练的局限性,并击败其他强大的基线方法。本文还分析了生成的伪数据,所揭示的特征可能会对离线强化学习训练带来一些启示。
Offline reinforcement learning (RL) aims at learning policies from previously collected static trajectory data without interacting with the real environment. Recent works provide a novel perspective by viewing offline RL as a generic sequence generation problem, adopting sequence models such as Transformer architecture to model distributions over trajectories, and repurposing beam search as a planning algorithm. However, the training datasets utilized in general offline RL tasks are quite limited and often suffer from insufficient distribution coverage, which could be harmful to training sequence generation models yet has not drawn enough attention in the previous works. In this paper, we propose a novel algorithm named Bootstrapped Transformer, which incorporates the idea of bootstrapping and leverages the learned model to self-generate more offline data to further boost the sequence model training. We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods. We also analyze the generated pseudo data and the revealed characteristics may shed some light on offline RL training. The codes are available at https://seqml.github.io/ bootorl.
https://arxiv.org/abs/2206.08569
2、[AS] NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates
S Han, J Lee
[MINDsLab Inc]
NU-Wave 2:适用各种采样率的通用神经音频升频模型。传统上,音频超分辨率模型固定了初始采样率和目标采样率,这使得模型必须为每一对采样率进行训练。本文提出NU-Wave 2,一种用于神经音频上采样的扩散模型,能用单个模型从不同采样率的输入中生成48kHz的音频信号。基于NU-Wave的架构,NU-Wave 2用短时傅里叶卷积(STFC)生成谐波来解决NU-Wave的主要失败模式,结合带宽频谱特征变换(BSFT)来调节频域中输入的带宽。通过实验证明,无论输入的采样率如何,NU-Wave 2都能产生高分辨率的音频,同时需要的参数比其他模型少。
Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates. We introduce NU-Wave 2, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model. Based on the architecture of NU-Wave, NU-Wave 2 uses short-time Fourier convolution (STFC) to generate harmonics to resolve the main failure modes of NU-Wave, and incorporates bandwidth spectral feature transform (BSFT) to condition the bandwidths of inputs in the frequency domain. We experimentally demonstrate that NU-Wave 2 produces high-resolution audio regardless of the sampling rate of input while requiring fewer parameters than other models. The official code and the audio samples are available at https://mindslab-ai.github.io/nuwave2.
https://arxiv.org/abs/2206.08545
3、[LG] Gaussian Blue Noise.
A G. M. Ahmed, J Ren, P Wonka
[KAUST]
高斯蓝噪声。在产生具有蓝色噪声频谱的点分布的各种方法中,本文主张使用高斯核的优化框架。通过明智地选择优化参数,这种方法达到了前所未有的质量,可以证明超过了目前由最优传输(BNOT)方法达到的技术水平。此外,所提出算法在保持相同质量的情况下,可以顺利地扩展到高维度,实现了前所未有的高质量高维蓝噪声集。本文还展示了对自适应采样的扩展。
Among the various approaches for producing point distributions with blue noise spectrum, we argue for an optimization framework using Gaussian kernels. We show that with a wise selection of optimization parameters, this approach attains unprecedented quality, provably surpassing the current state of the art attained by the optimal transport (BNOT) approach. Further, we show that our algorithm scales smoothly and feasibly to high dimensions while maintaining the same quality, realizing unprecedented high-quality high-dimensional blue noise sets. Finally, we show an extension to adaptive sampling.
https://arxiv.org/abs/2206.07798
4、[LG] Lossy Compression with Gaussian Diffusion
L Theis, T Salimans, M D. Hoffman, F Mentzer
[Google Research]
基于高斯扩散的有损压缩。本文提出一种称为DiffC的新的有损压缩方法,基于无条件扩散生成模型。与依靠变换编码和量化来限制传输信息的现代压缩方案不同,DiffC依赖于被高斯噪声破坏的像素的有效通信。本文实施了一个概念验证,发现尽管缺乏编码器的变换,它的效果却出奇的好,在ImageNet 64x64上超过了最先进的生成式压缩方法HiFiC。DiffC只使用一个单一的模型,以任意比特率对损坏像素进行编码和去噪。该方法进一步提供了对渐进式编码的支持,即从部分比特流进行解码。本文进行了速率失真分析,以深入了解其性能,提供了对多变量高斯数据的分析结果,以及对一般分布的初步结果。实验表明,在高比特率下,基于流的重建比原始采样实现了3dB的增益。
We describe a novel lossy compression approach called DiffC which is based on unconditional diffusion generative models. Unlike modern compression schemes which rely on transform coding and quantization to restrict the transmitted information, DiffC relies on the efficient communication of pixels corrupted by Gaussian noise. We implement a proof of concept and find that it works surprisingly well despite the lack of an encoder transform, outperforming the state-of-the-art generative compression method HiFiC on ImageNet 64x64. DiffC only uses a single model to encode and denoise corrupted pixels at arbitrary bitrates. The approach further provides support for progressive coding, that is, decoding from partial bit streams. We perform a rate-distortion analysis to gain a deeper understanding of its performance, providing analytical results for multivariate Gaussian data as well as initial results for general distributions. Furthermore, we show that a flow-based reconstruction achieves a 3 dB gain over ancestral sampling at high bitrates.
https://arxiv.org/abs/2206.08889
5、[CL] Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems
J FitzGerald, S Ananthakrishnan, K Arkoudas, D Bernardi, A Bhagia...
[Amazon]
Alexa教师模型:面向自然语言理解系统的十亿级参数编码器预训练与蒸馏。本文提供了一个大规模实验结果,对非嵌入参数数在7亿到93亿之间的编码器进行预训练,随后将其蒸馏成17M-170M参数的小模型,并将其应用于虚拟助理系统的自然语言理解(NLU)部分。尽管使用70%的口语数据进行训练,但当对书面形式的跨语言自然语言推理(XNLI)语料库进行评估时,所述教师模型表现与XLM-R和mT5相当。用系统的域内数据对教师模型进行了第二阶段的预训练,将意图分类的错误率降低了3.86%,将槽的填充率提高了7.01%。即使是一个从第二阶段教师模型中蒸馏得到的170M参数的模型,与只在公共数据上训练的2.3B参数的教师模型(第一阶段)相比,也有2.88%的意图分类和7.69%的槽位填充错误率的改善,强调了域内数据对预训练的重要性。当使用标注的NLU数据进行离线评估时,所的、述17M参数的第二阶段蒸馏模型比XLM-R Base(85M参数)和DistillBERT(42M参数)分别高出4.23%到6.14%。最后,本文展示了一个完整的虚拟助理实验平台的结果,用所述预训练和蒸馏管道训练的模型在全系统用户不满意度自动测量上比从85M参数的教师提炼的模型要好3.74%-4.91%。
We present results from a large-scale experiment on pretraining encoders with non-embedding parameter counts ranging from 700M to 9.3B, their subsequent distillation into smaller models ranging from 17M-170M parameters, and their application to the Natural Language Understanding (NLU) component of a virtual assistant system. Though we train using 70% spoken-form data, our teacher models perform comparably to XLM-R and mT5 when evaluated on the written-form Cross-lingual Natural Language Inference (XNLI) corpus. We perform a second stage of pretraining on our teacher models using in-domain data from our system, improving error rates by 3.86% relative for intent classification and 7.01% relative for slot filling. We find that even a 170M-parameter model distilled from our Stage 2 teacher model has 2.88% better intent classification and 7.69% better slot filling error rates when compared to the 2.3B-parameter teacher trained only on public data (Stage 1), emphasizing the importance of in-domain data for pretraining. When evaluated offline using labeled NLU data, our 17M-parameter Stage 2 distilled model outperforms both XLM-R Base (85M params) and DistillBERT (42M params) by 4.23% to 6.14%, respectively. Finally, we present results from a full virtual assistant experimentation platform, where we find that models trained using our pretraining and distillation pipeline outperform models distilled from 85M-parameter teachers by 3.74%-4.91% on an automatic measurement of full-system user dissatisfaction.
https://arxiv.org/abs/2206.07808
另外几篇值得关注的论文:
[CL] DIRECTOR: Generator-Classifiers For Supervised Language Modeling
DIRECTOR:面向监督语言建模的生成器-分类器
K Arora, K Shuster, S Sukhbaatar, J Weston
[Meta AI]
https://arxiv.org/abs/2206.07694
[LG] On Neural Architecture Inductive Biases for Relational Tasks
抽象关系任务的神经架构归纳偏差研究
G Kerg, S Mittal, D Rolnick, Y Bengio, B Richards, G Lajoie
[Mila, Quebec AI Institute]
https://arxiv.org/abs/2206.05056
[CV] Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing
面向GAN反转和编辑的空间自适应多层选择
G Parmar, Y Li, J Lu, R Zhang, J Zhu, K K Singh
[CMU & Adobe Research]
https://arxiv.org/abs/2206.0835
[CV] Prefix Language Models are Unified Modal Learners
前缀语言模型是统一模态学习器
S Diao, W Zhou, X Zhang, J Wang
[The Hong Kong University of Science and Technology & ByteDance AI Lab & Shanghai Jiao Tong University]
https://arxiv.org/abs/2206.07699
边栏推荐
- xlrd.biffh.XLRDError: Excel xlsx file; not supported 解决办法
- MySQL使用SQL语句修改字段长度、字段名称
- LeetCode Algorithm 剑指 Offer 18. 删除链表的节点
- 社区文章|MOSN 构建 Subset 优化思路分享
- Using computed columns in laravel
- Good news - agile technology was selected into the 2022 China top 100 Digital Security Report
- MySQL使用SQL语句新增字段、删除字段
- Signal integrity (SI) power integrity (PI) learning notes (XXIV) differential pair and differential impedance (IV)
- QQ email for opencv face recognition
- 世界上第一个“半机械人”去世,改造自己只为“逆天改命”
猜你喜欢

Laravel development article URL generator

Denso China adopts Oracle HCM cloud technology solution to accelerate the digital transformation of human resources

Signal integrity (SI) power integrity (PI) learning notes (XXIV) differential pair and differential impedance (IV)

ONES 出席首届「精益软件工程大会」,分享效能改进实践

字节二面:Redis主节点的Key已过期,但从节点依然读到过期数据是为什么?怎么解决?

php 数据库 mysql提问

Spark精简面试

论文精读:Generative Adversarial Imitation Learning(生成对抗模仿学习)

Catch up with this big guy

YOLOv3目标检测
随机推荐
一条TCP连接时占用内存空间多少?
Analysis of thinkphp5.0.24 deserialization vulnerability
Investment transaction management
thinkphp3.2.3日志包含分析
In 2022, IPv6 deployment and application will be further promoted. How can we comprehensively realize security upgrading and transformation?
Batch create / delete files, folders, modify file name suffixes
Software project management 8.3 Agile project quality activities
Pareto's law at work: focus on results, not outputs
LeetCode Algorithm 剑指 Offer 24. 反转链表
YOLOv3目标检测
Spark streamlined interview
AttributeError: module ‘skimage. draw‘ has no attribute ‘circle‘
MySQL使用SQL语句新增字段、删除字段
Start from the principle of MVC and knock on an MVC framework to bring you the pleasure of being a great God
Reopen the terminal after NVM use or display the version before modification
Spark精简面试
普乐蛙5d飞行影院5d动感影院体验馆设备7d多人互动影院
批量创建/删除文件、文件夹、修改文件名 后缀名
Leetcode algorithm The penultimate node in the linked list
laravel 开发 文章URL 生成器