当前位置:网站首页>Sorting and sharing of selected papers, systems and applications related to the most comprehensive mixed expert (MOE) model in history
Sorting and sharing of selected papers, systems and applications related to the most comprehensive mixed expert (MOE) model in history
2022-07-04 21:44:00 【lqfarmer】

sparsity (Sparsity), It means that the model has a very large capacity , But only the model is used for a given task 、 Some parts of the sample or mark are activated . such , It can significantly increase the capacity and capacity of the model , Without proportionally increasing the amount of calculation .
2017 year , Google introduced a sparse gated expert hybrid layer (Sparsely-Gated Mixture-of-Experts Layer,MoE), This layer shows better results in various transformation benchmarks , The calculations used at the same time are more intensive than the most advanced before LSTM There are few models 10 times .
This resource collates the mixed experts in recent years (MoE) Related papers , And classified in detail . Mark this knowledge base , Then you can keep up with the latest developments in this booming research field .
Resources are organized from the Internet , See the source address for downloading and obtaining :https://github.com/codecaution/Awesome-Mixture-of-Experts-Papers#awesome-mixture-of-experts-papers
Catalog

Content screenshot
Recommended contents of previous boutiques
A detailed explanation baseline The paper Reproduce actual combat (NLP)
Write some suggestions to current and future doctoral students to sort out and share
2021 Sorting and sharing of the most complete selected resources for in-depth intensive learning in
Federal learning - Machine learning architecture based on distributed privacy data
边栏推荐
猜你喜欢
Analysis of maker education technology in the Internet Era
Maya lamp modeling
IIC (STM32)
改善机器视觉系统的方法
Can be displayed in CAD but not displayed in print
Huawei ENSP simulator enables devices of multiple routers to access each other
解析互联网时代的创客教育技术
[public class preview]: basis and practice of video quality evaluation
Lambdaquerywrapper usage
TCP shakes hands three times and waves four times. Do you really understand?
随机推荐
2021 CCPC 哈尔滨 B. Magical Subsequence(思维题)
Can be displayed in CAD but not displayed in print
面试官:说说XSS攻击是什么?
超详细教程,一文入门Istio架构原理及实战应用
Jerry's ad series MIDI function description [chapter]
【C语言】符号的深度理解
类方法和类变量的使用
Analysis of maker education technology in the Internet Era
【C語言】符號的深度理解
Routing configuration and connectivity test of Huawei simulator ENSP
A quick start to fastdfs takes you three minutes to upload and download files to the ECS
Jerry's ad series MIDI function description [chapter]
解读创客教育中的各类智能化组织发展
OMS系统实战的三两事
How was MP3 born?
gtest从一无所知到熟练运用(1)gtest安装
Numpy vstack and column_ stack
[early knowledge of activities] list of recent activities of livevideostack
杰理之AD 系列 MIDI 功能说明【篇】
minidom 模塊寫入和解析 XML