当前位置:网站首页>Sorting and sharing of selected papers, systems and applications related to the most comprehensive mixed expert (MOE) model in history
Sorting and sharing of selected papers, systems and applications related to the most comprehensive mixed expert (MOE) model in history
2022-07-04 21:44:00 【lqfarmer】

sparsity (Sparsity), It means that the model has a very large capacity , But only the model is used for a given task 、 Some parts of the sample or mark are activated . such , It can significantly increase the capacity and capacity of the model , Without proportionally increasing the amount of calculation .
2017 year , Google introduced a sparse gated expert hybrid layer (Sparsely-Gated Mixture-of-Experts Layer,MoE), This layer shows better results in various transformation benchmarks , The calculations used at the same time are more intensive than the most advanced before LSTM There are few models 10 times .
This resource collates the mixed experts in recent years (MoE) Related papers , And classified in detail . Mark this knowledge base , Then you can keep up with the latest developments in this booming research field .
Resources are organized from the Internet , See the source address for downloading and obtaining :https://github.com/codecaution/Awesome-Mixture-of-Experts-Papers#awesome-mixture-of-experts-papers
Catalog

Content screenshot

Recommended contents of previous boutiques
A detailed explanation baseline The paper Reproduce actual combat (NLP)
Write some suggestions to current and future doctoral students to sort out and share
2021 Sorting and sharing of the most complete selected resources for in-depth intensive learning in
Federal learning - Machine learning architecture based on distributed privacy data
边栏推荐
- How much is the minimum stock account opening commission? Is it safe to open an account online
- 旋变串判断
- CloudCompare&Open3D DBSCAN聚类(非插件式)
- 【活动早知道】LiveVideoStack近期活动一览
- 杰理之AD 系列 MIDI 功能说明【篇】
- redis03——Redis的网络配置与心跳机制
- 【C語言】符號的深度理解
- [C language] deep understanding of symbols
- How to use concurrentlinkedqueue as a cache queue
- TCP shakes hands three times and waves four times. Do you really understand?
猜你喜欢

【C語言】符號的深度理解

How to implement Devops with automatic tools

每日一题-LeetCode1200-最小绝对差-数组-排序

Maidong Internet won the bid of Beijing life insurance

杰理之增加进关机前把触摸模块关闭流程【篇】

创客思维在高等教育中的启迪作用

ArcGIS 10.2.2 | solution to the failure of ArcGIS license server to start

MP3是如何诞生的?

如何借助自动化工具落地DevOps
![[wechat applet] collaborative work and release](/img/14/2658cf0ba6be9432c74b2490e53d05.png)
[wechat applet] collaborative work and release
随机推荐
WGCNA analysis basic tutorial summary
Compréhension approfondie du symbole [langue C]
QT—绘制其他问题
Liu Jincheng won the 2022 China e-commerce industry innovation Figure Award
torch. Tensor and torch The difference between tensor
Shutter textfield example
Huawei ENSP simulator enables devices of multiple routers to access each other
杰理之AD 系列 MIDI 功能说明【篇】
迈动互联中标北京人寿保险
Jerry's ad series MIDI function description [chapter]
redis RDB AOF
股票开户佣金最低多少,炒股开户佣金最低网上开户安全吗
redis RDB AOF
解读创客教育中的各类智能化组织发展
2021 CCPC Harbin B. magical subsequence (thinking question)
2021 CCPC 哈尔滨 I. Power and Zero(二进制 + 思维)
类方法和类变量的使用
[weekly translation go] how to code in go series articles are online!!
Billions of citizens' information has been leaked! Is there any "rescue" for data security on the public cloud?
LambdaQueryWrapper用法