当前位置:网站首页>Sorting and sharing of selected papers, systems and applications related to the most comprehensive mixed expert (MOE) model in history
Sorting and sharing of selected papers, systems and applications related to the most comprehensive mixed expert (MOE) model in history
2022-07-04 21:44:00 【lqfarmer】

sparsity (Sparsity), It means that the model has a very large capacity , But only the model is used for a given task 、 Some parts of the sample or mark are activated . such , It can significantly increase the capacity and capacity of the model , Without proportionally increasing the amount of calculation .
2017 year , Google introduced a sparse gated expert hybrid layer (Sparsely-Gated Mixture-of-Experts Layer,MoE), This layer shows better results in various transformation benchmarks , The calculations used at the same time are more intensive than the most advanced before LSTM There are few models 10 times .
This resource collates the mixed experts in recent years (MoE) Related papers , And classified in detail . Mark this knowledge base , Then you can keep up with the latest developments in this booming research field .
Resources are organized from the Internet , See the source address for downloading and obtaining :https://github.com/codecaution/Awesome-Mixture-of-Experts-Papers#awesome-mixture-of-experts-papers
Catalog

Content screenshot

Recommended contents of previous boutiques
A detailed explanation baseline The paper Reproduce actual combat (NLP)
Write some suggestions to current and future doctoral students to sort out and share
2021 Sorting and sharing of the most complete selected resources for in-depth intensive learning in
Federal learning - Machine learning architecture based on distributed privacy data
边栏推荐
- Minidom module writes and parses XML
- Difference between ApplicationContext and beanfactory (MS)
- Master the use of auto analyze in data warehouse
- maya灯建模
- [ 每周译Go ] 《How to Code in Go》系列文章上线了!!
- gtest从一无所知到熟练使用(2)什么是测试夹具/装置(test fixture)
- redis03——Redis的网络配置与心跳机制
- Interviewer: what is XSS attack?
- Shutter textfield example
- Configuration of DNS server of Huawei ENSP simulator
猜你喜欢

解读创客教育中的各类智能化组织发展

超详细教程,一文入门Istio架构原理及实战应用

Flutter TextField示例

How to use concurrentlinkedqueue as a cache queue

Case sharing | integrated construction of data operation and maintenance in the financial industry

如何使用ConcurrentLinkedQueue做一个缓存队列

Difference between ApplicationContext and beanfactory (MS)

Daily question-leetcode556-next larger element iii-string-double pointer-next_ permutation

LambdaQueryWrapper用法

【微信小程序】协同工作与发布
随机推荐
Stealing others' vulnerability reports and selling them into sidelines, and the vulnerability reward platform gives rise to "insiders"
Three or two things about the actual combat of OMS system
How to implement Devops with automatic tools
2022 version of stronger jsonpath compatibility and performance test (snack3, fastjson2, jayway.jsonpath)
Jerry's ad series MIDI function description [chapter]
Master the use of auto analyze in data warehouse
Caduceus从未停止创新,去中心化边缘渲染技术让元宇宙不再遥远
Kubeadm初始化报错:[ERROR CRI]: container runtime is not running
Redis03 - network configuration and heartbeat mechanism of redis
2021 CCPC Harbin B. magical subsequence (thinking question)
Le module minidom écrit et analyse XML
MP3是如何诞生的?
How to remove the black dot in front of the title in word document
Enlightenment of maker thinking in Higher Education
Delphi SOAP WebService 服务器端多个 SoapDataModule 实现相同的接口方法,接口继承
应用实践 | 蜀海供应链基于 Apache Doris 的数据中台建设
CAD中能显示打印不显示
IIC (STM32)
SolidWorks工程图添加材料明细表的操作
【LeetCode】17、电话号码的字母组合