当前位置:网站首页>442 authors, 100 pages! It took Google 2 years to release the new benchmark big bench | open source
442 authors, 100 pages! It took Google 2 years to release the new benchmark big bench | open source
2022-06-12 12:50:00 【QbitAl】
White cross From the Aofei temple
qubits | official account QbitAI
A piece of AI The paper ,442 An author .
There is also a chapter devoted to the author's contribution .
100 More than half of the pages are references ……
No , It's popular now This kind of paper Do you ?
see , Google's latest paper ——Beyond The Imitation Game: Quantifying And Extrapolating The Capabilities Of Language Models.
So the author column becomes like this ……

come from 132 A research scholar at an institution , It took two years to propose a new benchmark for large language model BIG-bench.
On this basis, the OpenAI Of GPT Model ,Google-internal dense transformer Architecture, etc , Model scale horizontal 6 An order of magnitude .
Final results showed , Although the model performance improves with the expansion of the scale , But it is far from the human performance .
For this work ,Jeff Dean Forward likes :Great Work.

New benchmark for big language model
What did Lai Kangkang say in this paper .
With the expansion of the scale , The performance and quality of the model have been improved , There may also be some transformative effects , But these performances have not been well described before .
Some existing benchmarks have certain limitations , The scope of assessment is narrow , Performance scores quickly reach saturation .
such as SuperGLUE, After the introduction of the benchmark 18 months , The model implements “ Beyond the human level ” Performance of .

Based on this background ,BIG-bench It was born .
Currently it is controlled by 204 A task consists of , The content covers linguistics 、 Child development 、 mathematics 、 Commonsense reasoning 、 biology 、 physics 、 Social prejudice 、 Problems in software development, etc .

There is also a panel of human experts , Also performed all tasks , To provide baseline levels .
For the convenience of more organizations , The researchers also gave BIG-bench Lite, A small but representative subset of tasks , Easy and faster assessment .

And open source implementation benchmarks API Code for , Support Task Evaluation on publicly available models , And the lightweight creation of new tasks .
The final assessment results can be seen , The scale spans six orders of magnitude ,BIG-bench The overall performance of the model increases with the scale of the model 、 The number of training samples increases .
But compared with human baseline level , Still perform poorly .

Specific tasks , The performance of the model will steadily improve with the increase of the scale . But sometimes , There will be sudden breakthrough performance on a specific scale .

Besides , It can also assess the social bias of the model .

Besides , They were also surprised to find that the model was OK get Some hidden skills . such as , How to move regularly in chess .

The author contributed 14 page
It is worth mentioning that , Maybe because there are too many authors , At the end of the paper, there is a chapter devoted to the author's contribution .
I wrote with great ease 14 page , This includes core contributors 、Review Of 、 To provide a task ……

The rest , also 50 Page references .
Okay , Interested friends can poke the following link to Kangkang's paper .
Thesis link :
https://arxiv.org/abs/2206.04615
GitHub link :
https://github.com/google/BIG-bench
Reference link :
https://twitter.com/jaschasd/status/1535055886913220608
边栏推荐
- 嵌入式系统硬件构成-嵌入式系统硬件体系结构
- 从基础到源码统统帮你搞定,技术详细介绍
- 机械臂改进的DH参数与标准DH参数理论知识
- Jacobian matrix IK of manipulator
- 时序数据库 - InfluxDB2 docker 安装
- 【数据库】navicat --oracle数据库创建
- Downloading and using SWI Prolog
- Openmax (OMX) framework
- Uniapp wechat applet long press the identification QR code to jump to applet and personal wechat
- 442个作者100页论文!谷歌耗时2年发布大模型新基准BIG-Bench | 开源
猜你喜欢

安全KNN

提升管道效率:你需要知道如何识别CI/CD管道中的主要障碍

一个ES设置操作引发的“血案”

数组——双指针技巧秒杀七道数组题目

Vant tab bar + pull-up loading + pull-down refresh demo van tabs + van pull refresh + van list demo

Advanced chapter of C language -- ten thousand words explanation pointer and qsort function

位图、布隆过滤器和哈希切分

Microsoft Word 教程,如何在 Word 中插入页眉或页脚?

Examples of Cartesian product and natural connection of relational algebra

嵌入式系统概述1-嵌入式系统定义、特点和发展历程
随机推荐
About paiwen
From simple to deep - websocket
A "murder case" caused by ES setting operation
一个ES设置操作引发的“血案”
Overview of embedded system 1- definition, characteristics and development history of embedded system
Newton method for solving roots of polynomials
Advanced C language -- storage of deep anatomical data in memory (with exercise)
Uniapp wechat applet long press the identification QR code to jump to applet and personal wechat
Object value taking method in JS And []
JS convert string to array object
[HXBCTF 2021]easywill
vtk 三视图
Detect whether the vector has an intersection
InfluxDB2.x 基准测试工具 - influxdb-comparisons
Vs2019 set ctrl+/ as shortcut key for annotation and uncomment
itkMultiResolutionImageRegistrationMethod
Micro task, macro task and event loop of JS
C语言进阶篇——深度解剖数据在内存中的存储(配练习)
号称下一代监控系统!来看看它有多牛逼
Overview of embedded system 3- development process, learning basis and methods of embedded system