当前位置:网站首页>AI scores 81 in high scores. Netizens: AI model can't avoid "internal examination"!
AI scores 81 in high scores. Netizens: AI model can't avoid "internal examination"!
2022-07-03 13:21:00 【CSDN information】

Arrangement | Hemu wood
Produce | AI Technology base (ID:rgznai100)
High numbers are a nightmare for many science students ? Xiaobian was a person with poor high numbers at that time 
Then let AI How difficult is it to do a math problem ? Not to mention high numbers ?
not long ago , See such a hot search :

Is it more difficult to accept ?!!!
these years , Scientists have been trying to make AI Robot Challenge math exam , But I failed for years , Even as low as 20 Multipoint . therefore , Scientists generally believe that AI cannot challenge Advanced Mathematics . But recently , Scientists at MIT are based on OpenAI Codex The pre training model passes on high numbers few-shot learning The correct rate of 81%! Relevant research has been ArXiv Included . Courses range from elementary calculus to differential equations 、 probability theory 、 Linear algebra has , The form of the question is in addition to calculation 、 There are even drawings .

Language model Minerva
Researchers found that , Give Way AI There are many ways to solve mathematical problems .
First , Use the latest GPT-3 The language model can only achieve 18.8% The accuracy of ; Secondly, researchers try to use small sample learning and the latest thinking chain tips , The accuracy has risen to 30.8%; Last , Researchers fine tune the code , Use Codex A small amount of learning , Give Way AI Challenge MIT in six math courses 210 A question , The accuracy has been improved to 81.1% .
The solution of the research team is to do pre training on the text first , Fine tune with code , Transform mathematical problems into equivalent problems , By making AI Automatically generate supplementary context , After automatically generating the text suitable for the operation of the model , Then generate the corresponding code and run , Finally solve the mathematical problem . The next step of the research team is to expand this technology to more courses , And consider the practical application in teaching .
In this paper submitted , We learned that they have launched a language model Minerva, The model can solve mathematical and scientific problems , Let the model step by step . By collecting training data related to quantitative reasoning problems 、 Large scale training model , And use advanced reasoning technology , This research has achieved significant performance improvement in various difficult quantitative reasoning tasks .
Minerva Solve problems by generating solutions , Including numerical calculation 、 Symbol operation , Instead of relying on external tools such as calculators .Minerva Natural language and mathematical symbols can be combined to analyze and answer mathematical problems .
Besides ,Minerva It also combines a variety of technologies , Including small sample tips 、 Thinking chain 、 Register prompt and majority voting principle , Thus in STEM Reasoning task SOTA performance .
Minerva It can not only solve algebraic problems , It can also solve physics 、 number theory 、 The geometric 、 biological 、 chemical 、 Astronomy and many other problems .

Here is Minerva Solve geometric problems :

Application questions , You can list equations :

You can even deduce and prove .
In order to test Minerva Quantitative reasoning ability , Researchers are in different STEM It was evaluated on the benchmark , Covering courses ranging from elementary school level problems to graduate level . The researchers are still OCWCourses On the assessment Minerva, Covering from MIT OpenCourseWare Solid state chemistry collected in 、 Astronomy 、 Differential equations and special relativity STEM The theme .
It turns out that , After evaluation of all data sets ,5400 Billion parameter Minerva Achieve SOTA, Sometimes even a substantial increase .
however ,Minerva And made a lot of mistakes .
To better identify areas where the model can be improved , The researchers analyzed a sample of problems where the model went wrong , Most of the errors found are easy to explain . It turns out that , About half of them are calculation errors , The other half is reasoning error , The reason is that the solution steps do not follow the logical thinking chain .
meanwhile ,Minerva It is also possible to get the correct final answer , But the reasoning is still wrong . Analysis results show that , This probability is relatively low ,Minerva 62B stay MATH The average on the dataset is lower than 8%.

Conclusion
AI Not only in the technology circle has a good development , They also show their strength in different fields , There is a concession before AI stay 40 Second write 40 College entrance examination composition , use AI Repair many precious photos 、 The picture .
Not only students are looking forward to one day using AI do the homework , And teachers also expect to use AI Write a paper .
Some netizens also said , Want to challenge him .
What do you think ?
Reference link :
https://s.weibo.com/weibo/%2523AI%25E8%2580%2583%25E9%25AB%2598%25E6%2595%25B0%25E4%25BB%2585%25E5%25BE%259781%25E5%2588%2586%2523?topnav=1&wvr=6&Refer=top_hot&sudaref=weibo.com
https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html
— Recommended reading —
*7-Zip Boycotted ? The caller decided “ Three sins ”: Pseudo open source 、 unsafe 、 The author is from Russia !
*“ give up GitHub , The time has come. ”, Software freedom protection association angrily criticized !
* Microsoft banned , Russia is against piracy Windows Demand for new products “ Skyrocketing ”!
边栏推荐
- 剑指 Offer 16. 数值的整数次方
- R语言gt包和gtExtras包优雅地、漂亮地显示表格数据:nflreadr包以及gtExtras包的gt_plt_winloss函数可视化多个分组的输赢值以及内联图(inline plot)
- Logseq 评测:优点、缺点、评价、学习教程
- Server coding bug
- The difference between stratifiedkfold (classification) and kfold (regression)
- MyCms 自媒体商城 v3.4.1 发布,使用手册更新
- Logseq evaluation: advantages, disadvantages, evaluation, learning tutorial
- Multi table query of MySQL - multi table relationship and related exercises
- 2022-01-27 redis cluster brain crack problem analysis
- SVN添加文件时的错误处理:…\conf\svnserve.conf:12: Option expected
猜你喜欢

This math book, which has been written by senior ml researchers for 7 years, is available in free electronic version

显卡缺货终于到头了:4000多块可得3070Ti,比原价便宜2000块拿下3090Ti

MySQL installation, uninstallation, initial password setting and general commands of Linux

Flink SQL knows why (17): Zeppelin, a sharp tool for developing Flink SQL

剑指 Offer 14- II. 剪绳子 II

JSP and filter

Flink SQL knows why (7): haven't you even seen the ETL and group AGG scenarios that are most suitable for Flink SQL?

【历史上的今天】7 月 3 日:人体工程学标准法案;消费电子领域先驱诞生;育碧发布 Uplay

Logseq evaluation: advantages, disadvantages, evaluation, learning tutorial

stm32和电机开发(从mcu到架构设计)
随机推荐
有限状态机FSM
Today's sleep quality record 77 points
Flick SQL knows why (10): everyone uses accumulate window to calculate cumulative indicators
Setting up Oracle datagurd environment
Seven habits of highly effective people
71 articles on Flink practice and principle analysis (necessary for interview)
8皇后问题
已解决TypeError: Argument ‘parser‘ has incorrect type (expected lxml.etree._BaseParser, got type)
Sword finger offer 12 Path in matrix
显卡缺货终于到头了:4000多块可得3070Ti,比原价便宜2000块拿下3090Ti
untiy世界边缘的物体阴影闪动,靠近远点的物体阴影正常
MySQL constraints
My creation anniversary: the fifth anniversary
Sword finger offer 17 Print from 1 to the maximum n digits
Ubuntu 14.04 下开启PHP错误提示
SwiftUI 开发经验之作为一名程序员需要掌握的五个最有力的原则
mysql更新时条件为一查询
Sword finger offer 16 Integer power of numeric value
物联网毕设 --(STM32f407连接云平台检测数据)
父亲和篮球