当前位置:网站首页>AI scores 81 in high scores. Netizens: AI model can't avoid "internal examination"!
AI scores 81 in high scores. Netizens: AI model can't avoid "internal examination"!
2022-07-03 13:21:00 【CSDN information】
Arrangement | Hemu wood
Produce | AI Technology base (ID:rgznai100)
High numbers are a nightmare for many science students ? Xiaobian was a person with poor high numbers at that time
Then let AI How difficult is it to do a math problem ? Not to mention high numbers ?
not long ago , See such a hot search :
Is it more difficult to accept ?!!!
these years , Scientists have been trying to make AI Robot Challenge math exam , But I failed for years , Even as low as 20 Multipoint . therefore , Scientists generally believe that AI cannot challenge Advanced Mathematics . But recently , Scientists at MIT are based on OpenAI Codex The pre training model passes on high numbers few-shot learning The correct rate of 81%! Relevant research has been ArXiv Included . Courses range from elementary calculus to differential equations 、 probability theory 、 Linear algebra has , The form of the question is in addition to calculation 、 There are even drawings .
Language model Minerva
Researchers found that , Give Way AI There are many ways to solve mathematical problems .
First , Use the latest GPT-3 The language model can only achieve 18.8% The accuracy of ; Secondly, researchers try to use small sample learning and the latest thinking chain tips , The accuracy has risen to 30.8%; Last , Researchers fine tune the code , Use Codex A small amount of learning , Give Way AI Challenge MIT in six math courses 210 A question , The accuracy has been improved to 81.1% .
The solution of the research team is to do pre training on the text first , Fine tune with code , Transform mathematical problems into equivalent problems , By making AI Automatically generate supplementary context , After automatically generating the text suitable for the operation of the model , Then generate the corresponding code and run , Finally solve the mathematical problem . The next step of the research team is to expand this technology to more courses , And consider the practical application in teaching .
In this paper submitted , We learned that they have launched a language model Minerva, The model can solve mathematical and scientific problems , Let the model step by step . By collecting training data related to quantitative reasoning problems 、 Large scale training model , And use advanced reasoning technology , This research has achieved significant performance improvement in various difficult quantitative reasoning tasks .
Minerva Solve problems by generating solutions , Including numerical calculation 、 Symbol operation , Instead of relying on external tools such as calculators .Minerva Natural language and mathematical symbols can be combined to analyze and answer mathematical problems .
Besides ,Minerva It also combines a variety of technologies , Including small sample tips 、 Thinking chain 、 Register prompt and majority voting principle , Thus in STEM Reasoning task SOTA performance .
Minerva It can not only solve algebraic problems , It can also solve physics 、 number theory 、 The geometric 、 biological 、 chemical 、 Astronomy and many other problems .
Here is Minerva Solve geometric problems :
Application questions , You can list equations :
You can even deduce and prove .
In order to test Minerva Quantitative reasoning ability , Researchers are in different STEM It was evaluated on the benchmark , Covering courses ranging from elementary school level problems to graduate level . The researchers are still OCWCourses On the assessment Minerva, Covering from MIT OpenCourseWare Solid state chemistry collected in 、 Astronomy 、 Differential equations and special relativity STEM The theme .
It turns out that , After evaluation of all data sets ,5400 Billion parameter Minerva Achieve SOTA, Sometimes even a substantial increase .
however ,Minerva And made a lot of mistakes .
To better identify areas where the model can be improved , The researchers analyzed a sample of problems where the model went wrong , Most of the errors found are easy to explain . It turns out that , About half of them are calculation errors , The other half is reasoning error , The reason is that the solution steps do not follow the logical thinking chain .
meanwhile ,Minerva It is also possible to get the correct final answer , But the reasoning is still wrong . Analysis results show that , This probability is relatively low ,Minerva 62B stay MATH The average on the dataset is lower than 8%.
Conclusion
AI Not only in the technology circle has a good development , They also show their strength in different fields , There is a concession before AI stay 40 Second write 40 College entrance examination composition , use AI Repair many precious photos 、 The picture .
Not only students are looking forward to one day using AI do the homework , And teachers also expect to use AI Write a paper .
Some netizens also said , Want to challenge him .
What do you think ?
Reference link :
https://s.weibo.com/weibo/%2523AI%25E8%2580%2583%25E9%25AB%2598%25E6%2595%25B0%25E4%25BB%2585%25E5%25BE%259781%25E5%2588%2586%2523?topnav=1&wvr=6&Refer=top_hot&sudaref=weibo.com
https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html
— Recommended reading —
*7-Zip Boycotted ? The caller decided “ Three sins ”: Pseudo open source 、 unsafe 、 The author is from Russia !
*“ give up GitHub , The time has come. ”, Software freedom protection association angrily criticized !
* Microsoft banned , Russia is against piracy Windows Demand for new products “ Skyrocketing ”!
边栏推荐
猜你喜欢
PowerPoint 教程,如何在 PowerPoint 中将演示文稿另存为视频?
MySQL
MyCms 自媒体商城 v3.4.1 发布,使用手册更新
Typeerror resolved: argument 'parser' has incorrect type (expected lxml.etree.\u baseparser, got type)
IDEA 全文搜索快捷键Ctr+Shift+F失效问题
Libuv库 - 设计概述(中文版)
106. 如何提高 SAP UI5 应用路由 url 的可读性
rxjs Observable filter Operator 的实现原理介绍
[email protected] chianxin: Perspective of Russian Ukrainian cyber war - Security confrontation and sanctions g"/>
Start signing up CCF C ³- [email protected] chianxin: Perspective of Russian Ukrainian cyber war - Security confrontation and sanctions g
File uploading and email sending
随机推荐
Finite State Machine FSM
php:  The document cannot be displayed in Chinese
mysqlbetween实现选取介于两个值之间的数据范围
道路建设问题
71 articles on Flink practice and principle analysis (necessary for interview)
json序列化时案例总结
The shortage of graphics cards finally came to an end: 3070ti for more than 4000 yuan, 2000 yuan cheaper than the original price, and 3090ti
SSH登录服务器发送提醒
Tencent cloud tdsql database delivery and operation and maintenance Junior Engineer - some questions of Tencent cloud cloudlite certification (TCA) examination
2022-02-10 introduction to the design of incluxdb storage engine TSM
MySQL installation, uninstallation, initial password setting and general commands of Linux
Elk note 24 -- replace logstash consumption log with gohangout
Logseq evaluation: advantages, disadvantages, evaluation, learning tutorial
Fabric. JS three methods of changing pictures (including changing pictures in the group and caching)
双链笔记 RemNote 综合评测:快速输入、PDF 阅读、间隔重复/记忆
Logseq 评测:优点、缺点、评价、学习教程
正则表达式
2022-02-09 survey of incluxdb cluster
18W word Flink SQL God Road manual, born in the sky
MySQL functions and related cases and exercises