当前位置:网站首页>AI scores 81 in high scores. Netizens: AI model can't avoid "internal examination"!
AI scores 81 in high scores. Netizens: AI model can't avoid "internal examination"!
2022-07-03 13:21:00 【CSDN information】

Arrangement | Hemu wood
Produce | AI Technology base (ID:rgznai100)
High numbers are a nightmare for many science students ? Xiaobian was a person with poor high numbers at that time 
Then let AI How difficult is it to do a math problem ? Not to mention high numbers ?
not long ago , See such a hot search :

Is it more difficult to accept ?!!!
these years , Scientists have been trying to make AI Robot Challenge math exam , But I failed for years , Even as low as 20 Multipoint . therefore , Scientists generally believe that AI cannot challenge Advanced Mathematics . But recently , Scientists at MIT are based on OpenAI Codex The pre training model passes on high numbers few-shot learning The correct rate of 81%! Relevant research has been ArXiv Included . Courses range from elementary calculus to differential equations 、 probability theory 、 Linear algebra has , The form of the question is in addition to calculation 、 There are even drawings .

Language model Minerva
Researchers found that , Give Way AI There are many ways to solve mathematical problems .
First , Use the latest GPT-3 The language model can only achieve 18.8% The accuracy of ; Secondly, researchers try to use small sample learning and the latest thinking chain tips , The accuracy has risen to 30.8%; Last , Researchers fine tune the code , Use Codex A small amount of learning , Give Way AI Challenge MIT in six math courses 210 A question , The accuracy has been improved to 81.1% .
The solution of the research team is to do pre training on the text first , Fine tune with code , Transform mathematical problems into equivalent problems , By making AI Automatically generate supplementary context , After automatically generating the text suitable for the operation of the model , Then generate the corresponding code and run , Finally solve the mathematical problem . The next step of the research team is to expand this technology to more courses , And consider the practical application in teaching .
In this paper submitted , We learned that they have launched a language model Minerva, The model can solve mathematical and scientific problems , Let the model step by step . By collecting training data related to quantitative reasoning problems 、 Large scale training model , And use advanced reasoning technology , This research has achieved significant performance improvement in various difficult quantitative reasoning tasks .
Minerva Solve problems by generating solutions , Including numerical calculation 、 Symbol operation , Instead of relying on external tools such as calculators .Minerva Natural language and mathematical symbols can be combined to analyze and answer mathematical problems .
Besides ,Minerva It also combines a variety of technologies , Including small sample tips 、 Thinking chain 、 Register prompt and majority voting principle , Thus in STEM Reasoning task SOTA performance .
Minerva It can not only solve algebraic problems , It can also solve physics 、 number theory 、 The geometric 、 biological 、 chemical 、 Astronomy and many other problems .

Here is Minerva Solve geometric problems :

Application questions , You can list equations :

You can even deduce and prove .
In order to test Minerva Quantitative reasoning ability , Researchers are in different STEM It was evaluated on the benchmark , Covering courses ranging from elementary school level problems to graduate level . The researchers are still OCWCourses On the assessment Minerva, Covering from MIT OpenCourseWare Solid state chemistry collected in 、 Astronomy 、 Differential equations and special relativity STEM The theme .
It turns out that , After evaluation of all data sets ,5400 Billion parameter Minerva Achieve SOTA, Sometimes even a substantial increase .
however ,Minerva And made a lot of mistakes .
To better identify areas where the model can be improved , The researchers analyzed a sample of problems where the model went wrong , Most of the errors found are easy to explain . It turns out that , About half of them are calculation errors , The other half is reasoning error , The reason is that the solution steps do not follow the logical thinking chain .
meanwhile ,Minerva It is also possible to get the correct final answer , But the reasoning is still wrong . Analysis results show that , This probability is relatively low ,Minerva 62B stay MATH The average on the dataset is lower than 8%.

Conclusion
AI Not only in the technology circle has a good development , They also show their strength in different fields , There is a concession before AI stay 40 Second write 40 College entrance examination composition , use AI Repair many precious photos 、 The picture .
Not only students are looking forward to one day using AI do the homework , And teachers also expect to use AI Write a paper .
Some netizens also said , Want to challenge him .
What do you think ?
Reference link :
https://s.weibo.com/weibo/%2523AI%25E8%2580%2583%25E9%25AB%2598%25E6%2595%25B0%25E4%25BB%2585%25E5%25BE%259781%25E5%2588%2586%2523?topnav=1&wvr=6&Refer=top_hot&sudaref=weibo.com
https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html
— Recommended reading —
*7-Zip Boycotted ? The caller decided “ Three sins ”: Pseudo open source 、 unsafe 、 The author is from Russia !
*“ give up GitHub , The time has come. ”, Software freedom protection association angrily criticized !
* Microsoft banned , Russia is against piracy Windows Demand for new products “ Skyrocketing ”!
边栏推荐
- 剑指 Offer 15. 二进制中1的个数
- Flink SQL knows why (16): dlink, a powerful tool for developing enterprises with Flink SQL
- Anan's doubts
- Setting up remote links to MySQL on Linux
- Setting up Oracle datagurd environment
- Flink SQL knows why (17): Zeppelin, a sharp tool for developing Flink SQL
- sitesCMS v3.1.0发布,上线微信小程序
- [Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter 6 exercises]
- Flink SQL knows why (12): is it difficult to join streams? (top)
- Fabric.js 更换图片的3种方法(包括更换分组内的图片,以及存在缓存的情况)
猜你喜欢

Flink SQL knows why (19): the transformation between table and datastream (with source code)

Today's sleep quality record 77 points

用户和组命令练习

Image component in ETS development mode of openharmony application development

35道MySQL面试必问题图解,这样也太好理解了吧

道路建设问题

MySQL constraints

The 35 required questions in MySQL interview are illustrated, which is too easy to understand

对业务的一些思考

Libuv库 - 设计概述(中文版)
随机推荐
The 35 required questions in MySQL interview are illustrated, which is too easy to understand
剑指 Offer 15. 二进制中1的个数
2022-02-09 survey of incluxdb cluster
CVPR 2022 图像恢复论文
SwiftUI 开发经验之作为一名程序员需要掌握的五个最有力的原则
35道MySQL面试必问题图解,这样也太好理解了吧
This math book, which has been written by senior ml researchers for 7 years, is available in free electronic version
JS 将伪数组转换成数组
AI 考高数得分 81,网友:AI 模型也免不了“内卷”!
Flink SQL knows why (XIV): the way to optimize the performance of dimension table join (Part 1) with source code
Flink SQL knows why (7): haven't you even seen the ETL and group AGG scenarios that are most suitable for Flink SQL?
Luogup3694 Bangbang chorus standing in line
Solve system has not been booted with SYSTEMd as init system (PID 1) Can‘t operate.
Logback 日志框架
刚毕业的欧洲大学生,就能拿到美国互联网大厂 Offer?
Sitescms v3.1.0 release, launch wechat applet
Asp.Net Core1.1版本没了project.json,这样来生成跨平台包
Detailed explanation of multithreading
Some thoughts on business
双链笔记 RemNote 综合评测:快速输入、PDF 阅读、间隔重复/记忆