当前位置:网站首页>How difficult is it to be high? AI rolls into the mathematics circle, and the accuracy rate of advanced mathematics examination is 81%!
How difficult is it to be high? AI rolls into the mathematics circle, and the accuracy rate of advanced mathematics examination is 81%!
2022-07-02 23:19:00 【AI technology base camp】
Arrangement | Hemu wood
Produce | AI Technology base (ID:rgznai100)
OpenAI Of Codex Already in MIT Of 7 The accuracy rate of subjects in advanced mathematics courses reaches 81.1%, Proper MIT Undergraduate level . Courses range from elementary calculus to differential equations 、 probability theory 、 Linear algebra has , The form of the question is in addition to calculation 、 There are even drawings .
High numbers are a nightmare for many science students ? Xiaobian was a person with poor high numbers at that time
Then let AI How difficult is it to do a math problem ? Not to mention high numbers ?
yesterday , See such a hot search :
Is it more difficult to accept ?!!!
these years , Scientists have been trying to make AI Robot Challenge math exam , But I failed for years , Even as low as 20 Multipoint . therefore , Scientists generally believe that AI cannot challenge Advanced Mathematics . But recently , Scientists at MIT are based on OpenAI Codex The pre training model passes on high numbers few-shot learning The correct rate of 81%! Relevant research has been ArXiv Included . Courses range from elementary calculus to differential equations 、 probability theory 、 Linear algebra has , The form of the question is in addition to calculation 、 There are even drawings .
Language model Minerva
Researchers found that , Give Way AI There are many ways to solve mathematical problems .
First , Use the latest GPT-3 The language model can only achieve 18.8% The accuracy of ; Secondly, researchers try to use small sample learning and the latest thinking chain tips , The accuracy has risen to 30.8%; Last , Researchers fine tune the code , Use Codex A small amount of learning , Give Way AI Challenge MIT in six math courses 210 A question , The accuracy has been improved to 81.1% .
The solution of the research team is to do pre training on the text first , Fine tune with code , Transform mathematical problems into equivalent problems , By making AI Automatically generate supplementary context , After automatically generating the text suitable for the operation of the model , Then generate the corresponding code and run , Finally solve the mathematical problem . The next step of the research team is to expand this technology to more courses , And consider the practical application in teaching .
In this paper submitted , We learned that they have launched a language model Minerva, The model can solve mathematical and scientific problems , Let the model step by step . By collecting training data related to quantitative reasoning problems 、 Large scale training model , And use advanced reasoning technology , This research has achieved significant performance improvement in various difficult quantitative reasoning tasks .
Minerva Solve problems by generating solutions , Including numerical calculation 、 Symbol operation , Instead of relying on external tools such as calculators .Minerva Natural language and mathematical symbols can be combined to analyze and answer mathematical problems .
Besides ,Minerva It also combines a variety of technologies , Including small sample tips 、 Thinking chain 、 Register prompt and majority voting principle , Thus in STEM Reasoning task SOTA performance .
Minerva It can not only solve algebraic problems , It can also solve physics 、 number theory 、 The geometric 、 biological 、 chemical 、 Astronomy and many other problems .
Here is Minerva Solve geometric problems :
Application questions , You can list equations :
You can even deduce and prove .
In order to test Minerva Quantitative reasoning ability , Researchers are in different STEM It was evaluated on the benchmark , Covering courses ranging from elementary school level problems to graduate level . The researchers are still OCWCourses On the assessment Minerva, Covering from MIT OpenCourseWare Solid state chemistry collected in 、 Astronomy 、 Differential equations and special relativity STEM The theme .
It turns out that , After evaluation of all data sets ,5400 Billion parameter Minerva Achieve SOTA, Sometimes even a substantial increase .
however ,Minerva And made a lot of mistakes .
To better identify areas where the model can be improved , The researchers analyzed a sample of problems where the model went wrong , Most of the errors found are easy to explain . It turns out that , About half of them are calculation errors , The other half is reasoning error , The reason is that the solution steps do not follow the logical thinking chain .
meanwhile ,Minerva It is also possible to get the correct final answer , But the reasoning is still wrong . Analysis results show that , This probability is relatively low ,Minerva 62B stay MATH The average on the dataset is lower than 8%.
Conclusion
AI Not only in the technology circle has a good development , They also show their strength in different fields , There is a concession before AI Write a composition for college entrance examination , use AI Restore the precious picture of the PLA garrison in Hong Kong .
Not only students are looking forward to one day using AI do the homework , And teachers also expect to use AI Write a paper .
Some netizens also said , Want to challenge him .
What do you think ?
Reference link :
https://s.weibo.com/weibo/%2523AI%25E8%2580%2583%25E9%25AB%2598%25E6%2595%25B0%25E4%25BB%2585%25E5%25BE%259781%25E5%2588%2586%2523?topnav=1&wvr=6&Refer=top_hot&sudaref=weibo.com
https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html
Looking back
NLP Exploration and practice of class problem modeling scheme
Python Common encryption algorithms in crawlers !
2D Transformation 3D, Look at NVIDIA's AI“ new ” magic !
How to use Python Realize the security system of the scenic spot ?
Share
Point collection
A little bit of praise
Click to see
边栏推荐
- Brief introduction of emotional dialogue recognition and generation
- Golang common settings - modify background
- Catalogue of digital image processing experiments
- Cryptographic technology -- key and ssl/tls
- 2016. maximum difference between incremental elements
- Why does RTOS system use MPU?
- 归并排序详解及应用
- Strictly abide by the construction period and ensure the quality, this AI data annotation company has done it!
- Boost库链接错误解决方案
- Detailed explanation and application of merging and sorting
猜你喜欢
Lambda expression: an article takes you through
FOC矢量控制及BLDC控制中的端电压、相电压、线电压等概念别还傻傻分不清楚
Hisilicon VI access video process
Eight honors and eight disgraces of the programmer version~
Print out mode of go
Go basic constant definition and use
Go language sqlx library operation SQLite3 database addition, deletion, modification and query
Li Kou brush questions (2022-6-28)
Splunk audit setting
Alibaba cloud award winning experience: how to use polardb-x
随机推荐
Which common ports should the server open
Easyclick, EC Quanlang network verification source code
面试过了,起薪16k
Learning Websites commonly used by circuit designers
STM32之ADC
情感对话识别与生成简述
Doorplate making C language
[adjustment] postgraduate enrollment of Northeast Petroleum University in 2022 (including adjustment)
Alibaba cloud award winning experience: how to use polardb-x
FOC矢量控制及BLDC控制中的端电压、相电压、线电压等概念别还傻傻分不清楚
STM32串口DAM接收253字节就死机原因排查
地平线2022年4月最新方案介绍
Editor Caton
Methods to solve the tampering of Chrome browser and edeg browser homepage
Go language sqlx library operation SQLite3 database addition, deletion, modification and query
分布式监控系统zabbix
[redis notes] compressed list (ziplist)
理想汽车×OceanBase:当造车新势力遇上数据库新势力
严守工期,确保质量,这家AI数据标注公司做到了!
实现BottomNavigationView和Navigation联动