当前位置:网站首页>Coggle 30 days of ML July competition learning
Coggle 30 days of ML July competition learning
2022-06-27 07:57:00 【Datawhale】
Part1 Content introduction
In the process of sharing knowledge , I found that many students have many problems in learning competitions :
Pandas、Numpy Not familiar with data processing
Sklearn、LightGBM Not familiar with the model
Don't know how to build Feature Engineering 、 Screening features
NLP The model doesn't know how to build
The above questions are all about a contestant 、 An algorithm engineer must have . So we will organize a competition training camp this month , Hope to help you get started with the data contest . In the event, we will arrange specific competition tasks , Then, the students who participated in the class continued to break through the barrier to complete , It is possible to help you get started .
7 This month's competition will start with the following two competitions :
Diabetes genetic risk testing challenge
Multilingual transfer learning challenge in the automotive field
Part2 Arrangement of activities
Activities are free learning activities , There will be no charge .
Please add the following wechat , And the reply 【 Competitive learning 】, To participate .

Part3 Points description and reward
In order to motivate the students to complete their learning tasks , Divide the learning tasks according to their difficulty , And according to whether the task with high and low scoring difficulty is completed or not, the scores are 3、2 and 1. At the completion of 7 After monthly study ( This activity , By the end of 7 month 24 Number ), The evaluation will be conducted in the order of points Top3 learners .
Punch cards can be written in an address , Each time there is a new completion, you can repeatedly submit and punch in !
Top1 Of learners will receive the following Reward :
Coggle Competition interview opportunity
《 Machine learning algorithm competition practice 》
Top10 Of learners will receive the following Reward :
“ IFLYTEK x Datawhale” The certificate of outstanding player jointly issued .
Coggle Fringe benefits
Coggle Competition interview opportunity
Part4 Diabetes genetic risk testing challenge
Learning content
This course is mainly aimed at the diabetes genetic risk detection challenge , It will explain the specific knowledge points and details used in the data competition . In this study, we will learn feature engineering 、 Feature selection and model parameter adjustment process .
The address of the competition registration :http://challenge.xfyun.cn/topic/info?type=diabetes&ch=ds22-dw-gzh01
Introduction to the contest question
In this competition , You need to build a genetic risk prediction model for diabetes through training data sets , Then predict whether the individuals in the test data set have diabetes , Join us to help diabetes patients solve this problem “ Sweet troubles ”. For individuals in the test data set , You must predict whether they have diabetes ( Have diabetes :1, No diabetes :0), The predicted value can only be an integer 1 perhaps 0.
Training set ( Game training set .csv) Altogether 5070 Data , Used to build your forecasting model ( You may need to do data analysis first ). The fields of the data are numbered 、 Gender 、 Year of birth 、 Body mass index 、 Family history of diabetes 、 diastolic pressure 、 Oral glucose tolerance test 、 Insulin release test 、 Triceps brachii skinfold thickness 、 Signs of diabetes ( The last column ), You can also use feature engineering techniques to build new features .
Test set ( Competition test set .csv) Altogether 1000 Data , Used to verify the performance of the prediction model . The fields of the data are numbered 、 Gender 、 Year of birth 、 Body mass index 、 Family history of diabetes 、 diastolic pressure 、 Oral glucose tolerance test 、 Insulin release test 、 Triceps brachii skinfold thickness .
Punch in summary
| The name of the task | difficulty |
|---|---|
| Mission 1: Sign up for the competition | low 、1 |
| Mission 2: Game data analysis | low 、1 |
| Mission 3: Logistic regression attempts | low 、1 |
| Mission 4: Feature Engineering | in 、2 |
| Mission 5: Feature screening | in 、2 |
| Mission 6: High order tree model | in 、2 |
| Mission 7: Multi fold training and integration | high 、3 |
Punch in requirements
notes :
Need all the tasks to be written in one blog
It is recommended to add a thought process to the punch in process , You can join in trying & Data record
Part5 Multilingual transfer learning challenge in the automotive field
Learning content
This course is mainly aimed at the multilingual transfer learning challenge in the automotive field , It will explain the specific knowledge points and details used in the data competition . In this study, we will learn text classification and keyword extraction .
The address of the competition registration :http://challenge.xfyun.cn/topic/info?type=car-multilingual&ch=ds22-dw-gzh05
Introduction to the contest question
In order to improve the competitiveness of products, domestic automobile enterprises 、 Better go to overseas markets , Put forward the demand for intelligent interaction in overseas markets . But countries around the world are “ Data security ” There are strict legal restrictions on , Do a good job in overseas intelligent interaction , The biggest challenge for local enterprises is the lack of data . This competition requires the contestants to pass NLP Relevant artificial intelligence algorithms to achieve multilingual transfer learning in the automotive field .
In this transfer learning task , IFLYTEK smart car BU There will be more in car human-computer interaction Chinese corpus , And a small amount of Chinese and English 、 China and Japan 、 Chinese and Arabic parallel corpora are used as training sets , Contestants build models from the data provided , Carry out intention classification and key information extraction tasks , The final use of English 、 Japanese 、 Test and judge in Arabic .
1. Preliminaries
Training set : Chinese corpus 30000 strip , Chinese and English parallel corpora 1000 strip , Chinese and Japanese parallel corpora 1000 strip
Test set A: English Corpus 500 strip , Japanese Corpus 500 strip
Test set B: English Corpus 500 strip , Japanese Corpus 500 strip
2. The rematch
Training set : Chinese corpus is the same as the preliminary contest , Chinese Arabic parallel corpora 1000 strip
Test set A: Arabic corpus 500 strip
Test set B: Arabic corpus 500 strip
This model is based on the submitted result document , use accuracy Evaluate .
Classification of intention accuracy = The correct number of intentions / Total data
Key information extraction accuracy = Number of critical information completely correct / Total data
Punch in summary
| The name of the task | difficulty |
|---|---|
| Mission 1: Sign up for the competition | low 、1 |
| Mission 2: File reading and text segmentation | low 、1 |
| Mission 3:TFIDF And text classification | low 、1 |
| Mission 4: Regular expressions | in 、2 |
| Mission 5:BERT Introduction to models | in 、2 |
| Mission 6:BERT Text classification | in 、2 |
| Mission 7:BER Entity extraction | in 、2 |
Punch in requirements
notes :
Need all the tasks to be written in one blog
It is recommended to add a thought process to the punch in process , You can join in trying & Data record

One key, three links , Learning together ️
边栏推荐
- What is futures reverse documentary?
- Basic knowledge | JS Foundation
- 淘宝虚拟产品开店教程之作图篇
- 基础知识 | js基础
- Sword finger offer 07 Rebuild binary tree
- JS uses the while cycle to calculate how many years it will take to grow from 1000 yuan to 5000 yuan if the interest rate for many years of investment is 5%
- What is a magnetic separator?
- Closure problem
- 【批处理DOS-CMD命令-汇总和小结】-输出/显示命令——echo
- Binary tree structure and heap structure foundation
猜你喜欢

Implementation of game hexagon map

淘宝虚拟产品开店教程之作图篇

c#的初步认识

js输出形状
![[13. number and bit operation of 1 in binary]](/img/53/024f9742d1936fe6f96f4676cea00e.png)
[13. number and bit operation of 1 in binary]

2. QT components used in the project

JS to judge the odd and even function and find the function of circular area

MSSQL how to export and delete multi table data using statements

JS use switch to output whether the result is qualified

JS find the number of all daffodils
随机推荐
"Short video" Linxia fire rescue detachment carries out fire safety training
1-4 decimal representation and conversion
ACM course term summary
【c ++ primer 笔记】第3章 字符串、向量和数组
认识O(NlogN)的排序
(笔记)Anaconda-Navigator闪退解决方法
第6届蓝桥杯
The IPO of Yefeng pharmaceutical was terminated: Yu Feng, the actual controller who had planned to raise 540million yuan, made P2P investment
js输出1-100之间所有的质数并求总个数
js用switch输出成绩是否合格
[paper reading] internally semi supervised methods
JS find the number of all daffodils
win10-如何管理开机启动项?
Multi table associated query -- 07 -- hash join
Speech synthesis: tacotron explains [end-to-end speech synthesis model] [compared with traditional speech synthesis, it does not have complex phonetics and acoustic feature modules, but only uses < te
盲测调查显示女码农比男码农更优秀
「短视频」临夏消防救援支队开展消防安全培训授课
R language analyzing wine data
JS, and output from small to large
【批处理DOS-CMD命令-汇总和小结】-cmd的内部命令和外部命令怎么区分,CMD命令和运行(win+r)命令的区别,