当前位置:网站首页>Tianchi - student test score forecast

Tianchi - student test score forecast

2022-06-11 04:47:00 Panbohhhhh

Personal learning work , Level co., LTD. , For reference only , Share with you .

1: Purpose

Call library , Familiar with data cleaning , Data processing , be familiar with python Knowledge of programming

The general process is as follows ( It is recommended to memorize , It is good to master knowledge and interview ):

  1. Make sure the dataset itself is available , Including but not limited to :
    a) Check whether the data itself is balanced (balanced or not), And deal with
    b) Check the data itself for missing values (missing value), And deal with
    c) Check whether the data itself has some obvious heterogeneous data (outlier), Deal with it according to the situation

  2. Examine the nature of the dataset itself , Determine the appropriate machine learning model (machine learning model)
    a) There's a surveillance model (Supervised) VS Unsupervised model (Unsupervised)
    b) The regression model (Regression) VS Classification model (Classification)

  3. Through data visualization , Build an intuition about data sets (intuition) And cognition (understanding)

  4. Through data visualization , Get a rough idea of the relationship between characteristics and results , Further determine the appropriate machine learning model

  5. Predict and verify the results of future model outputs

  6. Preliminary screening of features used in the model

  7. Characteristic Engineering (feature engineering) Part of the preparation

2: data

data csv About the following : I will upload to the download section , Friends who need to download

3: Start

#  Introducing library packages 
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('student-por.csv')

After introducing the library package , Conduct preliminary data processing

print(df.head(10))
print(df.shape)
print(df.isnull().sum())
print(df.describe(include = 'all'))
print(df.info())

The point is , This data set is relatively clean , There is no default ,

The Chinese correspondence of the data segment is as follows

Field name meaning type describe
sex Gender stringF It's a woman. ,M Male
address address stringU Representing City ,R It means country
famsize Number of family members stringLE3 Less than three people ,GT3 More than three people
pstatus Whether they live with their parents stringT Live together ,A Separate
medu Mother's educational level string from 0~4 Gradually increase
fedu Father's educational level string from 0~4 Gradually increase
mjob Mother's work string It is divided into teacher related 、 Health related 、 services
fjob Father's job string It is divided into teacher related 、 Health related 、 services
guardian The student's Supervisor stringmother,father or other
traveltime It takes time from home to school double In minutes
studytime Weekly study time double In hours
failures Number of failed courses double Number of failed courses
schoolsup Are there any additional learning aids stringyes or no
fumsup Is there a tutor stringyes or no
paid Whether there is any assistance from relevant examination disciplines stringyes or no
activities Are there any extracurricular interest classes stringyes or no
higher Whether there is a willingness to study upwards stringyes or no
internet Whether the home is connected to the Internet stringyes or no
famrel Family relationships double from 1~5 It means that the relationship goes from bad to good
freetime Amount of spare time double from 1~5 From less to more
goout How often do you go out with friends double from 1~5 From less to more
dalc Daily drinking capacity double from 1~5 From less to more
walc Weekly drinking capacity double from 1~5 From less to more
health health double from 1~5 From bad to good
absences Attendance double0 To 93 Time
G1,G2,G3 Final grade double20 "

1; Deal with gender

sns.countplot(x = 'sex', order = ['M','F'], data = df )
df['sex'].replace('M','0')
df['sex'].replace('F','1')

take M- male ,F- Woman Turn into 01

2: Translate addresses

sns.countplot(x = 'address', order = ['U','R'], data = df )
df['address'].replace('U','1')
df['address'].replace('R','0')

 

Reference:

1:https://tianchi.aliyun.com/course/video?spm=5176.12282042.0.0.3eb22042bd6YRi&liveId=7729

1:https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12281897.0.0.209439a9IUXP6k&postId=7459

1:https://blog.csdn.net/jiangtianshe/article/details/77703450

 

原网站

版权声明
本文为[Panbohhhhh]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203020545291086.html