当前位置：网站首页>Tianchi - student test score forecast

Tianchi - student test score forecast

2022-06-11 04:47:00 【Panbohhhhh】

Personal learning work , Level co., LTD. , For reference only , Share with you .

1： Purpose

Call library , Familiar with data cleaning , Data processing , be familiar with python Knowledge of programming

The general process is as follows （ It is recommended to memorize , It is good to master knowledge and interview ）：

Make sure the dataset itself is available , Including but not limited to ：
a) Check whether the data itself is balanced (balanced or not), And deal with
b) Check the data itself for missing values (missing value), And deal with
c) Check whether the data itself has some obvious heterogeneous data (outlier), Deal with it according to the situation
Examine the nature of the dataset itself , Determine the appropriate machine learning model （machine learning model）
a) There's a surveillance model （Supervised） VS Unsupervised model （Unsupervised）
b) The regression model （Regression） VS Classification model （Classification)
Through data visualization , Build an intuition about data sets （intuition） And cognition (understanding)
Through data visualization , Get a rough idea of the relationship between characteristics and results , Further determine the appropriate machine learning model
Predict and verify the results of future model outputs
Preliminary screening of features used in the model
Characteristic Engineering （feature engineering） Part of the preparation

2： data

data csv About the following ： I will upload to the download section , Friends who need to download

3： Start

#  Introducing library packages 
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('student-por.csv')

After introducing the library package , Conduct preliminary data processing

print(df.head(10))
print(df.shape)
print(df.isnull().sum())
print(df.describe(include = 'all'))
print(df.info())

The point is , This data set is relatively clean , There is no default ,

The Chinese correspondence of the data segment is as follows

Field name	meaning	type	describe
sex	Gender	string	F It's a woman. ,M Male
address	address	string	U Representing City ,R It means country
famsize	Number of family members	string	LE3 Less than three people ,GT3 More than three people
pstatus	Whether they live with their parents	string	T Live together ,A Separate
medu	Mother's educational level	string	from 0~4 Gradually increase
fedu	Father's educational level	string	from 0~4 Gradually increase
mjob	Mother's work	string	It is divided into teacher related 、 Health related 、 services
fjob	Father's job	string	It is divided into teacher related 、 Health related 、 services
guardian	The student's Supervisor	string	mother,father or other
traveltime	It takes time from home to school	double	In minutes
studytime	Weekly study time	double	In hours
failures	Number of failed courses	double	Number of failed courses
schoolsup	Are there any additional learning aids	string	yes or no
fumsup	Is there a tutor	string	yes or no
paid	Whether there is any assistance from relevant examination disciplines	string	yes or no
activities	Are there any extracurricular interest classes	string	yes or no
higher	Whether there is a willingness to study upwards	string	yes or no
internet	Whether the home is connected to the Internet	string	yes or no
famrel	Family relationships	double	from 1~5 It means that the relationship goes from bad to good
freetime	Amount of spare time	double	from 1~5 From less to more
goout	How often do you go out with friends	double	from 1~5 From less to more
dalc	Daily drinking capacity	double	from 1~5 From less to more
walc	Weekly drinking capacity	double	from 1~5 From less to more
health	health	double	from 1~5 From bad to good
absences	Attendance	double	0 To 93 Time
G1,G2,G3	Final grade	double	20 "

1; Deal with gender

sns.countplot(x = 'sex', order = ['M','F'], data = df )
df['sex'].replace('M','0')
df['sex'].replace('F','1')

take M- male ,F- Woman Turn into 01

2： Translate addresses

sns.countplot(x = 'address', order = ['U','R'], data = df )
df['address'].replace('U','1')
df['address'].replace('R','0')

Reference：

1：https://tianchi.aliyun.com/course/video?spm=5176.12282042.0.0.3eb22042bd6YRi&liveId=7729

1：https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12281897.0.0.209439a9IUXP6k&postId=7459

1：https://blog.csdn.net/jiangtianshe/article/details/77703450

原网站

版权声明
本文为[Panbohhhhh]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203020545291086.html

当前位置：网站首页>Tianchi - student test score forecast

Tianchi - student test score forecast

1： Purpose

2： data

3： Start

边栏推荐

猜你喜欢

随机推荐