当前位置:网站首页>Data visualization practice: Experimental Report

Data visualization practice: Experimental Report

2022-06-26 06:07:00 Ah, teacher Q

1. Project background

1.1 Description of project

The project intends to movie.csv Data visualization

1.2 Data field description

id—— Serial number
movieId—— Movie number
title—— The movie name
cover—— Photo URL
rate—— score
director—— The director
composer—— The writers
actor—— actor
category—— type
district—— region
language—— Language
showtime—— Performance time
length—— Duration

2. Raise questions

What is the relationship between the type of film and the average score 、2000 What are the top ten movies rated since 、2010-2015 What changes have taken place in the top five categories of film production in ?

3. Understand the data

3.1 Collect data

Data sources :movie.csv

3.2 Import data

Based on the questions raised , That's right csv File data acquisition . And then through pandas Operation read csv Document and usecols Import the data of the specified column .
1.
 Insert picture description here
2.
 Insert picture description here
3.
 Insert picture description here

3.3 View dataset information

Pictured 1.2.3:
 Insert picture description here

4. Data cleaning

4.1 Data preprocessing

4.1.1 Abnormal data processing

because category The column data is null , First, it is tested with “ Missing data for type ” Filling treatment .
 Insert picture description here

4.2 Feature Engineering

4.2.1 feature extraction

1. What is the relationship between the type of film and the average score ?
because category There may be many types of data in each cell , So press it first “/” Split , And convert the result into DataFrame; After break up , Then output in the form of columns ; After row column conversion , The index needs to be reset ; Last , Data processing finished , take new_category Replace the column of with category Column . The second step , Get the average score for each type , I'm going to call it Average_rate Spliced in the table ; Get the total number of times for each movie type , I'm going to call it Count_category Spliced in the table ; At the same time, the data of the type column is de duplicated and the average score is used to sort from high to low . The third step , Get the type separately 、 Average score 、 Type times are output in the form of a list .
 Insert picture description here
The result is shown in Fig. :
 Insert picture description here

2.2000 What are the top ten movies rated since ?
First of all, the performance time sequence is greater than 2000 Year's conditional screening ; And then sort them from high to low , And get the first ten rows of data ; And then generate ‘rank’ Data as a rank column ; Finally, the data is output in the form of a list .
 Insert picture description here
The result is shown in Fig. :
 Insert picture description here

3.2010-2015 What changes have taken place in the top five categories of film production in ?
Press first 2010-2015 Traverse , Then filter by the data of the staging time column , Finally, the type column is treated in the same way as the first question .
 Insert picture description here
The result is shown in Fig. :
 Insert picture description here

5. Data visualization

5.1 The relationship between film type and average score

Bar-Mixed_bar_and_line chart :
The figure shows that the average score of the film type decreases from left to right , The average score of the child type is the highest , The type of terror is the lowest ; The number of plots is the most , common 2376 Time , absurd 、 The number of surprises and suspense is the lowest , Only 1 Time .
 Insert picture description here

Funnel-Funnel_chart chart :
This figure simply extracts the top five movie types with average scores from the above information , They are children 、 Black movies 、 absurd 、 Missing data for type ( This type is a padding for null values )、 Animation .
 Insert picture description here

5.2 2000 The top ten films rated since

Table-Table_base chart :
The picture shows 2000 Ranking of the top ten films since , The highest movie score is 9.3, Their films are called brilliant life 、 Wall-E , The performance time is 2003、2008.
 Insert picture description here

5.3 2010-2015 Changes in production of film types in

Timeline-Timeline_bar_reversal chart :
The picture shows 2010 - 2015 Dynamic changes in the production of the top five film types in , Among them, the output of different types decreased from top to bottom .

原网站

版权声明
本文为[Ah, teacher Q]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206260555075723.html