当前位置:网站首页>Data visualization practice: Experimental Report
Data visualization practice: Experimental Report
2022-06-26 06:07:00 【Ah, teacher Q】
Data visualization experiment report
1. Project background
1.1 Description of project
The project intends to movie.csv Data visualization
1.2 Data field description
id—— Serial number
movieId—— Movie number
title—— The movie name
cover—— Photo URL
rate—— score
director—— The director
composer—— The writers
actor—— actor
category—— type
district—— region
language—— Language
showtime—— Performance time
length—— Duration
2. Raise questions
What is the relationship between the type of film and the average score 、2000 What are the top ten movies rated since 、2010-2015 What changes have taken place in the top five categories of film production in ?
3. Understand the data
3.1 Collect data
Data sources :movie.csv
3.2 Import data
Based on the questions raised , That's right csv File data acquisition . And then through pandas Operation read csv Document and usecols Import the data of the specified column .
1.
2.
3.
3.3 View dataset information
Pictured 1.2.3:
4. Data cleaning
4.1 Data preprocessing
4.1.1 Abnormal data processing
because category The column data is null , First, it is tested with “ Missing data for type ” Filling treatment .
4.2 Feature Engineering
4.2.1 feature extraction
1. What is the relationship between the type of film and the average score ?
because category There may be many types of data in each cell , So press it first “/” Split , And convert the result into DataFrame; After break up , Then output in the form of columns ; After row column conversion , The index needs to be reset ; Last , Data processing finished , take new_category Replace the column of with category Column . The second step , Get the average score for each type , I'm going to call it Average_rate Spliced in the table ; Get the total number of times for each movie type , I'm going to call it Count_category Spliced in the table ; At the same time, the data of the type column is de duplicated and the average score is used to sort from high to low . The third step , Get the type separately 、 Average score 、 Type times are output in the form of a list .
The result is shown in Fig. :
2.2000 What are the top ten movies rated since ?
First of all, the performance time sequence is greater than 2000 Year's conditional screening ; And then sort them from high to low , And get the first ten rows of data ; And then generate ‘rank’ Data as a rank column ; Finally, the data is output in the form of a list .
The result is shown in Fig. :
3.2010-2015 What changes have taken place in the top five categories of film production in ?
Press first 2010-2015 Traverse , Then filter by the data of the staging time column , Finally, the type column is treated in the same way as the first question .
The result is shown in Fig. :
5. Data visualization
5.1 The relationship between film type and average score
Bar-Mixed_bar_and_line chart :
The figure shows that the average score of the film type decreases from left to right , The average score of the child type is the highest , The type of terror is the lowest ; The number of plots is the most , common 2376 Time , absurd 、 The number of surprises and suspense is the lowest , Only 1 Time .
Funnel-Funnel_chart chart :
This figure simply extracts the top five movie types with average scores from the above information , They are children 、 Black movies 、 absurd 、 Missing data for type ( This type is a padding for null values )、 Animation .
5.2 2000 The top ten films rated since
Table-Table_base chart :
The picture shows 2000 Ranking of the top ten films since , The highest movie score is 9.3, Their films are called brilliant life 、 Wall-E , The performance time is 2003、2008.
5.3 2010-2015 Changes in production of film types in
Timeline-Timeline_bar_reversal chart :
The picture shows 2010 - 2015 Dynamic changes in the production of the top five film types in , Among them, the output of different types decreased from top to bottom .
边栏推荐
- 【 langage c】 stockage des données d'analyse approfondie en mémoire
- 04. basic data type - list, tuple
- Multi thread synchronous downloading of network pictures
- Selective Search for Object Recognition 论文笔记【图片目标分割】
- String类学习
- Definition of Halcon hand eye calibration
- Gram matrix
- BOM document
- Class and object learning
- Redis多线程与ACL
猜你喜欢
随机推荐
在web页面播放rtsp流视频(webrtc)
canal部署、原理和使用介绍
Definition of Halcon hand eye calibration
06. talk about the difference and coding between -is and = = again
one billion two hundred and twelve million three hundred and twelve thousand three hundred and twenty-one
适配器模式
Logstash - logstash pushes data to redis
The difference between overload method and override method
Bubble sort
Day3 - variables and operators
volatile应用场景
Easy to understand from the IDE, and then talk about the applet IDE
Library management system
Mysql-10 (key)
Machine learning 05: nonlinear support vector machines
423-二叉树(110. 平衡二叉树、257. 二叉树的所有路径、100. 相同的树、404. 左叶子之和)
机器学习 05:非线性支持向量机
电商借助小程序技术发力寻找增长突破口
Kolla ansible deploy openstack Yoga version
04. basic data type - list, tuple








