当前位置:网站首页>Visual analysis of data related to crawling cat's eye essays "sadness flows upstream into a river" | the most moving film of Guo Jingming's five years
Visual analysis of data related to crawling cat's eye essays "sadness flows upstream into a river" | the most moving film of Guo Jingming's five years
2022-07-06 15:20:00 【Jane said Python】
Welcome to WeChat official account. : Jane said Python
account number :xksnh888
Follow the reply :1024, You can get selected e-books for programming learning .
All the source code of this article has been uploaded github, Click here for
One 、 My feelings
know 《 Sorrow flows back into a river 》 It's still on qq Space saw the younger brother send a talk , Suddenly I think of the book of chasing the fourth grade in junior high school , See every night 10 Some more , I saw the gun version yesterday 《 Sorrow flows back into a river 》, The whole story is almost the same as the novel , The only difference is that Yi Yao in the original book committed suicide by jumping off a building , In the movie, Lu Yao is in the public " Tongue gun lip sword "、 Under the eyes of schadenfreude , With unwilling and resentment, he jumped into the river and committed suicide , In the end … I don't know how to play , The whole play lasted about an hour and 40 minutes, and there was no urine in the whole process , I saw the gun version twice yesterday …( I'm going to find someone to watch it again ), It's also the first time , That makes me want to write this article full of Technology + Emotional articles .
Take the cat's eye movie 《 Sorrow flows back into a river 》 Short commentary , Tell you with data 17 God's movie , You deserve to see , It's worth watching twice .
Two 、 Technology makes things happen ( Climb up )
1. Cat's eye movie short comment interface
http://maoyan.com/films/1217236
We visit this directly , stay web You can only see the hottest 10 Short commentary , How can I get all the essays ?
(1) Visit the link above , Press down F12, Then click the icon on the picture , Put browsing mode ( Responsive design patterns , Firefox shortcut Ctrl+Shift+M) Change to mobile mode , Refresh the page .
(2) Switch to Google browser ,F12 Perform the above operation under , Pull down the short comment after loading , The page continues to load , Find the containing offset and startTime
Loading bar , Discover its Response
It contains the data we want , by json
Format .
2. Get short comments
(1) Simple analysis
Through the analysis above
Request URL: http://m.maoyan.com/mmdb/comments/movie/1217236.json?v=yes&offset=0&startTime=0%2021%3A09%3A31
Request Method: GET
Fell several times , I found the following rules :
frequency | offset | startTime |
---|---|---|
for the first time | 0 | 0 |
The second time | 15 | 2018-10-06 |
third time | 30 | 2018-10-06 |
The first n Time | 15 | 2018-10-05 |
The first n+1 Time | 30 | 2018-10-05 |
You can guess roughly :offset
Indicates that the interface displays the starting position of comments , Every page 15 strip , such as :15, Is displayed 15-30 In the middle 15 comments ; startTime
Indicates the time of the current comment , Fixed format (2018-10-06).
In addition, the last interface %2021%3A09%3A31
It is the same. .
(2) Code acquisition
''' data : 2018.10.06 author : The minimalist XksA goal : Crawling cat's eye 《 Sorrow flows back into a river 》 Film review , Word cloud Visualization '''
# Cat's eye movie introduction url
# http://maoyan.com/films/1217236
import requests
from fake_useragent import UserAgent
import json
headers = {
"User-Agent": UserAgent(verify_ssl=False).random,
"Host":"m.maoyan.com",
"Referer":"http://m.maoyan.com/movie/1217236/comments?_v_=yes"
}
# Cat's eye movie short comment interface
offset = 0
# The movie is 2018.9.21 It's on
startTime = '2018-09-21'
comment_api = 'http://m.maoyan.com/mmdb/comments/movie/1217236.json?_v_=yes&offset={0}&startTime={1}%2021%3A09%3A31'.format(offset,startTime)
# send out get request
response_comment = requests.get(comment_api,headers = headers)
json_comment = response_comment.text
json_comment = json.loads(json_comment)
print(json_comment)
Return the data :
(3) A brief introduction to the data
name | meaning |
---|---|
cityName | Commentator's city |
content | Comment content |
gender | Gender of commentator |
nickName | Reviewer's nickname |
userLevel | Reviewer cat's eye rating |
score | score ( A full score of five stars ) |
(4) Data Extraction
# Get data and store
def get_data(self,json_comment):
json_response = json_comment["cmts"] # list
list_info = []
for data in json_response:
cityName = data["cityName"]
content = data["content"]
if "gender" in data:
gender = data["gender"]
else:
gender = 0
nickName = data["nickName"]
userLevel = data["userLevel"]
score = data["score"]
list_one = [self.time,nickName,gender,cityName,userLevel,score,content]
list_info.append(list_one)
self.file_do(list_info)
3. Store the data
# Storage file
def file_do(list_info):
# Get file size
file_size = os.path.getsize(r'G:\maoyan\maoyan.csv')
if file_size == 0:
# Header
name = [' Comment date ', ' Reviewer's nickname ', ' Gender ', ' city ',' Cat's eye rating ',' score ',' Comment content ']
# establish DataFrame object
file_test = pd.DataFrame(columns=name, data=list_info)
# Data writing
file_test.to_csv(r'G:\maoyan\maoyan.csv', encoding='gbk', index=False)
else:
with open(r'G:\maoyan\maoyan.csv', 'a+', newline='') as file_test:
# Append to file
writer = csv.writer(file_test)
# write file
writer.writerows(list_info)
4. Encapsulates the code
Click to read the original text to get the encapsulated crawling cat's eye movie data code .
There is almost no anti climbing of cat's eye essays , It broke twice in the middle , Change data , Just run it again , Unsealed ip.
5. Running result display
3、 ... and 、 Technology makes things happen ( Data Analysis Visualization )
1. Extract the data
- Code :
def read_csv():
content = ''
# Read file contents
with open(r'G:\maoyan\maoyan.csv', 'r', encoding='utf_8_sig', newline='') as file_test:
# Reading documents
reader = csv.reader(file_test)
i = 0
for row in reader:
if i != 0:
time.append(row[0])
nickName.append(row[1])
gender.append(row[2])
cityName.append(row[3])
userLevel.append(row[4])
score.append(row[5])
content = content + row[6]
# print(row)
i = i + 1
print(' Altogether :' + str(i - 1) + ' Data ')
return content
- Running results :
Altogether :15195 Data
2. Visualization of gender distribution of commentators
- Code :
# Visualization of gender distribution of commentators
def sex_distribution(gender):
# print(gender)
from pyecharts import Pie
list_num = []
list_num.append(gender.count('0')) # Unknown
list_num.append(gender.count('1')) # male
list_num.append(gender.count('2')) # Woman
attr = [" other "," male "," Woman "]
pie = Pie(" Gender pie ")
pie.add("", attr, list_num, is_label_show=True)
pie.render("H:\PyCoding\spider_maoyan\picture\sex_pie.html")
- Running results :
From the data point of view , Most commentators did not indicate gender in their personal information column when registering cats , And men and women , The raters are mainly girls , Also understand , This is originally a comparative literature 、 Minority youth , Girls may prefer , Boys may prefer action movies .
3. Visualization of the distribution of commentators' cities
- Code :
# Visualization of the distribution of commentators' cities
def city_distribution(cityName):
city_list = list(set(cityName))
city_dict = {
city_list[i]:0 for i in range(len(city_list))}
for i in range(len(city_list)):
city_dict[city_list[i]] = cityName.count(city_list[i])
# According to the quantity ( Dictionary key values ) Sort
sort_dict = sorted(city_dict.items(), key=lambda d: d[1], reverse=True)
city_name = []
city_num = []
for i in range(len(sort_dict)):
city_name.append(sort_dict[i][0])
city_num.append(sort_dict[i][1])
import random
from pyecharts import Bar
bar = Bar(" Commentator city distribution ")
bar.add("", city_name, city_num, is_label_show=True, is_datazoom_show=True)
bar.render("H:\PyCoding\spider_maoyan\picture\city_bar.html")
# Map visualization
def render_city(cities):
Click to read the original text to view the complete code of this function
- Running results :
It can be seen from it that , Most movie watchers and raters are located in the southeast of China , City distribution , Shenzhen 、 Chengdu 、 Beijing 、 wuhan 、 Shanghai occupies the top five , Because there are many prefecture level cities in the icon , So the data is not centralized ( The largest is only a few hundred ), We can still see that , These people are mostly distributed in the first and second tier cities , Have the ability to consume , Also willing to spend on holidays , rich , It's just good .
4. Visual analysis of the total number of daily comments
- Code :
# Visual analysis of the total number of daily comments
def time_num_visualization(time):
from pyecharts import Line
time_list = list(set(time))
time_dict = {
time_list[i]: 0 for i in range(len(time_list))}
time_num = []
for i in range(len(time_list)):
time_dict[time_list[i]] = time.count(time_list[i])
# According to the quantity ( Dictionary key values ) Sort
sort_dict = sorted(time_dict.items(), key=lambda d: d[0], reverse=False)
time_name = []
time_num = []
print(sort_dict)
for i in range(len(sort_dict)):
time_name.append(sort_dict[i][0])
time_num.append(sort_dict[i][1])
line = Line(" Comment number date line chart ")
line.add(
" date - comments ",
time_name,
time_num,
is_fill=True,
area_color="#000",
area_opacity=0.3,
is_smooth=True,
)
line.render("H:\PyCoding\spider_maoyan\picture\c_num_line.html")
- Running results :
Due to incomplete data display , Can't see the change in the number of comments very well , But we can basically see that the number of comments every day is 1005, I guess cat's eye limits the number of comments per day , Or I am restricted when I get , from 9.21 Start to 10.6 Of 16 In the day , The number of new comments per day has reached the maximum , It can be said that its heat is not reduced .
5. Reviewer cat's eye rating 、 Scoring Visualization
- Code :
# Reviewer cat's eye rating 、 Scoring Visualization
def level_score_visualization(userLevel,score):
from pyecharts import Pie
userLevel_list = list(set(userLevel))
userLevel_num = []
for i in range(len(userLevel_list)):
userLevel_num.append(userLevel.count(userLevel_list[i]))
score_list = list(set(score))
score_num = []
for i in range(len(score_list)):
score_num.append(score.count(score_list[i]))
pie01 = Pie(" Hierarchical pie chart ", title_pos='center', width=900)
pie01.add(
" Grade ",
userLevel_list,
userLevel_num,
radius=[40, 75],
label_text_color=None,
is_label_show=True,
legend_orient="vertical",
legend_pos="left",
)
pie01.render("H:\PyCoding\spider_maoyan\picture\level_pie.html")
pie02 = Pie(" Score the rose pie ", title_pos='center', width=900)
pie02.add(
" score ",
score_list,
score_num,
center=[50, 50],
is_random=True,
radius=[30, 75],
rosetype="area",
is_legend_show=False,
is_label_show=True,
)
pie02.render("H:\PyCoding\spider_maoyan\picture\score_pie.html")
- Running results :
From the data visualization results, we can see , Among the commentators 47.08% For cat eye secondary users ,31.5% For cat eye level 3 users , Users at level 4 and above account for 11.82%,0 Grade or 1 level ( It can be recognized as a newly registered user ) Occupy 9.6%, It can be seen that the number of people who score is very small , Basically, they are old users of cat's eye , Ratings and comments will not have any objective color .
Judging from the score , Full score of five stars , Score on 3 Stars and above account for 93.8%, Score on 4 Stars and above account for 87.7%, Score on 5 Star ( Full marks ) Occupy 62.82%, It can be seen that everyone is unanimous in their praise of the film .
6. Visual analysis of commentators' comments
- Code :
# Define a functional expression for word segmentation
def jiebaclearText(text):
Click to read the original text to view the complete code of this function
# Generate word cloud
def make_wordcloud(text1):
text1 = text1.replace(" Sorrow flows back into a river ", "")
bg = plt.imread(d + r"/static/znn1.jpg")
# Generate
wc = WordCloud(# FFFAE3
background_color="white", # Set the background to white , Default is black
width=890, # Set the width of the picture
height=600, # Set the height of the picture
mask=bg,
# margin=10, # Set the edge of the picture
max_font_size=150, # Maximum font size displayed
random_state=50, # Return one... For each word PIL Color
font_path=d+'/static/simkai.ttf' # Chinese processing , Use the font provided by the system
).generate_from_text(text1)
# Set the font for the picture
my_font = fm.FontProperties(fname=d+'/static/simkai.ttf')
# Picture background
bg_color = ImageColorGenerator(bg)
# Start drawing
plt.imshow(wc.recolor(color_func=bg_color))
# Remove the axis for the cloud image
plt.axis("off")
# Draw a cloud picture , Show
# Save the cloud
wc.to_file(d+r"/picture/word_cloud.png")
- Figure
- Running results :
On the whole , It's a conscience play , good-looking , It's very nice , Very pretty. , It's super beautiful , Look and cry. , moving , It's worth seeing. … almost 100% The high praise , topic prominence , School violence , Sinister face , It's none of your business. The exposure of the rotten mentality of hanging high , Exhibition , It highlights the impetuous society now , Impetuous atmosphere .
Four 、 What I want to say
First , stay My feelings
I have almost written what I want to say in , It is highly recommended that you go to the cinema to have a look ,《 Sorrow flows back into a river 》 In addition to responding to campus violence , Contemporary 、 high 、 College students' , Even adults are impetuous , Also intentionally or unintentionally reflects the value of friendship in that era , There's even something like 《 I'm not a druggist 》 It also highlights the short details of medical drugs , At least Lu Yao went to find that A male doctor in a small clinic , The male doctor said ” once 100,10 Next time, your pain can be completely relieved “, I still remember Lu Yao's confused eyes , And Lu Yao's mother , It's not a dirty business , It's ordinary for those " decadent " People just press massage , There are many plots , Lu Yao's mother said ” Every time I do business, I deliberately put away your underwear for fear that those garbage will know you “, When Lu Yao was in a hurry to change money, he found the registration fee saved by his mother , From one yuan to 100 Of , So thick , Lu Yao's mother knew that Lu Yao contracted the disease because of his own , Slap yourself in the face , Qi Ming's mother saw Lu Yao's mother holding Lu Yao's surprised eyes … That's too much , Finally, Lu Yao said that ” The murderer who killed Gu senxiang , I don't know who it is , But my killer , You know who it is “, Turn around and run to the sea , I don't know whether it's liberation or stupidity , Only blame us for being timid , We do what others do .
The world has never lacked warmth , But everyone is too , Really? , Too much want to get warm , Small groups , build ’ Gang of four ‘,” Gift giving “… I don't think it's just children playing , Many adults are also making trouble ” play “.
Whether you are a child , junior school student , Senior high school student , College students' , adults , Working , Officials … Or what , Please care for the vulnerable groups around you , Please remember to set a good example for your descendants , Please remember not to “ Plunder because of need ”, I Believe , Although the evil in the world cannot be completely eliminated , however , We can find goodness and beauty as much as possible .
边栏推荐
- Oracle foundation and system table
- C4D quick start tutorial - creating models
- The minimum number of operations to convert strings in leetcode simple problem
- CSAPP homework answers chapter 789
- What are the business processes and differences of the three basic business modes of Vos: direct dial, callback and semi direct dial?
- Global and Chinese market of RF shielding room 2022-2028: Research Report on technology, participants, trends, market size and share
- ucore lab5用户进程管理 实验报告
- MySQL数据库(一)
- [200 opencv routines] 98 Statistical sorting filter
- [C language] twenty two steps to understand the function stack frame (pressing the stack, passing parameters, returning, bouncing the stack)
猜你喜欢
随机推荐
How to do agile testing in automated testing?
Global and Chinese market of DVD recorders 2022-2028: Research Report on technology, participants, trends, market size and share
[HCIA continuous update] working principle of static route and default route
Global and Chinese market of RF shielding room 2022-2028: Research Report on technology, participants, trends, market size and share
STC-B学习板蜂鸣器播放音乐
Réponses aux devoirs du csapp 7 8 9
How to build a nail robot that can automatically reply
软件测试工作太忙没时间学习怎么办?
Mysql database (V) views, stored procedures and triggers
Install and run tensorflow object detection API video object recognition system of Google open source
ucore lab7
CSAPP shell lab experiment report
如何成为一个好的软件测试员?绝大多数人都不知道的秘密
Global and Chinese markets for complex programmable logic devices 2022-2028: Research Report on technology, participants, trends, market size and share
C language do while loop classic Level 2 questions
Example 071 simulates a vending machine, designs a program of the vending machine, runs the program, prompts the user, enters the options to be selected, and prompts the selected content after the use
Cadence physical library lef file syntax learning [continuous update]
Global and Chinese markets of electronic grade hexafluorobutadiene (C4F6) 2022-2028: Research Report on technology, participants, trends, market size and share
Pedestrian re identification (Reid) - Overview
ucore lab5用户进程管理 实验报告