当前位置:网站首页>Visual analysis of data related to crawling cat's eye essays "sadness flows upstream into a river" | the most moving film of Guo Jingming's five years

Visual analysis of data related to crawling cat's eye essays "sadness flows upstream into a river" | the most moving film of Guo Jingming's five years

2022-07-06 15:20:00 Jane said Python

Welcome to WeChat official account. : Jane said Python
account number :xksnh888
Follow the reply :1024, You can get selected e-books for programming learning .

All the source code of this article has been uploaded github, Click here for

One 、 My feelings

know 《 Sorrow flows back into a river 》 It's still on qq Space saw the younger brother send a talk , Suddenly I think of the book of chasing the fourth grade in junior high school , See every night 10 Some more , I saw the gun version yesterday 《 Sorrow flows back into a river 》, The whole story is almost the same as the novel , The only difference is that Yi Yao in the original book committed suicide by jumping off a building , In the movie, Lu Yao is in the public " Tongue gun lip sword "、 Under the eyes of schadenfreude , With unwilling and resentment, he jumped into the river and committed suicide , In the end … I don't know how to play , The whole play lasted about an hour and 40 minutes, and there was no urine in the whole process , I saw the gun version twice yesterday …( I'm going to find someone to watch it again ), It's also the first time , That makes me want to write this article full of Technology + Emotional articles .

Take the cat's eye movie 《 Sorrow flows back into a river 》 Short commentary , Tell you with data 17 God's movie , You deserve to see , It's worth watching twice .

Two 、 Technology makes things happen ( Climb up )

1. Cat's eye movie short comment interface

http://maoyan.com/films/1217236

We visit this directly , stay web You can only see the hottest 10 Short commentary , How can I get all the essays ?
(1) Visit the link above , Press down F12, Then click the icon on the picture , Put browsing mode ( Responsive design patterns , Firefox shortcut Ctrl+Shift+M) Change to mobile mode , Refresh the page .
 First step
 After refresh
(2) Switch to Google browser ,F12 Perform the above operation under , Pull down the short comment after loading , The page continues to load , Find the containing offset and startTime Loading bar , Discover its Response It contains the data we want , by json Format .
 Get the real comment interface

2. Get short comments

(1) Simple analysis

Through the analysis above
Request URL: http://m.maoyan.com/mmdb/comments/movie/1217236.json?v=yes&offset=0&startTime=0%2021%3A09%3A31
Request Method: GET

Fell several times , I found the following rules :

frequency offsetstartTime
for the first time 00
The second time 152018-10-06
third time 302018-10-06
The first n Time 152018-10-05
The first n+1 Time 302018-10-05

You can guess roughly :offset Indicates that the interface displays the starting position of comments , Every page 15 strip , such as :15, Is displayed 15-30 In the middle 15 comments ; startTime Indicates the time of the current comment , Fixed format (2018-10-06).
In addition, the last interface %2021%3A09%3A31 It is the same. .
(2) Code acquisition

''' data : 2018.10.06 author :  The minimalist XksA goal :  Crawling cat's eye 《 Sorrow flows back into a river 》 Film review , Word cloud Visualization  '''

#  Cat's eye movie introduction url
# http://maoyan.com/films/1217236

import requests
from fake_useragent import UserAgent
import json
headers = {
    
	    "User-Agent": UserAgent(verify_ssl=False).random,
	    "Host":"m.maoyan.com",
	    "Referer":"http://m.maoyan.com/movie/1217236/comments?_v_=yes"
	}
#  Cat's eye movie short comment interface 
offset = 0
#  The movie is 2018.9.21 It's on 
startTime = '2018-09-21'
comment_api = 'http://m.maoyan.com/mmdb/comments/movie/1217236.json?_v_=yes&offset={0}&startTime={1}%2021%3A09%3A31'.format(offset,startTime)
#  send out get request 
response_comment = requests.get(comment_api,headers = headers)
json_comment = response_comment.text
json_comment = json.loads(json_comment)
print(json_comment)

Return the data :
json data
(3) A brief introduction to the data

name meaning
cityName Commentator's city
content Comment content
gender Gender of commentator
nickName Reviewer's nickname
userLevel Reviewer cat's eye rating
score score ( A full score of five stars )

(4) Data Extraction

#  Get data and store 
def get_data(self,json_comment):
	json_response = json_comment["cmts"]  #  list 
	list_info = []
	for data in json_response:
		cityName = data["cityName"]
		content = data["content"]
		if "gender" in data:
			gender = data["gender"]
		else:
			gender = 0
		nickName = data["nickName"]
		userLevel = data["userLevel"]
		score = data["score"]
		list_one = [self.time,nickName,gender,cityName,userLevel,score,content]
		list_info.append(list_one)
	self.file_do(list_info)

3. Store the data

#  Storage file 
def file_do(list_info):
	#  Get file size 
	file_size = os.path.getsize(r'G:\maoyan\maoyan.csv')
	if file_size == 0:
		#  Header 
		name = [' Comment date ', ' Reviewer's nickname ', ' Gender ', ' city ',' Cat's eye rating ',' score ',' Comment content ']
		#  establish DataFrame object 
		file_test = pd.DataFrame(columns=name, data=list_info)
		#  Data writing 
		file_test.to_csv(r'G:\maoyan\maoyan.csv', encoding='gbk', index=False)
	else:
		with open(r'G:\maoyan\maoyan.csv', 'a+', newline='') as file_test:
			#  Append to file 
			writer = csv.writer(file_test)
			#  write file 
			writer.writerows(list_info)

4. Encapsulates the code

Click to read the original text to get the encapsulated crawling cat's eye movie data code .
There is almost no anti climbing of cat's eye essays , It broke twice in the middle , Change data , Just run it again , Unsealed ip.

5. Running result display

 Get data display

3、 ... and 、 Technology makes things happen ( Data Analysis Visualization )

1. Extract the data

  • Code :
def read_csv():
	content = ''
	#  Read file contents 
	with open(r'G:\maoyan\maoyan.csv', 'r', encoding='utf_8_sig', newline='') as file_test:
		#  Reading documents 
		reader = csv.reader(file_test)
		i = 0
		for row in reader:
			if i != 0:
				time.append(row[0])
				nickName.append(row[1])
				gender.append(row[2])
				cityName.append(row[3])
				userLevel.append(row[4])
				score.append(row[5])
				content = content + row[6]
				# print(row)
			i = i + 1
		print(' Altogether :' + str(i - 1) + ' Data ')
		return content
  • Running results :
 Altogether :15195 Data 

2. Visualization of gender distribution of commentators

  • Code :
#  Visualization of gender distribution of commentators 
def sex_distribution(gender):
	# print(gender)
	from pyecharts import Pie
	list_num = []
	list_num.append(gender.count('0')) #  Unknown 
	list_num.append(gender.count('1')) #  male 
	list_num.append(gender.count('2')) #  Woman 
	attr = [" other "," male "," Woman "]
	pie = Pie(" Gender pie ")
	pie.add("", attr, list_num, is_label_show=True)
	pie.render("H:\PyCoding\spider_maoyan\picture\sex_pie.html")
  • Running results :

 Gender distribution

From the data point of view , Most commentators did not indicate gender in their personal information column when registering cats , And men and women , The raters are mainly girls , Also understand , This is originally a comparative literature 、 Minority youth , Girls may prefer , Boys may prefer action movies .

3. Visualization of the distribution of commentators' cities

  • Code :
#  Visualization of the distribution of commentators' cities 
def city_distribution(cityName):
	city_list = list(set(cityName))
	city_dict = {
    city_list[i]:0 for i in range(len(city_list))}
	for i in range(len(city_list)):
		city_dict[city_list[i]] = cityName.count(city_list[i])
	#  According to the quantity ( Dictionary key values ) Sort 
	sort_dict = sorted(city_dict.items(), key=lambda d: d[1], reverse=True)
	city_name = []
	city_num = []
	for i in range(len(sort_dict)):
		city_name.append(sort_dict[i][0])
		city_num.append(sort_dict[i][1])
	
	import random
	from pyecharts import Bar
	bar = Bar(" Commentator city distribution ")
	bar.add("", city_name, city_num, is_label_show=True, is_datazoom_show=True)
	bar.render("H:\PyCoding\spider_maoyan\picture\city_bar.html")

#  Map visualization 
def render_city(cities):
	  Click to read the original text to view the complete code of this function 
  • Running results :

 Histogram city distribution
 Geographical distribution

It can be seen from it that , Most movie watchers and raters are located in the southeast of China , City distribution , Shenzhen 、 Chengdu 、 Beijing 、 wuhan 、 Shanghai occupies the top five , Because there are many prefecture level cities in the icon , So the data is not centralized ( The largest is only a few hundred ), We can still see that , These people are mostly distributed in the first and second tier cities , Have the ability to consume , Also willing to spend on holidays , rich , It's just good .

4. Visual analysis of the total number of daily comments

  • Code :
#  Visual analysis of the total number of daily comments 
def time_num_visualization(time):
	from pyecharts import Line
	time_list = list(set(time))
	time_dict = {
    time_list[i]: 0 for i in range(len(time_list))}
	time_num = []
	for i in range(len(time_list)):
		time_dict[time_list[i]] = time.count(time_list[i])
	#  According to the quantity ( Dictionary key values ) Sort 
	sort_dict = sorted(time_dict.items(), key=lambda d: d[0], reverse=False)
	time_name = []
	time_num = []
	print(sort_dict)
	for i in range(len(sort_dict)):
		time_name.append(sort_dict[i][0])
		time_num.append(sort_dict[i][1])
			
	line = Line(" Comment number date line chart ")
	line.add(
		" date - comments ",
		time_name,
		time_num,
		is_fill=True,
		area_color="#000",
		area_opacity=0.3,
		is_smooth=True,
	)
	line.render("H:\PyCoding\spider_maoyan\picture\c_num_line.html")
  • Running results :

 Line chart of daily comments

Due to incomplete data display , Can't see the change in the number of comments very well , But we can basically see that the number of comments every day is 1005, I guess cat's eye limits the number of comments per day , Or I am restricted when I get , from 9.21 Start to 10.6 Of 16 In the day , The number of new comments per day has reached the maximum , It can be said that its heat is not reduced .

5. Reviewer cat's eye rating 、 Scoring Visualization

  • Code :
#  Reviewer cat's eye rating 、 Scoring Visualization 
def level_score_visualization(userLevel,score):
	from pyecharts import Pie
	userLevel_list = list(set(userLevel))
	userLevel_num = []
	for i in range(len(userLevel_list)):
		userLevel_num.append(userLevel.count(userLevel_list[i]))
	
	score_list = list(set(score))
	score_num = []
	for i in range(len(score_list)):
		score_num.append(score.count(score_list[i]))
		
	pie01 = Pie(" Hierarchical pie chart ", title_pos='center', width=900)
	pie01.add(
		" Grade ",
		userLevel_list,
		userLevel_num,
		radius=[40, 75],
		label_text_color=None,
		is_label_show=True,
		legend_orient="vertical",
		legend_pos="left",
	)
	pie01.render("H:\PyCoding\spider_maoyan\picture\level_pie.html")
	pie02 = Pie(" Score the rose pie ", title_pos='center', width=900)
	pie02.add(
		" score ",
		score_list,
		score_num,
		center=[50, 50],
		is_random=True,
		radius=[30, 75],
		rosetype="area",
		is_legend_show=False,
		is_label_show=True,
	)
	pie02.render("H:\PyCoding\spider_maoyan\picture\score_pie.html")
  • Running results :

 Hierarchical distribution
 Score distribution

From the data visualization results, we can see , Among the commentators 47.08% For cat eye secondary users ,31.5% For cat eye level 3 users , Users at level 4 and above account for 11.82%,0 Grade or 1 level ( It can be recognized as a newly registered user ) Occupy 9.6%, It can be seen that the number of people who score is very small , Basically, they are old users of cat's eye , Ratings and comments will not have any objective color .
Judging from the score , Full score of five stars , Score on 3 Stars and above account for 93.8%, Score on 4 Stars and above account for 87.7%, Score on 5 Star ( Full marks ) Occupy 62.82%, It can be seen that everyone is unanimous in their praise of the film .

6. Visual analysis of commentators' comments

  • Code :
# Define a functional expression for word segmentation 
def jiebaclearText(text):
     Click to read the original text to view the complete code of this function 

#  Generate word cloud 
def make_wordcloud(text1):
	text1 = text1.replace(" Sorrow flows back into a river ", "")
	bg = plt.imread(d + r"/static/znn1.jpg")
	#  Generate 
	wc = WordCloud(# FFFAE3
		background_color="white",  #  Set the background to white , Default is black 
		width=890,  #  Set the width of the picture 
		height=600,  #  Set the height of the picture 
		mask=bg,
		# margin=10, #  Set the edge of the picture 
		max_font_size=150,  #  Maximum font size displayed 
		random_state=50,  #  Return one... For each word PIL Color 
		font_path=d+'/static/simkai.ttf'  #  Chinese processing , Use the font provided by the system 
	).generate_from_text(text1)
	#  Set the font for the picture 
	my_font = fm.FontProperties(fname=d+'/static/simkai.ttf')
	#  Picture background 
	bg_color = ImageColorGenerator(bg)
	#  Start drawing 
	plt.imshow(wc.recolor(color_func=bg_color))
	#  Remove the axis for the cloud image 
	plt.axis("off")
	#  Draw a cloud picture , Show 
	#  Save the cloud 
	wc.to_file(d+r"/picture/word_cloud.png")
  • Figure

 I really like , The movie is full of mobile phone laughter , The feeling of first love

  • Running results :

 Clouds of words

On the whole , It's a conscience play , good-looking , It's very nice , Very pretty. , It's super beautiful , Look and cry. , moving , It's worth seeing. … almost 100% The high praise , topic prominence , School violence , Sinister face , It's none of your business. The exposure of the rotten mentality of hanging high , Exhibition , It highlights the impetuous society now , Impetuous atmosphere .

Four 、 What I want to say

First , stay My feelings I have almost written what I want to say in , It is highly recommended that you go to the cinema to have a look ,《 Sorrow flows back into a river 》 In addition to responding to campus violence , Contemporary 、 high 、 College students' , Even adults are impetuous , Also intentionally or unintentionally reflects the value of friendship in that era , There's even something like 《 I'm not a druggist 》 It also highlights the short details of medical drugs , At least Lu Yao went to find that A male doctor in a small clinic , The male doctor said ” once 100,10 Next time, your pain can be completely relieved “, I still remember Lu Yao's confused eyes , And Lu Yao's mother , It's not a dirty business , It's ordinary for those " decadent " People just press massage , There are many plots , Lu Yao's mother said ” Every time I do business, I deliberately put away your underwear for fear that those garbage will know you “, When Lu Yao was in a hurry to change money, he found the registration fee saved by his mother , From one yuan to 100 Of , So thick , Lu Yao's mother knew that Lu Yao contracted the disease because of his own , Slap yourself in the face , Qi Ming's mother saw Lu Yao's mother holding Lu Yao's surprised eyes … That's too much , Finally, Lu Yao said that ” The murderer who killed Gu senxiang , I don't know who it is , But my killer , You know who it is “, Turn around and run to the sea , I don't know whether it's liberation or stupidity , Only blame us for being timid , We do what others do .
The world has never lacked warmth , But everyone is too , Really? , Too much want to get warm , Small groups , build ’ Gang of four ‘,” Gift giving “… I don't think it's just children playing , Many adults are also making trouble ” play “.
Whether you are a child , junior school student , Senior high school student , College students' , adults , Working , Officials … Or what , Please care for the vulnerable groups around you , Please remember to set a good example for your descendants , Please remember not to “ Plunder because of need ”, I Believe , Although the evil in the world cannot be completely eliminated , however , We can find goodness and beauty as much as possible .

原网站

版权声明
本文为[Jane said Python]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202131319309485.html

随机推荐