当前位置:网站首页>Beijing rental data analysis
Beijing rental data analysis
2022-07-02 15:26:00 【Little doll】
1. Basic data processing
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pylab import mpl # Set display Chinese font
mpl.rcParams["font.sans-serif"] = ["SimHei"]
file_data=pd.read_csv("./data/ Lianjia Beijing rental data .csv")
file_data.head(10)


1.1 Processing of duplicate values and null values
# duplicate value
#file_data.duplicated()
# Delete rows with duplicate values
file_data=file_data.drop_duplicates()
# Null processing
file_data=file_data.dropna()

1.2 Data type conversion
1.2.1 Area data type conversion

# Single value implementation
file_data[" area (㎡)"].values[0][:-2]

# Create an empty array
data_new=np.array([])
data_area=file_data[" area (㎡)"].values
for i in data_area:
data_new=np.append(data_new,np.array(i[:-2]))
data_new
# transformation data_new Data type of
data_new=data_new.astype(np.float64)
data_new
file_data.loc[:," area (㎡)"]=data_new
file_data

1.2.2 House type expression replacement
file_data.head()

house_data=file_data[" House type "]
house_data.head()
temp_list=[]
for i in house_data:
new_info=i.replace(" room "," room ")
temp_list.append(new_info)
temp_list

file_data.loc[:," House type "]=temp_list
file_data

1.3 Chart analysis
1.3.1 Number of houses available , Location distribution analysis
file_data[" Area "].unique() # Find out all the locations of the listing information

new_df=pd.DataFrame({
" Area ":file_data[" Area "].unique()," Number ":0*13})

# Get the number of listings in each area
area_count=file_data.groupby(by=" Area ").count()
area_count

new_df[" Number "]=area_count.values
new_df.sort_values(by=" Number ",ascending=False)

1.3.2 Analysis of house type quantity
house_data=file_data[" House type "]
house_data.head()

def all_house(arr):
key=np.unique(arr)
result={
}
for k in key:
mask=(arr==k)
arr_new=arr[mask]
v=arr_new.size
result[k]=v
return result
housr_info=all_house(house_data)
np.unique(house_data)

# Remove the value with smaller statistical quantity
housr_data=dict((key,value) for key,value in housr_info.items() if value>50)
house_data.head()

show_houses=pd.DataFrame({
" House type ": [x for x in house_data.values],
" Number ": [x for x in house_data.keys()]})
show_houses=show_houses.head(11)
show_houses

# Graphic display of house type
house_type=show_houses[" House type "]
house_type_num=show_houses[" Number "]
plt.barh(range(11),house_type_num)
plt.yticks(range(11),house_type)
plt.xlim(0,20)
plt.title(" Statistics on the number of rental houses in various regions of Beijing ")
plt.xlabel(" Number ")
plt.ylabel(" type ")
# Add specific numbers to each bar
for x,y in enumerate(house_type_num):
plt.text(y+0.5,x-0.2,"%s" %y)
plt.show()

1.3.3 Average rent analysis
df_all=pd.DataFrame({
" Area ":file_data[" Area "].unique(),
" Total rent ":[0]*13,
" Total area ":[0]*13})
df_all


sum_price=file_data[" Price ( element / month )"].groupby(file_data[" Area "]).sum()
sum_area=file_data[" area (㎡)"].groupby(file_data[" Area "]).sum()
sum_price
sum_area

df_all[" Total rent "]=sum_price.values
df_all[" Total area "]=sum_area.values

# Calculate the rent per square meter of each area
df_all[" Rent per square meter ( element )"]=round(df_all[" Total rent "]/df_all[" Total area "],2) # 2 Express Keep two decimal places

df_merge=pd.merge(new_df,df_all) # Merge two tables

# Graphic visualization
num=df_merge[" Number "]
price=df_all[" Rent per square meter ( element )"]
lx=df_merge[" Area "]
l=[i for i in range(13)]
# Create a canvas
fig=plt.figure(figsize=(10,8),dpi=100)
# Show line chart
ax1=fig.add_subplot(111)
ax1.plot(l,price,"or-",label=" Price ")
for i,(_x,_y) in enumerate(zip(l,price)):
plt.text(_x+0.2,_y,price[i])
ax1.set_ylim([0,160])
ax1.set_ylabel(" Price ")
plt.legend(loc="upper right")
# Show bar chart
ax2=ax1.twinx()
plt.bar(l,num,label=" Number ",alpha=0.2,color="green")
ax2.set_ylabel(" Number ")
plt.legend(loc="upper left")
plt.xticks(l,lx)
plt.show()

1.3.4 Area interval analysis
# Check the maximum and minimum area of the house
print(' The largest area of the house is %d Square meters '%(file_data[' area (㎡)'].max()))
print(' The minimum area of the house is %d Square meters '%(file_data[' area (㎡)'].min()))
# Check the maximum and minimum rent
print(' The maximum rent is per month %d element '%(file_data[' Price ( element / month )'].max()))
print(' The minimum price of the house is per month %d element '%(file_data[' Price ( element / month )'].min()))

# Area division
area_divide=[1,30,50,70,90,120,140,160,1200]
area_cut=pd.cut(list(file_data[" area (㎡)"]),area_divide)
area_cut

area_cut_num=area_cut.describe()
area_cut_num

# Image visualization
area_per=(area_cut_num["freqs"].values)*100
labels = ['30 Below square meters ', '30-50 Square meters ', '50-70 Square meters ', '70-90 Square meters ', '90-120 Square meters ','120-140 Square meters ','140-160 Square meters ','160 Square meters or more ']
plt.figure(figsize=(20,8),dpi=100)
plt.axes(aspect=1)
plt.pie(x=area_per,labels=labels,autopct="%.2f %%")
plt.legend()
plt.show()
area_per

边栏推荐
- Yolo format data set processing (XML to txt)
- 04_ 栈
- How to test tidb with sysbench
- Record an error report, solve the experience, rely on repetition
- Application and practice of Jenkins pipeline
- 05_队列
- The traversal methods of binary tree mainly include: first order traversal, middle order traversal, second order traversal, and hierarchical traversal. First order, middle order, and second order actu
- 记一次面试
- Deploy tidb cluster with tiup
- AtCoder Beginner Contest 254
猜你喜欢

I made an istio workshop. This is the first introduction

06_ Stack and queue conversion

18_Redis_Redis主从复制&&集群搭建

21_Redis_浅析Redis缓存穿透和雪崩

05_ queue

Btrace- (bytecode) dynamic tracking tool

Be a good gatekeeper on the road of anti epidemic -- infrared thermal imaging temperature detection system based on rk3568

18_ Redis_ Redis master-slave replication & cluster building

Leetcode - Search 2D matrix

Practical debugging skills
随机推荐
面对“缺芯”挑战,飞凌如何为客户产能提供稳定强大的保障?
HUSTPC2022
XML配置文件
17_Redis_Redis发布订阅
GeoServer offline map service construction and layer Publishing
学习使用php将时间戳转换为大写日期的方法代码示例
N皇后问题的解决
04_ Stack
Recommended configuration of tidb software and hardware environment
Tidb environment and system configuration check
[C language] explain the initial and advanced levels of the pointer and points for attention (1)
10_ Redis_ geospatial_ command
04_ 栈
FPGA - clock-03-clock management module (CMT) of internal structure of 7 Series FPGA
How does the computer set up speakers to play microphone sound
学习使用php实现公历农历转换的方法代码
Tidb hybrid deployment topology
Tidb cross data center deployment topology
基于RZ/G2L | OK-G2LD-C开发板存储读写速度与网络实测
[noi simulation] Elis (greedy, simulation)