当前位置:网站首页>Beijing rental data analysis
Beijing rental data analysis
2022-07-02 15:26:00 【Little doll】
1. Basic data processing
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pylab import mpl # Set display Chinese font
mpl.rcParams["font.sans-serif"] = ["SimHei"]
file_data=pd.read_csv("./data/ Lianjia Beijing rental data .csv")
file_data.head(10)
1.1 Processing of duplicate values and null values
# duplicate value
#file_data.duplicated()
# Delete rows with duplicate values
file_data=file_data.drop_duplicates()
# Null processing
file_data=file_data.dropna()
1.2 Data type conversion
1.2.1 Area data type conversion
# Single value implementation
file_data[" area (㎡)"].values[0][:-2]
# Create an empty array
data_new=np.array([])
data_area=file_data[" area (㎡)"].values
for i in data_area:
data_new=np.append(data_new,np.array(i[:-2]))
data_new
# transformation data_new Data type of
data_new=data_new.astype(np.float64)
data_new
file_data.loc[:," area (㎡)"]=data_new
file_data
1.2.2 House type expression replacement
file_data.head()
house_data=file_data[" House type "]
house_data.head()
temp_list=[]
for i in house_data:
new_info=i.replace(" room "," room ")
temp_list.append(new_info)
temp_list
file_data.loc[:," House type "]=temp_list
file_data
1.3 Chart analysis
1.3.1 Number of houses available , Location distribution analysis
file_data[" Area "].unique() # Find out all the locations of the listing information
new_df=pd.DataFrame({
" Area ":file_data[" Area "].unique()," Number ":0*13})
# Get the number of listings in each area
area_count=file_data.groupby(by=" Area ").count()
area_count
new_df[" Number "]=area_count.values
new_df.sort_values(by=" Number ",ascending=False)
1.3.2 Analysis of house type quantity
house_data=file_data[" House type "]
house_data.head()
def all_house(arr):
key=np.unique(arr)
result={
}
for k in key:
mask=(arr==k)
arr_new=arr[mask]
v=arr_new.size
result[k]=v
return result
housr_info=all_house(house_data)
np.unique(house_data)
# Remove the value with smaller statistical quantity
housr_data=dict((key,value) for key,value in housr_info.items() if value>50)
house_data.head()
show_houses=pd.DataFrame({
" House type ": [x for x in house_data.values],
" Number ": [x for x in house_data.keys()]})
show_houses=show_houses.head(11)
show_houses
# Graphic display of house type
house_type=show_houses[" House type "]
house_type_num=show_houses[" Number "]
plt.barh(range(11),house_type_num)
plt.yticks(range(11),house_type)
plt.xlim(0,20)
plt.title(" Statistics on the number of rental houses in various regions of Beijing ")
plt.xlabel(" Number ")
plt.ylabel(" type ")
# Add specific numbers to each bar
for x,y in enumerate(house_type_num):
plt.text(y+0.5,x-0.2,"%s" %y)
plt.show()
1.3.3 Average rent analysis
df_all=pd.DataFrame({
" Area ":file_data[" Area "].unique(),
" Total rent ":[0]*13,
" Total area ":[0]*13})
df_all
sum_price=file_data[" Price ( element / month )"].groupby(file_data[" Area "]).sum()
sum_area=file_data[" area (㎡)"].groupby(file_data[" Area "]).sum()
sum_price
sum_area
df_all[" Total rent "]=sum_price.values
df_all[" Total area "]=sum_area.values
# Calculate the rent per square meter of each area
df_all[" Rent per square meter ( element )"]=round(df_all[" Total rent "]/df_all[" Total area "],2) # 2 Express Keep two decimal places
df_merge=pd.merge(new_df,df_all) # Merge two tables
# Graphic visualization
num=df_merge[" Number "]
price=df_all[" Rent per square meter ( element )"]
lx=df_merge[" Area "]
l=[i for i in range(13)]
# Create a canvas
fig=plt.figure(figsize=(10,8),dpi=100)
# Show line chart
ax1=fig.add_subplot(111)
ax1.plot(l,price,"or-",label=" Price ")
for i,(_x,_y) in enumerate(zip(l,price)):
plt.text(_x+0.2,_y,price[i])
ax1.set_ylim([0,160])
ax1.set_ylabel(" Price ")
plt.legend(loc="upper right")
# Show bar chart
ax2=ax1.twinx()
plt.bar(l,num,label=" Number ",alpha=0.2,color="green")
ax2.set_ylabel(" Number ")
plt.legend(loc="upper left")
plt.xticks(l,lx)
plt.show()
1.3.4 Area interval analysis
# Check the maximum and minimum area of the house
print(' The largest area of the house is %d Square meters '%(file_data[' area (㎡)'].max()))
print(' The minimum area of the house is %d Square meters '%(file_data[' area (㎡)'].min()))
# Check the maximum and minimum rent
print(' The maximum rent is per month %d element '%(file_data[' Price ( element / month )'].max()))
print(' The minimum price of the house is per month %d element '%(file_data[' Price ( element / month )'].min()))
# Area division
area_divide=[1,30,50,70,90,120,140,160,1200]
area_cut=pd.cut(list(file_data[" area (㎡)"]),area_divide)
area_cut
area_cut_num=area_cut.describe()
area_cut_num
# Image visualization
area_per=(area_cut_num["freqs"].values)*100
labels = ['30 Below square meters ', '30-50 Square meters ', '50-70 Square meters ', '70-90 Square meters ', '90-120 Square meters ','120-140 Square meters ','140-160 Square meters ','160 Square meters or more ']
plt.figure(figsize=(20,8),dpi=100)
plt.axes(aspect=1)
plt.pie(x=area_per,labels=labels,autopct="%.2f %%")
plt.legend()
plt.show()
area_per
边栏推荐
- vChain: Enabling Verifiable Boolean Range Queries over Blockchain Databases(sigmod‘2019)
- Huffman tree: (1) input each character and its weight (2) construct Huffman tree (3) carry out Huffman coding (4) find hc[i], and get the Huffman coding of each character
- . Net core logging system
- 06_栈和队列转换
- 损失函数与正负样本分配:YOLO系列
- Facing the challenge of "lack of core", how can Feiling provide a stable and strong guarantee for customers' production capacity?
- LeetCode_ Sliding window_ Medium_ 395. Longest substring with at least k repeated characters
- I made an istio workshop. This is the first introduction
- 14_ Redis_ Optimistic lock
- Equipped with Ti am62x processor, Feiling fet6254-c core board is launched!
猜你喜欢
[noi Simulation Competition] scraping (dynamic planning)
让您的HMI更具优势,FET-G2LD-C核心板是个好选择
08_ 串
.NET Core 日志系统
JVM architecture, classloader, parental delegation mechanism
How to avoid 7 common problems in mobile and network availability testing
21_Redis_浅析Redis缓存穿透和雪崩
Download blender on Alibaba cloud image station
YOLOV5 代码复现以及搭载服务器运行
Btrace- (bytecode) dynamic tracking tool
随机推荐
2021-2022学年编译原理考试重点[华侨大学]
JVM architecture, classloader, parental delegation mechanism
6.12 企业内部upp平台(Unified Process Platform)的关键一刻
记一次面试
06_ Stack and queue conversion
03_线性表_链表
面对“缺芯”挑战,飞凌如何为客户产能提供稳定强大的保障?
党史纪实主题公益数字文创产品正式上线
TiDB 环境与系统配置检查
Markdown tutorial
Equipped with Ti am62x processor, Feiling fet6254-c core board is launched!
Tidb cross data center deployment topology
10_ Redis_ geospatial_ command
. Net again! Happy 20th birthday
Be a good gatekeeper on the road of anti epidemic -- infrared thermal imaging temperature detection system based on rk3568
How to solve the problem of database content output
Tidb data migration scenario overview
12_Redis_Bitmap_命令
20_Redis_哨兵模式
Table responsive layout tips