当前位置:网站首页>Data analysis course notes (III) array shape and calculation, numpy storage / reading data, indexing, slicing and splicing
Data analysis course notes (III) array shape and calculation, numpy storage / reading data, indexing, slicing and splicing
2022-07-07 00:21:00 【M Walker x】
Data analysis course notes
The shape of the array
The calculation of array
Calculate in different dimensions
Broadcasting principles
Axis (axis)
numpy Reading data
np.loadtxt(fname,dtype=np.float,delimiter=None,skiprows=0,usecols=None,unpack=False)
Data sources : https://www.kaggle.com/datasnaek/youtube/data
# coding=utf-8
import numpy as np
us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"
# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t2 = np.loadtxt(us_file_path,delimiter=",",dtype="int")
# print(t1)
print(t2)
print("*"*100)
# Take row
# print(t2[2])
# Take consecutive multiple lines
# print(t2[2:])
# Take discontinuous multiple lines
# print(t2[[2,8,10]])
# print(t2[1,:])
# print(t2[2:,:])
# print(t2[[2,10,3],:])
# Fetch
# print(t2[:,0])
# Take consecutive Columns
# print(t2[:,2:])
# Take discontinuous multiple columns
# print(t2[:,[0,2]])
# Go to rows and columns , Take the first place 3 That's ok , The value of the fourth column
# a = t2[2,3]
# print(a)
# print(type(a))
# Fetching multiple rows and columns , Take the first place 3 Line to line five , The first 2 Column to the first 4 The results of the column
# Go to the intersection of rows and columns
b = t2[2:5,1:4]
# print(b)
# Take multiple non adjacent points
# The result is (0,0) (2,1) (2,3)
c = t2[[0,2,2],[0,1,3]]
print(c)
numpy Index and slice
numpy Boolean index in
numpy Ternary operator in
numpy Medium clip( tailoring )
numpy Medium nan and inf
numpy Medium nan Points for attention
numpy Statistical functions commonly used in
All statistical results of multidimensional array are returned by default , If specified axis Returns a result on the current axis
Missing value processing
# coding=utf-8
import numpy as np
# print(t1)
def fill_ndarray(t1):
for i in range(t1.shape[1]): # Traverse each column
temp_col = t1[:,i] # The current column
nan_num = np.count_nonzero(temp_col!=temp_col)
if nan_num !=0: # Not for 0, Indicates that there are... In the current column nan
temp_not_nan_col = temp_col[temp_col==temp_col] # The current column is not nan Of array
# Check that the current is nan The location of , Assign a value that is not nan The average of
temp_col[np.isnan(temp_col)] = temp_not_nan_col.mean()
return t1
if __name__ == '__main__':
t1 = np.arange(24).reshape((4, 6)).astype("float")
t1[1, 2:] = np.nan
print(t1)
t1 = fill_ndarray(t1)
print(t1)
- How to select one or more rows of data ( Column )?
- How to assign values to selected rows or columns ?
- How to make it bigger than 10 Replace the value of with 10?
- np.where How to use ?
- np.clip How to use ?
- How to transpose ( Exchange axis )?
- Read and save data as csv
- np.nan and np.inf What is it?
- How many common statistical functions do you remember ?
- What information does the standard deviation reflect about the data
#### numpy Index and slice of
- t[10,20]
- `t[[2,5],[4,8]]`
- t[3:]
- t[[2,5,6]]
- t[:,:4]
- t[:,[2,5,6]]
- t[2:3,5:7]
#### numpy Medium bool Indexes ,where,clip Use
- t[t<30] = 2
- np.where(t<10,20,5)
- t.clip(10,20)
#### Transpose and read local files
- t.T
- t.transpose()
- t.sawpaxes()
- np.loadtxt(file_path,delimiter,dtype)
#### nan and inf What is it?
- nan not a number
- np.nan != np.nan
- Any value sum nan All calculations are nan
- inf infinite
#### Commonly used statistical functions
- t.sum()
- t.mean()
- np.meadian()
- t.max()
- t.min()
- np.ptp()
- t.std()
import numpy as np
from matplotlib import pyplot as plt
us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"
# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t_us = np.loadtxt(us_file_path,delimiter=",",dtype="int")
# Take the data of the comment
t_us_comments = t_us[:,-1]
# Choose more than 5000 Small data
t_us_comments = t_us_comments[t_us_comments<=5000]
print(t_us_comments.max(),t_us_comments.min())
d = 50
bin_nums = (t_us_comments.max()-t_us_comments.min())//d
# mapping
plt.figure(figsize=(20,8),dpi=80)
plt.hist(t_us_comments,bin_nums)
plt.show()
import numpy as np
from matplotlib import pyplot as plt
us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"
# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t_uk = np.loadtxt(uk_file_path,delimiter=",",dtype="int")
# Choose to like books better than 50 Ten thousand small data
t_uk = t_uk[t_uk[:,1]<=500000]
t_uk_comment = t_uk[:,-1]
t_uk_like = t_uk[:,1]
plt.figure(figsize=(20,8),dpi=80)
plt.scatter(t_uk_like,t_uk_comment)
plt.show()
Data splicing
Data row column exchange
Now I hope to study and analyze the data methods of the two countries in the previous case , So what should I do ?
# coding=utf-8
import numpy as np
us_data = "./youtube_video_data/US_video_data_numbers.csv"
uk_data = "./youtube_video_data/GB_video_data_numbers.csv"
# Load country data
us_data = np.loadtxt(us_data,delimiter=",",dtype=int)
uk_data = np.loadtxt(uk_data,delimiter=",",dtype=int)
# Add country information
# The structure is all 0 The data of
zeros_data = np.zeros((us_data.shape[0],1)).astype(int)
ones_data = np.ones((uk_data.shape[0],1)).astype(int)
# Add a column with all 0,1 Array of
us_data = np.hstack((us_data,zeros_data))
uk_data = np.hstack((uk_data,ones_data))
# Splice two sets of data
final_data = np.vstack((us_data,uk_data))
print(final_data)
More easy-to-use methods
numpy Generate random number
# coding=utf-8
import numpy as np
np.random.seed(10)
t = np.random.randint(0,20,(3,4))
print(t)
numpy Points for attention copy and view
边栏推荐
- web渗透测试是什么_渗透实战
- Cas d'essai fonctionnel universel de l'application
- 什么是响应式对象?响应式对象的创建过程?
- DAY FOUR
- 【vulnhub】presidential1
- Use source code compilation to install postgresql13.3 database
- Introduction au GPIO
- Everyone is always talking about EQ, so what is EQ?
- Supersocket 1.6 creates a simple socket server with message length in the header
- Wind chime card issuing network source code latest version - commercially available
猜你喜欢
刘永鑫报告|微生物组数据分析与科学传播(晚7点半)
从外企离开,我才知道什么叫尊重跟合规…
Uniapp uploads and displays avatars locally, and converts avatars into Base64 format and stores them in MySQL database
Wind chime card issuing network source code latest version - commercially available
DAY FIVE
VTK volume rendering program design of 3D scanned volume data
uniapp中redirectTo和navigateTo的区别
[2022 the finest in the whole network] how to test the interface test generally? Process and steps of interface test
St table
DAY THREE
随机推荐
刘永鑫报告|微生物组数据分析与科学传播(晚7点半)
SQL的一种写法,匹配就更新,否则就是插入
System activity monitor ISTAT menus 6.61 (1185) Chinese repair
Use source code compilation to install postgresql13.3 database
MIT 6.824 - raft Student Guide
沉浸式投影在线下展示中的三大应用特点
GPIO简介
互动滑轨屏演示能为企业展厅带来什么
Leecode brushes questions and records interview questions 01.02 Determine whether it is character rearrangement for each other
js导入excel&导出excel
在Docker中分分钟拥有Oracle EMCC 13.5环境
How to answer the dualistic opposition of Zhihu
[CVPR 2022] semi supervised object detection: dense learning based semi supervised object detection
1000字精选 —— 接口测试基础
VTK volume rendering program design of 3D scanned volume data
在docker中快速使用各个版本的PostgreSQL数据库
What is web penetration testing_ Infiltration practice
Use package FY in Oracle_ Recover_ Data. PCK to recover the table of truncate misoperation
JS import excel & Export Excel
What is a responsive object? How to create a responsive object?