当前位置:网站首页>Machine learning - Data Science Library Day 3 - Notes
Machine learning - Data Science Library Day 3 - Notes
2022-07-01 12:04:00 【weixin_ forty-five million six hundred and forty-nine thousand 】
Catalog
What is? numpy
In a Python Basic database of scientific computing in China , Focus on numerical calculation , Most of them PYTHON Basic database of scientific computing database , Mostly used in large 、 Perform numeric operations on multidimensional arrays
Axis (axis)
stay numpy Can be understood as direction , Use 0,1,2… Digital representation , For a one-dimensional array , only one 0 Axis , about 2 Dimension group (shape(2,2)), Yes 0 Axis and 1 Axis , For three-dimensional arrays (shape(2,2, 3)), Yes 0,1,2 Axis
With the concept of axis , It will be more convenient for us to calculate , For example, calculate a 2 The average of the set of dimensions , You must specify which direction to calculate the average of the numbers above
Create array :
Modify the shape of the array
Inter array operation
Transpose matrix
numpy Reading data
CSV:Comma-Separated Value, Comma separated value files
Show : Form status
Source file : Formatted text with newline and comma separated rows and columns , Each row of data represents a record
because csv Easy to show , Read and write , So it's also used in many places csv Storage and transmission of small and medium-sized data , For the convenience of teaching , We will often operate csv File format , But it is also easy to operate the data in the database
numpy Reading data
numpy Read and store data
# coding=utf-8
import numpy as np
us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"
# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t2 = np.loadtxt(us_file_path,delimiter=",",dtype="int")
# print(t1)
print(t2)
print("*"*100)
b = t2[2:5,1:4]
# print(b)
# Take multiple non adjacent points
# The result is (0,0) (2,1) (2,3)
c = t2[[0,2,2],[0,1,3]]
print(c)
Running results :
numpy Boolean index in
numpy The ternary operator in
numpy Medium nan and inf
nan(NAN,Nan):not a number It's not a number
When we read the local file as float When , If there is a deficiency , Will appear nan
As an inappropriate calculation ( For example, infinity (inf) Subtract infinity )
inf(-inf,inf):infinity,inf Positive infinity ,-inf Negative infinity
When will it show up inf Include (-inf,+inf)
For example, a number divided by 0,(python An error will be reported directly in ,numpy There is a inf perhaps -inf)
numpy Medium nan Points for attention
1. Two nan It's not equal
2.np.nan!=np.nan
3. Take advantage of the above features , Judge the... In the array nan The number of
4. Judge whether a number is nan adopt np.isnan(a) To judge
5.nan And any value calculated as nan
###numpy Statistical functions commonly used in
Sum up :t.sum(axis=None)
mean value :t.mean(a,axis=None) Affected by outliers
The median :np.median(t,axis=None)
Maximum :t.max(axis=None)
minimum value :t.min(axis=None)
extremum :np.ptp(t,axis=None) That is, the difference between the maximum value and the minimum value is only
Standard deviation :t.std(axis=None)
numpy Fill in nan
# coding=utf-8
import numpy as np
# print(t1)
def fill_ndarray(t1):
for i in range(t1.shape[1]): # Traverse each column
temp_col = t1[:,i] # The current column
nan_num = np.count_nonzero(temp_col!=temp_col)
if nan_num !=0: # Not for 0, Indicates that there are... In the current column nan
temp_not_nan_col = temp_col[temp_col==temp_col] # The current column is not nan Of array
# Check that the current is nan The location of , Assign a value that is not nan The average of
temp_col[np.isnan(temp_col)] = temp_not_nan_col.mean()
return t1
if __name__ == '__main__':
t1 = np.arange(24).reshape((4, 6)).astype("float")
t1[1, 2:] = np.nan
print(t1)
t1 = fill_ndarray(t1)
print(t1)
Running results :
【 Hands on 】 Britain and the United States each youtube1000 The data is combined with the previous matplotlib Draw a histogram of the number of comments
import numpy as np
from matplotlib import pyplot as plt
us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"
# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t_us = np.loadtxt(us_file_path,delimiter=",",dtype="int")
# Take the data of the comment
t_us_comments = t_us[:,-1]
# Choose more than 5000 Small data
t_us_comments = t_us_comments[t_us_comments<=5000]
print(t_us_comments.max(),t_us_comments.min())
d = 50
bin_nums = (t_us_comments.max()-t_us_comments.min())//d
# mapping
plt.figure(figsize=(20,8),dpi=80)
plt.hist(t_us_comments,bin_nums)
plt.show()
Running results :
【 Hands on 】 I hope to know about youtube The relationship between the number of comments and the number of likes in the video , How to draw the change map
import numpy as np
from matplotlib import pyplot as plt
us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"
t_uk = np.loadtxt(uk_file_path,delimiter=",",dtype="int")
# Choose to like books better than 50 Ten thousand small data
t_uk = t_uk[t_uk[:,1]<=500000]
t_uk_comment = t_uk[:,-1]
t_uk_like = t_uk[:,1]
plt.figure(figsize=(20,8),dpi=80)
plt.scatter(t_uk_like,t_uk_comment)
plt.show()
Running results :
Row and column swapping of arrays
The horizontal or vertical splicing of arrays is very simple , But what should we pay attention to before splicing ?
When splicing vertically : Each column represents the same meaning !!! Otherwise, the bull's head is not right for the horse's mouth
If each column has a different meaning , At this time, the columns of a certain group of numbers should be exchanged , Make it the same as another class
【 Hands on 】 Now I hope to study and analyze the data methods of the two countries in the previous case , At the same time, keep the information of the country ( Country source of each data ), What to do
import numpy as np
us_data = "./youtube_video_data/US_video_data_numbers.csv"
uk_data = "./youtube_video_data/GB_video_data_numbers.csv"
# Load country data
us_data = np.loadtxt(us_data,delimiter=",",dtype=int)
uk_data = np.loadtxt(uk_data,delimiter=",",dtype=int)
# Add country information
# The structure is all 0 The data of
zeros_data = np.zeros((us_data.shape[0],1)).astype(int)
ones_data = np.ones((uk_data.shape[0],1)).astype(int)
# Add a column with all 0,1 Array of
us_data = np.hstack((us_data,zeros_data))
uk_data = np.hstack((uk_data,ones_data))
# Splice two sets of data
final_data = np.vstack((us_data,uk_data))
print(final_data)
Running results :
numpy More easy to use methods
1. Get the position of the maximum value and the minimum value
np.argmax(t,axis=0)
np.argmin(t,axis=1)
2. Create a full 0 Array of : np.zeros((3,4))
3. Create a full 1 Array of :np.ones((3,4))
4. Create a diagonal for 1 The square array of ( Matrix ):np.eye(3)
边栏推荐
- redis中value/hush
- 研发效能度量框架解读
- Want to ask, is there a discount for opening a securities account? Is it safe to open a mobile account?
- Comment Cao définit la décimale de dimension
- Golang des-cbc
- How to make the development of liquidity pledge mining system, case analysis and source code of DAPP defi NFT LP liquidity pledge mining system development
- Summary of JFrame knowledge points 1
- Adjacency matrix undirected graph (I) - basic concepts and C language
- 强大、好用、适合程序员/软件开发者的专业编辑器/笔记软件综合评测和全面推荐
- 内核同步机制
猜你喜欢
Learning summary on June 29, 2022
Uniapp uses uni upgrade Center
CPU 上下文切换的机制和类型 (CPU Context Switch)
用实际例子详细探究OpenCV的轮廓检测函数findContours(),彻底搞清每个参数、每种模式的真正作用与含义
How to set decimal places in CAD
Binary stack (I) - principle and C implementation
The Missing Semester
Explore the contour detection function findcontours() of OpenCV in detail with practical examples, and thoroughly understand the real role and meaning of each parameter and mode
Harbor webhook from principle to construction
91.(cesium篇)cesium火箭发射模拟
随机推荐
Rural guys earn from more than 2000 a month to hundreds of thousands a year. Most brick movers can walk my way ǃ
Mechanism and type of CPU context switch
Sum of factor numbers of interval product -- prefix sum idea + fixed one shift two
Personnaliser le plug - in GRPC
C#依赖注入(直白明了)讲解 一看就会系列
Learning summary on June 28, 2022
redis常识
Exploration and practice of inress in kubernetes
自定义 grpc 插件
Emotion analysis based on IMDB comment data set
Golang des-cbc
自定義 grpc 插件
Raspberry pie 4B installation tensorflow2.0[easy to understand]
Build yocto system offline for i.mx8mmini development board
Computer graduation project asp Net hotel room management system VS development SQLSERVER database web structure c programming computer web page source code project
指定的服务已标记为删除
C summary of knowledge points 3
邻接矩阵无向图(一) - 基本概念与C语言
Understanding of MVVM and MVC
迅为i.MX8Mmini开发板离线构建Yocto系统