当前位置：网站首页>Data analysis course notes (III) array shape and calculation, numpy storage / reading data, indexing, slicing and splicing

Data analysis course notes (III) array shape and calculation, numpy storage / reading data, indexing, slicing and splicing

2022-07-07 00:21:00 【M Walker x】

Data analysis course notes

numpy Reading data
numpy Index and slice
Data splicing
- Data row column exchange
- More easy-to-use methods
numpy Generate random number
- numpy Points for attention copy and view

The shape of the array

Insert picture description here

The calculation of array

Insert picture description here

Calculate in different dimensions

Insert picture description here

Broadcasting principles

Insert picture description here

Axis （axis）

Insert picture description here

numpy Reading data

Insert picture description here
np.loadtxt(fname,dtype=np.float,delimiter=None,skiprows=0,usecols=None,unpack=False)

Data sources :
https://www.kaggle.com/datasnaek/youtube/data
Insert picture description here

# coding=utf-8
import numpy as np

us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"

# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t2 = np.loadtxt(us_file_path,delimiter=",",dtype="int")

# print(t1)
print(t2)

print("*"*100)

# Take row 
# print(t2[2])

# Take consecutive multiple lines 
# print(t2[2:])

# Take discontinuous multiple lines 
# print(t2[[2,8,10]])

# print(t2[1,:])
# print(t2[2:,:])
# print(t2[[2,10,3],:])

# Fetch 
# print(t2[:,0])

# Take consecutive Columns 
# print(t2[:,2:])

# Take discontinuous multiple columns 
# print(t2[:,[0,2]])

# Go to rows and columns , Take the first place 3 That's ok , The value of the fourth column 
# a = t2[2,3]
# print(a)
# print(type(a))

# Fetching multiple rows and columns , Take the first place 3 Line to line five , The first 2 Column to the first 4 The results of the column 
# Go to the intersection of rows and columns 
b = t2[2:5,1:4]
# print(b)

# Take multiple non adjacent points 
# The result is （0,0） （2,1） （2,3）
c = t2[[0,2,2],[0,1,3]]
print(c)

numpy Index and slice

Insert picture description here

numpy Boolean index in

Insert picture description here

numpy Ternary operator in

Insert picture description here

numpy Medium clip( tailoring )

Insert picture description here

numpy Medium nan and inf

Insert picture description here

numpy Medium nan Points for attention

Insert picture description here

numpy Statistical functions commonly used in

Insert picture description here

All statistical results of multidimensional array are returned by default , If specified axis Returns a result on the current axis
Insert picture description here

Missing value processing

# coding=utf-8
import numpy as np


# print(t1)
def fill_ndarray(t1):
    for i in range(t1.shape[1]):  # Traverse each column 
        temp_col = t1[:,i]  # The current column 
        nan_num = np.count_nonzero(temp_col!=temp_col)
        if nan_num !=0: # Not for 0, Indicates that there are... In the current column nan
            temp_not_nan_col = temp_col[temp_col==temp_col] # The current column is not nan Of array

            #  Check that the current is nan The location of , Assign a value that is not nan The average of 
            temp_col[np.isnan(temp_col)] = temp_not_nan_col.mean()
    return t1

if __name__ == '__main__':
    t1 = np.arange(24).reshape((4, 6)).astype("float")
    t1[1, 2:] = np.nan
    print(t1)
    t1 = fill_ndarray(t1)
    print(t1)

How to select one or more rows of data （ Column ）？
How to assign values to selected rows or columns ？
How to make it bigger than 10 Replace the value of with 10？
np.where How to use ？
np.clip How to use ？
How to transpose （ Exchange axis ）？
Read and save data as csv
np.nan and np.inf What is it?
How many common statistical functions do you remember ？
What information does the standard deviation reflect about the data

#### numpy Index and slice of 
  - t[10,20]
  - `t[[2,5],[4,8]]`

  - t[3:]
  - t[[2,5,6]]

  - t[:,:4]
  - t[:,[2,5,6]]

  - t[2:3,5:7]


#### numpy Medium bool Indexes ,where,clip Use 
  - t[t<30] = 2
  - np.where(t<10,20,5)
  - t.clip(10,20)


####  Transpose and read local files 
  - t.T
  - t.transpose()
  - t.sawpaxes()

  - np.loadtxt(file_path,delimiter,dtype)


#### nan and inf What is it? 
  - nan not a number
  - np.nan != np.nan
  -  Any value sum nan All calculations are nan

  - inf  infinite 

####  Commonly used statistical functions 
  - t.sum()
  - t.mean()
  - np.meadian()
  - t.max()
  - t.min()
  - np.ptp()
  - t.std()

Insert picture description here

import numpy as np
from matplotlib import  pyplot as plt

us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"

# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t_us = np.loadtxt(us_file_path,delimiter=",",dtype="int")

# Take the data of the comment 
t_us_comments = t_us[:,-1]

# Choose more than 5000 Small data 
t_us_comments = t_us_comments[t_us_comments<=5000]

print(t_us_comments.max(),t_us_comments.min())

d = 50

bin_nums = (t_us_comments.max()-t_us_comments.min())//d

# mapping 
plt.figure(figsize=(20,8),dpi=80)

plt.hist(t_us_comments,bin_nums)


plt.show()

import numpy as np
from matplotlib import  pyplot as plt

us_file_path = "./youtube_video_data/US_video_data_numbers.csv"
uk_file_path = "./youtube_video_data/GB_video_data_numbers.csv"

# t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)
t_uk = np.loadtxt(uk_file_path,delimiter=",",dtype="int")

# Choose to like books better than 50 Ten thousand small data 
t_uk = t_uk[t_uk[:,1]<=500000]

t_uk_comment = t_uk[:,-1]
t_uk_like = t_uk[:,1]



plt.figure(figsize=(20,8),dpi=80)
plt.scatter(t_uk_like,t_uk_comment)

plt.show()

Data splicing

Insert picture description here

Data row column exchange

Insert picture description here
Now I hope to study and analyze the data methods of the two countries in the previous case , So what should I do ？

# coding=utf-8
import numpy as np

us_data = "./youtube_video_data/US_video_data_numbers.csv"
uk_data = "./youtube_video_data/GB_video_data_numbers.csv"

# Load country data 
us_data = np.loadtxt(us_data,delimiter=",",dtype=int)
uk_data = np.loadtxt(uk_data,delimiter=",",dtype=int)

#  Add country information 
# The structure is all 0 The data of 
zeros_data = np.zeros((us_data.shape[0],1)).astype(int)
ones_data = np.ones((uk_data.shape[0],1)).astype(int)

# Add a column with all 0,1 Array of 
us_data = np.hstack((us_data,zeros_data))
uk_data = np.hstack((uk_data,ones_data))


#  Splice two sets of data 
final_data = np.vstack((us_data,uk_data))
print(final_data)

More easy-to-use methods

Insert picture description here

numpy Generate random number

Insert picture description here

# coding=utf-8
import numpy as np


np.random.seed(10)
t = np.random.randint(0,20,(3,4))
print(t)

numpy Points for attention copy and view

Insert picture description here

原网站

版权声明
本文为[M Walker x]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202131006208574.html

当前位置：网站首页>Data analysis course notes (III) array shape and calculation, numpy storage / reading data, indexing, slicing and splicing

Data analysis course notes (III) array shape and calculation, numpy storage / reading data, indexing, slicing and splicing