当前位置：网站首页>Numpy basic package for data analysis

Numpy basic package for data analysis

2022-07-25 14:07:00 【Ride Hago to travel】

First of all, why should we learn numpy Well ？ a verbal statement without any proof , Watch a little exercise

Suppose you have a list , There are n It's worth , Take out the value of the list that is greater than a certain number

import numpy as np
import random

#  If it is larger than 60 Value 
a = [random.randint(1, 100) for i in range(50)]
# print(a)

#  learn python The first day 
new_list = []
for i in a:
    if i > 60:
        new_list.append(i)
print(new_list)

#  After learning anonymous functions 
c = list(filter(lambda x: x > 60, a))
print(c)

#  To learn the numpy after 
d = np.array(a)
print(d[d > 60])

[68, 69, 77, 69, 61, 77, 95, 96, 73, 88, 98, 74, 88, 98, 92, 63]
[68, 69, 77, 69, 61, 77, 95, 96, 73, 88, 98, 74, 88, 98, 92, 63]
[68 69 77 69 61 77 95 96 73 88 98 74 88 98 92 63]

Obvious , Use numpy After that, it will be much simpler ！ namely numpy Make many batch operations easier , More efficient ！！！

One , Installation environment

numpy Belong to python Third party Library in , Need to be in Windows Of dos Window or pycharm Below Terminal Enter the following command to install

pip install numpy

Two , establish array

1, Import numpy package

import numpy as np
import random

2, establish array

a = [random.randint(10, 30) for i in range(10)]
b = np.array(a)  #  Create array 
print(b)
print(type(a), type(b)

[25 22 28 10 21 12 22 29 20 11]
<class 'list'> <class 'numpy.ndarray'>

a = np.array([1, 2, 3, 4, 5])
print(a)
b = np.array(range(10))
print(b)

[1 2 3 4 5]
[0 1 2 3 4 5 6 7 8 9]

3, Create multidimensional arrays

① Create a 2D array

a = np.array([[1, 2, 3, 4], [6, 7, 8, 9]])
print(a)

[[1 2 3 4]
 [6 7 8 9]]

a = np.arange(5, 20).reshape((3, 5))
print(a)

[[ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]]

② Create a 3D array

a = np.array([[[1, 2, 3, 4], [6, 7, 8, 9]], [[2, 4, 5, 6], [3, 4, 5, 6]]])
print(a)

[[[1 2 3 4]
  [6 7 8 9]]

 [[2 4 5 6]
  [3 4 5 6]]]

a = np.arange(5, 17).reshape((2, 3, 2))
print(a)

[[[ 5  6]
  [ 7  8]
  [ 9 10]]

 [[11 12]
  [13 14]
  [15 16]]]

4, Create whole 0 Array

print(np.array([0] * 10))
a = np.zeros(10)  #  The default is float
b = np.zeros(10, dtype=int)  #  Convert to int
print(a)
print(b)

[0 0 0 0 0 0 0 0 0 0]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0 0 0 0 0 0 0 0 0 0]

5, Create whole 1 Array

print(np.ones(10))  #  The default is decimal 

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

6, Create an empty array

print(np.empty(10))  #  Create an empty array , Values left in memory before , It makes no sense 

[6.23042070e-307 4.67296746e-307 1.69121096e-306 3.11523921e-307
 7.56599128e-307 1.37961913e-306 8.01097889e-307 1.78019082e-306
 1.78020984e-306 1.60218627e-306]

7, Create identity matrix

print(np.eye(5))

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

8, Others

① Find array length

b = np.array(range(20))
print(b.size)

20

②

print(np.linspace(1, 13, 10))  #  take 1~13 The average is divided into 10 Share 

[ 1.          2.33333333  3.66666667  5.          6.33333333  7.66666667
  9.         10.33333333 11.66666667 13.        ]

③

print(np.arange(10))  #  and range The usage is basically similar to , The only difference is arange The third parameter can be decimal 
print(np.arange(1, 10, 0.5))

[0 1 2 3 4 5 6 7 8 9]
[1.  1.5 2.  2.5 3.  3.5 4.  4.5 5.  5.5 6.  6.5 7.  7.5 8.  8.5 9.  9.5]

3、 ... and ,array Batch operation of

b = [random.randint(0, 100) for n in range(10)]
c = np.array(b)
print(c)
print(c + 1)
print(c / 3)

[72 50 91 95 35 63 66 65 53 15]
[73 51 92 96 36 64 67 66 54 16]
[24.         16.66666667 30.33333333 31.66666667 11.66666667 21.
 22.         21.66666667 17.66666667  5.        ]

a = np.arange(10)
b = np.arange(5, 15)
print(a, b)
print(a + b)
print(b / a)
print(a > b)
print(a > 5)
print(a**0.5) 

[0 1 2 3 4 5 6 7 8 9] [ 5  6  7  8  9 10 11 12 13 14]
[ 5  7  9 11 13 15 17 19 21 23]
G:/untitled/data_habding/numpy_study/day1/study2.py:76: RuntimeWarning: divide by zero encountered in true_divide
  print(b / a)  # [ inf 6. 3.5 2.66666667 2.25 2.
[       inf 6.         3.5        2.66666667 2.25       2.
 1.83333333 1.71428571 1.625      1.55555556]  # inf(infinity Infinity )  Larger than any floating point number 
[False False False False False False False False False False]
[False False False False False False  True  True  True  True]
[0.         1.         1.41421356 1.73205081 2.         2.23606798
 2.44948974 2.64575131 2.82842712 3.        ]

Four ,array The index of

a = np.arange(5, 16)
print(a)  # [ 5 6 7 8 9 10 11 12 13 14 15]
print(a[0])  # 5

a = np.arange(15).reshape((3, 5))  #  Quick creation 3 That's ok 5 Two dimensional array of columns 
print(a)
# [[ 0 1 2 3 4]
# [ 5 6 7 8 9]
# [10 11 12 13 14]]

print(a[0][0])  # 0  List writing 
print(a[0, 0])  # 0  recommend ( The new way of writing )
print(a[1, 2])  # 7

3, Boolean index

a = np.arange(5)
print(a)
b = a[[True, False, True, False, False]]
print(b)  # [0 2]

4, Fancy index

a = np.arange(5, 20).reshape((3, 5))
print(a)
# [[ 5 6 7 8 9]
# [10 11 12 13 14]
# [15 16 17 18 19]]
print(a[0, a[0] > 6])  # [7 8 9]

print(a[[0, 2], :][:, [1, 3]])
# [[ 6 8]
# [16 18]]

5、 ... and ,array The section of

a = np.arange(5, 15)  # [ 5 6 7 8 9 10 11 12 13 14]
print(a[: 3])  # [5 6 7]
print(a[-5:])  # [10 11 12 13 14]

a = np.arange(5, 20).reshape((3, 5))
# [[ 5 6 7 8 9]
# [10 11 12 13 14]
# [15 16 17 18 19]]
print(a[:, 1])  # [ 6 11 16]
print(a[: 2, 1: 3])
# [[ 6 7]
# [11 12]]
print(a[1:, 3:])
# [[13 14]
# [18 19]]

3, A small difference between arrays and lists

a = list(range(5, 15))
print(a)

b = np.arange(5, 15)
print(b)

c = a[: 5]
d = b[: 5]

c[0] = 20  # [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
d[0] = 20  # [20 6 7 8 9 10 11 12 13 14]
print(a)
print(b)

summary ： Modify the elements in its slice in the array , The array will also be modified , But the list won't

6、 ... and ,numpy General functions in

1,abs Take the absolute value

a = np.arange(-20, 5)
print(np.abs(a))

[20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0  1  2  3
  4]

2,sqrt prescribing

a = np.arange(-20, 5)
print(np.sqrt(a))

[       nan        nan        nan        nan        nan        nan
        nan        nan        nan        nan        nan        nan
        nan        nan        nan        nan        nan        nan
        nan        nan 0.         1.         1.41421356 1.73205081
 2.        ]

notes ：nan( namely not a number) It's not a number Not equal to any floating point number (nan != nan)

3,round

① primary python in

import math

print(round(3.4))  #  Round to the nearest whole  3

print(int(1.9))  #  Just throw away the decimal places  1

print(math.floor(1.8))  #  Rounding down  1
print(math.floor(-1.8))  #  Rounding down  -2

print(math.ceil(1.1))  #  Rounding up  2
print(math.ceil(-1.6))  #  Rounding up  -1

②numpy in

a = np.arange(-5.5, 5)
print(a)
print(np.round(a))  # [-6. -4. -4. -2. -2. -0. 0. 2. 2. 4. 4.]
print(np.trunc(a))  # [-5. -4. -3. -2. -1. -0. 0. 1. 2. 3. 4.]
print(np.floor(a))  # [-6. -5. -4. -3. -2. -1. 0. 1. 2. 3. 4.]
print(np.ceil(a))  # [-5. -4. -3. -2. -1. -0. 1. 2. 3. 4. 5.]

③

a = np.linspace(1, 5, 10)
print(a)
print(np.round(a, 2))  #  Take two decimal places 

[1.         1.44444444 1.88888889 2.33333333 2.77777778 3.22222222
 3.66666667 4.11111111 4.55555556 5.        ]
[1.   1.44 1.89 2.33 2.78 3.22 3.67 4.11 4.56 5.  ]

4,modf Strip decimal and integer digits

a = np.arange(-5.5, 5.5)
print(a)
print(np.modf(a))
# (array([-0.5, -0.5, -0.5, -0.5, -0.5, -0.5, 0.5, 0.5, 0.5, 0.5, 0.5]),
# array([-5., -4., -3., -2., -1., -0., 0., 1., 2., 3., 4.]))

5, Two special floating-point numbers

①nan

# nan(nan namely not a number) It's not a number   Not equal to any floating point number (nan != nan)
print(float('nan'))  # nan

a = np.arange(0, 5)
print(a / a)  # [nan 1. 1. 1. 1.]

print(np.nan)  #  establish nan
print(np.nan == np.nan)  # False

How to judge whether there is nan

a = np.array([0, 3, 6, 8, 0])
b = a / a
print(np.isnan(b))  #  Judge whether there is nan [ True False False False True]
print(b[~np.isnan(b)])  # [1. 1. 1.]  Take out the inside nan

②inf

# inf(infinity Infinity )  Larger than any floating point number 
print(float('inf'))  # inf

How to judge whether there is inf

a = np.array([3, 4, 6, 9])  # [inf 1. inf 4.5]  Infinity 
b = np.array([0, 4, 0, 2])
c = a / b
print(np.inf == np.inf)  # True
print(c[~np.isinf(c)])  # [1. 4.5]  Take out the inside inf
print(c[c!=np.inf])  # [1. 4.5]  Take out the inside inf

6,maximum Returns the largest value in the corresponding position of two arrays

a = np.array([3, 5, 6, 8])
b = np.array([5, 8, 2, 1])
print(np.maximum(a, b))  # [5 8 6 8]

7,minimum Returns the smallest value in two arrays with one-to-one correspondence

a = np.array([3, 5, 6, 8])
b = np.array([5, 8, 2, 1])
print(np.minimum(a, b))  # [3 5 2 1]

7、 ... and , Mathematical and statistical methods

1,sum() Sum up

a = np.array([3, 5, 6, 8])
print(a.sum())  # 22

2,mean() averaging

a = np.array([3, 5, 6, 8])
print(a.mean())  # 5.5

3,var() Variance estimation ( The degree of dispersion of the whole set of data )

a = np.array([3, 5, 6, 8])
print(a.var())  # 3.25

4,std() Find standard deviation ( Square root of variance )

a = np.array([3, 5, 6, 8])
print(a.std())  # 1.8027756377319946

5,max() min() For maximum , minimum value

a = np.array([3, 5, 6, 8])
print(a.max())  # 8
print(a.min())  # 3

6,argmax() argmin() Find the maximum subscript , The minimum number is shown in the table below

a = np.arange(-10, 10)
print(a.argmax())  # 19
print(a.argmin())  # 0

8、 ... and , Randomly generated numbers

1,random Usage in Library

print(random.random())  #  return [0, 1) Is a random floating point number 
print(random.randint(1, 10))  #  return [1.10] Random integer of 
print(random.choice(list(range(3, 10))))  #  Select an element randomly from the given list and return 

a = list(range(5, 15))
random.shuffle(a)  #  Scramble the list 
print(a)

print(random.uniform(1, 10))  # [1,10] Return a floating-point number randomly

2,numpy In the middle of the day

print(np.random.randint(0, 10))  # 8
print(np.random.randint(0, 10, 10))  # [0 1 5 8 4 7 3 6 8 6]
print(np.random.randint(0, 10, (3, 5)))
# [[2 9 2 3 6]
# [9 2 8 4 1]
# [5 3 1 6 5]]

print(np.random.random(10))  # [0.33031099 0.35918637 0.65868327 0.74442108 0.71771834 0.74782961
                                # 0.15125635 0.17983218 0.37393755 0.77529924]
print(np.random.random())  # 0.7375970998290451
print(np.random.rand())  # 0.4410851788926652
print(np.random.rand(10))  # [0.14405128 0.0463478 0.97758344 0.03100739 0.94210667 0.70551171
                                # 0.91935786 0.43250767 0.78710345 0.78756913]

print(np.random.randint(1, 19))  # 16
print(np.random.randint(1, 19, 10))  # [ 7 15 17 12 18 14 13 15 5 16]

print(np.random.choice(np.arange(1, 10)))  #  Choose one at random 

a = np.arange(3, 13)
np.random.shuffle(a)  #  Random disorder order 
print(a)

print(np.random.uniform(1, 10))  #  Return a floating-point number randomly 
print(np.random.uniform(1, 10, 10))  #  Return ten floating-point numbers randomly