当前位置：网站首页>Numpy's research imitation 1

Numpy's research imitation 1

2022-06-29 21:43:00 【InfoQ】

I have been in contact with a project , It is necessary to extract the voiceprint feature vector of an audio , For speech recognition . This requires some mathematical calculation . It mainly includes fast Fourier transform 、 High frequency filtering 、 Calculation of inverse Mel coefficient . I refer to some materials on the Internet , What is useful is to use Python Realized , Among them, there are Numpy The highest frequency of use . But our project needs to extract voiceprints from mobile phones , Only use C To achieve audio voiceprint extraction . The difficulty is to C Write a set of Numpy The function of （ Part of the ）, And then emulate Python Code writing C Version of voiceprint extraction .

This paper mainly records the author in order to achieve voiceprint extraction , Instead of having to use C Copy Numpy Part of the function . Recorded in the study Numpy There are some interesting processes in the process .

N Analysis of dimension group

NumPy It's using Python Basic software package for Scientific Computing .NumPy One of the most important features is its N Dimensional array object ndarray, It's a collection of data of the same type , With 0 The subscript is to start indexing the elements in the collection .ndarray Object is a multidimensional array used to hold elements of the same type .ndarray Each element in has an area of the same storage size in memory . use C To imitate one Numpy How to design ？

The key is this N Dimension group .N I'm not sure , When designing objects , The inner array is not fixed . This is a challenge for a static language . But one thing is certain , That is N The number of elements in the dimension group is certain , The number of elements is determined , It means that the size of memory space is determined . for example 3*5*5 Three dimensional array of , If it is float type , That's it 3*5*5*sizeof(float) = 75 * 4 = 300 byte. So internally, we design a continuous memory block with the same size . Externally through the interface , Restore it to a 3D array .

Give this to me first C It's a fake Numpy Give a domineering name --- ultra_array, The prototype is as follows ：

struct _u_array {
 char *start[2];
 int axis_n;
};

Yes, it is so simple and grand .start It's an array of Pointers . There are only two in all . These two pointer arrays , Point to two arrays respectively ,start[0] Is a pointer to an array that stores dimension information . for example 3*5*5 It is  [3, 5, 5],start[1]  Is the memory block pointer to each stored data . and axis_n Indicates that the array has dimensions . for example 3 Dimension is 3.4 Dimension is 4.axis_n Confirmed. start[0] The boundary of the . and start[0] Confirmed again start[1] The boundary of the .

Multidimensional array element access

How to access this N The elements in the dimension group ？N The data in the dimension group is stored in a one-dimensional array . Then we enter the coordinates of the elements at the interface , You need a matrix to get the coordinates of the one-dimensional array , For example, there is a 3*4*5*6 Array , We are going to visit （2,2,2,3） The data of this coordinate , So we first have to calculate this, this 3*3*5*6 This transformation matrix .

Do you think that the transformation matrix is （3,4,5,6）. wrong ！ If you think it's this , That would be a mistake of empiricism . Its transformation matrix starts from the penultimate two dimensions , Multiply each dimension by the next dimension to the last dimension , The resulting matrix will be N The transformation matrix from dimension group to one-dimensional array , And the last dimension is 1 Instead of . namely （3*4*5*6） The transformation matrix of the array is  [ 4*5*6,5*6, 6,1 ] => [ 120, 30, 6, 1 ].

that （2,2,2,3） This coordinate is converted into a one-dimensional coordinate  （2,2,2,3）dot (120,30,6,1)

=> 2*120 + 2*30 + 2*6 + 3*1 => 315. The code implementation is as follows ：

static size_t
__xd_coord_to_1d_offset(size_t coord[], size_t axes[], int axis_n) {

 size_t offset = 0, axis_mulitply;
 for (int i=0; i<axis_n; ++i) {
 size_t co = coord[i];
 axis_mulitply = __axis_mulitply(axes, axis_n, i+1);
 offset += co * axis_mulitply;
 }
 return offset;
}

So how do one-dimensional coordinates become N Dimensional coordinates ？ The penultimate dimension begins , We need to use one-dimensional coordinate values , Divide by the penultimate dimension and multiply by the last dimension , The resulting quotient is the coordinate of the current dimension , The remainder is the total value of the next dimension , Divide the product from the following dimension to the last dimension by the total value , Until the last dimension . For example, the one-dimensional coordinates we just calculated are 315, Then according to the above calculation, it is ：

315 / （4*5*6）= 2 more than 75

75   / （5*6）   = 2 more than 15 

15   / （6）       = 2 more than 3   

 3    /    1          = 3     

So the coordinates are [2,2,2,3]. The code implementation is ：

static void
__1d_offset_to_xd_coord( size_t offset, size_t axes[], int axis_n, size_t coord[])
{
 size_t div, mod, i, axis_mulitply, middle_value;
 middle_value = offset;
 for(i=0; i<axis_n-1; ++i) {
 axis_mulitply = __axis_mulitply(axes, axis_n, i+1);
 div = middle_value / axis_mulitply;
 mod = middle_value % axis_mulitply;
 coord[i] = div;
 middle_value = mod;
 }
 coord[i] = mod;
 return;
}

Code implementation

initialization

/**
 *  Enter dimension quantity , for example  3  dimension 
 *  Enter each dimension , for example  [3, 3, 3]
 */
u_array_t UArray_create(int axis_n, size_t shape[]) 
{
 if (axis_n >= 0) {
 u_array_t n_array;
 n_array.axis_n = axis_n;
 start[0] = __alloc_shape(axis_n, shape);
 start[1] = __alloc_data(__axis_mulitply(shape, axis_n, 0));

 return n_array;
 }
 return ua_unable;
}

Load data

u_array_t* UArray_load(u_array_t* arr, vfloat_t data[])
{
 size_t size_arr = UA_size(arr);
 vfloat_t* ptr = UA_data_ptr(arr);
 memcpy(ptr, data, size_arr); 
 return arr;
}

Access data

float UArray_get(u_array_t* arr, ...) 
{
 va_list valist;
 va_start(valist, arr);
 size_t coord[UA_axisn(arr)];
 for (int i=0; i<UA_axisn(arr); ++i) {
 coord[i] = va_arg(valist, size_t);
 }
 va_end(valist);
 size_t offset = UA_cover_coordinate(arr, coord);
 return ((float*)(UA_data_ptr(arr)))[offset];
}

void UArray_set(u_array_t* arr, ...)
{
 va_list valist;
 va_start(valist, arr);
 size_t coord[UA_axisn(arr)];
 vfloat_t value;
 for (int i=0; i<UA_axisn(arr); ++i) {
 coord[i] = va_arg(valist, size_t);
 }
 value = va_arg(valist, double);
 va_end(valist);
 size_t offset = UA_cover_coordinate(arr, coord);
 ((float*)(UA_data_ptr(arr)))[offset] = value;
 return;
}

test

int main()
{
 //  Define a  3  Dimensional  ultra_array
 u_array_t arr3 = UArray3d(2, 3, 4);
 //  Fill from  0  To  23  The number of .
 UA_arange(&arr3, 2*3*4);
 //  obtain 
 float v = UA_get(&arr3, 1, 2, 3);
 // v == 23
 UA_set(&arr3, 1, 2, 3, 5.5);
 v = UA_get(&arr3, 1, 2, 3);
 // v == 5.5
 return 0;
}

So here's an easy one C Version of the multi-dimensional array to achieve . The above codes are derived from ：

https://github.com/zuweie/boring-code/tree/main/src/ultra_array

End ！

原网站

版权声明
本文为[InfoQ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/180/202206291532504058.html