Cann operator: using iterators to efficiently realize tensor data cutting and blocking processing

Abstract ： This article takes Diagonal For example, operators , Introduce and explain in detail how to use iterators to n dimension Tensor Carry out mass data reading based on position coordinates .

This article is shared from Huawei cloud community 《CANN operator ： Use iterators to achieve Tensor Data cutting and block processing 》, author ： CatherineWang .

Mission scenarios and objectives

stay CANN aicpu Operator development and implementation , Often need to be right n dimension Tensor Slice （slice）、 cutting （dice）、 Transposition （transpose）、 Exchange specified dimension data （shuffle） Wait for the operation . The above operations are essentially data reading in sequence according to the specified rules , And write the read data into the new data address .

This article takes Diagonal For example, operators , Introduce and explain in detail how to use iterators to n dimension Tensor Carry out mass data reading based on position coordinates .

Diagonal The operator wants to extract diagonal elements from the data with two specified dimensions , Finally, the diagonal element of the tensor is returned . In essence, the operator passes through the attribute dim1 and dim2 Determine a matrix , Returns the diagonal elements of the matrix （ There is an offset offset）, And put it in the last dimension . Not dim1 and dim2 Dimensions , Will be regarded as batch Dimension processing .

Conventional scheme ：

Scheme 1 ： take shape by s, The number of elements is numel Of Input Tensor：x Turn into Eigen::Tensor：eigen_x; Yes eigen_x Conduct shuffle operation , take dim1 and dim2 Change to the penultimate dimension and the penultimate dimension ; adopt reshape Operation will eigen_x Change into a three-dimensional Eigen::Tensor：reshape_x,shape=(numel/ s[dim1]/s[dim2],s[dim1],s[dim2]); Take diagonal elements for the last two-dimensional data , And assign the final data to the output data address . Be careful ： because Eigen::Tensor<typename T, int NumIndices_> Cannot dynamically set dimensions , namely NumIndices_ Item must be a specific value , Therefore, you need to define the corresponding dimension in advance Eigen::Tensor spare .

Option two ： For one n Dimensional Tensor, utilize n layer for Cycle the positioning and reading of data , And take the diagonal value .

It can be seen that the above two schemes are cumbersome in the implementation of dynamic size input calculation , You need to set the corresponding dimension in advance according to the situation Eigen::Tensor or for Loop logic structure , That is, there are dimensional constraints .

Prepare knowledge and analysis

We know we'll AICPU in , For one Tensor, We can get through GetTensorShape、GetData Wait a function to get Tensor Shape and size 、 Specific data address and other information . However, we cannot directly obtain the data value of the specified position in the form of position coordinates .

1. step

First, introduce the step size （stride） The concept of （ If you have mastered this part of knowledge, you can directly jump to the next part ）.stride Is in the specified dimension dim The step size necessary to jump from one element to the next in . for example , For one shape=(2, 3, 4, 5) Of Tensor, Its stride=(60, 20, 5, 1). So if you want to get the above Tensor The middle position coordinate is [1, 2, 1, 3] The data of , Just find the number in the data address 108(=60*1+20*2+5*1+3) Bit corresponding value .

2. iterator

Define iterator PositionIterator, Contains private members pos_ and shape_, among pos_ Is the initial position ,shape_ Standard shape . By overloading ++ Symbol , Yes pos_ Make changes , Realize the self increment operation of iterators . Based on the above iterators , It can realize the given shape Take positions in sequence . For a given shape=(d_1,d_2,…,d_n), From the initial position (0,0,…,0) Start , Take... In turn (0,0,…,0,0), (0,0,…,0,1),…,(0,0,…,0,d_n-1), (0,0,…,1,0), (0,0,…,1,1),…, (d_1 - 1,d_2 - 1,…,d_{n-1}-1,d_{n}-1).

in fact , The above iterator can be understood as a base , For a given standard shape shape_=(d_1,d_2,…,d_n), The first i Bit operation is every d_i Into the 1. At the same time through PositionIterator .End() Control the end of the iterator . The specific implementation is as follows ：

template <typename T>

class PositionIterator {

 public:

  PositionIterator(){};

  ~PositionIterator(){};

  PositionIterator(std::vector<T> stt, std::vector<T> sh) {

    if (stt.size() != sh.size()) {

      PositionIterator();

    } else {

      for (unsigned int i = 0; i < sh.size(); i++) {

        if (stt[i] >= sh[i]) {

          PositionIterator();

        }

      }

      pos_ = stt;

      shape_ = sh;

    }

  }

  PositionIterator operator++() {

    pos_[shape_.size() - 1] += 1;

    for (unsigned int i = shape_.size() - 1; i > 0; i--) {

      if (pos_[i] / shape_[i] != 0) {

        pos_[i - 1] += pos_[i] / shape_[i];

        pos_[i] = pos_[i] % shape_[i];

      }

    }

    return *this;

  }

  bool End() {

    if (pos_[0] != shape_[0]) {

      return false;

    }

    return true;

  }

  std::vector<T> GetPos() { return pos_; }

  std::vector<T> GetShape() { return shape_; }

 private:

  std::vector<T> pos_;

  std::vector<T> shape_;

};

Diagonal The realization of operators

Using iterators , In general , We only need two floors for loop , Can be realized Diagonal The calculation of the operator . first floor for The loop is used to determine the division dim1 and dim2 Position coordinates of dimensions , The second floor for A loop is used for dim1 and dim2 The corresponding dimension determines the position of diagonal elements , Through such two layers for loop , The position of diagonal elements can be determined . Through such value processing , Compare with Eigen Realize the idea , The computing speed has been significantly improved , And there are no dimensional restrictions ,st The test results are compared as follows ：

See the following code for specific implementation ：

template <typename T>

uint32_t DiagonalCpuKernel::DoComputeType(CpuKernelContext &ctx,

                                          const int64_t &offset,

                                          const int64_t &dim1,

                                          const int64_t &dim2) {

  // Get the inuput and output

  Tensor *input_x = ctx.Input(0);

  Tensor *y = ctx.Output(0);

  // Get some information of input

  auto x_shape = input_x->GetTensorShape();

  std::vector<int64_t> x_shape_ = x_shape->GetDimSizes();

  const int64_t x_dim = x_shape->GetDims();

  auto dataptr = reinterpret_cast<T *>(ctx.Input(0)->GetData());

  auto y_dataptr = reinterpret_cast<T *>(y->GetData());

  // Compute

  //  First, calculate the number of diagonal elements 

  int64_t dsize = OffsetSize(offset, dim1, dim2, x_shape_);

  //  To generate the input Tensor Step vector of x_stride

  std::vector<int64_t> x_stride = ConstructStride<int64_t>(x_shape_);

  //  Discussion by situation ,2 Peacekeeping greater than 2 The d 

  if (x_dim != N2) {

    //set the vx_shape and vx_stride

    //  Generate x_shape and x_stride Remove from dim1 and dim2 Corresponding to vx_shape And vx_stride

    std::vector<int64_t> vx_shape, vx_stride;

    for (unsigned int tmp_dim = 0; tmp_dim < x_shape_.size(); tmp_dim++) {

      if (tmp_dim != dim1 && tmp_dim != dim2) {

        vx_shape.push_back(x_shape_[tmp_dim]);

        vx_stride.push_back(x_stride[tmp_dim]);

      }

    }

    // set the y_shape, y_stride, vy_stride

    //  Generate output Tensor Shape and step vector of ：y_shape and y_stride

    std::vector<int64_t> y_shape = vx_shape;

    y_shape.push_back(dsize);

    std::vector<int64_t> y_stride =

        ConstructStride<int64_t>(y_shape);

    //  Generate output Tensor Out of the last one-dimensional step vector ：vy_stride

    std::vector<int64_t> vy_stride = y_stride;

    vy_stride.pop_back();

    //  Read diagonal data 

    std::vector<int64_t> v_start(vx_shape.size(), 0);

    for (PositionIterator<int64_t> myiter(v_start, vx_shape); !myiter.End();

         ++myiter) {

      //  Use the iterator to determine the division dim1 and dim2 Position coordinates of dimensions 

      auto p = myiter.GetPos();

      //  The basic position values of input and output are calculated by step vector and position coordinates base_pos1 and outbase_pos

      int64_t base_pos1 = MulSum<int64_t>(p, vx_stride);

      int64_t outbase_pos = MulSum<int64_t>(p, vy_stride);

      for (int i = 0; i < dsize; i++) {

      //  Combined with the foundation position value calculated above , Yes dim1 and dim2 The corresponding dimension determines the position of diagonal elements , And assign it to the output data address （get_data It involves taking elements from the upper diagonal or the lower diagonal , It does not affect the understanding of the function of iterators ）

        int64_t base_pos2 = i * (x_stride[dim1] + x_stride[dim2]);

        int64_t arr[N2] = {x_stride[dim1], x_stride[dim2]};

        y_dataptr[outbase_pos + i] =

            get_data(base_pos1 + base_pos2, offset, arr, dataptr);

      }

    }

  } else {

    for (int i = 0; i < dsize; i++) {

      int64_t base_pos = i * (x_stride[dim1] + x_stride[dim2]);

      int64_t arr[N2] = {x_stride[dim1], x_stride[dim2]};

      y_dataptr[i] = get_data(base_pos, offset, arr, dataptr);

    }

  }

  return KERNEL_STATUS_OK;

}

Other uses of iterators

1、 Data slicing ： Such as Sort In operator , Use iterators for Tensor Data about tmp_axis Take a dimension , For subsequent sorting operations .

for (position_iterator<int64_t> mit(v_start, v_shape); !mit.end(); ++mit) {

      auto p = mit.get_pos();

      int axis_len = input_shape_[tmp_axis];

      std::vector<ValueIndex<T>> data_(axis_len);

      int base_pos = mul_sum<int64_t>(p, v_stride);

      for (int32_t i = 0; i < axis_len; i++) {

        data_[i].value = x_dataptr[base_pos + i * input_stride[tmp_axis]];

        data_[i].index = i;

      }

2、 Data segmentation ： Chunking can be iterated by two iterators , You can also use an iterator and two coordinate positions for loop

3、 About specifying dimensions dim, Yes Tensor Dimension reduction is split into N Son Tensor： Such as UniqueConsecutive In operator , First, we need to talk about attributes axis dimension , Will be the original Tensor The data is split into input_shape[axis] Height Tensor（ This is used here vector Storage Tensor Data in ）.

std::vector<std::vector<T1>> data_;

  for (int64_t i = 0; i < dim0; i++) {

    std::vector<T1> tmp_v1;

    for (PositionIterator<int64_t> mit(v_start, v_shape); !mit.End(); ++mit) {

      auto pos = mit.GetPos();

      tmp_v1.push_back(

          x_dataptr[MulSum<int64_t>(pos, v_stride) + i * input_stride[axis]]);

    }

    data_.push_back(tmp_v1);

  }

Click to follow , The first time to learn about Huawei's new cloud technology ~

CANN operator ： Use iterators to achieve Tensor More related articles on data cutting and block processing

select2, utilize ajax Efficiently query big data list （ searchable 、 Pageable ）
Two . Import css and js Go to the website 1. Use CDN, Save your website traffic ? 1 2 <link href="https://cdnjs.cloudflare.com/ajax/libs/se ...
python Iterators chain process data
pytorch.utils.data Compatible with iterative data training processing , stay dataloader Use in to improve training efficiency : Use iterators to avoid memory overflow and insufficient phenomenon . With the help of chain processing, data reading and utilization are more efficient ( Comparable to the resource regulation of the operating system ) ...
stay Winform In the development framework , utilize DevExpress Control to achieve rapid data entry and selection
In the actual project development process , There are good controls or function modules , I always try to integrate as much as possible into my WInform In the development framework , In this way, we can develop projects later , You can save a lot of research time , And can be reused , Very efficient and convenient . In a blog I wrote a long time ago & ...
utilize PHPExcel Read Excel And export data to Excel
PHPExcel It's a PHP Class library , To help us be simple . Efficient implementation from Excel Read Excel And export data to Excel. It's also in our daily development , Common use scenarios . For example, there is a customer information table , To batch export to colleagues , I ...
Big data learning day34---spark14------1 redis The business of (pipeline) test ,2. utilize redis Of pipeline Realize data statistics exactlyonce ,3 SparkStreaming Write data in Hbase Realization ExactlyOnce, 4.Spark StandAlone Execution mode of ,5 spark on yarn
1 redis The business of (pipeline) test Redis Operate the data by itself , A single command is atomic , But transactions don't guarantee atomicity , And there is no rollback . The execution of any command in the transaction failed , The rest of the orders will still be executed , take Redis Multiple operations of are put into ...
utilize SQl Split and combine the data in the database
utilize SQl The implementation of data splitting and combination of the database provides the following solutions : Method 1 : WITH CTE AS (SELECT A.Id,A.[Uid],UserName FROM (SELECT A.[id], RE ...
Talk about Java Using the original HttpURLConnection send out POST data
This article mainly introduces java Using the original httpUrlConnection send out post data , Design to httpUrlConnection Class , If you are interested, please study with Xiaobian URLConnectio ...
utilize flashback query Restore table data
flashback query You can query the status of an object at a certain point in the past , So you can use this to recover data 1 Prepare test data Create a table with ordinary users , Insert some data into the table : SQL> show user USER ...
android Use cutting board to realize data transmission
stay Android One of the problems we often encounter in development is that the data is in different Activity Shared between . stay Android There are many ways to achieve this goal in development . Here is a common . Another common method is to use shear plates . I ...
utilize TOAD Realize the EXCEL Data import oracle database
utilize TOAD Realize the EXCEL Data import oracle database Tools : Toad11.7z( Baidu search , Direct download ) 1. take Excel Some fields in the file are imported into Oracle The corresponding table of the database Connect to the database you want to import , however ...

Random recommendation

spark To configure lzo
spark1.0 edition spark-env.sh in export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/soft/hadoop/lzo/lib/export S ...
android apk Simple decompile
1. View or extract resource files : Use Google's official tools apktool, The command line is as follows : apktool d xxx.apk xxx_decode d Represents decompilation ,xxx.apk For what you want to decompile apk,xxx_ ...
【Linux】 JDK Installation and configuration (tar.gz edition )
Installation environment Linux(Ubuntu edition ) JDK install tar.gz For decompressed versions that can be used , Here I will use jdk-8u65-linux-x64.tar.gz edition , The installation to /usr/java/ Next Step one Jiang Wen ...
sentos Installation on vnc The graphical interface
One . install gnome Graphic desktop CentOS 6.3 64 position #yum groupinstall -y "X Window System" #yum groupinstall - ...
Java An index implementation of enumeration storage
First introduce guava package ( A tool class for code verification ): <dependency> <groupId>com.google.guava</groupId> <art ...
ASCII Art ヾ(≧∇≦*)ゝ
Conmajia, 2012 Updated on Feb. 18, 2018 What is ASCII art? It's graphic symbols formed by ASCII char ...
Java Concurrent programming practice
It is easy to appear in the code bug Scene : Inconsistent synchronization , Call directly Thread.run, Locks that have not been released , Empty sync block , Double check lock , Start a thread in the constructor ,notify or notifyAll Notification error ,Object. ...
Beta sprint （3/7）
Beta sprint (3/7) Team name : The third perspective Group leader blog link Link to this assignment The team part Team burn out chart Work report Publicize ( group leader ) What tasks have been completed in the past two days written words / Verbally describe Participate in the development of keyword reminder Exhibition GitHu ...
Python Learning notes --- section list Yuan Zu Dictionaries aggregate
list [1,2,3,2]#[] Yuan Zu (1,2,3,2)#() Dictionaries {1:2,3:2}#{} aggregate {1,2,3,2}#{} 1, The difference between sets and lists , There can't be duplicate elements in the set 2. The difference between a dictionary and a set , It's all in curly brackets ...
bzoj 2460 [BeiJing2011] Elements ( Linear base ）
link :https://www.lydsy.com/JudgeOnline/problem.php?id=2460 The question : Give you a pile of ore , Ore has a,b Two properties , Take any ore , Meet these ores obtained a Property XOR ...