当前位置：网站首页>Discussion on MLIR Technology

Discussion on MLIR Technology

2022-06-09 02:31:00 【wujianming_ one hundred and ten thousand one hundred and sevent】

MLIR Technical talks
MLIR: Compiler infrastructure redefinition
MLIR（ Multi level intermediate representation ） It's language （ Such as C） Or library （ Such as TensorFlow） And compiler backend （ Such as LLVM） The middle between means (IR) System . Allow code reuse between different compiler stacks in different languages and other performance and availability benefits .
MLIR from Google Developed as an open source project , Mainly to improve TensorFlow Support on different back ends , But it can usually be used in any language .
Buddy-MLIR Detailed explanation
0x0. Preface
Whole Buddy-MLIR The biggest feeling of the project is , Whatever the result , You can start with run get up . although MLIR It has appeared for several years and there are also some star projects such as IREE It was a success . But compared to the TVM There is still a little gap in the richness of use cases , Especially in the Chinese community . This creates a problem , If one is right MLIR Interested in or based on MLIR Engage in some development work , You have to bite MLIR Of official documents Toy Tutorial To speed up . There is no denying that the official documents are very detailed and well structured , But for a completely new user , It's really not so friendly . Is there any way to MLIR After understanding the relevant basic concepts, you can quickly enter MLIR The world provides components to build a real application ？
Links to references
https://mp.weixin.qq.com/s/AM1hTcQsgbwG3hCzK6P_gQ
https://mp.weixin.qq.com/s/uE5VhU_s3NgndPk2X6zbAA
https://github.com/buddy-compiler/buddy-mlir

Personally, I think the recently launched Buddy-MLIR It relieves this pain point , Can easily run based on MLIR Make an application and learn at the same time MLIR To build their own applications .Buddy-MLIR Another highlight of the project is the organizational structure and LLVM/MLIR The project itself is as clear , It reduces the difficulty of grasping the whole project and reading relevant codes . Next , From Run And engineering structure analysis . In fact, this organizational structure OneFlow In the warehouse IR Part is exactly the same , But due to the OneFlow Calculation diagram and IR Has interacted with each other, so there is no such thing as IR Part of it is independent of a warehouse , Otherwise you will see Buddy-MLIR and OneFlow-MLIR The engineering structure is exactly the same .
0x1. How to run？
How to run ？ This should be one of the most important questions to get a project . Actually follow Buddy-MLIR Of README Can , However, there are still some details to pay attention to in actual operation . For the sake of Xiaobai users , Here's a record on a Ubuntu20.04 Complete compilation and Run Buddy-MLIR The process of .
Buddy-MLIR The project is based on LLVM/MLIR The expansion of the project , Or say LLVM yes Buddy-MLIR A dependency of , So you need to install this dependency first . The specific operation process is as follows ：
$ git clone [email protected]:buddy-compiler/buddy-mlir.git
$ cd buddy-mlir
$ git submodule update --init

$ cd buddy-mlir
$ mkdir llvm/build
$ cd llvm/build
$ cmake -G Ninja …/llvm
-DLLVM_ENABLE_PROJECTS=“mlir”
-DLLVM_TARGETS_TO_BUILD=“host;RISCV”
-DLLVM_ENABLE_ASSERTIONS=ON
-DCMAKE_BUILD_TYPE=RELEASE
$ ninja
$ ninja check-mlir
Follow the above command to complete LLVM Compilation of the project , The compilation results are stored in llvm/build In file . Then we can Buddy-MLIR The project directory of is based on LLVM The library provided by the compilation result is completed Buddy-MLIR Its own compilation . Yes Buddy-MLIR The project is compiled as follows ：
$ cd buddy-mlir
$ mkdir build
$ cd build
$ cmake -G Ninja …
-DMLIR_DIR= $KaTeX parse error: Undefined control sequence: \ at position 34: …lib/cmake/mlir \̲ ̲ -DLLVM_DIR=$ PWD/…/llvm/build/lib/cmake/llvm
-DLLVM_ENABLE_ASSERTIONS=ON
-DCMAKE_BUILD_TYPE=RELEASE
$ ninja check-buddy
After compilation, if the following output appears , namely FileCheck Success proves Buddy-MLIR The build process for has been successful .
Testing Time: 0.06s
Passed: 3
stay Buddy-MLIR There are currently three types of open source projects Dialect namely Bud Dialect,DIP Dialect as well as RVV Dialect. I still don't understand the introduction related to the project RVV Dialect, So this article will only cover Bud Dialect as well as DIP Dialect. among DIP Dialect For digital image processing （digital image processing ） The abstraction that goes on . because Buddy-MLIR C/C++ The front end relies on OpenCV To encode and decode pictures , therefore Buddy-MLIR Introduced OpenCV Third party Library . If you haven't compiled OpenCV You can compile using the following commands ：
$ sudo apt-get install libgtk2.0-dev pkg-config libcanberra-gtk-module
$ git clone https://github.com/opencv/opencv.git
$ cd opencv && mkdir build && cd build
$ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local …
$ make -j$(nproc)
$ sudo make install
Here we can put /usr/local Change to any custom directory . Later, we will build DIP Dialect Relevant applications need to be specified -DBUDDY_ENABLE_OPENCV=ON This option enables OpenCV.
Let's take a look Buddy-MLIR Which interesting examples are provided in .

IR Level example
IR The level example shows how to MLIR and Buddy-MLIR Use in pass, Some of these examples come from MLIR Integration testing . It can be used directly in most cases MLIR JIT engine mlir-cpu-runner function . The descending pipeline and tool chain are configured in makefile Specify... In the target . You can choose one you are interested in Dialect And find the target to run in the corresponding directory .Buddy-MLIR All the examples in are in https://github.com/buddy-compiler/buddy-mlir/tree/main/examples In this directory :

Buddy-MLIR Example classification
Click on any one Dialect Example MakeFile, It can be found that there are three main tests ：
• --lower . This test is used to show the descending pipeline . Will produce log.mlir file .
• --translate. This test is used to show that from the current Dialect Document generated LLVM IR. Will generate a log.ll file .
• --run. This test will use MLIR JIT Engine perform LLVM IR Produce results .
With MemRef Dialect Inside memref.dim Op For example , The compilation test method is as follows ：
$ cd buddy-mlir/examples/MLIRMemRef
$ make memref-dim-lower
$ make memref-dim-translate
$ make memref-dim-run
The original memref.dim Long like this ：
func.func @main() {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%mem0 = memref.alloc() : memref<2x3xf32>
%mem1 = memref.cast %mem0 : memref<2x3xf32> to memref<?x?xf32>
%dim0 = memref.dim %mem0, %c0 : memref<2x3xf32>
%dim1 = memref.dim %mem0, %c1 : memref<2x3xf32>
%dim2 = memref.dim %mem1, %c0 : memref<?x?xf32>
%dim3 = memref.dim %mem1, %c1 : memref<?x?xf32>
vector.print %dim0 : index
vector.print %dim1 : index
vector.print %dim2 : index
vector.print %dim3 : index
memref.dealloc %mem0 : memref<2x3xf32>
func.return

}
Use JIT Eagine Output executed ：
2. Convolution Vectorization Examples
Buddy-MLIR One is provided in 2D Vectorized convolution Pass conv-vectorization , This Pass Realized Coefficients Broadcasting algorithm with Strip Mining Algorithm , then strip mining size It can be configured . It's configured here as 256 demonstrate ：
$ cd buddy-mlir/build/bin
$ ./buddy-opt …/…/examples/ConvOpt/conv2d.mlir -conv-vectorization=“strip-mining=256”
The original conv2d.mlir Long like this ：
func.func @conv_2d(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
linalg.conv_2d ins (%arg0, %arg1: memref<?x?xf32>, memref<?x?xf32>)
outs (%arg2: memref<?x?xf32>)
return
}
Generated after the above executable command MLIR The results are as follows ：
#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (d0 ceildiv 256)>
module {
func.func @conv_2d(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c256 = arith.constant 256 : index
%cst = arith.constant 0.000000e+00 : f32
%0 = vector.splat %cst : vector<256xf32>
%1 = memref.dim %arg1, %c0 : memref<?x?xf32>
%2 = memref.dim %arg1, %c1 : memref<?x?xf32>
%3 = memref.dim %arg2, %c0 : memref<?x?xf32>
%4 = memref.dim %arg2, %c1 : memref<?x?xf32>
affine.for %arg3 = #map0(%c0) to #map0(%3) {
affine.for %arg4 = #map0(%c0) to #map0(%1) {
affine.for %arg5 = #map0(%c0) to #map0(%2) {
affine.for %arg6 = #map0(%c0) to #map1(%4) {
// Corresponding to the following steps 1
%5 = affine.vector_load %arg1[%arg4, %arg5] : memref<?x?xf32>, vector<1xf32>
%6 = vector.broadcast %5 : vector<1xf32> to vector<256xf32>
%7 = arith.muli %arg6, %c256 : index
%8 = arith.subi %4, %7 : index
%9 = arith.cmpi sge, %8, %c256 : index
scf.if %9 {
// Corresponding to step 2 below
%10 = affine.vector_load %arg0[%arg3 + %arg4, %arg5 + %arg6 * 256] : memref<?x?xf32>, vector<256xf32>
// It corresponds to step 3 below
%11 = affine.vector_load %arg2[%arg3, %arg6 * 256] : memref<?x?xf32>, vector<256xf32>
// Corresponding to step 4 below
%12 = vector.fma %10, %6, %11 : vector<256xf32>
// Corresponding to step 5 below
affine.vector_store %12, %arg2[%arg3, %arg6 * 256] : memref<?x?xf32>, vector<256xf32>
} else {
%10 = vector.create_mask %8 : vector<256xi1>
%11 = arith.addi %arg3, %arg4 : index
%12 = arith.muli %arg6, %c256 : index
%13 = arith.addi %arg5, %12 : index
%14 = vector.maskedload %arg0[%11, %13], %10, %0 : memref<?x?xf32>, vector<256xi1>, vector<256xf32> into vector<256xf32>
%15 = vector.maskedload %arg2[%arg3, %12], %10, %0 : memref<?x?xf32>, vector<256xi1>, vector<256xf32> into vector<256xf32>
%16 = vector.fma %14, %6, %15 : vector<256xf32>
vector.maskedstore %arg2[%arg3, %12], %10, %16 : memref<?x?xf32>, vector<256xi1>, vector<256xf32>
}
}
}
}
}
return
}
}
I may be confused to see this transformation , Combine this algorithm with Pass Implementation to understand .
Coefficients broadcasting（CB） The algorithm is 2D An efficient implementation of convolution .Buddy-MLIR be based on MLIR The infrastructure has completed the implementation of this algorithm . The implementation of this algorithm involves MLIR Dialect as well as Op Here's a list ：
• affine.for ： Execute the operation of the specified number of times loop body .
• affine.vector_load： Returns a vector from the buffer slice （MLIR MemRef Format ）.
• affine.vector_store： Write a vector to the cache slice （MLIR MemRef Format ）.
• vector.broadcast： Broadcast scalar or vector values as N- dimension The result vector .
• vector.fma： Vectorization type multiply add mixed instruction .
then CB The process of the algorithm is shown in the figure below ：
Insert picture description here

CB Algorithm flow
Note that the number of input channels is 1 A picture or feature of , then kernel The number of channels is also 1. The execution flow of the algorithm is about ：
• First of all, will kernel Each element of the uses vector_load Load into buffer And use vector.broadcast Broadcast to vector1 in .
• Then use the elements of the feature graph vector_load Load into vector2 in .
• The third step is to use the elements of the output feature graph vector_load Load into vector3 in .
• And then use vector.fma take vector1 and vector2 Multiply and add to vector3 On .
• Finally using vector_store Write the above results back to the buffer .
Be careful , after conv-vectorization Pass After that MLIR In file 2 Parts of . The other part uses vector.create_mask and vector.maskedstore , This corresponds to insufficient element bytes loaded at the end of each line of the feature graph in the above figure fma Instructions need to be 256Bit （ This 256 It's through -conv-vectorization=“strip-mining=256” designated ）, So you need a Mask To complete and then calculate .
• Edge detection example
Buddy-MLIR An edge detection example is also provided to demonstrate the optimization . conv-vectorization pass Responsible for using algorithms to step down linalg.conv_2d. And then use mlir-translate and llc Tools generate object files . Last , stay C++ Call this in the program MLIR Convolution function （ This process will be described in detail in Section 2 ）. Before running this example again, you need to ensure that OpenCV It's already installed , The installation method is described above .
This example also shows AutoConfig The mechanism “ magic power ”, Can help you specify strip mining size、ISA SIMD/Vector extension and target triple. You just need to enable BUDDY_EXAMPLES Options , No need to worry about tool chain configuration . The operation command is as follows ：
$ cd buddy-mlir/build
$ cmake -G Ninja … -DBUDDY_EXAMPLES=ON -DBUDDY_ENABLE_OPENCV=ON
$ ninja edge-detection
Of course , You can also use your own configuration values -DBUDDY_CONV_OPT_STRIP_MINING（ for example 64） and -DBUDDY_OPT_ATTR（ for example avx2）.
The warehouse provided a picture , Path is buddy-mlir/examples/ConvOpt/images/YuTu.png , This is the robotic lunar rover that forms part of China's Chang'e 3 mission . Then run the following command to perform edge detection .
$ cd bin
$ ./edge-detection …/…/examples/ConvOpt/images/YuTu.png result.png
Insert picture description here

Original picture
Insert picture description here

Image after edge detection
3. Digital Image Processing Examples
Buddy-MLIR It also provides DIP Dialect Related demonstration examples , Specifically, give a picture Constant Padding perhaps Replicate Padding Then do convolution . The operation steps are similar to the above , It's not going to show . Interested readers can experience by themselves . link ：https://github.com/buddy-compiler/buddy-mlir/tree/main/examples#digital-image-processing-examples .
0x2. How to Understand?
The above section mainly shows that Buddy-MLIR How to run the built application in , This section will take you from Buddy-MLIR To understand this project from the structure of . The overall structure of the project can be summarized as ：

Insert picture description here

Buddy-MLIR Engineering structure
Focus on include and lib Two folders , Other documents , Test and tool class source code readers can have a choice to view .
2.1 Bud Dialect
As can be seen from the above figure ,Buddy-MLIR There are three main types Dialect , namely ：Bud Dialect,DIP Dialect also RVV Dialect. about Dialect The definition of follows and LLVM The upstream Dialect Same file structure and method , So I won't go into details here . If you want to know more details, you can check https://github.com/BBuf/tvm_mlir_learn Warehouse MLIR： The compiler infrastructure at the end of Moore's law Interpretation of the thesis article .
The main focus here is Bud Dialect Which operations are defined in . from buddy-mlir/include/Dialect/Bud/BudOps.td Can be seen in Bud Dialect It mainly defines 4 There are two types of operations ：
• Bud_TestConstantOp. This Op Used to test constants Op.
• Bud_TestPrintOp . This Op For test printing Op.
• Bud_TestEnumAttrOp . stay Op Test enumeration properties in .
• Bud_TestArrayAttrOp . stay Op Test array properties in .
After building the basic operations , Need to be for Bud Dialect Register a descending Pipline, That is to say lib/Conversion/LowerBud/LowerBudPass.cpp Implemented in LowerBudPass .
Yes bud::TestConstantOp The implementation is as follows ：
class BudTestConstantLowering : public OpRewritePatternbud::TestConstantOp {
public:
using OpRewritePatternbud::TestConstantOp::OpRewritePattern;

LogicalResult matchAndRewrite(bud::TestConstantOp op,
PatternRewriter &rewriter) const override {
auto loc = op.getLoc();
// Get type from the origin operation.
Type resultType = op.getResult().getType();
// Create constant operation.
Attribute zeroAttr = rewriter.getZeroAttr(resultType);
Value c0 = rewriter.createmlir::arith::ConstantOp(loc, resultType, zeroAttr);

rewriter.replaceOp(op, c0);
return success();

}
};
You can see that in the match to bud::TestConstantOp It will be rewritten as mlir::arith::ConstantOp . Can be in buddy-mlir/examples/BudDialect perform make bud-constant-lower . The result is zero ：
module {
%i0 = bud.test_constant : i32
}

=>
module {
%c0_i32 = arith.constant 0 : i32
}
Other operations are similar , Are all the Bud Dialect Several operations defined Lower To several designated upstream Dialect On . This LowerBudPass The concrete realization of is ：
namespace {
class LowerBudPass : public PassWrapper<LowerBudPass, OperationPass> {
public:
MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(LowerBudPass)
LowerBudPass() = default;
LowerBudPass(const LowerBudPass &) {}

StringRef getArgument() const final { return “lower-bud”; }
StringRef getDescription() const final { return “Lower Bud Dialect.”; }

void runOnOperation() override;

void getDependentDialects(DialectRegistry &registry) const override {
// clang-format off
registry.insert<
buddy::bud::BudDialect,
func::FuncDialect,
vector::VectorDialect,
memref::MemRefDialect>();
// clang-format on
}
};
} // end anonymous namespace.

void LowerBudPass::runOnOperation() {
MLIRContext *context = &getContext();
ModuleOp module = getOperation();

ConversionTarget target(*context);
// clang-format off
target.addLegalDialect<
arith::ArithmeticDialect,
func::FuncDialect,
vector::VectorDialect,
memref::MemRefDialect>();
// clang-format on
target.addLegalOp<ModuleOp, func::FuncOp, func::ReturnOp>();

RewritePatternSet patterns(context);
populateLowerBudConversionPatterns(patterns);

if (failed(applyPartialConversion(module, target, std::move(patterns))))
signalPassFailure();
}
You can see Bud Dialect The operation of will be mainly Lower To arith::ArithmeticDialect, func::FuncDialect, vector::VectorDialect, memref::MemRefDialect On .
You can also see from the above introduction ,Buddy Dialect Actually, it's just a demonstration , Maybe it is to teach beginners how to quickly define a new Dialect And access MLIR The ecology of .
2.2 DIP Dialect
Insert picture description here
DIP Dialect Is an abstraction of array image processing . Here's to show DIP Dialect Currently defined operations .
def DIP_ConstantPadding : I32EnumAttrCase<“ConstantPadding”, 0, “CONSTANT_PADDING”>;
def DIP_ReplicatePadding : I32EnumAttrCase<“ReplicatePadding”, 1, “REPLICATE_PADDING”>;

def DIP_BoundaryOption : I32EnumAttr<“BoundaryOption”,
“Specifies desired method of boundary extrapolation during image processing.”,
[
DIP_ConstantPadding,
DIP_ReplicatePadding
]>{
let genSpecializedAttr = 0;
let cppNamespace = “::buddy::dip”;
}

def DIP_BoundaryOptionAttr : EnumAttr<DIP_Dialect, DIP_BoundaryOption, “boundary_option”>;
DIP Dialect Of Corr2DOp
DIP Dialect Defines the only operation DIP_Corr2DOp , This Op Doing it 2D Before convolution, the input will be Padding The size of the output characteristic graph after convolution is consistent with that of the input . There are also many optimization techniques involved , It is embodied in https://github.com/buddy-compiler/buddy-mlir/blob/main/docs/dip-opt.md This document and https://github.com/buddy-compiler/buddy-mlir/blob/main/lib/Conversion/LowerDIP/LowerDIPPass.cpp This Pass In the implementation . The logic of this algorithm is not fully understood , So this part will not be explained here , Interested readers can do their own research .
2.3 Interface
It says Buddy-MLIR Two types defined in the project Dialect , This section needs to answer such a question . namely , Based on Buddy-MLIR The constructed algorithm is C/C++ Call in the front end to implement a complete application ？
To achieve this ,Buddy-MLIR The implementation of a C/C++ Data structure of front-end services MemRef,https://github.com/buddy-compiler/buddy-mlir/blob/main/include/Interface/buddy/core/Container.h .
// MemRef descriptor.
// - T represents the type of the elements.
// - N represents the number of dimensions.
// - The storage order is NCHW.
template <typename T, size_t N> class MemRef {
public:
// Constructor from shape.
MemRef(intptr_t sizes[N], T init = T(0));
// Constructor from data.
MemRef(const T *data, intptr_t sizes[N], intptr_t offset = 0);
// Copy constructor.
MemRef(const MemRef<T, N> &other);
// Copy assignment operator.
MemRef<T, N> &operator=(const MemRef<T, N> &other);
// Move constructor.
MemRef(MemRef<T, N> &&other) noexcept;
// Move assignment operator.
MemRef<T, N> &operator=(MemRef<T, N> &&other) noexcept;
// Desctrutor.
~MemRef();
// Get the data pointer.
T *getData();
// Get the sizes (shape).
const intptr_t *getSizes() { return sizes; }
// Get the strides.
const intptr_t *getStrides() { return strides; }
// Get the rank of the memref.
size_t getRank() const { return N; }
// Get the size (number of elements).
size_t getSize() const { return size; }
// Get the element at index.
const T &operator[](size_t index) const;
T &operator[](size_t index);

protected:
// Default constructor.
// This constructor is desinged for derived domain-specific constructor.
MemRef() {};
// Set the strides.
// Computes the strides of the transposed tensor for transpose=true.
void setStrides();
// Compute the product of array elements.
size_t product(intptr_t sizes[N]) const;

// Data.
// The aligned and allocated members point to the same address, aligned
// member is responsible for handling data, and allocated member is
// resposible for handling the memory space.
T *allocated;
T *aligned;
// Offset.
intptr_t offset = 0;
// Shape.
intptr_t sizes[N];
// Strides.
intptr_t strides[N];
// Number of elements.
size_t size;
};
The specific implementation is ：https://github.com/buddy-compiler/buddy-mlir/blob/main/lib/Interface/core/Container.cpp . Here, I'd like to sort out the customized MemRef How classes serve C/C++ Front-end . Here we take edge detection as an example . The core code is as follows ：
#include
#include <opencv2/imgcodecs.hpp>
#include <time.h>

#include “Interface/buddy/core/ImageContainer.h”
#include “kernels.h”

using namespace cv;
using namespace std;

// Declare the conv2d C interface.
extern “C” {
void _mlir_ciface_conv_2d(Img<float, 2> *input, MemRef<float, 2> *kernel,
MemRef<float, 2> *output);
}

int main(int argc, char *argv[]) {
printf(“Start processing…\n”);

// Read as grayscale image.
Mat image = imread(argv[1], IMREAD_GRAYSCALE);
if (image.empty()) {
cout << "Could not read the image: " << argv[1] << endl;
return 1;
}
Img<float, 2> input(image);

// Define the kernel.
float *kernelAlign = laplacianKernelAlign;
int kernelRows = laplacianKernelRows;
int kernelCols = laplacianKernelCols;
intptr_t sizesKernel[2] = {kernelRows, kernelCols};
MemRef<float, 2> kernel(kernelAlign, sizesKernel);

// Define the output.
int outputRows = image.rows - kernelRows + 1;
int outputCols = image.cols - kernelCols + 1;
intptr_t sizesOutput[2] = {outputRows, outputCols};
MemRef<float, 2> output(sizesOutput);

// Run the convolution and record the time.
clock_t start, end;
start = clock();

// Call the MLIR conv2d function.
_mlir_ciface_conv_2d(&input, &kernel, &output);

end = clock();
cout << “Execution time: " << (double)(end - start) / CLOCKS_PER_SEC << " s”
<< endl;

// Define a cv::Mat with the output of the conv2d.
Mat outputImage(outputRows, outputCols, CV_32FC1, output.getData());

// Choose a PNG compression level
vector compression_params;
compression_params.push_back(IMWRITE_PNG_COMPRESSION);
compression_params.push_back(9);

// Write output to PNG.
bool result = false;
try {
result = imwrite(argv[2], outputImage, compression_params);
} catch (const cv::Exception &ex) {
fprintf(stderr, “Exception converting image to PNG format: %s\n”,
ex.what());
}
if (result)
cout << “Saved PNG file.” << endl;
else
cout << “ERROR: Can’t save PNG file.” << endl;

return 0;
}
Notice the Img The base class of class is also MemRef class .
// Image container.
// - T represents the type of the elements.
// - N represents the number of dimensions.
template <typename T, size_t N> class Img : public MemRef<T, N> {
public:
Img(cv::Mat image);
};
Then in the above application, we define conv2d Op Of C Front end function ：
// Declare the conv2d C interface.
extern “C” {
void _mlir_ciface_conv_2d(Img<float, 2> *input, MemRef<float, 2> *kernel,
MemRef<float, 2> *output);
}
This is the overall situation C The function will execute buddy-opt Is translated into llvm.call Instructions , namely CMakeLists.txt This part of ：
add_custom_command(OUTPUT conv2d.o
COMMAND ${BUDDY_BINARY_DIR}/buddy-opt ${BUDDY_EXAMPLES_DIR}/ConvOpt/conv2d.mlir -conv-vectorization="strip-mining=$ {SPLITING_SIZE}" -lower-affine -convert-scf-to-cf -convert-vector-to-llvm -convert-memref-to-llvm -convert-func-to-llvm=‘emit-c-wrappers=1’ -reconcile-unrealized-casts |
${LLVM_MLIR_BINARY_DIR}/mlir-translate --mlir-to-llvmir |
${LLVM_MLIR_BINARY_DIR}/llc -mtriple=$ {BUDDY_TARGET_TRIPLE} -mattr=${BUDDY_OPT_ATTR} --filetype=obj -o ${BUDDY_BINARY_DIR}/…/examples/ConvOpt/conv2d.o
DEPENDS buddy-opt)
conv2d The original of the operation MLIR The content of the document is ：
func.func @conv_2d(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
linalg.conv_2d ins (%arg0, %arg1: memref<?x?xf32>, memref<?x?xf32>)
outs (%arg2: memref<?x?xf32>)
return
}
In execution -convert-func-to-llvm=‘emit-c-wrappers=1’ This Pass When the above Func Dialect Under the conv2d The operation translates into LLVM IR And package it as a llvm.call Instructions . The detailed interaction process here is shown in buddy-mlir/llvm/mlir/docs/TargetLLVMIR.md You can see in this document , That is to say MLIR Provides a C/C++ Front end interface function of , Buddy-MLIR This front-end interface function is applied to complete the end-to-end application construction .
It got LLVM IR, And then from cmake You can see that the command is called again LLVM llc Command compilation LLVM Source files to assembly language for the specified architecture . then , Assembly language output can be passed through native assemblers and linkers , To generate a native executable . The execution architecture and some optimization parameters can be specified here .
0x3. buddy-opt and buddy-translate
The implementation of Pass Add to MLIR Upstream Pass In the management mechanism buddy-opt Tools .
and buddy-translate Only one Buddy Dialect To LLVMIR A function of translation .
in general ,Buddy-MLIR It's an introduction MLIR Or with MLIR A good example of building your own application for infrastructure . Recommend readers or developers in need to learn and explore more possibilities . This article does not cover any RVV Dialect Knowledge about , Because I don't know much about , I hope Hong bin can talk about this later Dialect Motivation and details .
MLIR - A new kind of IR Presentation and compiler framework
With the development of deep learning technology , Deep learning technology has gradually shifted from the direction of academic research to the direction of practical application , This not only has a high demand for the accuracy of the depth model , It also has a higher and higher demand for the reasoning speed of the depth model .
At present, the reasoning engine of depth model can be divided into two types according to the implementation mode ：
o Interpretive reasoning engine ： It usually includes a model parser and a model interpreter , Some inference engines may also include a model optimizer . The model parser is responsible for reading and parsing model files , And convert it to a memory format suitable for interpreter processing ; The model optimizer is responsible for transforming the original model into an equivalent 、 But models with faster reasoning speed ; The model interpreter analyzes the model in memory format and accepts the input data of the model , Then, according to the structure of the model, the corresponding operators inside the model are executed in turn , Finally, the output of the model is generated .
o Compiled inference engine ： It usually includes a model parser and a model compiler . The role of model parser is the same as that of interpretive inference engine ; The model compiler is responsible for compiling the model into a computing device （CPU、GPU etc. ） Machine code that can be directly processed , And it is possible to apply various optimization methods in the compilation process to improve the efficiency of the generated machine code . Because the model of machine code can be directly processed by computing devices without the participation of additional interpreters , It eliminates the cost of interpreter scheduling . Besides , Compared with interpretive reasoning engine , Because the process of generating machine code depends more on the bottom , The compiler has more optimization opportunities to achieve higher execution efficiency .
Now the industry has a higher demand for the execution speed of the inference engine , Compiled inference engine has gradually become the development direction of high-speed inference engine . The target compiled inference engine has Apache TVM、oneDNN、PlaidML、TensorFlow XLA、TensorFlow Runtime etc. .
For the sake of optimization , Generally speaking, the inference engine will transform the model into an intermediate representation , Then the intermediate representation is optimized and transformed , Finally, the target model is generated （ For interpretive reasoning engines ） Or target machine code （ For compiled inference engines ）. Besides , In addition to deep learning areas , In the field of programming languages, intermediate representation has been introduced for optimization and transformation long ago . And new programming languages are emerging in endlessly , So there are all kinds of intermediate expressions ：
Insert picture description here

Different inference engines or compilers have their own intermediate representations and optimization schemes , Each intermediate representation and optimization scheme may need to be implemented from scratch , Eventually, it may lead to software fragmentation and repeated development work .
MLIR brief introduction
MLIR（Multi-Level Intermediate Representation） Is a new framework for building reusable and extensible compilers .MLIR Designed to address software fragmentation 、 Improve the compilation of heterogeneous hardware 、 Reduce the cost of building domain specific compilers , And help connect existing compilers .
MLIR Designed to be a hybrid intermediate representation that supports a variety of different requirements in a unified infrastructure , for example ：
o Represents a data flow diagram （ For example, in TensorFlow in ） The ability of , Including dynamic characters 、 User extensible operator ecosystem 、TensorFlow Variable etc. .
o Optimize and transform in these diagrams （ For example, in Grappler in ）.
o A representation of the kernel of machine learning operators in a form suitable for optimization .
o Loop optimization that can host high-performance computing styles across kernels （ The fusion 、 Circular exchange 、 Block, etc ）, And can change the memory layout of data .
o Code generation “ falling ” Transformation , for example DMA Insert 、 Explicit cache management 、 Memory blocks , as well as 1 Peace-keeping 2 Vectorization of dimensional register architecture .
o Indicates the capability of a target specific operation , For example, accelerator specific high-level operations .
o Quantization and other graph transformations in deep learning graphs .
MLIR Is a generic intermediate representation that supports hardware specific operations . therefore , To surround MLIR Any investment in the infrastructure （ For example, in the compiler pass Work on ） Will produce good returns ; Many goals can use this infrastructure , And benefit from it .
Even though MLIR It's a powerful framework , There are also some non goals .MLIR Do not try to support the underlying machine code generation algorithm （ Such as register allocation and instruction scheduling ）. These are better suited to the underlying optimizer （ for example LLVM）. Besides ,MLIR It is not intended to be the source language for the end user to write the operator kernel （ Be similar to CUDA and C++）. On the other hand ,MLIR Provides pillars for representing such domain specific languages and integrating them into the ecosystem .
MLIR Benefit from building other intermediate representations at build time （LLVM IR、XLA HLO and Swift SIL） Experience gained in the process of .MLIR The framework encourages existing best practices , for example ： Write and maintain intermediate representation specifications 、 Build an intermediate representation validator 、 To provide the MLIR File dump and parsing to text 、 Use FileCheck Tools to write detailed unit tests 、 And building the basic framework in the form of a set of modular libraries that can be combined in a new way .
Other lessons have been incorporated into the design . for example ,LLVM There is an unobtrusive design error , It prevents the multithreaded compiler from processing LLVM Multiple functions in the module .MLIR Pass restrictions SSA Scope to reduce the use of - Define the chain , And use explicit symbolic references instead of cross function references to solve these problems .
MLIR dialect （Dialect）
MLIR adopt “ dialect ” To define different levels of intermediate representation , Each dialect has its own unique name space . Developers can create custom dialects , And define the operation inside the dialect 、 Types and properties , And semantics .MLIR It is recommended to use dialect to MLIR Expand . Having such a unified intermediate representation framework reduces the cost of developing new compilers . Except that it can be used C++ Language defines dialects beyond ,MLIR It also provides a declarative way to define dialects , That is, users write TableGen Format file to define dialect , And then use TableGen Tools to generate corresponding C++ Header and source files , And the corresponding documents .MLIR This declarative way of defining dialects is also recommended . Besides ,MLIR It also provides a framework for conversions between or within dialects .
To facilitate development ,MLIR There are also built-in dialects for direct use ：
o acc
o affine
o async
o avx512
o gpu
o linalg
o llvm
o nvvm
o omp
o pdl
o pdl_interp
o quant
o rocdl
o scf
o shape
o spv
o std
o vector
MLIR Use “ operation ” To describe different levels of abstraction and computation .MLIR The operations in are also extensible , Users can create custom operations and specify their semantics . For example, target independent operation 、 Affine operation and target specific operation .MLIR It also supports users to use declarative methods （TableGen） To create custom actions .
MLIR Each value in has its corresponding “ type ”,MLIR Built in some primitive types （ For example, integer. ） And aggregate types （ Tensor and memory buffer ）.MLIR Its type system also allows users to extend it , Create custom types and specify their semantics .
In addition to MLIR in , The user can specify the “ attribute ” To control the behavior of the operation . An operation can define its own properties , For example, convolution operation stride Properties, etc .
The transformation of dialects
stay MLIR You can define its normalized behavior when defining operations in , For example, will x + 2 and 2 + x Unified and normalized to x + 2, So that the subsequent optimization process can be carried out more conveniently .MLIR In a greedy strategy , Constantly apply normalized transformations , Until the middle indicates convergence .
stay MLIR During the conversion within or between dialects , The user must first define a transformation target . The conversion target specifies what operations can occur in the generated target . Then the user needs to specify a set of rewrite patterns , These rewrite patterns define the transformation relationships between operations . Finally, the framework performs the conversion according to the conversion target and rewrite mode specified by the user . This conversion process will automatically detect the conversion mode , For example, if..., is specified A → B and B → C The rewriting mode of , The framework will complete automatically A → C Conversion process .MLIR It also supports users to use declarative methods （TableGen） To create a custom override pattern . When there are different type systems between the converted dialects , Users can use type converters to convert between types .
MLIR Users of
o ONNX MLIR： take ONNX The deep learning network model of the format is transformed into a binary format that can be executed in different binaries .
o PlaidML： An open source tensor compiler , Allow deep learning models to run on different hardware platforms .
o TensorFlow：TensorFlow Project XLA and TensorFlow Lite The model converter uses MLIR.
o TensorFlow Runtime： A new kind of TensorFlow Runtime .
o Verona： A new research programming language , For exploring concurrent ownership . It provides a new concurrency model that can be seamlessly integrated with ownership .
Conclusion
MLIR Is a new compiler framework , Its design draws lessons from the implementation of existing compilers , Including the definition of intermediate representation 、 Conversion and optimization , It greatly facilitates the development and debugging of new compilers . meanwhile ,MLIR It also contains many ready-made tools that can be used directly （batteries included）.MLIR It covers the general part of compiler design , So that the compiler developers can focus on the core semantic analysis 、 Design and transformation of intermediate representation , So as to reduce the development cost , Improve development efficiency and product quality .
External links
o MLIR Home page ：https://mlir.llvm.org/.
o MLIR Language reference ：https://mlir.llvm.org/docs/LangRef/.
appendix ： Compile and install MLIR
download MLIR
MLIR yes LLVM A subproject of a project , To compile MLIR, First of all get LLVM Source code .
LLVM Source code of GitHub obtain ：
git clone https://github.com/llvm/llvm-project.git
Users can also download the source code package directly ：https://github.com/llvm/llvm-project/releases.
Assume LLVM The source directory of is $LLVM_SRC.
compile MLIR
First, the user needs to specify a path to store the compilation intermediate products , Assume that the path is $LLVM_BUILD. Then use the following command to LLVM To configure ： cmake -S "$ LLVM_SRC" -B “ $LLVM_BUILD" -DLLVM_ENABLE_PROJECTS=mlir -DCMAKE_BUILD_TYPE=Release By default ,LLVM Exception handling and runtime type information are disabled . If the application needs to rely on these functions , It can be specified that... Is specified during configuration LLVM_ENABLE_EH and LLVM_ENABLE_RTTI CMake The value of the variable is ON： cmake -S "$ LLVM_SRC” -B “ $LLVM_BUILD" -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_EH=ON -DLLVM_ENABLE_RTTI=ON -DCMAKE_BUILD_TYPE=Release added LLVM See for configuration parameters https://llvm.org/docs/CMake.html. After the configuration process, use the following command to perform the compilation ： cmake --build "$ LLVM_BUILD”
install MLIR
Use the following command to LLVM The installation to /usr/local Catalog ：
cmake --install “$LLVM_BUILD”
If you want to specify another installation directory , for example $INSTALL_DIR, have access to --prefix Command line arguments to specify ： cmake --install "$ LLVM_BUILD" --prefix “ $INSTALL_DIR" stay CMake Project use MLIR Users can go to CMake Use the following statements in the project file to add a lookup MLIR rely on ： find_package(MLIR REQUIRED CONFIG) If MLIR Installed in the system directory （ such as /、/usr、/usr/local etc. ）,CMake No additional configuration is required to find MLIR; If MLIR Installed in a non system directory , Can be in CMake The configuration process of is through CMake Of MLIR_DIR Variable to specify MLIR Installation position of ： cmake "$ MY_PROJECT_DIR” -DMLIR_DIR=“$INSTALL_DIR”
After success, users can directly use MLIR Dependency on the library as the compilation target ：
add_executable(my-executable main.cpp)
target_include_directories(my-executable SYSTEM PRIVATE ${MLIR_INCLUDE_DIRS}) target_link_libraries(my-executable PRIVATE MLIRIR) among MLIR_INCLUDE_DIRS Is an automatically generated variable , It points to MLIR The contents of . In the use of CMake When defining an executable target , If LLVM Runtime type information is disabled , So dependent on LLVM The executable target of also needs to disable runtime type information , Otherwise, compilation may fail .LLVM Provides a CMake Help function llvm_update_compile_flags This configuration can be done automatically . This function is defined in LLVM Provided AddLLVM.cmake In file . Users can use the following statements to import AddLLVM.cmake file ： list(APPEND CMAKE_MODULE_PATH "$ {LLVM_CMAKE_DIR}")
include(AddLLVM)
Import AddLLVM.cmake File, you can configure the compilation target ：
llvm_update_compile_flags(my-executable)
complete CMake An example of a project file is as follows ：
cmake_minimum_required(VERSION 3.15)
project(my-executable)

find_package(MLIR REQUIRED CONFIG)
list(APPEND CMAKE_MODULE_PATH “${LLVM_CMAKE_DIR}”)
include(AddLLVM)

add_executable(my-executable main.cpp)
target_include_directories(my-executable SYSTEM PRIVATE ${MLIR_INCLUDE_DIRS})
target_link_libraries(my-executable PRIVATE MLIRIR)
llvm_update_compile_flags(my-executable)

Links to references
https://mp.weixin.qq.com/s/AM1hTcQsgbwG3hCzK6P_gQ
https://mp.weixin.qq.com/s/uE5VhU_s3NgndPk2X6zbAA
https://github.com/buddy-compiler/buddy-mlir

原网站

版权声明
本文为[wujianming_ one hundred and ten thousand one hundred and sevent]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/159/202206081254596884.html

当前位置：网站首页>Discussion on MLIR Technology

Discussion on MLIR Technology

边栏推荐

猜你喜欢

随机推荐