当前位置：网站首页>EC code introduction

EC code introduction

2022-07-27 22:13:00 【Samooyou】

What is? EC code

EC(Erasure code), It is a kind of erasure code , Compared with multi replica replication , Erasure codes can achieve higher data reliability with less data redundancy , But the coding method is more complex , It takes a lot of calculation . Erasure codes can only tolerate data loss , Data tampering cannot be tolerated , This is the name of erasure code .

Principle of erasure "> Principle of erasure "> Principle of erasure

EC The code is divided into data block and check block . Suppose our input data is in D1,D2,...D5 To represent the vector of , matrix B Is the coding matrix , After coding, we get D,C Matrix of composition , among D For data blocks ,C Is a check block . Our data writing needs to be encoded before it can be stored .

Redundancy comparison

Compared with traditional files that are divided into data blocks for storage ,EC Encoded files are divided into blocks , A block group is divided into data blocks (data block) Sum check block (parity block), When a block loss occurs in a block group , When the number of lost blocks does not exceed a certain number , We can recover through the remaining blocks in the block group . for example RS(6-3), It means that a block group consists of 6 Data blocks and 3 Check blocks , The number of dropped blocks that can be tolerated is equal to the number of check blocks , That is to say 3.9 As long as you don't lose more than 3 Block , Can be recovered by relevant algorithms .（RS For Reed-Solomon code )

We store and RS(6-3) For example , Compare the data redundancy of the two .

Classic three copy storage , The document is divided into several Block, And each Block Corresponding to three copies （Replicas）, in other words , Two of the three replicas belong to redundant storage ,200% Redundancy ratio .

RS(6-3), Within a block group ,9 There are 6 Data block , That is to say, there is only one left 3 Check blocks are redundant , The redundancy rate is 50%.

so , use EC Code in principle , It can greatly reduce redundancy , Improve storage efficiency .

EC Storage of code file

EC The storage of code files is mainly divided into continuous storage (contiguous storage) And stripe cell storage (stripe cell storage), because HDFS Continuous storage is not yet supported , Our next concepts are all around stripe cell storage .

Fringe unit (Stripe Unit)：

stay EC Code encoded file , The file is divided into several fringe units . Stripe unit is stored , The stripe cells will be scattered and stored in multiple DN On .

EC Code code （Encoding）

The stripe unit acts as encoder The input of , The verification unit acts as encoder Output , The process of generating a verification unit from a fringe unit is called EC Code encoding .

Write each stripe cell , Will be coded , Generate several verification units , These check units are written into several check blocks .

EC Code decoding (Decoding)

The remaining fringe unit and check unit are input as decoder , Finally get complete data , This process of recovering data is called decoding .

quote EC Post code changes to the architecture

NN End extension ： Normally , The block group contains several blocks , Developers have introduced a new block naming pattern , Let's get the block group it belongs to from the block name , So as to realize the management at the block group level .

Client End extension ：Client You can read and write in parallel ,DFSStripedOutputStream management data streamers A collection of , Every data streamer One corresponding to the internal block of a storage block group DN. each data streamers Basically, they work asynchronously , One of the coordinators coordinates the whole writing process , Including the end of writing the block group , Allocation of new block groups , It realizes parallel writing at the block level , Of course , Block group level or serial write . In terms of reading ,DFSStripedInputStream It can convert the byte range of the file requested to be read into the byte range of multiple internal blocks in the block group , And then realize parallel reading .HDFS It's using online EC, Code while writing .

DN End extension ：DN The end will run one ECWorker The task of , Mainly for the backstage to deal with bad EC Code block . bad EC The code block will be NN detected , Then choose one DN Do recovery work , Relevant recovery tasks are informed by the return of heartbeat . This process is right or wrong Replicated Block The recovery process is very similar . The reconstruction process mainly performs these three key tasks .

1. from source nodes Read data at ,, The input data is read in parallel by a specific thread pool . be based on EC Code strategy , Just read the minimum number of input blocks that can be used for recovery .

2. Decode data and generate output data , New data and blocks can be obtained by decoding the input data . All missing data blocks and check blocks will be decoded together .

3. Transfer the generated data block to the target node , Once decoding is complete , The recovered block will be passed to the target DN.

EC Code strategies have different modes , It encapsulates our coding / The way of decoding . The definition of each method contains the following information ：

1.EC schema, It includes the number of data blocks and check blocks in a block group and the related coding algorithm .RS(6,3) Namely 6 Data blocks ,3 Check blocks , With Reed-Solomon code .

2. The size of the strip unit , This determines the granularity of our reading and writing , Include clients buffer size , Some work of coding .

for example ,RS-6-3-1024k, Express RS code ,6 Data blocks 3 Check blocks , The size of each strip unit is 1024k.

EC Code strategy and multi copy strategy can coexist , You can set relevant directories to force the use of multiple copies instead of EC code . alike , Specifically EC The code policy is set on the directory , When a file is created , It will use the nearest ancestor path EC Code strategy . Directory level EC The code policy will only affect the newly created files in the directory . Once the file is created , His EC The code strategy will not change , Unless we copy this file （ For example, use distcp）, This will rewrite his data . It is useless to rename or move the file to another directory .

Users can also use XML File customization EC Strategy , I won't repeat it here , For details, please refer to Official website Information .

Deploy

EC Code pair cluster CPU And network have higher requirements .

EC The encoding and decoding of the code will consume extra CPU, Mainly in the Client End sum DN End .

Realization EC The code strategy needs at least DN Reach a certain number , for example RS(6,3) At the very least 9 individual DN.（6 Data blocks ,3 Check blocks ）

To achieve rack level fault tolerance ,EC Code files will be transferred between racks , Most operations of reading and writing strip units are cross rack , Therefore, bisection of bandwidth is very important .

meanwhile , Having enough racks is also very important for rack level fault tolerance , Each rack cannot hold more blocks than the number of check blocks , That is to say, at least ( Data blocks + Check block )/ Check block , Take the whole rack up , Otherwise, rack level fault tolerance cannot be achieved . Even if the number of racks is not enough , The file written by the bar will still be propagated to multiple nodes , Try to ensure node level fault tolerance .