当前位置:网站首页>Microservice architecture | how to solve the problem of fragment uploading of large attachments?

Microservice architecture | how to solve the problem of fragment uploading of large attachments?

2022-06-23 21:11:00 Code farming architecture

Reading guide : Patch uploading 、 Breakpoint continuation , These two nouns should not be unfamiliar to friends who have done or are familiar with file uploading , Summarize this article, hoping to be helpful or enlightening to students engaged in related work .

When our files are very large , Does it take a long time to upload , Such a long connection , What if the network fluctuates ? The intermediate network is disconnected ? In such a long process, if there is instability , All the content uploaded this time has failed , Upload again .

Patch uploading , Is to upload the file , According to a certain size , Separate the entire file into blocks ( We call it Part) To upload separately , After uploading, the server will summarize all the uploaded files and integrate them into the original files . Piecemeal upload can not only avoid the problem of always having to upload from the starting position of the file due to the poor network environment , Multithreading can also be used to send different block data concurrently , Improve transmission efficiency , Reduce sending time .

One 、 background

After the sudden increase in the number of system users , In order to better adapt to the customized needs of various groups . Business support is slowly realized C End user defined layout and configuration , Causes configuration data to be read IO a surge .

To better optimize such scenarios , Manage user-defined configuration statically ! The configuration file that will be generated is the static configuration file , There are thorny problems in the process of generating static files , The configuration file is too large, resulting in long waiting time in the file upload server , As a result, the overall performance of the whole business scenario declines .

Two 、 Generate configuration files

Three elements of generating files

The contents of the document 、 File storage formats are easy to understand and handle , Of course, I've sorted out the encryption methods commonly used in microservices

Here is a supplementary explanation , If you want to encrypt the file content, you can consider . However, the case scenario in this paper has a low degree of confidentiality for the configuration information , There's no expansion here .

The naming criteria for file names are determined in combination with business scenarios , It's usually based on a profile + Timestamp format is the main format . However, such naming conventions can easily lead to file name conflicts , Cause unnecessary follow-up trouble .

So I made a special treatment for the naming of file names here , I have handled the front end Route Routing experience should be associated with , The filename can be accessed through Content based generation Hash value Instead of .

stay Spring 3.0 Then the method of calculating the summary is provided .

DigestUtils#md

Returns the of the given byte MD5 Hexadecimal string representation of the abstract .

md5DigestAsHex Source code

/**
 *  Calculate the bytes of the summary 
 * @param   A hexadecimal summary character 
 * @return  String returns the of a given byte  MD5  Hexadecimal string representation of the abstract .
 */
public static String md5DigestAsHex(byte[] bytes) {
  return digestAsHexString(MD5_ALGORITHM_NAME, bytes);
}

file name 、 Content 、 suffix ( Storage format ) Generate the file directly after confirmation

/**
  *  Generate... Directly from content   file 
  */
public static void generateFile(String destDirPath, String fileName, String content) throws FileZipException {
    File targetFile = new File(destDirPath + File.separator + fileName);
      // Ensure that the parent directory exists 
      if (!targetFile.getParentFile().exists()) {
        if (!targetFile.getParentFile().mkdirs()) {
          throw new FileZipException(" path is not found ");
        }
      }
    // Set file encoding format 
    try (PrintWriter writer = new PrintWriter(new BufferedWriter(new OutputStreamWriter(new FileOutputStream(targetFile), ENCODING)))
        ) {
      writer.write(content);
      return;
    } catch (Exception e) {
      throw new FileZipException("create file error",e);
    }
}

The advantages of generating files through content are self-evident , It can greatly reduce our initiative to generate new files based on content comparison 、 If the file content is large and the corresponding file name is the same, it means that the content has not been adjusted , At this time, we don't need to do subsequent file update operations .

3、 ... and 、 Upload attachments in pieces

The so-called fragment upload , Is to upload the file , According to a certain size , Separate the entire file into blocks ( We call it Part) To upload separately , After uploading, the server will summarize all the uploaded files and integrate them into the original files . Piecemeal upload can not only avoid the problem of always having to upload from the starting position of the file due to the poor network environment , Multithreading can also be used to send different block data concurrently , Improve transmission efficiency , Reduce sending time .

Fragment upload is mainly applicable to the following scenarios :

  • The network environment is not good : When the upload fails , You can deal with failed Part Make an independent retry , There is no need to upload other Part.
  • Breakpoint continuation : After a pause , It can be uploaded from the last time Part Continue to upload .
  • Speed up upload : To upload to OSS When your local file is large , You can upload multiple files in parallel Part To speed up upload .
  • Stream upload : You can start uploading when the size of the file you want to upload is uncertain . This kind of scene is quite common in video surveillance and other industry applications .
  • Larger files : Generally, when the file is relatively large , By default, fragment upload is generally adopted .

The whole process of fragment upload is roughly as follows :

  • Will need to upload the file according to certain segmentation rules , Split into blocks of the same size ;
  • Initialize a fragment upload task , Return to the unique identification of this fragment upload ;
  • Follow a certain strategy ( Serial or parallel ) Send each piece of data block ;
  • After sending , The server judges whether the data upload is complete according to the data , If it's complete , Then the data block is synthesized to get the original file

▐ Define the fragment rule size

By default, files are used to achieve 20MB Perform forced segmentation

/**
 *  Force fragment file size (20MB)
 */
long FORCE_SLICE_FILE_SIZE = 20L* 1024 * 1024;

For the convenience of debugging , Force the threshold of fragmented files to be adjusted to 1KB

▐ Define fragment upload objects

As shown in the figure above, the file fragment with red serial number , Define the basic attributes of the fragment upload object, including the attachment file name 、 Original file size 、 The original document MD5 value 、 Total number of segments 、 Each slice size 、 Current slice size 、 Current slice serial number, etc

The basis of definition is to facilitate the reasonable division of documents in the future 、 Business expansion such as slice merger , Of course, you can define expansion attributes according to business scenarios .

  • Total number of segments
long totalSlices = fileSize % forceSliceSize == 0 ? 
    fileSize / forceSliceSize : fileSize / forceSliceSize + 1;
  • Each slice size
long eachSize = fileSize % totalSlices == 0 ? 
    fileSize / totalSlices : fileSize / totalSlices + 1;
  • Of the original document MD5 value
MD5Util.hex(file)

Such as :

The current attachment size is :3382KB, The forced partition size is limited to 1024KB

By the above calculation : The number of slices is 4 individual , The size of each slice is 846KB

▐ Read the data bytes of each partition

Mark the current byte subscript , Cyclic reading 4 Fragmented data bytes

try (InputStream inputStream = new FileInputStream(uploadVO.getFile())) {
    for (int i = 0; i < sliceBytesVO.getFdTotalSlices(); i++) {
        //  Read the data bytes of each partition 
        this.readSliceBytes(i, inputStream, sliceBytesVO);
      //  Call fragment upload API Function of 
        String result = sliceApiCallFunction.apply(sliceBytesVO);
        if (StringUtils.isEmpty(result)) {
            continue;
        }
        return result;
    }
} catch (IOException e) {
    throw e;
}

3、 ... and 、 summary

The so-called fragment upload , Is to upload the file , According to a certain size , Separate the entire file into blocks ( We call it Part) To upload separately .

Dealing with large files and slicing, the main core is to determine three points

  • File fragmentation granularity
  • How to read slices
  • How to store slices

This article mainly analyzes and deals with how to compare the contents of large files in the process of uploading large files 、 Shard processing . Reasonably set the segmentation threshold and how to read and mark the segmentation . I hope it can help or inspire students engaged in related work . Later, we will discuss how to store the fragments 、 Mark 、 Combine documents for detailed interpretation .

原网站

版权声明
本文为[Code farming architecture]所创,转载请带上原文链接,感谢
https://yzsam.com/2021/12/202112261357286984.html