当前位置:网站首页>Cloud backup project

Cloud backup project

2022-07-07 07:16:00 Li Hanhan_

Cloud backup project



Understanding of cloud backup

Automatically upload and back up the files to be backed up in the specified folder on the local computer to the server . And it can be viewed and downloaded through the browser at any time , The download process supports the breakpoint continuation function , The server will also perform hotspot management for uploaded files , Compress and store non hotspot files , Save disk space .

Project objectives

Server program : Deployed in Linux Server
   Realize the business processing for client requests : Upload and backup of files , And the viewing and downloading functions of the client browser . And it has the function of hotspot management , Compress and store non hotspot files to save disk space
Client program : Deployed in Windows On the client
   Implement files in the specified folder on the client host , Automatic detection to determine whether backup is required , Upload to the server for backup if necessary

Module partition

 Insert picture description here
Server side :
Network communication module : Realize network communication with the client , And carry on http Protocol data analysis
Business processing module : Specify the client request , And carry out corresponding business processing ( Upload files , Backup information acquisition , File download )
Data management module : Unified data management
Hotspot management module : Perform hotspot management on the files backed up on the server , Compress and store non hotspot files
   On our server , First, there is a network communication module , This module can be implemented with any client , We have two clients , One is the browser , One is the backup client written by ourselves , Our backup client is dedicated to our network communication module , That is, our server sends a file upload request, while the browser sends a backup information view request and file download request to the server .
   After the server receives the network communication data , There is a very important module in the server : Business processing module , Its function is , The data is received by the network communication module , The business processing module performs a data analysis , What kind of request is it to analyze this data , And carry out a corresponding business processing for this request .
   There is also a module on the server , It is called data management module , This module is a network communication module dedicated to data management. After receiving the data, it will analyze , After parsing, the business processing module performs business processing , Data access will be involved in the process of business processing , No direct access to data , Instead, data is obtained through the data management module , Operate on data , Whether it's management , Storage , Whether it is acquired or not is carried out uniformly by the data management module , Just send the request .
   There is also a hot management module running in parallel with the business processing module on the server ( Server background function ), Specifically detect which files on the server are less popular , Become a non hot file , Compress it , If someone in the business processing module wants to download this file , Decompress the file before responding , After all, it is a non hot spot .
client :( Backup client )
Directory detection module : Traverse the specified folder on the client host , Get all the file information under the folder
Data management module : Manage all file information backed up by the client
    ( Determine whether a file needs to be backed up :1. Historical backup information does not exist ,2. Historical backup information exists but is inconsistent )
Network communication module : Build a network communication client , Upload and back up the files to be backed up to the server

The third party library knows

JSON know

json Is a data exchange format , Use text format completely independent of programming language to store and represent data .
for example : Student information of Xiaoming

char name = " Xiao Ming ";
int age = 18;
float score[3] = {
    88.5, 99, 58};
 When network data transmission or persistent storage is needed : It needs to be organized according to the specified data format , In this way, it can be resolved when used 
 be json This data exchange format organizes the various data objects into a string :
[
   {
    
        " full name " : " Xiao Ming ",
        " Age " : 18,
        " achievement " : [88.5, 99, 58]
   },
   {
    
        " full name " : " Little black ",
        " Age " : 18,
        " achievement " : [88.5, 99, 58]
   }
]

json data type : object , Array , character string , Numbers
object : Use curly braces {} Enclosed represents an object .
Array : Use brackets [] Enclosed represents an array .
character string : Use regular double quotes “” Enclosed represents a string
Numbers : Including integer and floating point , Use it directly .
It is composed of key value pairs
To put it bluntly, it is to format multiple data into a specified format string
jsoncpp library : It provides a series of interfaces for implementation JSON Format serialization and deserialization .

//Json Data object class 
class Json::Value{
    
    Value &operator=(const Value &other); //Value Reload the [] and =, Therefore, all assignment and data acquisition can be through 
    Value& operator[](const std::string& key);// Do it in a simple way  val[" full name "] = " Xiao Ming ";
    Value& operator[](const char* key);
    Value removeMember(const char* key);// Remove elements 
    const Value& operator[](ArrayIndex index) const; //val[" achievement "][0]
    Value& append(const Value& value);// Add array elements val[" achievement "].append(88); 
    ArrayIndex size() const;// Get the number of array elements  val[" achievement "].size();
    std::string asString() const;// turn string string name = val["name"].asString();
    const char* asCString() const;// turn char* char *name = val["name"].asCString();
    Int asInt() const;// turn int int age = val["age"].asInt();
    float asFloat() const;// turn float
    bool asBool() const;// turn  bool
};

Json::Value class :jsoncpp Intermediate data classes for data interaction between the library and the outside world
If you want to serialize multiple and data objects , You need to instantiate a Json::Value object , Add data
Json::Write class :jsoncpp A serialization class of Library
   Member interface :write() Interface is used to put Json::Value All data in the object is serialized into a string .
Json::Reader class :jsoncpp An anti sequence class of Library
   Member interface :parse() Interface is used to put a json Format string , Parse the data to Json::Value In the object
example :
This is a serialization

#include <iostream>
#include <sstream>
#include <string>
#include <jsoncpp/json/json.h>
using namespace std;

void Serialize() {
    
	const char *name = " Xiao Ming ";
    int age = 18;
    float score[] = {
    77.5, 88, 99};
    // serialize 
    //1. Define a Json::Value object , Fill in the data 
    Json::Value val;
    val[" full name "] = name;
    val[" Age "] = age;
    val[" achievement "].append(score[0]);
    val[" achievement "].append(score[1]);
    val[" achievement "].append(score[2]);
    //2. Use StreamWriter Object serialization 
    Json::StreamWriterBuilder swb;
    Json::StreamWriter *sw = swb.newStreamWriter();//new One StreamWriter object 
    stringstream ss;
    sw->write(val, &os);// Implement serialization 
    cout << ss.str() << endl;
    delete sw;
}

int main()
{
    
	Serialize();
	return 0;
}

 Insert picture description here
Next, let's look at a deserialization

void UnSerialize(string &str) {
    
   Json::CharReaderBuilder crb;
   Json::CharReader *cr = crb.newCharReader();
   Json::Value val;
   string err;
   bool ret = cr->parse(str.c_str(), str.c_str() + str.size(), &val, &err);
   if (ret == false) {
    
     cout << "UnSerialize failed:" << err << endl;
     delete cr;
     return ;
   }
   cout << val[" full name "].asString() << endl;
   cout << val[" Gender "].asString() << endl;
   cout << val[" Age "].asString() << endl;
   int sz = val[" achievement "].size();
   for (int i = 0; i < sz; ++i) {
                                                                       
     cout << val[" achievement "][i].asFloat() << endl;
   }
   delete cr;
   return;
 }

 Insert picture description here

bundle File compression library

BundleBundle Is an embedded compression library , Support 23 Compression algorithm and 2 Archive formats . When using, you only need to add two files bundle.h and bundle.cpp that will do .
Let's create a 100M Size file

dd if=/dev/zero of=./hello.txt bs=100M count=1

 Insert picture description here

 #include <iostream> 
 #include <fstream>
 #include <string>
 #include "bundle.h"
 using namespace std;
 
 // Read all the data in the file 
 bool Read(const string &name, string *body) {
    
   ifstream ifs;
   ifs.open(name, ios::binary);// Open the file in binary mode 
   if (ifs.is_open() == false) {
    
     cout << "open failed!\n";
     return false;
   }
   ifs.seekg(0, ios::end);//fseek(fp, 0, SEEK_END);
   size_t fsize = ifs.tellg();// Get the offset of the current position from the starting position of the file 
   ifs.seekg(0, ios::beg);//fseek(fp, 0, SEEK_SET);
 
   body->resize(fsize);
   ifs.read(&(*body)[0], fsize);//string::c_str()  Return value  const char*
   if (ifs.good() == false) {
    
     cout << "read file failed!\n";
     ifs.close();
     return false;
   }
   ifs.close();
  return true; 
 }
 
 // Write data to file 
 bool Write(const string &name, const string &body) {
    
   ofstream ofs;
   ofs.open(name, ios::binary);// Open the file in binary mode 
   if (ofs.is_open() == false) {
    
     cout << "open failed!\n";
     return false;
   }
   ofs.write(body.c_str(), body.size());
   if (ofs.good() == false) {
                                                                           
     cout << "read file failed!\n";
     ofs.close();
     return false;
   }
   ofs.close();
   return true;
 }
 
 void Compress(const string &filename, const string &packname) {
    
 string body;
   Read(filename, &body);// from filename Read data from file to body in 
   string packed = bundle::pack(bundle::LZIP, body);// Yes body The data in lzip Format compression , Return compressed data 
   Write(packname, packed);// Write the compressed data into the specified compressed package 
 }
 
 void UnCompress(const string &filename, const string &packname) {
    
   string packed;
   Read(packname, &packed);// Read the compressed data from the compressed package 
   string body = bundle::unpack(packed);// Decompress the compressed data 
   Write(filename, body);// Write the extracted data to a new file 
 }
                                                                                                    
 int main()
 {
    
   Compress("./hello.txt", "./hello.zip");
   UnCompress("./hi.txt", "./hello.zip");
   return 0;
 }

 Insert picture description here
Verify whether the contents of the two files are consistent , Then calculate the of two files MD5 value , Whether the comparison is consistent
  MD5: It's a hash algorithm , Will perform a lot of calculations based on data , Finally get a final result ( character string ), As long as the contents of the two documents are slightly different , Then you finally get MD5 Values are completely different
 Insert picture description here
Obviously, we can see that , These two documents are consistent

httplib library

httplib library , One C++11 Single header cross platform HTTP/HTTPS library . It is very easy to install . Just include httplib.h In your code .
httplib The library is actually used to build a simple http Server or client library , This third-party network library , It can save us the time of setting up the server or client , Put more energy into specific business processing , Improve development efficiency .
http Protocol is an application layer protocol , At the transport layer, it is based on tcp Realize transmission — therefore http The protocol is essentially the data format of the application layer
http Form of agreement :
   request :
     Request first line : Request method URL Protocol version \r\n
     Header field :key:val\r\n The key/value pair
     Blank line :\r\n- Used to separate the header from the text
     Text : Data submitted to the server
   Respond to :
     Response first line : Protocol version Response status code Status code description \r\n
     Header field :key: val\r\n The key/value pair
     Blank line :\r\n- Used for indirect headers and text
     Text : Respond to the client's data
stay httplib In the library , There are two data structures , Used to store request and response information :struct Request & struct Response
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
 Insert picture description here
Threads in the thread pool get connections for processing :
  1. Receive request data , And analyze , Put the parsed request data into a Request Structural variable req in ;
  2. Based on the requested information ( Request method & Resource path ), Go to the mapping table to find out whether there is a corresponding processing function , If not, return 404;
  3. If there is corresponding mapping information - Call the business processing function , Will parse the resulting req In the incoming interface , And pass in an empty Response Structural variable rsp;
  4. The processing function will be based on req Perform corresponding business processing according to the request information in , And in the process of processing, I think rsp Add response information to the structure ;
  5. Wait until the processing function is completed , be httplib Get a filled with response information Response Variable rsp;
  6. according to rsp Information in , Organize a http Response in format , Send to client ;
  7. If it is a short connection, close the connection and process the next , Long connections wait for requests , If it times out, close the processing of the next ;

First look at a simple file upload interface

<!--html Is a hypertext markup language , A tag can be understood as an element , A control -->
<html>
    <body>
        <!--form Is a form field control , When you click the submit button in the form field , The data of all controls in the form field will be organized and submitted -->
        <!--action Is the resource path of this request ;method Is the requested method ;enctype It's the encoding type , Is the organization format of data -->
        <form action="http://192.168.122.000:9090/upload" method="post" enctype="multipart/form-data">
            <div>
                <input type="file" name="file">    <!-- This is a file upload selection box -->
            </div>    
            <div>
                <input type="submit" name="submit" value=" Upload ">   <!-- This is a submit Submit button -->
            </div>  
        </form>
    </body>
</html>

 Insert picture description here
Its data organization format is like this

POST /upload HTTP/1.1
HOST: 192.168.122.130:9090
Connection: keep-alive
Content-Length: xxxxx
Content-Type: multipart/form-data; boundary=--xxxxxxxxxxxxxxxxxxxx

----xxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Disposition: form-data; name='file' filename=''
 File data of the selected file 
----xxxxxxxxxxxxxxxxxxxxxxxxxx
Content-Disposition: form-data; name='submit' 

 Upload ( Upload button value value )
----xxxxxxxxxxxxxxxxxxxxxxxxxx

httplib Library setup server

With such an interface , We can build http The server

 #include "httplib.h" 
 using namespace std;
 
 void Numbers(const httplib::Request &req, httplib::Response &rsp)
 {
    
   // This is the business processing function 
   rsp.body = req.path;
   rsp.body += "-------------";
   rsp.body += req.matches[1];
   rsp.status = 200;
   rsp.set_header("Content-Type", "text/plain");
 }
 
 void Upload(const httplib::Request &req, httplib::Response &rsp)
 {
    
   //req.files MultipartFormDataMap v--MultipartFormData{name, filename, content, content_type}
   for (auto &v : req.files) {
    
     cout << v.second.name << endl;   // Area field identification name 
     cout << v.second.filename << endl;// If it is a file upload, save the file name 
     cout << v.second.content << endl;// Area body data , If it is file upload, it is file content data 
   }
 }
 
 int main()
 {
    
   // Instantiation server object 
   httplib::Server server;
   // Add mapping relationship -- Tell the server what request to use and which function to process 
   // Because the number cannot determine the fixed data , So people actually use regular expressions -- Match the data that conforms to the rules 
   // In regular expressions  \d  Represents a numeric character ;+ Indicates matching the preceding character one or more times ;() It means to capture the data of rules in parentheses 
   server.Get("/numbers/(\\d+)", Numbers);
   server.Post("/upload", Upload);
 
   //0.0.0.0 Indicates any address of the server ; The virtual machine needs to turn off the firewall ; ECS needs to set the security group opening port 
   server.listen("0.0.0.0", 9090);
   return 0;
 } 

We can action Change the resource path in to our own virtual machine or ECS address , Then when we click Select , Choose what we just wrote HTML file , Then click upload , You can see the result
 Insert picture description here

httplib Library build client

 Insert picture description here

 Insert picture description here

 #include "httplib.h" 
 using namespace std;
 
 int main()
 {
    
   httplib::Client client("192.168.19.xxx", 9090);
 
   //Result Get(const char *path, const Headers &headers);
   httplib::Headers header = {
    
     {
    "connection", "close"}
   };
   //Response *res
   auto res = client.Get("/numbers/5678", header);
   if (res && res->status == 200) {
    
     cout << res->body << endl;
   }
 
   //Result Post(const char *path, const MultiparFormDataItems &items);
   httplib::MultipartFormDataItems items;
   httplib::MultipartFormData item;
   item.name = "file";
   item.filename = "hi.txt";
   item.content = "hello nihao";
   item.content_type = "application/octet-stream";
   items.push_back(item);
 
   res = client.Post("/upload", items);
   return 0;
 }

 Insert picture description here
 Insert picture description here

 Insert picture description here

Project implementation

Cloud backup server implementation

Network communication module : adopt httplib build http The client and server realize network communication
Business processing module : Handle client requests ( Upload , download , Display interface )
Data management module : Unified data management
Hotspot management module : Find out the non hot files in the backup file , Compressed storage

If a file is a non hotspot file , We will compress storage , But whether compressed or not , When others get the display interface , We all need to give the uploaded file information , And when the client wants to download files , You can also find the corresponding compressed package , After decompression , Return source file data ( It cannot be a compressed package )

Data management module

Data management module : Unified data management
   What data should be managed : original file name , Original file size , Original file time attribute , Corresponding package name , Compress logo
     Uploaded files , Finally, you should be able to view and download on the browser , On the browser interface, we need to be able to show the files that the client has backed up : original file name , file size , File backup time
     And a non hot file , If compressed , The obtained size is the size of the compressed package , Time is the time of the compressed package , However, the page should show the attributes of the original file , Instead of the properties of the compressed package
     Once a file is compressed , The compressed package will be stored in the compressed package path , The original file will be deleted
     A compressed package is decompressed , You should put the unzipped file in the backup path , The compressed package should be deleted
The data information to be managed has been determined , The question is how to manage ?
   Data management is divided into two parts :
     One part is data management in memory : More efficient query —undered_mep(hash)
     One part is persistent disk storage management : Prevent data loss due to power failure — File store (json serialize & Deserialization )— You can also use a database to store
   Data is stored in memory for query efficiency ; Put it on disk for safety ;
Don't rush to complete the data management module :
   The first is to complete some tool interfaces
     File operation tool class :
       File operations : Get the attributes of the file ( Time , size ,……), File write data , Read data from file
       Directory operation : Create directory , Traverse the directory ( Get all the file information in the directory )
    JSON Operation tool class :
       Realization Json serialize &Json Deserialization
     The third tool class : File compression and decompression ( Applied in the hotspot management module )
       In essence, it is the operation of files , So simplify the operation , Put the function implementation directly into the file tool class

File operation tool class

util.hpp

#ifndef __MY_UTIL__ 
#define __MY_UTIL__ 
      
#include <iostream> 
#include <fstream> 
#include <string> 
#include <vector> 
#include <time.h> 
#include <sys/stat.h> 
#include <experimental/filesystem> 
namespace fs = std::experimental::filesystem;    
using namespace std;    
namespace cloud{
    
class FileUtil{
    
private:
	string _name;
public:
	FileUtil(const string &name): _name(name){
    } 
	// Does the file exist 
	bool Exists() {
    
		return fs::exists(_name);
	}
    // Get file size 
    size_t Size() {
    
    	if (this->Exists() == false) {
    
        	return 0;
       	}
       	return fs::file_size(_name);
   	}
    // Last modification time 
    time_t MTime() {
    
    	if (this->Exists() == false) {
    
            return 0;
       	}
       	auto ftime = fs::last_write_time(_name);
        time_t cftime = decltype(ftime)::clock::to_time_t(ftime);
        return cftime;                                                                                 
    }
    // Last visit time 
    time_t ATime() {
    
    	if (this->Exists() == false) {
    
            return 0;
        }
        struct stat st;
        stat(_name.c_str(), &st);
        return st.st_atime;
    }
    // Read all data from the file to body in 
    bool Read(string *body) {
    
    	if (this->Exists() == false) {
    
            return false;
        }
        ifstream ifs;
	    ifs.open(_name, ios::binary);// Open the file in binary mode 
        if (ifs.is_open() == false) {
                                                                              
	          cout << "open failed!\n";
    	      return false;
       }                                                                                              
  
       size_t fsize = this->Size();
       body->resize(fsize);
       ifs.read(&(*body)[0], fsize);//string::c_str()  Return value  const char*
       if (ifs.good() == false) {
    
        	cout << "read file failed!\n";
            ifs.close();
            return false;
       }
       ifs.close();
       return true;
   	}
    // take body The data in is written to a file 
    bool Writer(const string &body) {
    
          ofstream ofs;
          ofs.open(_name, ios::binary);// Open the file in binary mode 
          if (ofs.is_open() == false) {
    
            cout << "open failed!\n";
            return false;
          }
          ofs.write(body.c_str(), body.size());
          if (ofs.good() == false) {
                                                                         
            cout << "read file failed!\n";
            ofs.close();
            return false;
          }
          ofs.close();
          return true;
        }
        // Create directory 
        bool CreateDirectory() {
    
          if (this->Exists()) {
    
            return true;
          }
          fs::create_directories(_name);
          return true;
        }
        // Traverse the directory , Get the pathnames of all files in the directory 
	    bool ScanDirectory(vector<string> *array) {
    
          if (this->Exists() == false) {
        
            return false;    
          }    
          // Currently, the directory iterator can only traverse directories with a depth of one layer by default  
          for(auto &a : fs::directory_iterator(_name)) {
    
            if(fs::is_directory(a) == true) {
    
              continue;// If the current file is a folder , Do not deal with , Traverse to the next 
            }
            // Currently, our directory is traversing , Only get ordinary file information , Do not do in-depth processing for directories 
            //string pathname = fs::path(a).filename().string();// Pure file name 
            string pathname = fs::path(a).relative_path().string();// File name with path 
            array->push_back(pathname);                                                  
          }
        } 
    };
 }
  
  #endif

cloud.cpp

#include "util.hpp" 
 
 void FileUtilTest()
 {
    
   //cloud::FileUtil("./testdir/adir").CreateDirectory();
   //cloud::FileUtil("./testdir/a.txt").Writer("hello bit\n");
   //string body;
   //cloud::FileUtil("./testdir/a.txt").Read(&body);
   //cout << body << endl;
   //cout << cloud::FileUtil("./testdir/a.txt").Size() << endl;
   //cout << cloud::FileUtil("./testdir/a.txt").MTime() << endl;
   //cout << cloud::FileUtil("./testdir/a.txt").ATime() << endl;
   vector<string> array;
   cloud::FileUtil("./testdir").ScanDirectory(&array);
   for(auto& a : array) {
    
     cout << a << endl;
   }
 }

create a file , Write data to file , Include file size , Modification time , There is no problem with the access time
 Insert picture description here

Json Operation tool class

class JsonUtil{
    
      public:
        // serialize 
        static bool Serialize(Json::Value &val, string *body) {
    
          Json::StreamWriterBuilder swb;                                         
          Json::StreamWriter *sw = swb.newStreamWriter(); //new One StreamWriter object 
          stringstream ss;                                                       
          int ret = sw->write(val, &ss);// Implement serialization  
          if (ret != 0) {
                                                            
            cout << "seriallize failed!\n";                                      
            delete sw;                                                           
            return false;
          } 
          *body = ss.str();
          delete sw;
          return true;
        }                                                                                                
        // Deserialization 
        static bool UnSerialize(string &body, Json::Value *val) {
    
          Json::CharReaderBuilder crb;
          Json::CharReader *cr = crb.newCharReader();
          string err;
          //pars( String first address , Address at the end of the string ,Json::Value Object pointer , Error information acquisition )
          bool ret = cr->parse(body.c_str(), body.c_str() + body.size(), val, &err);
          if (ret == false) {
    
            cout << "UnSerialize failed:" << err << endl;
            delete cr;
            return false;
          }
          delete cr;
          return true;
        }
    };
void JsonTest()    
  {
        
  Json::Value val;    
    val[" full name "] = " Xiao Ming ";    
    val[" Gender "] = " male ";    
    val[" Age "] = 18;    
    val[" achievement "].append(77.5);    
    val[" achievement "].append(78.5);    
    val[" achievement "].append(79.5);    
      
  string body;    
  cloud::JsonUtil::Serialize(val, &body);    
    cout << body << endl;    
      
    Json::Value root;    
    cloud::JsonUtil::UnSerialize(body, &root);    
    cout << root[" full name "].asString() << endl;                                                             
    cout << root[" Gender "].asString() << endl;    
    cout << root[" Age "].asInt() << endl;    
    cout << root[" achievement "][0].asFloat() << endl;    
    cout << root[" achievement "][1].asFloat() << endl;    
    cout << root[" achievement "][2].asFloat() << endl;
    }

Serialization and deserialization are no problem
 Insert picture description here

File compression and decompression

// File compression  
        bool Compress(const string &packname) {
        
          if (this->Exists() == false) {
        
            return false;    
          }                                                                                              
          string body;    
          if (this->Read(&body) == false) {
        
            cout << "Compress read failed!\n";    
          }    
          string packed = bundle::pack(bundle::LZIP, body);    
          if (FileUtil(packname).Writer(packed) == false) {
        
            cout << "Compress write pack data failed!\n";    
            return false;    
          }    
          fs::remove_all(_name);// Delete the original file after compression  
          return true;    
        }    
        // Extract the file 
        bool UnCompress(const string &filename) {
    
          if (this->Exists() == false) {
    
            return false;
          }                                                                                              
          string body;
          if (this->Read(&body) == false) {
    
            cout << "UnCompress read pack data failed!\n";
          }
          string unpack_data = bundle::unpack(body);
          if (FileUtil(filename).Writer(unpack_data) == false) {
    
            cout << "UnCompress write file data failed!\n";
            return false;
          }
          fs::remove_all(_name);// Delete the compressed package after decompression 
          return true;
        }

void CompressTest()    
  {
        
    cloud::FileUtil("./hello.txt").Compress("hello.zip");    
    cloud::FileUtil("./hello.zip").UnCompress("bit.txt");    
  }

 Insert picture description here


Implementation of data management module :
   thought :
    1. The file information is saved through the structure
    2. Multiple file information , adopt hash Table organization management
    3. Persistent storage , Complete by file , File storage format Json Serialization format


Data manipulation

Data addition, deletion, modification and query :
   increase : Enter a file name , Get various information in the interface , Generate the compressed package name , Fill structure , Push the hash surface
   Change : When the file is compressed and stored , To modify the compression flag , After downloading and decompressing, you should also modify the compression flag
   check : Query all backup information ( The front-end display interface needs - file name , Download link , size , Backup time ), Query single file information ( File download - Get file timing backup path )
   Delete : Basic backup files are not deleted ( Defense function expansion )
Header file and structure

#ifndef __MY_DATA__ 
#define __MY_DATA__ 
#include "util.hpp" 
#include <iostream> 
#include <unordered_map> 
#include <jsoncpp/json/json.h> 
using namespace std;

namespace cloud {
    
	typedef struct _FileInfo {
    
		string filename;  // file name  
		string url_path;  // Download link path  
		string real_path; // Actual storage path  
		size_t file_size; // file size  
		time_t back_time; // Backup time  
		bool pack_flag;   // Compress logo  
		string pack_path; // Compressed package pathname  
	}FileInfo;

Data management class :
Private data members

class DataManager {
    
	private:
		string _back_dir = "./backup_dir/";// The actual storage path of the backup file 
		string _pack_dir = "./pack_dir/";// Compressed package storage path 
		string _download_prefix = "/download/";// Download the prefix path of the link 
		string _pack_subfix = ".zip";// Package suffix 
		string _back_info_file = "./backup.dat";// A file that stores backup information 
		// use url As key, Because when downloading the file , It will be sent url
		unordered_map<string, FileInfo> _back_info;//<url, fileinfo> Backup information 

Constructors

	public:
		DataManager() {
    
			FileUtil(_back_dir).CreateDirectory();
			FileUtil(_pack_dir).CreateDirectory();
			if (FileUtil(_back_info_file).Exists()) {
    
				InitLoad();
			}
		}

data storage

		bool Storage() {
    
			Json::Value infos;
			vector<FileInfo> arry;
			this->SelectAll(&arry);
			for (auto& a : arry) {
    
				Json::Value info;
				info["filename"] = a.filename;
				info["url_path"] = a.url_path;
				info["real_path"] = a.real_path;
				info["file_size"] = (Json::UInt64)a.file_size;
				info["back_time"] = (Json::UInt64)a.back_time;
				info["pack_flag"] = a.pack_flag;
				info["pack_path"] = a.pack_path;
				infos.append(info);
			}
			string body;
			JsonUtil::Serialize(infos, &body);
			FileUtil(_back_info_file).Writer(body);
			return true;
		}

When the module runs , Read out the historical information and store it in hash In the table

	// When the module runs , Read out the historical information and store it in hash In the table 
		bool InitLoad() {
    
			//1. Read the historical backup information in the file 
			string body;
			bool ret = FileUtil(_back_info_file).Read(&body);
			if (ret == false) {
    
				cout << "load history failed!\n";
				return false;
			}
			//2. Deserialize the read information 
			Json::Value infos;
			ret = JsonUtil::UnSerialize(body, &infos);
			if (ret == false) {
    
				cout << "initload parse history falied!\n";
				return false;
			}
			//3. Store the data obtained by deserialization into hash In the table 
			int sz = infos.size();
			for (int i = 0; i < sz; ++i) {
    
				FileInfo info;
				info.filename = infos[i]["filename"].asString();
				info.url_path = infos[i]["url_path"].asString();
				info.real_path = infos[i]["real_path"].asString();
				info.file_size = infos[i]["file_size"].asInt64();
				info.back_time = infos[i]["back_time"].asInt64();
				info.pack_flag = infos[i]["pack_flag"].asBool();
				info.pack_path = infos[i]["pack_path"].asString();
				_back_info[info.url_path] = info;
			}
			return true;
		}

Data addition, deletion, modification and query
increase : Enter a file name , Get various information in the interface , Generate the compressed package name , Fill structure , Push the hash surface

		bool Insert(const string& pathname) {
    
			if (cloud::FileUtil(pathname).Exists() == false) {
    
				cout << "insert file is not exists!\n";
				return false;
			}
			// pathname = ./backup_dir/a.txt
			FileInfo info;
			info.filename = cloud::FileUtil(pathname).Name();//a.txt
			info.url_path = _download_prefix + info.filename;// /download/a.txt  Download the linked resource path 
			info.real_path = pathname; // Actual storage path  ./backup_dir/a.txt
			info.file_size = cloud::FileUtil(pathname).Size(); // file size 
			info.back_time = cloud::FileUtil(pathname).MTime();// The last modification time is the backup time 
			info.pack_flag = false;// The files just uploaded are in uncompressed state 
			info.pack_path = _pack_dir + info.filename + _pack_subfix;// Compressed package pathname  /.pack_dir/a.txt.zip
			_back_info[info.url_path] = info;// With url_path by key,info by value Add to map in ;
			Storage();
			return true;
		}

Change : When the file is compressed and stored , To modify the compression flag , After downloading and decompressing, you should also modify the compression flag

		bool UpdateStatus(const string& pathname, bool status) {
    
			string url_path = _download_prefix + FileUtil(pathname).Name();
			auto it = _back_info.find(url_path);
			if (it == _back_info.end()) {
    
				cout << "file info is not exists!\n";
				return false;
			}
			it->second.pack_flag = status;
			return true;
		}

check : Query all backup information ( The front-end display interface needs - file name , Download link , size , Backup time ),
Query single file information ( File download - Get file timing backup path )

		bool SelectAll(vector<FileInfo>* infos) {
    
			for (auto it = _back_info.begin(); it != _back_info.end(); ++it) {
    
				infos->push_back(it->second);
			}
			return true;
		}
		bool SelectOne(const string& url_path, FileInfo* info) {
    
			auto it = _back_info.find(url_path);
			if (it == _back_info.end()) {
    
				cout << "file info is not exists!\n";
				return false;
			}
			*info = it->second;
			return true;
		}

Delete : Basic backup files are not deleted ( Defense function expansion )

		bool DeleteOne(const string& url_path) {
    
			auto it = _back_info.find(url_path);
			if (it == _back_info.end()) {
    
				cout << "file info is not exists!\n";
				return false;
			}
			_back_info.erase(it);
			Storage();
			return true;
		}

Hotspot management module

Compress and store non hotspot files , Save server disk space .
   function : For the files in the backup directory , To test , Check the last access time of each file , Judge whether the time from the current system has exceeded the hot spot judgment time (30 Seconds no access ), If it exceeds, it means that this is a non hot file , Compressed storage is required , Delete the original file after compression , After successful compression , Manage objects through data , Modify backup information – The status is compressed .
   Realization :
    1. Traverse the specified directory - File backup directory - Original file storage path , Get the actual pathnames of all files in the directory
    2. Traverse all file names , By file pathname , Get the time attribute of the file ( Last visit time )
    3. Get the current time of the system , Subtract from the last access time of the file , Compare the difference with the specified length of hot spot judgment time
    4. If it exceeds the hot spot judgment time , It is determined as non hot spot , Compressed storage
    5. After compression , Modify backup information

#ifndef __MY_HOT__ 
#define __MY_HOT__ 
#include "data.hpp" 
#include <unistd.h> 

extern cloud::DataManager* _data;// Global data  
using namespace std;
namespace cloud {
    
	class HotManager {
    
	private:
		time_t _hot_time = 30; // Hot spot judgment duration , It should be a configurable item , Current simplification , Default to 30s 
		string _backup_dir = "./back_dir/"; // The storage path of the original file to be detected  
	public:
		HotManager() {
    
			FileUtil(_backup_dir).CreateDirectory();
		}
		bool IsHot(const string& filename) {
    
			time_t atime = FileUtil(filename).ATime();
			time_t ctime = time(NULL);
			if ((ctime - atime) > _hot_time) {
    
				return false;
			}
			return true;
		}
		bool RunModule() {
    
			while (1) {
    
				//1. Traverse the directory 
				vector<string> arry;
				FileUtil(_backup_dir).ScanDirectory(&arry);
				//2. Traversal information 
				for (auto& file : arry) {
    
					//3. Get the specified file time attribute , Based on the current system time , Make hot spot judgment 
					if (IsHot(file) == true) {
    
						continue;// Hot files will not be processed for the time being 
					}
					// Get the historical information of the current file 
					FileInfo info;
					bool ret = _data->SelectOneByRealpath(file, &info);
					if (ret == false) {
    
						// Currently detected files , No historical backup information , This may be an abnormal uploaded file , Delete processing 
						cout << "An exception file is deleted. Delete it!\n";
						FileUtil(file).Remove();
						continue;// After the exception file is deleted , Deal with the next 
						// For files without historical information detected , Then add information , Then compress and store 
						//_data->Insert(file);
						//_data->SelectOneByRealpath(file, &info);
					}
					//4. Non hotspot compressed storage 
					FileUtil(file).Compress(info.pack_path);
					//5. After compression, modify the backup information 
					info.pack_flag = true;
					_data->UpdateStatus(file, true);
				}
				usleep(1000);// Avoid empty directories , Empty traversal consumption CPU Excessive resources 
			}
			return true;
		}
	};
}

#endif

Network communication module & Business processing module

   The network communication module uses httplib Library building http The server , We pay more attention to business processing .
   Business processing : Upload processing , View the processing of page requests , Download processing

#ifndef __MY_SERVER__ 
#define __MY_SERVER__
#include "data.hpp"
#include "httplib.h"
#include <sstream>
using namespace std;

extern cloud::DataManager* _data;
namespace cloud {
    
	class Server {
    
	private:
		int _srv_port = 9090;// The binding listening port of the server 
		string _url_prefix = "/download/";
		string _backup_dir = "./backup_dir/";// Backup storage path of uploaded files 
		httplib::Server _srv;
	private:
		static void Upload(const httplib::Request& req, httplib::Response& rsp) {
    
			string _backup_dir = "./backup_dir/";// Backup storage path of uploaded files 
			// Judge whether there is data in the file upload area corresponding to the identification 
			if (req.has_file("file") == false) {
    // Judge whether there is name The field value is file Marked area 
				cout << "Upload file data format error!\n";
				rsp.status = 400;
				return;
			}
			// Get the parsed area data 
			httplib::MultipartFormData data = req.get_file_value("file");
			//cout << data.filename << endl;// If it is a file upload, save the file name 
			//cout << data.content << endl;// Area body data , If it is file upload, it is file content data 
			// The actual storage path name of the organization file 
			string  realpath = _backup_dir + data.filename;
			// Write data to file , It's actually backing up the files 
			if (FileUtil(realpath).Writer(data.content) == false) {
    
				cout << "back file failed!\n";
				rsp.status = 500;
				return;
			}
			// Add backup information 
			if (_data->Insert(realpath) == false) {
    
				cout << "insert back info failed!\n";
				rsp.status = 500;
				return;
			}
			rsp.status = 200;
			return;
		}
		static string StrTime(time_t t) {
    
			return asctime(localtime(&t));
		}
		static void List(const httplib::Request& req, httplib::Response& rsp) {
    
			// Get all historical backup information , And according to this information, organize a html page , As a response body 
			vector<FileInfo> arry;
			if (_data->SelectAll(&arry) == false) {
    
				cout << "select all back info failed!\n";
				rsp.status = 500;
				return;
			}
			stringstream ss;
			ss << "<html>";
			ss << "<head>";
			ss << "<meta http-equiv='Content-Type' content='text/html;charset=utf-8'>";
			ss << "<title>Download</title>";
			ss << "</head>";
			ss << "<body>";
			ss << "<h1>Download</h1>";
			ss << "<table>";
			for (auto& a : arry) {
    
				// Organize page tags for each row 
				ss << "<tr>";
				//<td><a href="/download/test.txt">test.txt</a></td>
				ss << "<td><a href='" << a.url_path << "'>" << a.filename << "</a></td>";
				//<td align="right"> 2021-12-29 10:10:10 </td>
				ss << "<td align='right'>" << StrTime(a.back_time) << "</td>";
				ss << "<td align='right'>" << a.file_size / 1024 << " KB </td>";
				ss << "</tr>";
			}
			ss << "</table>";
			ss << "</body>";
			ss << "</html>";
			rsp.set_content(ss.str(), "text/html");
			rsp.status = 200;
			return;
		}
		static string StrETag(const string& filename) {
    
			//etag Is the unique identification of a file , When the file is modified, it will change 
			// here etag No content calculation : file size - File last modified 
			time_t mtime = FileUtil(filename).MTime();
			size_t fsize = FileUtil(filename).Size();
			stringstream ss;
			ss << fsize << "-" << mtime;
			return ss.str();
		}
		static void Download(const httplib::Request& req, httplib::Response& rsp) {
    
			FileInfo info;
			if (_data->SelectOne(req.path, &info) == false) {
    
				cout << "select one back info failed!\n";
				rsp.status = 404;
				return;
			}
			// If the file has been compressed , Decompress first , Then go to the original file to read the data 
			if (info.pack_flag == true) {
    
				FileUtil(info.pack_path).UnCompress(info.real_path);
			}
			if (req.has_header("If-Range")) {
    
				//using Range = std::pair<ssize_t, ssize_t>;
				//using Ranges = std::vector<Range>; 
				string old_etag = req.get_header_value("If-Range");
				string cur_etag = StrETag(info.real_path);
				if (old_etag == cur_etag) {
    // If the file has not been changed, it can be continued at a breakpoint 
				  // If we handle it ourselves and parse the string to get the start and end positions 
				  //size_t start = req.Ranges[0].first;// however httplib It has been parsed for us 
				  //size_t end = req.Ranges[0].second;// without end Numbers , At the end of the file 
				  //httplib Will second Set to -1, At this time, from the file start Position start reading end-start+1 length , If end yes -1, be >  It's the length of the file -start length 
				  // Because of the assumption 1000 The length of the file , request 900-999, Returns a message containing 900 and 999 In total 100 Length data 
				  // And if you ask 900-,1000 The end of the length is actually 999, Direct length minus 900 That's all right. 
				  //
				  //httplib Has completed the function of breakpoint continuation for us , We just need to put all the data of the file into body in ,
				  // Then set the response status code 206,httplib The detected response status code is 206, It will start from body Intercept the data of the specified interval in   Respond to 
					FileUtil(info.real_path).Read(&rsp.body);
					rsp.set_header("Content-Type", "application/octet-stream");// Set the body type to binary stream 
					rsp.set_header("Accept-Ranges", "bytes");// Tell the client that I support breakpoint resuming 
					//rsp.set_header("Content-Range", "bytes start-end/fsize");//httplib It's automatically set 
					rsp.set_header("ETag", cur_etag);
					rsp.status = 206;
					return;
				}
			}
			FileUtil(info.real_path).Read(&rsp.body);
			rsp.set_header("Content-Type", "application/octet-stream");// Set the body type to binary stream 
			rsp.set_header("Accept-Ranges", "bytes");// Tell the client that I support breakpoint resuming 
			rsp.set_header("ETag", StrETag(info.real_path));
			rsp.status = 200;
			return;
		}
	public:
		Server() {
    
			FileUtil(_backup_dir).CreateDirectory();

		}
		bool RunModule() {
    
			// build http The server 
			// Build request - Handle function mapping 
			//Post( The requested resource path , Corresponding business processing callback function );
			_srv.Post("/upload", Upload);
			_srv.Get("/list", List);// This is a request to show the page 
			string regex_download_path = _url_prefix + "(.*)";
			_srv.Get(regex_download_path, Download);
			// Start the server  
			_srv.listen("0.0.0.0", _srv_port);
			return true;
		}
	};
}
#endif

Breakpoint continuation

When downloading a file , Download halfway , Because of the Internet or other reasons , Cause download interruption , If you download all the file data again after the second time , It's less efficient , Because in fact, the previously transmitted data does not need to be retransmitted , If this download is just a re download from the last disconnected location , Download efficiency can be improved
Realization thought : One end must be able to record the position of its transmission , That is, where you downloaded it , But in fact, the server record is inappropriate , Because all requests are initiated by the client , Therefore, it should be recorded by the client , In fact, who needs data , Just who records .
   During client download , Record the length and location of your downloaded data , If the download is interrupted , At the next request , Send the interval range of the specified file you need to the server , The server can return data according to the specified range .
   But there is a problem : How to ensure the next downloaded file , It is consistent with the current file to be renewed ??? That is, if the sequel , The file data on the server has changed , Then the sequel is meaningless !!! So there must be a sign , It is used to identify whether the file on the server has been modified after a download . If it has not been modified, you can continue transmission at breakpoint , If it is modified, it cannot be continued at breakpoint , Need to download again
So breakpoint continuation , Need to be associated with the last download , Need the support of the last download information , For example, the unique identification of the file .
Whether a server supports breakpoint retransmission , You should tell the client from the first download

Client implementation

Environmental Science :Windows10/vs2017 above
   Because our file manipulation tool class uses C++17 File system libraries in , and C++17 In low version VS Insufficient support in
function : Detect the files in the specified folder on the client host , Which file needs to be backed up , Then upload it to the server for backup
thought :
  1. Directory search : Traverse the specified folder , Get all file path names
  2. Get the properties of all files : Judge whether this file needs to be backed up according to the attribute information
     Conditions for file backup :1. New files ;2. Recently modified files after the last upload ;
     Record the backup information of the file every time : Backup file pathname & The unique identification of the document , When the file is modified, it will change , If the historical backup information cannot be found according to the file path name , It means that this file is newly added , Need backup , If historical backup information is found , But the unique ID is different from the current ID , It means that it has been modified after uploading , Need backup ;
There is a situation : A file may be currently under continuous modification , It will cause that every retrieval needs to be backed up , The actual processing — Judge that a file is not currently occupied by other processes , To determine whether backup is needed , Simple handling — Judge whether a file needs to be backed up if the specified time interval has not been modified
  3. If a file needs to be backed up , Create http client , Upload files
  4. After uploading the file , Add the historical backup information of the current file .
Module partition :
  1. Data management module : Manage the historical backup record information of the client
  2. Directory retrieval module : Get all files in the specified folder
  3. Network communication module : Upload the files to be backed up to the server for backup

#pragma once
#include "data.hpp"
#include "httplib.h"

namespace cloud {
    
	class Client {
    
	private:
		string _backup_dir = "./backup_dir";
		DataManager* _data = NULL;
		string _srv_ip;
		int _srv_port;
	private:
		string FileEtag(const string& pathname) {
    
			size_t fsize = FileUtil(pathname).Size();
			time_t mtime = FileUtil(pathname).MTime();
			stringstream ss;
			ss << fsize << "-" << mtime;
			return ss.str();
		}
		bool IsNeedBackup(const string& filename) {
    
			// Need backup :1. No historical information ;2. There is historical information , But it has been modified ( Whether the identification is consistent )
			string old_etag;
			if (_data->SelectOne(filename, &old_etag) == false) {
    
				return true;// No historical backup information 
			}
			string new_etag = FileEtag(filename);
			time_t mtime = FileUtil(filename).MTime();
			time_t ctime = time(NULL);
			// Prevent files from being continuously modified , So judge the last modification time , Whether the interval with the current time exceeds 3 second 
			if (new_etag != old_etag && (ctime - mtime) > 3) {
    
				return true;// The current logo is different from the historical logo , It means that the file has been modified 
			}
			return false;
		}
		bool Upload(const string& filename) {
    
			httplib::Client client(_srv_ip, _srv_port);
			httplib::MultipartFormDataItems items;
			httplib::MultipartFormData item;
			item.name = "file";// Area name identification 
			item.filename = FileUtil(filename).Name();// file name 
			FileUtil(filename).Read(&item.content); // File data 
			item.content_type = "application/octet-stream";// File data format --- Binary stream 
			items.push_back(item);
			auto res = client.Post("/upload", items);
			if (res && res->status != 200) {
    
				return false;
			}
			return true;
		}
	public:
		Client(const string srv_ip, int srv_port) :_srv_ip(srv_ip), _srv_port(srv_port) {
    }
		bool RunModule() {
    
			//1. initialization : Initialize the data management object , Create a monitoring Directory 
			FileUtil(_backup_dir).MCreateDirectory();
			_data = new DataManager();
			while (1) {
    
				//2. Create directory , Get all the files in the directory 
				vector<string> arry;
				FileUtil(_backup_dir).ScanDirectory(&arry);
				//3. Judge according to the historical backup information , Whether the current file needs to be backed up 
				for (auto& a : arry) {
    
					if (IsNeedBackup(a) == false) {
    
						continue;
					}
					cout << a << "need backup!\n";
					//4. Back up files if necessary 
					bool ret = Upload(a);
					//5. Add backup information 
					_data->Insert(a);
					cout << a << "backup success!\n";
				}
				Sleep(10);
			}
		}
	};
}
原网站

版权声明
本文为[Li Hanhan_]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070321034512.html