当前位置：网站首页>Project training (XVII) -- personal work summary

Project training (XVII) -- personal work summary

2022-06-13 01:25:00 【A May*】

Personal workload records ：

Algorithm ：

Face recognition algorithm , I finished in The target character in the video after key frame processing , It can be a single person or multiple people , Identification and statistics of occurrence times , Statistics of the time and position when the target character appears in the original video .

Scene matching algorithm , Users can select people Cut, You can also select the characters in a certain scene Cut, Here I read the scene features , Use of is CNN Convolution neural network Image feature extraction , utilize Cosine similarity To carry out Scene feature average vector And picture eigenvectors Similarity degree Calculation .

Face - Scene matching , The implementation gets the specific scenario , A video of a specific character , Back to front .

engineering ：

Back end Of all Code （ except java call ssh Interface part ）

Including the joint commissioning and testing of the whole project , Correct the mistake , Finally, let the project run smoothly （ Test with liwenwen ）

Personal summary

I really gained a lot from this project training , It is written independently to use springboot The back-end code of the framework implementation , And successfully interact with the front end , Realize the landing of various functions .

At the same time, the algorithm is also developed , Face recognition is not difficult to do , At present, there are many mature technologies , The difficulty is how to recognize the face , Organize single person goals with appropriate data structures , The relationship between multiple people and goals . The proportion of all time spent on algorithm development and back-end is almost equal here , The face recognition algorithm also goes through many iterations , From the original designee , To require the design of single person clips and multi person timeline previews , In the end, it will be provided as a single person multi person clip , Scene preview . The structure of face recognition algorithm , Also experienced many iterations , Finally, we generated a data structure that is convenient for front-end processing , The front end only needs to and operate the three flags , You can get the target video . This data structure is very important , It is almost the core foundation of the whole project .

The end is near , The teacher gave us a new one idea, Is to add a face on the basis of Scene constraint , That is, to find the video clips of the characters in the target scene . This part, with the previous exercise , Liwenwen and I cooperate with each other , It will be finished soon , Firstly, the last mean value of the feature vector of each frame of each small video is extracted , Secondly, feature vectors of scene photos are extracted , Through the cosine similarity calculation to get the matching video , This process is exercised by the project of the whole semester , We finished the whole series of work in almost one day , This was unthinkable before , Project training really helps us grow too much .

I feel that my programming ability has been greatly improved , Very happy , I also met three reliable teammates , Let's solve the problem together , debugging bug, The final project is really crying with joy . Looking at this code, I feel like a child . It was really written by hand bit by bit , Thank sunyifan very much , Liu Kai , Liwenwen's help among students , Let's finish the project successfully .

Because the code involves deep learning feature extraction , It takes a lot of effort to run every time .（ Record the late night commissioning time after time ...）

Team work

In the whole project training , Set up a long new Yue in addition to the initial stage （4 Month or so ） We sent a human face recognition code to modify it into something other than what the project can use , Not involved in other later developments . Throughout the project , Sunyifan , Wang Yuequn , Li Wenwen Students pay a lot , Help each other , They are often still in joint commissioning at twoorthree in the morning , Overcome bug Finally, a good finished product was made , It's very gratifying . Liu Kai Students also actively cooperate , When encountering code problems, it can always be solved actively and efficiently . To make a long story short , I feel I have made great progress in this project training , And I got a lot of , All three teammates are excellent , Working together makes me feel happy .

The end result of the project

Back end

Project process

Operation of the whole project process

Uploading large video files

When uploading large files , We usually upload in pieces , So if the upload process interrupt 了 , If you continue to upload next time, you don't have to upload all of them again , Just continue to upload the part that has not been uploaded , Thus, the second transmission effect can be realized .

In fact, the principle is to split the file into multiple small pieces on the client side , Then upload these pieces to the server one by one , After the server gets all the partitions, it will merge them and restore them to the original files . How does the server know whether the files I merged are exactly the same as the files uploaded by the server ？ This requires the use of files MD5 The value of . Of documents MD5 The value is equal to that of the file “ Digital fingerprinting ”, Only when the contents of two files are exactly the same , their MD5 The value will be the same . So before uploading the file , The client needs to calculate the... Of the file first MD5 value , And put this MD5 The value is passed to the server . Server through MD5 Uniquely identifies a file .

The core points are listed

MD5 cache
Add at the beginning of receipt , Remove after all pieces are received .
private static final ConcurrentMap<String, File> MD5_CACHE
= new ConcurrentHashMap<String, File>()

Whether the file is generated ？ Existence represents this md5 The corresponding video is still being transmitted , Otherwise, it means the first upload .
File targetFile = MD5_CACHE.get(md5);
if (targetFile == null) {

    //  If it is not generated, a new file will be generated 
    targetFile = new File(DATA_DIR, id + "_" + name);
    targetFile.getParentFile().mkdirs();
    MD5_CACHE.put(md5, targetFile);

}
 After complete transmission from MD5_CACHE Remove 
if (finished) {
    System.out.println("success.");
    MD5_CACHE.remove(md5);
 This method can read and write to any location of the file 
RandomAccessFile accessFile = new RandomAccessFile(targetFile, "rw");
seek(long a) Is to locate the position of the file pointer in the file . Parameters a Determine the number of bytes from the reading and writing position to the beginning of the file

After the server merges the files , In the calculation of the merged file MD5 value , Compare with that passed by the client , If the same , It indicates that the upload is successful , If it's not the same , It indicates that packet loss may occur during uploading , Upload failed .

Smart clip ：

Upload video ： Upload video in pieces + Establish the corresponding data structure on the server

Upload face photos ： Upload photos in pieces + Upload to server

Upload scene photos ： Upload photos in pieces + Upload to server

Call the intelligent clipping algorithm ：

Execute the process ：

Upload video to server + Call algorithm + Generate after Tag sign

Face recognition test display ：

(base) [email protected]:/opt/data/private/xuyunyang/2022419/SceneSeg/lgss# python app_Test.py

Scene constraint display ：

(base) [email protected]:/opt/data/private/xuyunyang/2022419/SceneFeatureExtract# python scene_gennerate.py --ID 10 --ID_VideoName  10_demoChangan2

Smart preview

It mainly focuses on scene clustering of video after key frames are generated .

3. File organization

4. Result generation flag

5. The asynchronous notification algorithm is complete

6. Smart preview results organization

7. Log in to register

Database autoincrement id, The main purpose is to return after logging in id, Need to make use of id Create the stage file .

8. Face recognition

Face feature extraction

It's through resnet Features extracted by depth residual neural network . It's using triplet loss Function training ,128 Features can just represent a face .
In essence, face recognition is to map the image to the low dimensional space through the mapping function , And different faces in this space can be well distinguished . Now, this mapping function is mostly obtained by training the depth network .128D The feature vector is the dimension of the mapped space .

Calculate the Euclidean distance of facial features

#  Calculation 128D Euclidean distance of descriptor

def compute_dst(feature_1, feature_2):
    feature_1 = np.array(feature_1)
    feature_2 = np.array(feature_2)
    dist = np.linalg.norm(feature_1 - feature_2)
    return dist

Face detection

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)#  Convert to gray image processing 
dets = detector(gray, 1)  #  Detect the face in the frame image

#  Get face keys 
shape = predictor(gray, value)
value.bottom()), (0, 255, 0), 2)
face_descriptor = facerec.compute_face_descriptor(frame, shape)
#  The eigenvector of this person in the video 
v = np.array(face_descriptor)

# preservation csv
l = [name]

#rueult It's the path 
with open(result + name2+'_'+'gif_count.csv', 'a+')as f:
    f_csv = csv.writer(f)
    for i in dict:
        if len(dict[i]) > 0:
            l.append(1)
        else:
            l.append(0)
    f_csv.writerow(l)

Character relationships ：

dict = {}  #  Character relationships 
for i in range(Flen):  #  initialization 
    dict[i] = []

9. Joint tune test Correct the mistake ( The most painful process )

When you finish writing your own part of the code , I started writing interfaces and call logic , And some generated files of process flags . This is also where it takes the most time , There is always something unspeakable bug...