当前位置：网站首页>Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing

Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing

2022-07-02 02:04:00 【Paddlepaddle】

lately , All major video platforms update the highlights of wonderful moments in the Winter Olympic Games in real time , Gu ailing 、 Wu Dajing 、 Su Yiming and other athletes have made great achievements , Gratifying congratulations ！ Moved by the strong strength of Chinese sports 、 At the same time of joy , We also pay attention to some behind sports AI Industrial application , For example, through the action recognition technology to assist athletes in daily training and competition scoring , Using intelligent classification and automatic editing AI Technology greatly reduces the labor and time cost of sports video content processing .

In order to let everyone know more about these AI Application of technology in industry , Reduce AI Landing threshold , Baidu PaddlePaddle 、 Baidu intelligent cloud 、 Associate Professor Liu Shenglan of Dalian University of technology jointly launched industrial practice examples , stay Figure skating movement recognition 、 Multimodal sports video classification 、 Football video clips Three classic scenes , Provides data preparation from , The project design , The whole process tutorial of model optimization deployment , Explain the industrial landing plan in simple terms , Teach users to practice code hand in hand .

Project links

https://github.com/PaddlePaddle/awesome-DeepLearning

All source code and tutorials have been open source , Welcome to use ,star encourage ~

Deep learning technology empowers sports events

Three typical examples

1、 Figure skating movement recognition

The movement track of figure skating is very complex 、 Fast 、 There are many categories , This is a great challenge for the recognition task . In this example, the human motion recognition algorithm based on the key points of human skeleton is introduced for the first time ST-GCN（ Spatiotemporal graph convolution network model ）, Applied to figure skating action recognition , Sure Recognize the technical actions of figure skaters in the video in real time and add labels to classify them , Do auxiliary scoring and movement quality evaluation in the process of competition and training .

Scene difficulty

In figure skating, it is difficult to judge the type of action by the figure posture in one or several frames ;
Figure skating belongs to the same category 、 The two action categories of different sub categories only have slight differences in a few frames , Discrimination is extremely difficult . However , The features of other frames must also be preserved , So as to be used for category identification and “ Polysemy frame ” Deal with situations such as .

For example, figure skating has jumping 、 rotate 、 A lift 、 Pace and twist 、 Swallow step , Jumping is one of the most important action elements , There are many kinds of ice blade methods and air rotation cycles used by players in take-off and landing , Therefore, a variety of combinations can be produced , This increases the difficulty of classification .

To solve the above problems , What is the thinking of technical scheme selection ？ This example selects ST-GCN, Based on the published papers, the network structure is improved , It provides a novel idea to solve the problem of human action recognition based on the key points of human skeleton , It has also achieved great performance improvement . The following figure shows the... Built in this project ST-GCN Network structure chart .

Final adoption of amendment batch_size、num_classes Parameters , You can achieve 91% The accuracy of the .

2、 Multimodal sports video classification

In recent days, , All kinds of ice and snow sports videos have attracted extensive attention . In order to extract users' real interest points and high-level semantic information , Enterprises need to check the text of the video 、 Audio 、 Image multi-modal data multi angle understanding . Flying oars and Baidu cloud bring multi-modal classification tasks , Give the video multiple labels describing the content , Used for content selection 、 Launch and other recommended system scenarios , It can be said to be the gospel of cultural and entertainment media workers .

Scene difficulty

Video tags have high-level semantic features , Unimodal features are difficult to express , High quality video classification data is limited , Corresponding image 、 Audio 、 It is difficult to extract high semantic features of text ;
There is a semantic gap between different modes , There are challenges in the interaction between modes , Different modes may interfere with each other ;
Mixed video themes and difficult problems in long video processing , Single mode may have large noise and missing , It has high requirements for the robustness of the model .

Based on the above difficulties , Practice examples integrate text 、 Video images 、 Video multimode feature extraction based on three modes of audio , Then feature fusion , Finally, multi label classification , Compared with pure video image features , Significantly improve the effect of high-level semantic labels .

This example summarizes a variety of optimization experience , Powerful pre training based on entity information ERNIE, Improve the ability of text representation , Hold on ERNIE Parameters of , After TextCNN Knowledge in the field of e-learning , Speed up model training , Multimodal cross attention Improve the interaction ability of different modes , Finally achieve 85.59% The accuracy of the model .

3、 Football video clips

Sports highlights video needs fast and high-quality automatic editing tools to process the video quickly . Professional sports training needs big data support , Get familiar with yourself and your opponents through game or daily training video playback , Conduct tactical exercises , The media industry also needs tools to extract the required video content , Produce high timeliness news materials .

Scene difficulty

The complexity of motion detection task is high ： The key point of video clip editing task is to accurately find the starting and ending point of this kind of action . But sports videos often contain a lot of redundant background information , The types of actions are diverse and the duration is relatively short , It is necessary to accurately judge the starting point and corresponding category of the action , The task is difficult ;
The information in the video is diverse , How to effectively use these characteristic information .

To solve the above problems , We finally chose TSN+BMN+LSTM As the basic model scheme , Ensure the accuracy of fragment extraction . The optimization strategy includes the use of a propeller feature model for extracting video image features PP-TSM、TSN and TSM, Data expansion and extended timing behavior proposal. The final accuracy is 91%,F1-score achieve 76.2%.

Example course of industrial practice

Help enterprises to stride forward AI Landing gap

Practice examples of propeller industry , Committed to accelerating AI In the forward path of industrial landing , Reduce the gap between theoretical technology and industrial application . The example comes from the real business scenario of the industry , Through complete code , Provide solution process analysis from data preparation to model deployment , It can be called an industrial landing “ Automatic pilot ”.

Real industrial scene ： With the actual AI Enterprise cooperation and co construction of application , Select the high-frequency demand of the enterprise AI Application scenarios such as smart city - Helmet detection 、 Intelligent manufacturing - Meter reading, etc ;

Complete code implementation ： Provide code that can be run with one key , stay “AI Studio One stop development platform ” Use the free power one button on Notebook function ;

Detailed process analysis ： Deep parsing starts with data preparation and processing 、 Model selection 、 Model optimization and deployment AI The whole process of landing , Share reusable model tuning and optimization experience ;

Direct project landing ： Baidu senior engineer teaches users the whole process code practice , Easy access to the project POC Stage .

Wonderful course preview

The three scenes of the above sports events have been built into industrial practice examples for everyone to quickly start to experience and apply , besides , We have also prepared corresponding course explanations .2 month 17 Japan 20:00-21:30, Professor Liu of Dalian University of technology and Baidu senior engineer will deeply analyze from data preparation 、 The whole development process from scheme design to model optimization deployment , Hand in hand to teach you code practice .

Welcome to sweep the code into the group , Get free links to live classes and playback videos , More opportunities to cover smart cities 、 Industrial manufacturing 、 Finance 、 Example Manual of propeller industry practice in Internet and other industries ！ Also welcome interested enterprises and developers to contact us , Exchange technology and discuss cooperation .

Excellent content first