当前位置:网站首页>Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
2022-07-02 02:04:00 【Paddlepaddle】
lately , All major video platforms update the highlights of wonderful moments in the Winter Olympic Games in real time , Gu ailing 、 Wu Dajing 、 Su Yiming and other athletes have made great achievements , Gratifying congratulations ! Moved by the strong strength of Chinese sports 、 At the same time of joy , We also pay attention to some behind sports AI Industrial application , For example, through the action recognition technology to assist athletes in daily training and competition scoring , Using intelligent classification and automatic editing AI Technology greatly reduces the labor and time cost of sports video content processing .
In order to let everyone know more about these AI Application of technology in industry , Reduce AI Landing threshold , Baidu PaddlePaddle 、 Baidu intelligent cloud 、 Associate Professor Liu Shenglan of Dalian University of technology jointly launched industrial practice examples , stay Figure skating movement recognition 、 Multimodal sports video classification 、 Football video clips Three classic scenes , Provides data preparation from , The project design , The whole process tutorial of model optimization deployment , Explain the industrial landing plan in simple terms , Teach users to practice code hand in hand .

Project links
https://github.com/PaddlePaddle/awesome-DeepLearning
All source code and tutorials have been open source , Welcome to use ,star encourage ~
Deep learning technology empowers sports events
Three typical examples
1、 Figure skating movement recognition
The movement track of figure skating is very complex 、 Fast 、 There are many categories , This is a great challenge for the recognition task . In this example, the human motion recognition algorithm based on the key points of human skeleton is introduced for the first time ST-GCN( Spatiotemporal graph convolution network model ), Applied to figure skating action recognition , Sure Recognize the technical actions of figure skaters in the video in real time and add labels to classify them , Do auxiliary scoring and movement quality evaluation in the process of competition and training .

Scene difficulty
In figure skating, it is difficult to judge the type of action by the figure posture in one or several frames ;
Figure skating belongs to the same category 、 The two action categories of different sub categories only have slight differences in a few frames , Discrimination is extremely difficult . However , The features of other frames must also be preserved , So as to be used for category identification and “ Polysemy frame ” Deal with situations such as .
For example, figure skating has jumping 、 rotate 、 A lift 、 Pace and twist 、 Swallow step , Jumping is one of the most important action elements , There are many kinds of ice blade methods and air rotation cycles used by players in take-off and landing , Therefore, a variety of combinations can be produced , This increases the difficulty of classification .
To solve the above problems , What is the thinking of technical scheme selection ? This example selects ST-GCN, Based on the published papers, the network structure is improved , It provides a novel idea to solve the problem of human action recognition based on the key points of human skeleton , It has also achieved great performance improvement . The following figure shows the... Built in this project ST-GCN Network structure chart .

Final adoption of amendment batch_size、num_classes Parameters , You can achieve 91% The accuracy of the .
2、 Multimodal sports video classification
In recent days, , All kinds of ice and snow sports videos have attracted extensive attention . In order to extract users' real interest points and high-level semantic information , Enterprises need to check the text of the video 、 Audio 、 Image multi-modal data multi angle understanding . Flying oars and Baidu cloud bring multi-modal classification tasks , Give the video multiple labels describing the content , Used for content selection 、 Launch and other recommended system scenarios , It can be said to be the gospel of cultural and entertainment media workers .

Scene difficulty
Video tags have high-level semantic features , Unimodal features are difficult to express , High quality video classification data is limited , Corresponding image 、 Audio 、 It is difficult to extract high semantic features of text ;
There is a semantic gap between different modes , There are challenges in the interaction between modes , Different modes may interfere with each other ;
Mixed video themes and difficult problems in long video processing , Single mode may have large noise and missing , It has high requirements for the robustness of the model .
Based on the above difficulties , Practice examples integrate text 、 Video images 、 Video multimode feature extraction based on three modes of audio , Then feature fusion , Finally, multi label classification , Compared with pure video image features , Significantly improve the effect of high-level semantic labels .

This example summarizes a variety of optimization experience , Powerful pre training based on entity information ERNIE, Improve the ability of text representation , Hold on ERNIE Parameters of , After TextCNN Knowledge in the field of e-learning , Speed up model training , Multimodal cross attention Improve the interaction ability of different modes , Finally achieve 85.59% The accuracy of the model .
3、 Football video clips
Sports highlights video needs fast and high-quality automatic editing tools to process the video quickly . Professional sports training needs big data support , Get familiar with yourself and your opponents through game or daily training video playback , Conduct tactical exercises , The media industry also needs tools to extract the required video content , Produce high timeliness news materials .

Scene difficulty
The complexity of motion detection task is high : The key point of video clip editing task is to accurately find the starting and ending point of this kind of action . But sports videos often contain a lot of redundant background information , The types of actions are diverse and the duration is relatively short , It is necessary to accurately judge the starting point and corresponding category of the action , The task is difficult ;
The information in the video is diverse , How to effectively use these characteristic information .

To solve the above problems , We finally chose TSN+BMN+LSTM As the basic model scheme , Ensure the accuracy of fragment extraction . The optimization strategy includes the use of a propeller feature model for extracting video image features PP-TSM、TSN and TSM, Data expansion and extended timing behavior proposal. The final accuracy is 91%,F1-score achieve 76.2%.
Example course of industrial practice
Help enterprises to stride forward AI Landing gap
Practice examples of propeller industry , Committed to accelerating AI In the forward path of industrial landing , Reduce the gap between theoretical technology and industrial application . The example comes from the real business scenario of the industry , Through complete code , Provide solution process analysis from data preparation to model deployment , It can be called an industrial landing “ Automatic pilot ”.
Real industrial scene : With the actual AI Enterprise cooperation and co construction of application , Select the high-frequency demand of the enterprise AI Application scenarios such as smart city - Helmet detection 、 Intelligent manufacturing - Meter reading, etc ;
Complete code implementation : Provide code that can be run with one key , stay “AI Studio One stop development platform ” Use the free power one button on Notebook function ;
Detailed process analysis : Deep parsing starts with data preparation and processing 、 Model selection 、 Model optimization and deployment AI The whole process of landing , Share reusable model tuning and optimization experience ;
Direct project landing : Baidu senior engineer teaches users the whole process code practice , Easy access to the project POC Stage .
Wonderful course preview
The three scenes of the above sports events have been built into industrial practice examples for everyone to quickly start to experience and apply , besides , We have also prepared corresponding course explanations .2 month 17 Japan 20:00-21:30, Professor Liu of Dalian University of technology and Baidu senior engineer will deeply analyze from data preparation 、 The whole development process from scheme design to model optimization deployment , Hand in hand to teach you code practice .
Welcome to sweep the code into the group , Get free links to live classes and playback videos , More opportunities to cover smart cities 、 Industrial manufacturing 、 Finance 、 Example Manual of propeller industry practice in Internet and other industries ! Also welcome interested enterprises and developers to contact us , Exchange technology and discuss cooperation .

Excellent content first


Official account , Get more technical content ~
边栏推荐
- [question] - why is optical flow not good for static scenes
- 【LeetCode 43】236. The nearest common ancestor of binary tree
- Deep learning: a solution to over fitting in deep neural networks
- 321. Chessboard segmentation (2D interval DP)
- 自动浏览拼多多商品
- flutter 中间一个元素,最右边一个元素
- The difference between new and malloc
- How to turn off debug information in rtl8189fs
- Redis环境搭建和使用的方法
- How does MySQL solve the problem of not releasing space after deleting a large amount of data
猜你喜欢

MySQL主从延迟问题怎么解决

MATLAB realizes voice signal resampling and normalization, and plays the comparison effect

leetcode373. Find and minimum k-pair numbers (medium)

What is AQS and its principle

The smart Park "ZhongGuanCun No.1" subverts your understanding of the park

Spend a week painstakingly sorting out the interview questions and answers of high-frequency software testing / automated testing

WebGPU(一):基本概念

Opencascade7.6 compilation
![[graduation season] graduate seniors share how to make undergraduate more meaningful](/img/03/9adc44476e87b2499aa0ebb11cb247.png)
[graduation season] graduate seniors share how to make undergraduate more meaningful

MySQL view concept, create view, view, modify view, delete view
随机推荐
Matlab uses audioread and sound to read and play WAV files
Niuke - Huawei question bank (51~60)
Golang lock
What is AQS and its principle
Matlab uses resample to complete resampling
Opencascade7.6 compilation
321. Chessboard segmentation (2D interval DP)
电子协会 C语言 1级 32、计算2的幂
This is the form of the K-line diagram (pithy formula)
PR second training
Ks006 student achievement management system based on SSM
Construction and maintenance of business websites [10]
The concept, function, characteristics, creation and deletion of MySQL constraints
leetcode2312. 卖木头块(困难,周赛)
电子协会 C语言 1级 33 、奇偶数判断
Medical management system (C language course for freshmen)
leetcode2309. 兼具大小写的最好英文字母(简单,周赛)
MySQL如何解决delete大量数据后空间不释放的问题
VARIATIONAL IMAGE COMPRESSION WITH A SCALE HYPERPRIOR文献实验复现
[Video] Markov chain Monte Carlo method MCMC principle and R language implementation | data sharing