当前位置:网站首页>Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
2022-06-26 05:05:00 【Paddlepaddle】
lately , All major video platforms update the highlights of wonderful moments in the Winter Olympic Games in real time , Gu ailing 、 Wu Dajing 、 Su Yiming and other athletes have made great achievements , Gratifying congratulations ! Moved by the strong strength of Chinese sports 、 At the same time of joy , We also pay attention to some behind sports AI Industrial application , For example, through the action recognition technology to assist athletes in daily training and competition scoring , Using intelligent classification and automatic editing AI Technology greatly reduces the labor and time cost of sports video content processing .
In order to let everyone know more about these AI Application of technology in industry , Reduce AI Landing threshold , Baidu Flying propeller 、 Baidu intelligent cloud 、 Associate Professor Liu Shenglan of Dalian University of technology jointly launched industrial practice examples , stay Figure skating movement recognition 、 Multimodal sports video classification 、 Football video clips Three classic scenes , Provides data preparation from , The project design , The whole process tutorial of model optimization deployment , Explain the industrial landing plan in simple terms , Teach users to practice code hand in hand .

Project links
https://github.com/PaddlePaddle/awesome-DeepLearning
All source code and tutorials have been open source , Welcome to use ,star encourage ~
Deep learning technology empowers sports events
Three typical examples
1、 Figure skating movement recognition
The movement track of figure skating is very complex 、 Fast 、 There are many categories , This is a great challenge for the recognition task . In this example, the human motion recognition algorithm based on the key points of human skeleton is introduced for the first time ST-GCN( Spatiotemporal graph convolution network model ), Applied to figure skating action recognition , Sure Recognize the technical actions of figure skaters in the video in real time and add labels to classify them , Do auxiliary scoring and movement quality evaluation in the process of competition and training .

Scene difficulty
In figure skating, it is difficult to judge the type of action by the figure posture in one or several frames ;
Figure skating belongs to the same category 、 The two action categories of different sub categories only have slight differences in a few frames , Discrimination is extremely difficult . However , The features of other frames must also be preserved , So as to be used for category identification and “ Polysemy frame ” Deal with situations such as .
For example, figure skating has jumping 、 rotate 、 A lift 、 Pace and twist 、 Swallow step , Jumping is one of the most important action elements , There are many kinds of ice blade methods and air rotation cycles used by players in take-off and landing , Therefore, a variety of combinations can be produced , This increases the difficulty of classification .
To solve the above problems , What is the thinking of technical scheme selection ? This example selects ST-GCN, Based on the published papers, the network structure is improved , It provides a novel idea to solve the problem of human action recognition based on the key points of human skeleton , It has also achieved great performance improvement . The following figure shows the... Built in this project ST-GCN Network structure chart .

Final adoption of amendment batch_size、num_classes Parameters , You can achieve 91% The accuracy of the .
2、 Multimodal sports video classification
In recent days, , All kinds of ice and snow sports videos have attracted extensive attention . In order to extract users' real interest points and high-level semantic information , Enterprises need to check the text of the video 、 Audio 、 Image multi-modal data multi angle understanding . Flying propeller Joint Baidu cloud to bring multimodal classification tasks , Give the video multiple labels describing the content , Used for content selection 、 Launch and other recommended system scenarios , It can be said to be the gospel of cultural and entertainment media workers .

Scene difficulty
Video tags have high-level semantic features , Unimodal features are difficult to express , High quality video classification data is limited , Corresponding image 、 Audio 、 It is difficult to extract high semantic features of text ;
There is a semantic gap between different modes , There are challenges in the interaction between modes , Different modes may interfere with each other ;
Mixed video themes and difficult problems in long video processing , Single mode may have large noise and missing , It has high requirements for the robustness of the model .
Based on the above difficulties , Practice examples integrate text 、 Video images 、 Video multimode feature extraction based on three modes of audio , Then feature fusion , Finally, multi label classification , Compared with pure video image features , Significantly improve the effect of high-level semantic labels .

This example summarizes a variety of optimization experience , Powerful pre training based on entity information ERNIE, Improve the ability of text representation , Hold on ERNIE Parameters of , After TextCNN Knowledge in the field of e-learning , Speed up model training , Multimodal cross attention Improve the interaction ability of different modes , Finally achieve 85.59% The accuracy of the model .
3、 Football video clips
Sports highlights video needs fast and high-quality automatic editing tools to process the video quickly . Professional sports training needs big data support , Get familiar with yourself and your opponents through game or daily training video playback , Conduct tactical exercises , The media industry also needs tools to extract the required video content , Produce high timeliness news materials .

Scene difficulty
The complexity of motion detection task is high : The key point of video clip editing task is to accurately find the starting and ending point of this kind of action . But sports videos often contain a lot of redundant background information , The types of actions are diverse and the duration is relatively short , It is necessary to accurately judge the starting point and corresponding category of the action , The task is difficult ;
The information in the video is diverse , How to effectively use these characteristic information .

To solve the above problems , We finally chose TSN+BMN+LSTM As the basic model scheme , Ensure the accuracy of fragment extraction . The optimization strategy includes using a method for extracting video image features Flying propeller Characteristic model PP-TSM、TSN and TSM, Data expansion and extended timing behavior proposal. The final accuracy is 91%,F1-score achieve 76.2%.
Example course of industrial practice
Help enterprises to stride forward AI Landing gap
Flying propeller Examples of industrial practice , Committed to accelerating AI In the forward path of industrial landing , Reduce the gap between theoretical technology and industrial application . The example comes from the real business scenario of the industry , Through complete code , Provide solution process analysis from data preparation to model deployment , It can be called an industrial landing “ Automatic pilot ”.
Real industrial scene : With the actual AI Enterprise cooperation and co construction of application , Select the high-frequency demand of the enterprise AI Application scenarios such as smart city - Helmet detection 、 Intelligent manufacturing - Meter reading, etc ;
Complete code implementation : Provide code that can be run with one key , stay “AI Studio One stop development platform ” Use the free power one button on Notebook function ;
Detailed process analysis : Deep parsing starts with data preparation and processing 、 Model selection 、 Model optimization and deployment AI The whole process of landing , Share reusable model tuning and optimization experience ;
Direct project landing : Baidu senior engineer teaches users the whole process code practice , Easy access to the project POC Stage .
Wonderful course preview
The three scenes of the above sports events have been built into industrial practice examples for everyone to quickly start to experience and apply , besides , We have also prepared corresponding course explanations .2 month 17 Japan 20:00-21:30, Professor Liu of Dalian University of technology and Baidu senior engineer will deeply analyze from data preparation 、 The whole development process from scheme design to model optimization deployment , Hand in hand to teach you code practice .
Welcome to sweep the code into the group , Get free links to live classes and playback videos , More opportunities to cover smart cities 、 Industrial manufacturing 、 Finance 、 Internet and other industries Flying propeller Industry practice example manual ! Also welcome interested enterprises and developers to contact us , Exchange technology and discuss cooperation .

Excellent content first


Official account , Get more technical content ~
This article is shared in Blog “ Flying propeller PaddlePaddle”(CSDN).
If there is any infringement , Please contact the [email protected] Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .
边栏推荐
- RESNET in tensorflow_ Train actual combat
- Numpy general function
- Collections and dictionaries
- LeetCode 19. 删除链表的倒数第 N 个结点
- Tensorflow and deep learning day 3
- 6.1 - 6.2 公钥密码学简介
- Stm8 MCU ADC sampling function is triggered by timer
- Transport layer TCP protocol and UDP protocol
- 天才制造者:独行侠、科技巨头和AI|深度学习崛起十年
- 2022.2.13
猜你喜欢
随机推荐
Genius makers: lone Rangers, technology giants and AI | ten years of the rise of in-depth learning
dijkstra
2022.2.16
UWB超高精度定位系统架构图
文件上传与安全狗
Multipass Chinese document - use packer to package multipass image
Some parameter settings and feature graph visualization of yolov5-6.0
2022.2.13
tensorlow:cifar100_ train
UWB ultra high precision positioning system architecture
2022.2.10
Day3 data type and Operator jobs
Interpretation of yolov5 training results
Dbeaver installation and configuration of offline driver
2.< tag-动态规划和常规问题>lt.343. 整数拆分
PowerShell runtime system IO exceptions
LeetCode 19. 删除链表的倒数第 N 个结点
Codeforces Round #800 (Div. 2)
图像翻译/GAN:Unsupervised Image-to-Image Translation with Self-Attention Networks基于自我注意网络的无监督图像到图像的翻译
Zhongshanshan: engineers after being blasted will take off | ONEFLOW u









