当前位置:网站首页>Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
Decipher the AI black technology behind sports: figure skating action recognition, multi-mode video classification and wonderful clip editing
2022-07-02 02:04:00 【Paddlepaddle】
lately , All major video platforms update the highlights of wonderful moments in the Winter Olympic Games in real time , Gu ailing 、 Wu Dajing 、 Su Yiming and other athletes have made great achievements , Gratifying congratulations ! Moved by the strong strength of Chinese sports 、 At the same time of joy , We also pay attention to some behind sports AI Industrial application , For example, through the action recognition technology to assist athletes in daily training and competition scoring , Using intelligent classification and automatic editing AI Technology greatly reduces the labor and time cost of sports video content processing .
In order to let everyone know more about these AI Application of technology in industry , Reduce AI Landing threshold , Baidu PaddlePaddle 、 Baidu intelligent cloud 、 Associate Professor Liu Shenglan of Dalian University of technology jointly launched industrial practice examples , stay Figure skating movement recognition 、 Multimodal sports video classification 、 Football video clips Three classic scenes , Provides data preparation from , The project design , The whole process tutorial of model optimization deployment , Explain the industrial landing plan in simple terms , Teach users to practice code hand in hand .
Project links
https://github.com/PaddlePaddle/awesome-DeepLearning
All source code and tutorials have been open source , Welcome to use ,star encourage ~
Deep learning technology empowers sports events
Three typical examples
1、 Figure skating movement recognition
The movement track of figure skating is very complex 、 Fast 、 There are many categories , This is a great challenge for the recognition task . In this example, the human motion recognition algorithm based on the key points of human skeleton is introduced for the first time ST-GCN( Spatiotemporal graph convolution network model ), Applied to figure skating action recognition , Sure Recognize the technical actions of figure skaters in the video in real time and add labels to classify them , Do auxiliary scoring and movement quality evaluation in the process of competition and training .
Scene difficulty
In figure skating, it is difficult to judge the type of action by the figure posture in one or several frames ;
Figure skating belongs to the same category 、 The two action categories of different sub categories only have slight differences in a few frames , Discrimination is extremely difficult . However , The features of other frames must also be preserved , So as to be used for category identification and “ Polysemy frame ” Deal with situations such as .
For example, figure skating has jumping 、 rotate 、 A lift 、 Pace and twist 、 Swallow step , Jumping is one of the most important action elements , There are many kinds of ice blade methods and air rotation cycles used by players in take-off and landing , Therefore, a variety of combinations can be produced , This increases the difficulty of classification .
To solve the above problems , What is the thinking of technical scheme selection ? This example selects ST-GCN, Based on the published papers, the network structure is improved , It provides a novel idea to solve the problem of human action recognition based on the key points of human skeleton , It has also achieved great performance improvement . The following figure shows the... Built in this project ST-GCN Network structure chart .
Final adoption of amendment batch_size、num_classes Parameters , You can achieve 91% The accuracy of the .
2、 Multimodal sports video classification
In recent days, , All kinds of ice and snow sports videos have attracted extensive attention . In order to extract users' real interest points and high-level semantic information , Enterprises need to check the text of the video 、 Audio 、 Image multi-modal data multi angle understanding . Flying oars and Baidu cloud bring multi-modal classification tasks , Give the video multiple labels describing the content , Used for content selection 、 Launch and other recommended system scenarios , It can be said to be the gospel of cultural and entertainment media workers .
Scene difficulty
Video tags have high-level semantic features , Unimodal features are difficult to express , High quality video classification data is limited , Corresponding image 、 Audio 、 It is difficult to extract high semantic features of text ;
There is a semantic gap between different modes , There are challenges in the interaction between modes , Different modes may interfere with each other ;
Mixed video themes and difficult problems in long video processing , Single mode may have large noise and missing , It has high requirements for the robustness of the model .
Based on the above difficulties , Practice examples integrate text 、 Video images 、 Video multimode feature extraction based on three modes of audio , Then feature fusion , Finally, multi label classification , Compared with pure video image features , Significantly improve the effect of high-level semantic labels .
This example summarizes a variety of optimization experience , Powerful pre training based on entity information ERNIE, Improve the ability of text representation , Hold on ERNIE Parameters of , After TextCNN Knowledge in the field of e-learning , Speed up model training , Multimodal cross attention Improve the interaction ability of different modes , Finally achieve 85.59% The accuracy of the model .
3、 Football video clips
Sports highlights video needs fast and high-quality automatic editing tools to process the video quickly . Professional sports training needs big data support , Get familiar with yourself and your opponents through game or daily training video playback , Conduct tactical exercises , The media industry also needs tools to extract the required video content , Produce high timeliness news materials .
Scene difficulty
The complexity of motion detection task is high : The key point of video clip editing task is to accurately find the starting and ending point of this kind of action . But sports videos often contain a lot of redundant background information , The types of actions are diverse and the duration is relatively short , It is necessary to accurately judge the starting point and corresponding category of the action , The task is difficult ;
The information in the video is diverse , How to effectively use these characteristic information .
To solve the above problems , We finally chose TSN+BMN+LSTM As the basic model scheme , Ensure the accuracy of fragment extraction . The optimization strategy includes the use of a propeller feature model for extracting video image features PP-TSM、TSN and TSM, Data expansion and extended timing behavior proposal. The final accuracy is 91%,F1-score achieve 76.2%.
Example course of industrial practice
Help enterprises to stride forward AI Landing gap
Practice examples of propeller industry , Committed to accelerating AI In the forward path of industrial landing , Reduce the gap between theoretical technology and industrial application . The example comes from the real business scenario of the industry , Through complete code , Provide solution process analysis from data preparation to model deployment , It can be called an industrial landing “ Automatic pilot ”.
Real industrial scene : With the actual AI Enterprise cooperation and co construction of application , Select the high-frequency demand of the enterprise AI Application scenarios such as smart city - Helmet detection 、 Intelligent manufacturing - Meter reading, etc ;
Complete code implementation : Provide code that can be run with one key , stay “AI Studio One stop development platform ” Use the free power one button on Notebook function ;
Detailed process analysis : Deep parsing starts with data preparation and processing 、 Model selection 、 Model optimization and deployment AI The whole process of landing , Share reusable model tuning and optimization experience ;
Direct project landing : Baidu senior engineer teaches users the whole process code practice , Easy access to the project POC Stage .
Wonderful course preview
The three scenes of the above sports events have been built into industrial practice examples for everyone to quickly start to experience and apply , besides , We have also prepared corresponding course explanations .2 month 17 Japan 20:00-21:30, Professor Liu of Dalian University of technology and Baidu senior engineer will deeply analyze from data preparation 、 The whole development process from scheme design to model optimization deployment , Hand in hand to teach you code practice .
Welcome to sweep the code into the group , Get free links to live classes and playback videos , More opportunities to cover smart cities 、 Industrial manufacturing 、 Finance 、 Example Manual of propeller industry practice in Internet and other industries ! Also welcome interested enterprises and developers to contact us , Exchange technology and discuss cooperation .
Excellent content first
Official account , Get more technical content ~
边栏推荐
- What are the necessary things for students to start school? Ranking list of Bluetooth headsets with good sound quality
- 医药管理系统(大一下C语言课设)
- Construction and maintenance of business websites [13]
- 开发工具创新升级,鲲鹏推进计算产业“竹林”式生长
- flutter 中間一個元素,最右邊一個元素
- 剑指 Offer 42. 连续子数组的最大和
- AR增强现实可应用的场景
- With the innovation and upgrading of development tools, Kunpeng promotes the "bamboo forest" growth of the computing industry
- This is the report that leaders like! Learn dynamic visual charts, promotion and salary increase are indispensable
- MATLAB realizes voice signal resampling and normalization, and plays the comparison effect
猜你喜欢
Matlab uses audiorecorder and recordblocking to record sound, play to play sound, and audiobook to save sound
Should enterprises choose server free computing?
With the innovation and upgrading of development tools, Kunpeng promotes the "bamboo forest" growth of the computing industry
JMeter (II) - install the custom thread groups plug-in
MySQL约束与多表查询实例分析
734. Energy stone (greed, backpack)
Medical management system (C language course for freshmen)
leetcode2311. Longest binary subsequence less than or equal to K (medium, weekly)
What are the necessary things for students to start school? Ranking list of Bluetooth headsets with good sound quality
How to debug apps remotely and online?
随机推荐
MySQL约束与多表查询实例分析
mysql列转行函数指的是什么
自动浏览拼多多商品
1069. Division of convex polygons (thinking, interval DP)
leetcode2310. The one digit number is the sum of integers of K (medium, weekly)
Sword finger offer 47 Maximum value of gifts
leetcode2305. Fair distribution of biscuits (medium, weekly, shaped pressure DP)
Electronic Society C language level 1 32, calculate the power of 2
Redis有序集合如何使用
From January 11, 2007 to January 11, 2022, I have been in SAP Chengdu Research Institute for 15 years
new和malloc的区别
* and & symbols in C language
2022 Q2 - Summary of skills to improve skills
OpenCASCADE7.6编译
Sword finger offer 62 The last remaining number in the circle
STM32F103——两路PWM控制电机
Failed to transform file 'xxx' to match attributes
Implementation of Weibo system based on SSM
Golang lock
Design and implementation of key value storage engine based on LSM tree