当前位置:网站首页>Deep learning: a survey of behavior recognition
Deep learning: a survey of behavior recognition
2022-07-27 18:22:00 【sky_ Zhe】
Here's the catalog title
A survey of behavior recognition
E:\ Study papers \ Behavior recognition \ Overview
According to different recognition technologies , At present, the mainstream of human behavior recognition can be divided into three categories ∶
be based on Computer vision Behavior recognition 、
Behavior recognition based on sensor system 、
be based on Multimodal data Behavior recognition .
(1) Behavior recognition based on computer vision has been studied for many years , Scholars at home and abroad have built a variety of frameworks in the field of computer vision in human detection ,
It is mainly divided into Video based method and image-based method .
involves key technology : Target detection technology 、 Target tracking technology 、 Sequential behavior classification technology 、 Human key point detection technology 、 Gesture recognition technology 、 Optical flow analysis technology 、 Human segmentation technology 、 Attribute analysis technology and Gait recognition technology etc. . With the rapid development of deep learning , These key technologies have made breakthroughs , Behavior recognition algorithm based on computer vision has been widely used in various industries .
(2) Behavior recognition based on sensor system is strongly promoted in artificial intelligence , Using sensors to recognize human behavior has become an important branch of intelligence . This recognition method mainly uses Sensors and sensor networks to capture user behavior . This method is compared with the way of human behavior recognition using vision , Less investment in the early stage and less equipment complexity , Have better spatial freedom .
(3) be based on Behavior recognition of multimodal data With the rise of various new sensors in recent years , Multimodal human behavior recognition has gradually become a new research hotspot in the field of behavior recognition .
In summary , Basic multimodal human behavior recognition The process for ∶ Multimodal data set acquisition 、 Data preprocessing 、 Feature extraction and selection 、 Human behavior recognition algorithm . The integration of this method and the framework type of computer vision method is feasible , Multimodal fusion analysis will improve the accuracy of behavior recognition , Bring better user experience .
Research direction
The Chinese Academy of Sciences
Website of intelligent perception and Computing Research Center http://www.cripac.ia.ac.cn/CN/column/column147.shtml
Bioinspired Intelligent Computing
The direction of biologically inspired intelligent computing tries to be full Simulate and learn from the neural structure of human beings and primates 、 Cognitive mechanism 、 Autonomous Learning and intelligent evolution , The research is highly robust 、 adaptive 、 Interpretable new artificial intelligence theory and method , And in multi-source information fusion and understanding 、 Multi sensor vision measurement 、 Scene perception and behavior understanding in open environment 、 Carry out verification on application problems such as man-machine symbiosis and hybrid intelligence
The research focus of this direction includes three parts :
1) Biologically inspired autonomous learning theory , Try to develop artificial intelligence and discover the collaborative research of human intelligence , From heuristic modeling 、 Two directions of mechanism understanding are to study new theories and methods of human like autonomous learning ;
2) Open environment perception and understanding , For open environments “ Small sample 、 Weak annotation 、 Broad category 、 Variable distribution ” Other characteristics , Research on multi degree of freedom information fusion and understanding 、 Visual measurement of multi-source heterogeneous information 、 Technologies such as scene robust perception and behavior understanding ;
3) Intelligent self evolution , Simulate the evolutionary trajectory of biological intelligence , With “ people - machine - matter ” Based on ternary game , Study intelligent autonomous evolution technology , Break through the small data that puzzles the current development of artificial intelligence 、 Unsupervised 、 Unexplainable problems . This direction has published papers in international authoritative journals and high-level conferences in recent years 100 Article above , Include CCF-A Published articles in journals and conferences 20 Article above ; Won many influential awards at home and abroad ( Include BICS2016 The best paper award of ); Undertake key national R & D programs 、 Key projects of NSFC 、 Many research projects including military equipment pre research projects . This direction will carry out continuous and in-depth research on revolutionary AI technology in the future , And for military defense 、 Industrial manufacturing 、 Provide advanced AI solutions for daily life .
Multimodal Intelligent Computing
The direction of multimodal intelligent computing is for Text 、 Images 、 Video and other large-scale multimodal data , Develop pattern recognition 、 Visual computing 、 machine learning 、 Data mining, etc Direction of theoretical and applied research .
The research in this direction mainly includes :
(1) Intelligent analysis technology of multimodal data based on deep learning . Research Images based on deep learning 、 Text 、 Multimodal data fusion of speech 、 Cross modal data retrieval 、 Methods and applications of cross modal data generation
(2) Large scale visual computing method and application based on deep learning . Study how to effectively integrate the top-down feedback response mechanism in the feedforward depth network 、 How to integrate active vision mechanism in feedforward and feedback depth Networks , So as to solve a series of visual tasks in large-scale visual data analysis , Such as target recognition 、 object detection 、 Video segmentation 、 Video understanding, etc .
(3) Visual intelligent monitoring technology for public safety . Intelligent analysis requirements of massive surveillance video for big data environment , Research on target detection in large-scale complex monitoring scenes 、 Motion tracking 、 Attribute recognition 、 Cross scene target recognition 、 action - Behavior - Key technologies such as event recognition , Establish a video big data analysis platform , Solve the massive target retrieval urgently needed in national public security 、 Abnormal behavior detection and other problems .
(4) Network big data intelligent processing technology for public security and business intelligence . Facing the actual needs of public security and enterprise applications , Research on time series prediction of big data 、 Situational modeling 、 User portrait and other core issues , Break through the key technology of intelligent analysis and processing of large-scale network data , Serve the needs of national public security and enterprise business intelligence .
Improving direction
In behavior recognition based on graph convolution and similar work , The research focuses on the following aspects :
1. how Design GCN The input of , Use some more Features with recognition ability to replace spatial coordinates , As network input .
2. How to solve the problem Define convolution operations , This is a very hard core problem .
3. How to design Adjacency matrix .
4. How to determine the Weight allocation strategy .
Adjacency matrix and weight matrix are in GCN It's very important , The weight matrix usually does not change with the structure of the graph , That is to say, it is not only shared among different nodes , It will also be shared in different graph structures , such GCN You can train and test on different structural diagrams .
But the work of behavior recognition is quite special , Because the human skeleton usually does not change , And the skeleton provided by the same data set is also fixed , In this case , We don't have to think about GCN The versatility of in different structures , conversely Assign weights directly to each joint , in other words , Now each node has its own weight , Instead of relying on label Policies are shared with other nodes . Doing so enables the network to treat every joint more differently , To those Joints with stronger recognition ability give more attention .
Besides , Automatic learning adjacency matrix It's also a good idea , It's just that it will be more difficult to implement the code .
The graph convolution network based on spatial domain is now in NTU RGB+D Data sets [7] Has reached an unprecedented height , I'm afraid it will be difficult to improve again , But Nanyang Technological University rose lab New ones have been released NTU 120+ Data sets [8], And more and more work is focused on 2D Posture recognition of bones , Corresponding to that Kinetic Datasets are also more challenging , So this field is still very valuable and promising . Besides , Spectral convolution has also received great attention in recent years , But so far, the author has only found one article related to attitude recognition that uses spectral convolution , The author believes that it is mainly Spectral convolution is too complex compared with spatial convolution , Many people are deterred , But the more complex things are, the better their performance is , So in the next article , The author will analyze the principle of spectral convolution in detail , And related behavior recognition !
Judging from the current development trend of the summit articles , The work is getting more and more complicated , If you consider the impact top, it will , It is necessary to focus on the study of 1 And the first 2 A train of thought , If it's a secondary meeting , You can start from 3 And the 4 Start with an idea . Besides , As far as possible follow Some have been published at the summit , Articles reviewed by peers , And the article of source code , This can effectively reduce the difficulty of work .
Other research directions
Other research directions of behavior recognition
Data to enhance : Some papers say color jitter And random flipping have some effects , Others have not been verified .
domain adaptation( A kind of transfer learning )
Neural network search (NAS): Meat eaters conspire for , How can it be
Efficient model deployment ( It is difficult to deploy to real scenes , It should refer to the monitoring scenario ):
The main problems :
Most models are in offline Designed for training under the condition of , That is, every time I get a video , Not online video streaming .
Most models cannot run in real time .
3D And other non-standard op It's hard to deploy .
quite a lot 2D Related technologies can be applied to behavior recognition , For example, model compression 、 quantitative 、 Pruning and so on .
Better data sets and more appropriate performance metrics may be needed to .
Compressed video may be used , After all, most videos have been compressed .
New dataset :
Most of the existing data sets are biased towards spatial information , That is, we can judge the behavior category through a picture , Without dynamic information .
youtube Single id Download a lot of data …… crying
Video against attacks
Zero-shot learning
Weak supervised learning
Fine grained classification
The first perspective is behavior recognition
Multimodal
Self supervised learning
Pedestrian recognition (Person Re-Identification)
Multimodal
Skeleton based action recognition (Skeleton-based Action Recognition);
边栏推荐
- js工具-cookie简单封装
- @Convert 注解在jpa中进行查询的注意事项
- 携手三星,vivo将推Exynos980双模5G手机!
- 浅谈AI深度学习的模型训练和推理
- You can't specify target table 'table name' for update in from clause error resolution in MySQL
- 深度学习-视频行为识别:论文阅读——双流网络(Two-stream convolutional networks for action recognition in videos)
- 英伟达发布全球最小边缘AI超算:算力21TOPS,功耗仅10W!
- Exciting collection of new features released by salesforce
- Year end summary template
- 请教大神一个问题 flinkcdc,同步mysql中的datetime字段会变为时间戳 有人遇到过吗
猜你喜欢

canvas根据坐标点绘制图形

Three consecutive high-frequency interview questions of redis online celebrity: cache penetration? Cache breakdown? Cache avalanche?

Salesforce Dynamic Forms

Interview FAQs 12

发布自己的npm组件库

深度学习:GAN案例练习-minst手写数字

zabbix6.0的安装部署
![[MIT 6.S081] Lec 6: Isolation & system call entry/exit 笔记](/img/b3/89b3688a06aa39d894376d57acb2af.png)
[MIT 6.S081] Lec 6: Isolation & system call entry/exit 笔记

深度学习:STGCN学习笔记

LootCode动态数组练习(724,118,766)
随机推荐
Publish your own NPM component library
@Scheduled and quartz
展锐鲜苗:赋能全场景应用,海量数据需要AI与IoT融合
Technology sharing | quick intercom integrated dispatching system
canvas根据坐标点绘制图形
OEM "made in the United States", domestic security equipment has been installed on the U.S. aircraft carrier!
js实现页面或DOM元素平滑滚动
The end of another era!
MySQL solves the problem of insert failure caused by duplicate unique indexes
联发科首款5G SoC来了!A77+G77+APU3.0,11月26日正式发布!
嘉楠耘智已完成预路演,预计11月20日登陆纳斯达克
ts学习笔记-interface
NowCoder(5选)
@Datetimeformat received less than minutes and seconds, conversion times type exception
深度学习:GAN案例练习-minst手写数字
The latest advanced interview questions for big factories are necessary
Golang customize once. When error occurs, reset it for the second time
【学习笔记】lombok的@Builder注解
[MIT 6.S081] Lec 5: Calling conventions and stack frames RISC-V 笔记
Getting started with typora: the most complete tutorial in the whole network