当前位置:网站首页>The Sad History of Image Processing Technology
The Sad History of Image Processing Technology
2022-07-31 02:11:00 【IT Geek Gang】
Back in the 1980s, you were an algorithm engineer and your boss asked you to write a program to recognize the graph below.
After thinking hard, you finally found the Harris corner detection algorithm, which judges the shape of the graph by detecting the number of corners.
As can be seen from the above figure, the detection process is to use a detection operator (also known as a filter kernel, which is essentially a matrix) to slide on the image, and the detection operator and the image do the inner product operation during the sliding process:
Although the principle is simple, the design of this filter core is not so easy, which requires experienced scientists to design it through long-term research experiments.
In the 1990s, your boss heard that machine learning was very popular, and wanted to be trendy, hoping that you could use machine learning to implement this graph classification task.
After thinking hard, you quickly came up with the implementation plan:
1 Collect a large number of graphic pictures, 80% for training, 20% for verification
2 Design multiple binary classifiers (logistic regression),Each classifier is only used for the recognition of one kind of pattern

2 First use Harris corner detector to extract corner features
3 Expand the feature matrix into a one-dimensional vector and send it to the machine learning algorithm
Although the task is completed, it still needs to rely on corner detectors and needs to train multiple classification models.
The millennium is fast approaching, the word neural network is spreading all over the streets, and your boss has eyes wide open, hoping you can use neural networks for graph classification tasks.
You pondered, and soon came up with a plan, or extract features with the help of corner detectors, and then input the features into the neural network: 
But you have been devastated by the boss and become more and more disgusted with the corner detector, because you are worried that one day, if the graphics are replaced with other shapes, the corner detector will not work, so you make up your mind to changedrop it.
After much thinking, you finally have an idea: 
Still follow the previous design idea, but the input layer is no longer the corner feature extracted by the corner detector, but the image matrix, because you know, according to the universal approximation theorem, even if there is only one hidden layer of the neural networkArbitrary complex functions can also be simulated, and you trust neural networks to automatically extract features.
In addition, the output layer is no longer a single logistic regression, but a Softmax multi-classifier: 
Although the task is completed, you find that this method depends on the position of the image in the image, and it no longer has the so-called translation invariance. For example, if the graphics in our training dataset are all in the center of the image, ifTake a picture with a graph in the upper left corner to verify, the model classification is not accurate.
Time flies, and soon it is the 21st century. Thanks to the development of hardware, convolutional neural networks are popular in the CV field. Although your boss is in his late years, he is still ambitious. I hope you can use it.Convolutional Neural Networks implement image classification tasks.
You think about the few remaining hairs, and finally come up with a plan: 
Compared with artificial neural network, convolutional neural network has fewer parameters and faster speed. Because the convolution kernel slides on the image, the entire image shares the convolution kernel parameters, which has translation invariance, which is more important.Yes, the parameters of the convolution kernel no longer require the prior knowledge of experts, but are learned by the model itself through the training process.
The time has come to 2022, and you will soon reach the age of retirement. After years of hard work, your image classification software has been able to work very well, but no matter how you change the model, adjust the parameters, and add data, the accuracy of the model is not enough.With only a small improvement, after a lot of thought, you finally come up with a solution, and you want to give your boss one last surprise.
Your solution is to apply the Transformer technology, which was originally developed in the field of natural language, to the field of computer vision. We call it ViT. The core idea is to focus on the parts that are important to the task through the self-attention mechanism.or slightly less important parts.
By looking at the feature maps during training, we find that the model pays more attention to task-relevant foreground parts and ignores task-irrelevant background parts.
You have finally retired. The sad history of being squeezed by your boss is also the history of the development of image processing technology.
边栏推荐
- My first understanding of MySql, and the basic syntax of DDL and DML and DQL in sql statements
- [1153] The boundary range of between in mysql
- 简易表白小页面
- mysql view
- How to expose Prometheus metrics in go programs
- 【银行系列第一期】中国人民银行
- Drools basic introduction, introductory case, basic syntax
- Draw Your Cards
- To write good test cases, you must first learn test design
- Software Testing Defect Reporting - Definition, Composition, Defect Lifecycle, Defect Tracking Post-Production Process, Defect Tracking Process, Purpose of Defect Tracking, Defect Management Tools
猜你喜欢

The final exam first year course

汉源高科8路HDMI综合多业务高清视频光端机8路HDMI视频+8路双向音频+8路485数据+8路E1+32路电话+4路千兆物理隔离网络

Real-time image acquisition based on FPGA

leetcode-399:除法求值

To write good test cases, you must first learn test design

Software testing basic interface testing - getting started with Jmeter, you should pay attention to these things
![[Map and Set] LeetCode & Niu Ke exercise](/img/66/d812a6ad854cb0993c796760042150.png)
[Map and Set] LeetCode & Niu Ke exercise

GCC Rust is approved to be included in the mainline code base, or will meet you in GCC 13

Unity界面总体介绍
![LeetCode 1161 最大层内元素和[BFS 二叉树] HERODING的LeetCode之路](/img/56/fcc8ee6f592abf0a374fc950a3362f.png)
LeetCode 1161 最大层内元素和[BFS 二叉树] HERODING的LeetCode之路
随机推荐
CV-Model [3]: MobileNet v2
Drools basic introduction, introductory case, basic syntax
AtCoder Beginner Contest 261 部分题解
leetcode-399:除法求值
初识C语言 -- 数组
Crawler text data cleaning
Programmer's debriefing report/summary
The PC side determines the type of browser currently in use
How to design the changing system requirements
PDF split/merge
PDF 拆分/合并
怎样做好一个创业公司CTO?
ShardingJDBC基本介绍
[1153]mysql中between的边界范围
1. Non-type template parameters 2. Specialization of templates 3. Explanation of inheritance
[1153] The boundary range of between in mysql
Fiddler抓包模拟弱网络环境测试
Gateway routing configuration
验证整数输入
Drools WorkBench的简介与使用