当前位置:网站首页>Digital recognition system based on OpenCV
Digital recognition system based on OpenCV
2022-07-04 15:44:00 【Xiaobai learns vision】
review
2012 year iOS An app named FuelMate Of Gas Tracking applications . Partners can use the app to track gasoline mileage , And some interesting features , for example Apple Watch Applications 、vin.li Integration and trend based mpg The visual effect of .
Fuel companion
We have a new idea , How to add a function to help us scan the fuel in the pump , And enter the fuel information in the application ? Let's delve into how to achieve this goal .
technology
For this project, we should first write a simple Python Application to take an image of a gasoline pump , Then try to read the number from it .OpenCV Is a popular cross platform library for computer vision applications . It includes various image processing utilities and some machine learning functions . In addition, we hope to use Python Prototype it , Then convert the processing code to C ++ In the iOS Running on an application .
The goal is
We should first consider the following two questions :
1. Can we separate numbers from images ?
2. Can we determine which number the image represents ?
Digital segmentation
There are many ways to determine the number in the image , But I propose to use simple Image threshold method To try to find a number .
The basic idea of image thresholding is to convert the image into gray , Then say that the gray value of a pixel is less than any gray constant , Then the pixel is a value , Otherwise, it's another . Last , The binary image you get has only two colors , In most cases, it's just a black-and-white image .
The concept of OCR Very effective in application , But the main problem is to decide what to use for this threshold . We can choose some constants , You can also use OpenCV Choose some other options . We can use adaptive thresholds instead of constants , This will use a smaller portion of the image and determine the different thresholds to use . This is particularly useful in applications with different lighting conditions , Especially in the scanning air pump .
After setting the image to the threshold , have access to OpenCV Of findContours Method to find the area in the image connected with white pixels . After drawing the outline , You can crop out these areas and determine whether they can be numbers and what numbers it is .
Basic image processing flow
This is the original image I used in test image processing . It has some glare points , But the image is quite clean . Let's step through the process of acquiring this source image , And try to decompose it into a single number .
Original picture
Image preparation
Before starting the image processing process , We decided to adjust some image properties first , Then go on . It's a little trial and error , But notice , When we adjust the exposure of the image , Can get better results . Here's how to use Python Adjusted image , Equivalent to exposure ( alpha ) Image cv::Mat::convertTo This is just in the image pad multiplication operation cv2.multiply(some_img, np.array([some_alpha]),
Adjust the exposure
gray-scale
Convert the image to grayscale .
Convert to grayscale
Fuzzy
Blur the image to reduce noise . We tried many different blur options , But the best result is found with only a slight blur .
Slightly blurred
The threshold image is converted to a black-and-white image
In the following illustration , Use cv2.adaptiveThreshold with cv2.ADAPTIVE_THRES_GAUSSIAN_C How to choose . This method takes two parameters , Block size and constant to be adjusted . Determining both requires some trial and error , More about the optimization section .
The threshold is black / white
Fill in the blanks
Because most fuel pumps use some kind of 7 paragraph LCD display , So there are some subtle gaps in the numbers , Cannot use contour drawing method , So we need to make these segments look connected . under these circumstances , We will turn to erode Images to fill these gaps . Because you may want to use , So it seems to look back ,dilate But these methods usually apply to the white part of the image . In our case , We are “ erode ” White background to make the numbers look bigger .
Eroded numbers
Invert image
Before trying to find the contour in the image , We need to reverse the color , Because it's time to findContours Method will find the white connecting part , The current number is black .
Color reversal
Find the outline on the image
The following figure shows our original image , The image has a bounding box on each contour in the figure above . You can see that it found the number , But I also found a bunch of things that are not numbers , So we need to filter them out .
The red box shows all found contours
Contour filtering
1. Now we have many outlines , We need to figure out what we care about . After browsing the display and scene of a pile of air pumps , Use a set of quick rules for contours .
2. Collect all square outlines that we classify as potential decimals .
3. Throw away anything that is not square or high rectangular .
4. Match the contour to some aspect ratio .LCD Nine of the ten numbers in the display have an aspect ratio similar to one of the highlights in the blue box below . The exception to this rule is numbers “ 1”, Its aspect ratio is slightly different . By using some sample contours , I will 0–9!1 The aspect is determined as 0.6, take 1 The aspect is determined as 0.3. It will use these ratios and +/- Buffer to determine whether the contour is what we want , And collect these contours .
5. Apply a set of additional rules to potential numbers , Here we will determine whether the contour boundary deviates from the average height or vertical position of all other potential numbers . Since the numbers should be the same size , And in the same Y Top alignment , So we can discard any outline that it thinks is a number , But you can't align and resize them like other contours .
The blue rectangle shows our numbers / Decimal system , Red is ignored
forecast
There are two contours , A number of potential digits , One with potential decimal places , We can use these contour boundaries to crop the image , And input it into the trained system to predict its value . More information about this process , Please see the “ Digital training ” part .
Find decimal
Finding a decimal point in an image is another problem to solve . Because it's small , Sometimes connected to the finger next to it , So there seems to be a problem using the method we use on our fingers to determine it . When we filter the contour , We collected square outlines that may be decimal . After obtaining the verified digital contour from the previous step , We'll find the leftmost part of the number x Position and rightmost x Location , To determine the number of decimal places we expect . then , We'll go through those potential decimals , Determine whether it is in the space and the lower half of the space , And classify it as decimal . Find the decimal point , We can insert it into the number string we predicted above .
Find decimals only in the yellow part
Digital training
In the world of machine learning , solve OCR The problem is a classification problem . We built a well-trained set of data , For example, digital image processing , Classify them as something , This data is then used to match any new image . Once the basic image isolation function starts to work , I created a script , The script can traverse the image folder , Run the digital isolation code , Then save the cropped numbers to a new folder for me to view . After running , I'll have a number without training , Then it can be used to train the system .
because OpenCV Already included k a near neighbor (k-NN) Realization , Therefore, there is no need to introduce any other libraries . For training , We browsed through the folder of digital crops , Then put it in the box marked 0–9 In your new folder , So each folder has a collection of different versions of numbers . We don't have a lot of these images , But there is enough evidence to prove that this is feasible . Because these figures are quite standard , I don't think I need a lot of trained images to be quite accurate .
k-NN The basis of the working principle is , We will load each image in black and white , Store the image in an array in which each pixel is on or off , Then open these / Turn off pixels associated with specific numbers . then , When we want to predict a new image , It will find out which training image best matches these pixels , Then return to us the closest value .
After sorting out the numbers , A new script will be created , The script will traverse these folders , Each image is acquired and associated with a number . up to now , In most code , The general concept of image processing is Python and C ++ Apply the same... In all , But there are subtle differences here .
In most of these applications Python Example , The classification is written to two files , One contains classification , The other contains the image content of the classification . Usually use NumPy And standard text files . however , Because I want to be in iOS Reuse the system on the application , So I need to think of a way to have cross platform classified files . at that time , I can't find anything , So I finally wrote a quick utility , The utility will start from Python Get the classification data in and serialize it into JSON file , I can be there. OpenCV Of FileStorage Systematic C ++ Use it at the end . It's not beautiful , But I wrote a simple MatPython Serialization method in , It will be OpenCV Create a suitable structure to iOS End read . Now? , When I train numbers , I will get NumPy Documents for my Python Test use , Then get a JSON file , I can drag it to my iOS In the application . You can Here, See this code .
Optimize
Once the two objectives of digital isolation and prediction are determined , We need to optimize the algorithm , To predict the number on the new image of the pump .
In the initial stage of optimization , Create a simple Playground Applications , It uses OpenCV Some simple UI Components . Use these components , You can create some simple track bars , Slide left and right and change different values and reprocess the image . Around this cv2.imshow Method creates a small wrapper , This method can tile the displayed window , Because I hate always repositioning them ,
Try different variables
We can load different images , And try different changes of variables in image processing , And determine the best combination .
automation
Testing different variables on each image is a good way to get started , But we want a better way to verify whether changing the variables of one image will affect any other image . So , We have come up with some automated testing systems for these images .
I took every test image , And put them in a folder . then , I name each file with the desired number in the image , And use the decimal point “ A” Express . The application can load each image in the directory and predict the number , Then compare it with the number in the file name to determine whether it matches . This allows us to quickly try changes for all different images .
Automatic test output
Further more , I created different versions of this script , The script will try to blur this set of images , Almost every combination of variables such as thresholds , And find out that the optimized variable set will have the best performance . accuracy . The script took quite a long time to run on the computer , About need 7 Hours , But finally, a different set of variables is proposed , These variables were not found when we manually tested .
Conclusion
Whether this is a feature that anyone will actually use remains to be determined , But this is in the realization of some machine learning concepts and uses OpenCV Aspect is an interesting exercise . up to now , In our tests , The biggest problem with the application is the glare on the pump display . According to the lighting on the pump and the angle of the mobile phone , May cause some scans to fail .
Code link :https://github.com/kazmiekr/GasPumpOCR
边栏推荐
- Rearrange array
- 谈SaaS下如何迅速部署应用软件
- .Net 应用考虑x64生成
- Neuf tendances et priorités du DPI en 2022
- MySQL~MySQL给已有的数据表添加自增ID
- 在芯片高度集成的今天,绝大多数都是CMOS器件
- C1 certification learning notes 3 -- Web Foundation
- 31年前的Beyond演唱会,是如何超清修复的?
- Intelligent customer service track: Netease Qiyu and Weier technology play different ways
- Find numbers
猜你喜欢
大神详解开源 BUFF 增益攻略丨直播
在芯片高度集成的今天,绝大多数都是CMOS器件
Redis sentinel mode realizes one master, two slave and three Sentinels
Functional interface, method reference, list collection sorting gadget implemented by lambda
Case sharing | integrated construction of data operation and maintenance in the financial industry
MySQL learning notes - data type (2)
2022年九大CIO趋势和优先事项
函数式接口,方法引用,Lambda实现的List集合排序小工具
Deep learning neural network case (handwritten digit recognition)
音视频技术开发周刊 | 252
随机推荐
Lombok使用引发的血案
谈SaaS下如何迅速部署应用软件
Unity script API - component component
LeetCode 1184. Distance between bus stops -- vector clockwise and counterclockwise
Shell programming basics
Temperature control system based on max31865
I plan to teach myself some programming and want to work as a part-time programmer. I want to ask which programmer has a simple part-time platform list and doesn't investigate the degree of the receiv
In today's highly integrated chips, most of them are CMOS devices
web聊天室实现
How did the beyond concert 31 years ago get super clean and repaired?
[book club issue 13] coding format of video files
找数字
go-zero微服务实战系列(九、极致优化秒杀性能)
Blood cases caused by Lombok use
【学习笔记】拟阵
Intelligent customer service track: Netease Qiyu and Weier technology play different ways
Usage of database functions "recommended collection"
左右对齐!
案例分享|金融业数据运营运维一体化建设
Hexadecimal form