当前位置：网站首页>Product identification of intelligent retail cabinet based on paddlex

Product identification of intelligent retail cabinet based on paddlex

2022-07-04 14:07:00 【Short section senior】

pick want

In Traditional Retail cabinets , The main methods to realize automatic recognition are ： Hardware separation 、 Judging by weight 、 Identify customer behavior 、 RFID marks, etc . This paper is based on the excellent performance of deep learning in the field of image classification , The research is based on PaddleX Intelligent retail cabinet product identification , Compared with scanning the bar code of goods manually or scanning the code by the customer's own machine 、 And the above automatic identification methods are more efficient , This experiment draws lessons from the original project , In the use of PaddleX In the process of model training , Detection model use PPYolo perhaps YOLOv3. The backbone network adopts ResNet50, Training is sloppy , The effect is not so obvious .
key word Image classification ;PaddleX; Convolutional neural networks ; Feature learning ; Image recognition

introduction

With the rapid development of China's economy , The level of national disposable income is also rising , People's focus on shopping has slowly shifted from price to the experience and feeling of the consumption process . However , Whether it's a large supermarket , Or small convenience stores , In densely populated areas and during peak consumption （ Such as weekend ） There is always a phenomenon of settlement queuing , This undoubtedly reduces the shopping experience of consumers . To alleviate the phenomenon of settlement queuing , You need to increase the cashier , Add settlement channels , The increase of labor cost of this solution is too expensive . The emergence of self-service code scanning settlement technology solves the problem of increasing labor costs , But in essence, this technology only transfers the operation of scanning barcode from cashier to consumer . In the process , Customers may encounter barcode scanning failure , Unable to complete settlement and other issues , Make shopping settlement take more time . therefore , The technology is still complicated , Low settlement efficiency . Some advanced technical means , Like big data , Artificial intelligence is applied to the sales process of goods , Change the traditional settlement method of scanning bar code , make “ The new retail ”, It has become an inevitable trend .
In recent years , Some automated retail stores have emerged at home and abroad , Unmanned convenience stores, etc , Such as JD convenience store , Alibaba unmanned supermarket , This shows that artificial intelligence is used to transform the sales process of goods , Automate retail scenarios , Unmanned has become one of the hotspots in the field of artificial intelligence . Compared with new retail , There are many processes of manual participation in traditional retail , Low level of Automation , Resulting in high costs , At the same time, the efficiency of service and the comfort of experience are low . The development of computer vision makes the technology of commodity recognition more and more mature , Using image-based commodity recognition technology can improve the degree of Automation , Cost reduction , Increase of efficiency , therefore , Design and develop an automatic identification and settlement system of batch commodities based on computer vision , It has important research and application value .
Object detection is an important branch of image processing and computer vision , It has been widely used in many fields . As a typical representative of intelligent retail system , It can provide automatic sales service without salesperson . In Traditional Retail cabinets , The main methods to realize automatic recognition are ： Hardware separation 、 Judging by weight 、 Identify customer behavior 、 RFID marks, etc . These traditional methods are expensive , Reduce the space utilization of the cabinet , Limit the types of goods . Adopt commodity identification technology , Compared with scanning the bar code of goods manually or scanning the code by the customer's own machine 、 And the above automatic identification methods are more efficient , Stores can also increase profits , For customers, it can also reduce the waiting time of customers in line , The consumption process is more convenient .
Here it is , Commodity inspection , That is to obtain the specific position coordinates of the goods in the image . In the commodity identification system , The task of commodity identification is based on the task of commodity inspection , When an image contains multiple products , Only after obtaining the position of the goods in the image , To use the recognition algorithm to obtain its category . If the accuracy of commodity inspection is high , Area accuracy , Then the image sent into the recognition algorithm will eliminate many background interference , Improve the accuracy of product identification . The target detection algorithm is used to find all the objects of interest in the image , That is to determine the position of the object in the image , Generally, it is represented by a rectangular box , At the same time, the objects in the rectangular box are classified , Determine its category , This is one of the core problems in the field of computer vision . We can use the target detection algorithm to complete the task of commodity detection , But the appearance of various commodities , Different shapes , The state of customers on the settlement table is also strange , There is also light , Shielding and other influencing factors , These make commodity inspection a very challenging problem .
The method to be used in the project ： Learn from the original project , In the use of PaddleX In the process of model training , Detection model use PPYolo perhaps YOLOv3. The backbone network adopts ResNet50. The total data volume of the dataset is 5422 Zhang , share 113 Commodity , It's a multi classification problem .

1.PaddleX brief introduction

PaddleX Image classification in the field of integrated intelligent vision 、 object detection 、 Semantic segmentation 、 Instance segmentation task capability , The whole process of deep learning development starts from data preparation 、 Model training and optimization to multi terminal deployment, end to end , And provide a unified mission API Interface and graphical development interface Demo. Developers don't need to install different packages separately , In the form of low code, we can quickly complete the whole process development of the propeller .
PaddleX After quality inspection 、 Security 、 On-Site Inspection 、 remote sensing 、 retail 、 More than a dozen industries, such as medical treatment, have been verified in practical application scenarios , Practical experience of precipitation industry , And provide a wealth of case practice course , Help developers to practice in the whole process .
As a whole PaddleX It has the following three advantages ：
One 、 Through the whole process, the in-depth learning and development will be accessed from data 、 model training 、 Parameter tuning 、 Model to evaluate 、 Through the whole process of prediction and deployment , It eliminates the code development and script calls between the links , Greatly improve the development efficiency .
Two 、 The open source technology kernel integrates PaddleCV Leading visual algorithms and task oriented development kits 、 Pre training model application tool PaddleHub、 Visual analysis tools VisualDL、 Model compression tools PaddleSlim And other technical capabilities , And provide concise and easy to understand Python API, Realize full open source , Easy integration and secondary development , Help your business practice in the whole process .
3、 ... and 、 The industry is deeply compatible and highly compatible Windows、Mac、Linux System , Support at the same time NVIDIA GPU Accelerate deep learning training . Local development 、 Ensure data security , It is highly in line with the actual needs of industrial applications .

2.YOLO brief introduction

YOLO yes “You Only Look Once” For short , Although it is not the most accurate algorithm , But the tradeoff between accuracy and speed , The effect is also quite good .YOLOv3 Learn from it YOLOv1 and YOLOv2, Although there are not many innovations , But keep YOLO The advantage of family speed , Improved detection accuracy , Especially for the detection ability of small objects .YOLOv3 The algorithm uses a single neural network to act on the image , The image is divided into multiple regions and the bounding box and the probability of each region are predicted .
YOLOv3 Use only convolution layers , Make it a full convolution network （FCN）. In the article , A new feature extraction network is proposed ,Darknet-53. As its name suggests , It contains 53 Convolution layers , Each is followed by batch normalization Layer and the leaky ReLU layer . No pooling layer , Use steps of 2 The convolution layer replaces the pooling layer for the downsampling process of the characteristic graph , This can effectively prevent the loss of low-level features caused by the pooling layer .
Insert picture description here

3. Data preprocessing

The collected data is in a stable light source , The image is well illuminated , Excellent image quality , But in real application scenarios , The lighting environment is very complex , The environment arranged in the laboratory cannot completely simulate the light in the actual environment , Therefore, we need to preprocess the input image . In addition to normalizing the image , Data enhancement operations such as image flipping are also carried out , The most important thing is the brightness of the image , Saturation is adjusted , To simulate the complex external lighting environment , Enhance the robustness of the model to illumination transformation .
Data sets are divided into training data sets 、 Training commodity Library . The training data set contains image data and annotation information . The image data set is a dense commodity image , Size 960x720, The format is jpg. The dataset uses VOC Format , It meets the requirements of most deep learning development kits for data set format , To satisfy paddlex or PaddleDetection Training requirements . The total data volume of this data set is 5422 Zhang , And all pictures have been marked , share 113 Commodity . This data set is used to divide the data set , One of the training sets 3796 Zhang 、 Verification set 1084 Zhang 、 Test set 542 Zhang , Some of them are shown in the figure below :
Insert picture description here

4. model training

In the use of PaddleX In the process of model training , We use VOCDetection Kit for training .
PaddleDetection built-in 30+ Model algorithm and 250+ Pre training model , Coverage target detection 、 Instance segmentation 、 track 、 Key point detection and other directions , Including server-side and mobile side high-precision 、 Lightweight industrial grade SOTA Model 、 Champion scheme and academic frontier Algorithm , And provide configured network module components 、 More than ten kinds of data enhancement strategies, loss functions and other high-order optimization support and a variety of deployment schemes , Getting through data processing 、 Model development 、 Training 、 Compress 、 Deploy the whole process , Provide rich cases and tutorials , Accelerate the implementation and application of algorithm industry .https://github.com/PaddlePaddle/PaddleDetection - %E6%8F%90%E4%BE%9B%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E5%AE%9E%E4%BE%8B%E5%88%86%E5%89%B2%E5%A4%9A%E7%9B%AE%E6%A0%87%E8%B7%9F%E8%B8%AA%E5%85%B3%E9%94%AE%E7%82%B9%E6%A3%80%E6%B5%8B%E7%AD%89%E5%A4%9A%E7%A7%8D%E8%83%BD%E5%8A%9B Provide target detection 、 Instance segmentation 、 Multitarget tracking 、 Key point detection and other capabilities , And https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/images/ppdet.gifhttps://github.com/PaddlePaddle/PaddleDetection - %E5%BA%94%E7%94%A8%E5%9C%BA%E6%99%AF%E8%A6%86%E7%9B%96%E5%B7%A5%E4%B8%9A%E6%99%BA%E6%85%A7%E5%9F%8E%E5%B8%82%E5%AE%89%E9%98%B2%E4%BA%A4%E9%80%9A%E9%9B%B6%E5%94%AE%E5%8C%BB%E7%96%97%E7%AD%89%E5%8D%81%E4%BD%99%E7%A7%8D%E8%A1%8C%E4%B8%9A Application scenarios cover industry 、 Smart city 、 Security 、 traffic 、 retail 、 More than ten industries such as medical treatment
Its data model is large , Predicted speed ratio YOLOv3-DarkNet53 faster , It is suitable for the server . Of course , You can also change other models . Training for several rounds , Final mAP You can achieve 65％ above , The visualization of the training process is as follows ：
Insert picture description here

From above loss We can see from the curve , As the number of training increases , The loss of models is declining . The training of the model gradually meets the expectation , The value of the loss function tends to flatten . Change the strategy according to the setting of the effect , At this time, adjustments are made , The loss value drops again , The final training effect tends to be normal , Model training is basically completed .

5. The evaluation index

The evaluation indicators of commonly used target detection models are IoU,recall,precision,mAP,Accuracy etc. .
（1）IOU： Occurring simultaneously than （Intersection over Union,IoU）, It is a way to measure the positioning accuracy . Occurring simultaneously than , seeing the name of a thing one thinks of its function , Is a function of calculating the proportion of the intersection and union of two bounding boxes . In the computer detection task , If IOU≥ 0.5, It is considered that the predicted bounding box result is correct , among 0.5 It's a threshold , It is set according to experience . If the requirements for test results are very strict , You can also set the threshold appropriately higher , such as 0.6,0.7; But the threshold must be less than 1, Greater than 0 The numerical , Because if the predicted bounding box and the actual bounding box completely coincide ,IOU= 1. The threshold is rarely set to 0.5 following .
（2）recall,precision： Recall rate （recall）, Represent the recall of a class （ Check all ） effect , Is the correct frequency predicted in the example with a positive label .
Insert picture description here

Accuracy （precision）, Represent the classification effect of the classifier （ Check accuracy ）, It is to predict the correct frequency value in the example with positive prediction .
Insert picture description here
among , Real examples （TP） It means to predict a positive sample as a positive sample , False counter example （FN） Indicates that a positive sample is predicted to be a negative sample , False positive example （FP） Indicates that the negative sample is predicted to be a positive sample , True counter example （TN） Indicates that the negative sample is predicted to be a negative sample . In the detection task ,IOU The threshold value of is set to 0.5, be TP Express IOU Greater than 0.5 The number of detection boxes ,FP Express IOU Less than or equal to 0.5 The number of detection boxes ,FN Indicates the number of detection boxes that should have boxes but no prediction result box .
（3）mAP： Various types AP Average value （mean average precision,mAP）. To calculate mAP, You need to draw the PR curve （precision-recall curve ） To figure out AP,AP yes PR The area under the curve , namely 0-1 Between all of recall It's worth it precision The mean of the values . Get all kinds of AP after , For all kinds of AP Find the average value mAP.
（4）Accuracy ： Accuracy will test set images （all） Every image in is input into the network , Carry out forward to get the prediction results （ Bounding boxes and categories ）, If the prediction result is consistent with the result in the annotation file , Including the number of detection frames , Position is the correct number of pictures （true） Add one .

6. test result

Test the model on the self built test set . The experimental platform here is with GPU Server for , The experimental programming development environment is MATLAB2019b, The computing device is a personal computer , by Intel Core i7-9750H , The main frequency is 2.60GHz, To display as 8GB. Memory is 16GB, Only one piece is used in the test GPU. The detection module only needs to locate the location of the goods , Therefore, all test results have nothing to do with the product category . During the test IOU The threshold value of is set to 0.5, The test results are shown in the following figure , And show some result images .
Insert picture description here
From the test results shown , The detection model has strong robustness , Influence of light and other factors （ The picture is darker or brighter ） It will not have much impact on the test results . The model has the best effect on single object detection , The accuracy is close 80%, In multi object detection , The effect is slightly inferior , But it also achieved satisfactory results . Several cases of poor performance of the model are summarized here ：（1） When there are many objects on the table and they are placed closely , It's easy to make detection box errors （ False check ）（2） In the detection of products that are too small or blocked in the image , It is easy to see that the target product is not detected （ Missed inspection ） The situation of .
Due to severe shielding , The model did not detect the product , Even the human eye may ignore the existence of this blocked commodity at the beginning .

7. Summary and prospect

The main research content of this paper is the product identification system for intelligent retail , Many literatures have been consulted for this research content , The research background and significance of commodity identification are summarized . Understand the current research status of commodity identification technology at home and abroad , At present, there are quite a few schemes for intelligent detection of goods at home and abroad , Each have advantages and disadvantages , But the plans have matured , This experiment is just a reproduction of the previous projects , Because the training time is short , The training effect is still not ideal , We will make improvements in the future .
Compared with the experimental environment , The settlement environment in real life scenarios is more complex and changeable , Therefore, there are still many aspects to be improved and optimized ：
（1） Increase the number of cameras in the settlement desk , Image acquisition of commodities from more perspectives . The product identification system designed this time uses two cameras , It is easy for commodities to block each other in the collected image , Cause recognition errors . This problem can be solved by collecting images from more angles , So as to further improve the accuracy of commodity identification .
（2） Use fewer training images , Get a better model . This paper uses a few commodity images for model training and testing , And this is only part of the category of goods , The types of goods in real life are far greater than this , The number of dataset images required is immeasurable . Therefore, it is very important to find a method that uses a small amount of image training to get a better model effect .

ginseng Examination writing offer

[1] be based on PaddleX Product identification of unmanned cabinet demo .https://aistudio.baidu.com/aistudio/projectdetail/3474742?channelType=0&channel=0&qq-pf-to=pcqq.group
[2] Intelligent retail cabinet commodity identification .https://aistudio.baidu.com/aistudio/projectdetail/2250826?channelType=0&channel=0
[3] Chi Haitao . Research on the business structure in the new retail era from the perspective of artificial intelligence [J]. Business economics research ,2019(09):51-53.
[4] Chen Jingwen . Design and implementation of automatic commodity identification system for intelligent retail [J] Hangzhou University of Electronic Science and technology 2021(02)
[5] With jade Teng . Research on commodity recognition based on deep learning [D]. Qingdao University of science and technology ,2019.
[6] Research on supermarket commodity image recognition method based on deep learning [D]. Hu Zhengwei . University of science and technology of China 2018
Welcome to join me for wechat exchange and discussion （ Please note csdn Add ）
Insert picture description here