当前位置:网站首页>Delivery lead time lightweight estimation practice - Notes
Delivery lead time lightweight estimation practice - Notes
2022-06-12 18:40:00 【Tyan】
The authors :Tyan Blog :noahsnail.com | CSDN | Simple books
This article is a collection of notes for meituan's article study .
1. background
In the process of ordering takeout , Delivery time for riders , That is, how long can the rider reach the user's hands after getting off the vehicle near the user , It's a very important link . The following figure shows the time structure of an order in the whole distribution link , The rightmost part of the timeline describes the location of the delivery link in the whole delivery link . Delivery time measures the difficulty of delivery when the rider delivers the meal , Including from the rider to the user's building , The whole time to deliver the food to the user .
The measurement of delivery time is a very challenging thing , Because riders will encounter different problems when delivering meals to users , for example : Riders deliver meals to multiple users in the building at one time , It's particularly difficult for riders to address specific buildings , Riders can only walk around the delivery building , There is no elevator in the old community , Office buildings can't go upstairs , Or it's hard to wait for the elevator and so on . Delivery time estimation needs the ability to describe the difficulty of delivery , In pricing 、 Scheduling is widely used in many scenarios . For example, according to the delivery difficulty to determine whether to adjust the rider postage , Determine whether to adjust the order of the delivery waybill according to the delivery difficulty , So as to avoid overtime and so on . in general , Delivery time estimation is an important part of basic service of distribution business .
Note: Estimated order delivery difficulty , Adjust the rider's salary according to the difficulty , There are high requirements for the accuracy of the delivery difficulty prediction model , Because it directly involves the income of riders , It also involves the operating costs of the company .
The following difficulties exist in the delivery time estimation :
- Less input , And most of them are non numerical data , At present, only the following dimensional features can be used for prediction : Delivery address 、 Longitude and latitude of delivery point 、 Area 、 City , Adapting to conventional machine learning model needs to be reorganized and information is easily lost .
- Computing performance is very demanding . Because it's basic services , Will be called by a large number of services , High performance required , It includes data processing and RPC Time for . And the standard is CPU Performance requirements in the environment , Instead of GPU Performance requirements under .
Sum up , Delivery time estimation problem , It's about using lightweight solutions to handle non numeric data in many forms , And extract the effective information , Get relatively accurate results . Under the same effect , Prefer better performance solutions .
Note: Time prediction requires high performance , Prefer better performance solutions .
The delivery time estimate iterates over three versions , They are tree models based on address structure 、 Vector recall scheme and lightweight End-to-End Deep learning network . At the same time, it introduces how to choose between performance and index , And the intermediate process of model strategy iteration .
2. Technology iteration path
2.1 Tree model
Technology selection The first and easiest thing to consider is the use of rules , The core idea is to use tree structure to measure address similarity , Accumulate structured data on similar delivery addresses as much as possible , Then use the local regression strategy , To get a relatively abundant regression logic , And those who fail to meet the requirements of the return strategy go for the bottom .
Observe the hierarchy of the address filled in by the user and the actual address , It's not hard to find out , An address can be made up of four levels of structure : Address backbone (addr)、 Building number (building)、 Unit no. (unit)、 floor (floor). In practice, the address trunk word may correspond to the name of a residential area or a school . Through analysis , The actual delivery time is positively correlated with the floor height , And the delivery time of different delivery addresses varies with the increase of floors , So we can use linear regression model to fit the relationship between floor information and delivery time , And the address trunk 、 Building number 、 The unit number serves as its hierarchical index . However, the address filled in by the user does not necessarily contain the complete four level structure , There will be a certain percentage of missing , So use this hierarchy to build a tree , Then make full use of the information known in the previous layer to estimate . When predicting , Just find the corresponding model according to the branch of the node , If missing , Use the previous structure to predict . For the address that does not meet the data amount required by the training model , Use the average delivery time of its region as the estimated result of the delivery time , This part can also be regarded as regional information , As the root node of the tree structure .
Note: Estimate the delivery time from the user's point of view , Cluster users with similar addresses .
Iterative path The whole idea is based on discrete feature training tree model , On the nodes of the tree, the linear regression model is trained based on the floor . Tree node training splitting rules :(1) The amount of data is greater than the threshold ;(2) After the split MAE( Mean absolute error ) The sum of is less than before the split . Considering the timeliness of data , Using weighted linear regression to increase the weight of recent data .
2.2 Tree model + Vector recall scheme
Technology selection Vector recall is one of the mainstream recall schemes , Widely used in the industry , In the use of LSH、PQ Product quantization and other common open source tools , High dimensional vector recall performance is usually in milliseconds .
And algorithmically , Tree model NLP The only way that the address resolution results can meet the requirements of the model is 70%+, The remaining 20%+ The address of can't be trained by the model and can only go through the degradation strategy . Using high-dimensional vector to express semantic similarity , That is, using vectors to express address similarity , Therefore, the model corresponding to similar data is used to replace similar but not recalled data , Carry on the address trunk Embedding after , Get rid of the low robustness of complete matching of main words .
Iterative path The whole technology path is clear and simple , It is using Word2Vec take charLevel The characters go on Embedding, Get the vector representation of the address , And blend in GPS Location information , Design the corresponding strategy .
Final effect
The recall rate of the overall strategy has been greatly improved , Promoted 12.20pp, For addresses that are not recalled by the previous version of the tree model , There has been a significant improvement in indicators , among ME falling 87.14s,MAE falling 38.13s,1min The absolute deviation rate decreases 14.01pp,2min The absolute deviation rate decreases 18.45pp,3min The absolute deviation rate decreases 15.90pp.
2.3 End-to-End Lightweight deep learning program
Technology selection On the basis of the tree model , Iterate to the vector recall scheme , The recall rate of the whole model has increased greatly , But still not 100%. Analysis found , The obstacle to a higher recall rate is NLP For address resolution coverage .
The starting point of the whole plan : Considering the complexity of the model , Also just using address information , In the ascension model VC On the basis of Wei , Using other model schemes can at least balance the effect of tree model , If other information can be integrated on this basis , So for the baseline of the original model , There will be further improvements .
Consider not only the need to use address data , At the same time, we need to use GPS data 、 A lot of ID Class Embedding, For all kinds of non numerical types of processing flexibility considerations , Adopt a deep learning program , To ensure that multi-source and multi-type features can be optimized in the same optimization system .
Points to be considered in Engineering : The delivery model serves as the base model , It is widely used in path construction 、 pricing 、ETA And so on , In the tree model version , The requirement for performance is the average response time 5ms, This scheme needs to consider the performance of the original business , It can not significantly increase the calculation time .
The difficulty of delivery model is that there are many non numerical features , The diversity of information acquisition forms , The current bottleneck is not the low complexity of the model . If information and fusion can be obtained lightly , There's no need to be right Fusion After the information to do heavier processing .
So the overall design idea is : Using deep learning to fuse non numerical features , In a simple Fusion On the basis of , Get the output structure directly , Choice of components , Choose... As much as possible Flops Lower design . The intention behind the design is , Make full use of the original input information , On the basis of avoiding information loss as much as possible , Incorporate non numerical information into . And fully integrate information , Direct docking of required targets . The fusion component structure is selected to ensure high performance as much as possible , And have high learning efficiency . Here is the address for the selection of a more Robust Of LSTM, in the light of GPS Choose the custom bilinear Embedding, Both performance and effect .
Iterative path Start using end-to-end deep learning model , The first thing we need to solve here is coverage , Direct adoption LSTM Read charLevel Address data for , Direct output delivery time through full connection layer . As the first version of the data , This version of data is basically the same as the tree model , But for tree models not recalled 20% data , A great improvement .
When using charLevel When the address works , Start adding user addresses GPS Information about , because GPS For latitude and longitude information , Non numerical data , We use a bilinear interpolation method based on the geographical location grid Embedding. The scheme has certain expansibility , For different GPS All can reasonably get Embedding vector , At the same time, it has smooth characteristics , For many pairs with small offset GPS Point can support well .
The final solution will address Embedding after , as well as GPS Dot Embedding After transformation , Join the order time 、 City ID、 Area ID Wait for the feature , Then feature fusion and transformation , Get the time estimate output of the delivery model . The whole model is an end-to-end training , All parameters are Trainable.
Note: This scheme is relatively complicated , Suitable for large-scale use , Not suitable at the initial stage .
Extension components
Is confirming End-to-End When the path works , We started to build the extended components , Include custom loss functions 、 Data sampling correction 、 National Model unification and other operations , Get a series of positive effects , And develop online .
Feature importance analysis
For deep learning models , We have a series of feature importance assessment programs , This is done in sequence Feature Permutation The way , As a way to assess the importance of model features .
The characteristics and importance of this scheme : Address of the user >GPS Longitude and latitude > Other features .
notes : In other cases of distribution , Merchant GPS The importance of latitude and longitude > Importance of user address > user GPS The importance of latitude and longitude , There may be obvious differences under different learning objectives .
Final effect
End-to-End The final effect of deep learning model is more significant : The most painful point for tree model and vector recall scheme , Coverage is completely addressed , Coverage increased to 100%.ME falling 4.96s,MAE falling 8.17s,1min The absolute deviation rate decreases 2.38pp,2min The absolute deviation rate decreases 5.08pp,3min The absolute deviation rate decreases 3.46pp. meanwhile , For the waybill that cannot be covered by previous tree model and vector recall scheme , Ascension is more obvious .
3. Model correlation analysis
It mainly analyzes the performance of vector recall scheme and deep learning scheme , In order to confirm the performance indicators online , Finally, it can ensure that the performance can meet the requirements .
3.1 Vector recall performance
Nearest neighbor search (Nearest Neighbor Search) It refers to finding the problem closest to the query point in a high-dimensional space . When the data sample is small , Through linear search can meet the demand , But as the amount of data increases , If it reaches millions 、 Hundreds of millions of hours , Tend to structure data to express vector information more accurately .
At this time, the nearest neighbor search is approximate ANN(Approximate Nearest Neighbor) It's a reference technology , It's able to do something similar to recall , And linear search , Balance efficiency and accuracy . At present, there are basically the following 3 Class mainstream method : Tree based approach , Such as K-D Trees, etc ; Hash based method , for example LSH; Based on vector quantization method , for example PQ Product quantization . In the industrial retrieval system , Product quantization is an index method used more often .
Tools for vector recall , There are a lot of open source implementations , In the process of technology selection , You can refer to ANN-Benchmarks as well as Erikbern/ANN-Benchmarks Performance evaluation results in .
3.2 Sequence module performance
In order to evaluate the performance bottleneck of deep learning delivery model , First, the whole model is Profile, The whole calculation is mostly consumed in the processing part of the sequence module . The computing performance of the sequence module needs to be evaluated OP Acceleration of operators .
3.3 Vector effect analysis
Compare vector recall with deep learning model , In the middle of the two processes, high-dimensional vectors are generated . Compared to vector recall , Where is the improvement brought about by deep learning model ? Supervised lstm Learned Embedding Vector and self supervised Word2Vec How different are the resulting vectors in the calculation of address similarity , Which is better or worse? ?
First , Let's analyze the first question ,End-to-End Where does the model upgrade come from ? After testing, it is known that , The improvement of deep learning model mainly comes from redundant information in address ( Compared to vector recall ) The use of , Next is the addition of many new features . in addition , Compare two End-to-End Effect of model , Address affixes also contain information useful for matching addresses .
For the second question , Supervised End-to-End Learned Embedding vector , With self supervision Word2Vec How different are the resulting vectors in the calculation of address similarity , Which is better or worse? ? Use address trunk instead of full address , As End-to-End Model input for training , Other information remains the same . Use the address trunk to train Embedding vector , Apply to the vector recall scheme .
Judging from the evaluation results , For different thresholds ,End-to-End The difference in performance is relative Word2Vec smaller . At the same threshold ,End-to-End Higher recall rate , But the effect is not as good as Word2Vec.
From the result of similar calculation ,End-to-End The model will put some semantically unrelated but delivery time similar addresses , Map to the same vector space , and Word2Vec It is to learn a more general text vector representation .
If you want to target more complex goals and introduce more information , have access to End-to-End frame ; Just calculating text similarity , From the results of the experiment ,Word2Vec Better . meanwhile , By looking at Case It can also be found ,End-to-End Pay more attention to the similarity of results , So we recall some vectors that are not semantically relevant . The difference between the two model objectives , This leads to differences in results .
Reference
边栏推荐
- torch.where的新用法(很老但是大家忽略的用法)
- Vue —— 进阶 vue-router 路由(二)(replace属性、编程式路由导航、缓存路由组件、路由的专属钩子)
- tarfile解压嵌套tar
- MYSQL:Expression #4 of SELECT list is not in GROUP BY clause and contains nonaggregated column
- 被八股文害惨了。。。
- 309. the best time to buy and sell stocks includes the freezing period
- A story on the cloud of the Centennial Olympic Games belonging to Alibaba cloud video cloud
- Quickly copy the request in browser F12 to postman/ or generate the corresponding code of the relevant language
- MYSQL:Expression #4 of SELECT list is not in GROUP BY clause and contains nonaggregated column
- wireshark基本使用命令
猜你喜欢

The difference between user status and system status in CRM

Difference between rxjs of() and of ({})

从应无所住说起

Typescript common types (I)

Gospel of audio and video developers, rapid integration of AI dubbing capability

基于Halcon的螺栓螺丝部分划痕、腐蚀缺陷检测

Problems that the sap Spartacus e-commerce cloud UI shipping method does not display in the unit test environment
![[blockbuster release] ant dynamic card, enabling the app home page to realize agile update](/img/65/5ed80090f4d0ee92b01888eb496528.jpg)
[blockbuster release] ant dynamic card, enabling the app home page to realize agile update

Machine learning series (3): logistic regression

Shenzhen has been shut down for 7 days since March 14. Home office experience | community essay solicitation
随机推荐
yoloe 目标检测使用笔记
从应无所住说起
Solution to the problem that the anaconda navigator card logo cannot be opened and the card will flash back - replace the alicloud image source
dumi 搭建文档型博客
C language learning -- data storage in memory
PHP:Fatal error: Allowed memory size of 262144 bytes exhausted (tried to allocat
Review of MySQL (10): three paradigms of database
Review of MySQL (VI): usage of Union and limit
JS quick sort
2022.6.12-----leetcode.890
CEPH deploy offline deployment of CEPH cluster and error reporting FAQ
[blockbuster release] ant dynamic card, enabling the app home page to realize agile update
Mise en œuvre de l'ACL réflexe dans le simulateur Cisco Cisco Packet Tracer
Research Report on the overall scale, major manufacturers, major regions, products and applications of Electric Screwdrivers in the global market in 2022
Double non grind one, three side byte, cool. Next time
SCI Writing - Methodology
OpenGL shadow implementation (hard shadow)
Typescript advanced type (2)
Summary of static memory allocation and dynamic memory allocation
bilibili视频列表名字太长显示不全的解决方法