当前位置:网站首页>Challenges of machine learning system in production
Challenges of machine learning system in production
2022-06-27 11:04:00 【Steven Devin】
The challenge of machine learning system in production
Machine learning and deep learning have become very popular in the past few years ,
However, most of the materials on the Internet and the teaching in the classroom are based on the construction model and adjustment model .
However, in actual production ,
Machine learning engineers are not only responsible for building and maintaining models , More need to master some software engineering skills .
Most companies have only started using machine learning technology in the past few years , Or develop related systems .
And there are few companies that develop and run machine learning systems on a large scale .
In the operating system , There are often challenges , This article is intended to discuss in depth some of the challenges of running machine learning systems .
1. Organize machine learning experiment process
The development of machine learning is an iterative process .
The data needs to be 、 Learning algorithm and various combinations of model parameters ,
And track the impact of these changes on prediction performance .
as time goes on , This iterative experiment may produce thousands of model training runs and model versions .
This makes it difficult to track the best performing model and re-enter the configuration of the best model .
Like traditional software engineering , Few people develop models over time .
Team turnover 、 Target changes and new data sets and functionality changes are common .
therefore , We should expect to build the model for the first time , The experimental process of building the model will last for a long time .
Compare the current experimental results with the past experimental results , It will become increasingly difficult to identify opportunities for further improvement ,
This requires a system to track experimental metadata and the impact of different parameters on prediction performance .
2. Conditioning and training models
When in Jupyter When training models in interactive programming environments such as notebooks , Debugging the model training task is a simple thing .
Run the code manually , If a training error occurs ,Jupyter The notebook will display exceptions and stack traces .
If the training is successful , Visual learning curves and other indicators can also be displayed .
Further diagnose whether the model has been fitted or the gradient disappears .
But when the model is in a fixed time , Automated batch processing , Adjusting the model will become difficult .
Although the scheduler will rerun the failed training process , But unless you write a custom solution , Otherwise they cannot easily check for over fitting and gradient vanishing .
And the goal of the data science team is to deploy more and more models , When more and more models appear in this process , The problem will only get worse .
3. Deploy the model to the production environment
The machine learning model can only be used by its users , To start adding value to the company .
Use trained ML The process of modeling and providing its predictions to users or other systems is called 「 Deploy 」.
Deployment and feature engineering 、 Conventional machine learning tasks such as model selection or model evaluation are completely different .
therefore , Lack of software engineering or DevOps Background data scientists and ML Engineers may not know much about this deployment .
When deciding how to deploy the machine learning model , There are many factors to consider :
- How often should forecasts be generated .
- Whether forecasts should be generated for a single instance or a batch of instances at a time .
- Number of access model applications .
- Latency requirements for these applications .
4. Expand machine learning services
If the model has been deployed to the endpoint , They can begin to provide value to users .
But model endpoints may face higher workloads in the near future .
for example , If the company starts to serve more users , These increased demands may reduce the quality of your machine learning services .
As API Endpoint managed ML Models often need to respond to this change in demand .
When requesting an increase , The number of calculation instances serving the model should be increased , When the workload decreases , The calculation instance should be deleted , This way, you don't have to pay for unused instances .
5. Model monitoring
This stage is the real beginning .
When the model completes the platform deployment , Another important thing is to monitor the predictions and exceptions of the model .
This phase must continuously monitor the model , Detect and eliminate the deviation of model quality , For example, data drift .
Proactively detect these deviations as early as possible , Able to take corrective actions , For example, retraining the model 、 Review upstream systems or fix data quality problems , Without manually monitoring the model or building additional tools .
At the end :
Sometimes it is a good choice to avoid building your own machine learning infrastructure .
Leverage a wide range of open source tools and platforms , Build models that provide differentiated value .

边栏推荐
- [tcaplusdb knowledge base] Introduction to tcaplusdb tcaplusadmin tool
- Mail system (based on SMTP protocol and POP3 protocol -c language implementation)
- 记一次 .NET 某物管后台服务 卡死分析
- 政策关注 | 加快构建数据基础制度,维护国家数据安全
- KDD 2022 | 基于分层图扩散学习的癫痫波预测
- [从零开始学习FPGA编程-47]:视野篇 - 第三代半导体技术现状与发展趋势
- Microsoft cloud technology overview
- 20 jeunes Pi recrutés par l'Institut de microbiologie de l'Académie chinoise des sciences, 2 millions de frais d'établissement et 10 millions de fonds de démarrage (à long terme)
- Oracle group statistics query
- [methodot topic] what kind of low code platform is more suitable for developers?
猜你喜欢
![[tcapulusdb knowledge base] Introduction to tmonitor background one click installation (II)](/img/0a/742503e96a9b51735f5fd3f598b9af.png)
[tcapulusdb knowledge base] Introduction to tmonitor background one click installation (II)

记一次 .NET 某物管后台服务 卡死分析

TCP/IP 详解(第 2 版) 笔记 / 3 链路层 / 3.4 桥接器与交换机 / 3.4.1 生成树协议(Spanning Tree Protocol (STP))

Tcp/ip explanation (version 2) notes / 3 link layer / 3.4 bridge and switch / 3.4.1 spanning tree protocol (STP)

微软云 (Microsoft Cloud) 技术概述
![[tcaplusdb knowledge base] Introduction to tcaplusdb tcaplusadmin tool](/img/ba/f865c99f3ea9e42c85b7e906f4f076.png)
[tcaplusdb knowledge base] Introduction to tcaplusdb tcaplusadmin tool
![21: Chapter 3: develop pass service: 4: further improve [send SMS, interface]; (in [send SMS, interface], call Alibaba cloud SMS service and redis service; a design idea: basecontroller;)](/img/a7/ce0dc8e53569703aa02843f1fc1cf4.png)
21: Chapter 3: develop pass service: 4: further improve [send SMS, interface]; (in [send SMS, interface], call Alibaba cloud SMS service and redis service; a design idea: basecontroller;)

【TcaplusDB知识库】TcaplusDB-tcapsvrmgr工具介绍(一)

Future & CompletionService

直播电子商务应用程序开发需要什么基本功能?未来发展前景如何?
随机推荐
[tcapulusdb knowledge base] tcapulusdb business data backup introduction
Oracle-多表查询
直播電子商務應用程序開發需要什麼基本功能?未來發展前景如何?
[tcapulusdb knowledge base] Introduction to tcapulusdb analytical text export
Go zero micro Service Practice Series (VII. How to optimize such a high demand)
防止被00后整顿?一公司招聘要求员工不能起诉公司
Evolution of software system architecture
【TcaplusDB知识库】TcaplusDB OMS业务人员权限介绍
Red envelope rain: a wonderful encounter between redis and Lua
[tcapulusdb knowledge base] Introduction to tmonitor stand-alone installation guidelines (II)
Deep learning in finance in cross sectional sectional predictions for random forests
[tcaplusdb knowledge base] Introduction to tcaplusdb tcaplusadmin tool
Future & CompletionService
杰理之一直喂狗会频繁开关中断导致定时器【篇】
ECMAScript 6(es6)
[tcapulusdb knowledge base] Introduction to tmonitor background one click installation (I)
Institute of Microbiology, Chinese Academy of Sciences recruited 20 young PI, with a resettlement fee of 2million yuan and a start-up fund of 10million yuan (long-term effective)
Queue, two-way queue, and its application
Change PIP mirror source
[从零开始学习FPGA编程-47]:视野篇 - 第三代半导体技术现状与发展趋势