当前位置:网站首页>Run faster with go: use golang to serve machine learning
Run faster with go: use golang to serve machine learning
2022-07-05 14:37:00 【fifteen billion two hundred and thirty-one million one hundred 】
use Go Run faster : Use Golang Serve machine learning
therefore , Our requirement is to complete every second with as few resources as possible 300 Ten thousand predictions . thankfully , This is a relatively simple recommendation system model , That is, dobby slot machine (MAB). Dobby slot machines usually involve from Beta Distribution Sampling in equal distribution . This is also where it takes the most time . If we can do as many samples as possible at the same time , We can make good use of resources . Maximizing resource utilization is the key to reducing the overall resources required by the model .
Our current forecasting service uses Python Written microservices , They follow the following general structure :
request -> Function acquisition -> forecast -> Post processing -> return
A request may require us to respond to thousands of users 、 Score the content . with GIL And multi process Python The handling performance is very good , We have achieved the goal based on cython and C++ Batch sampling method , Around GIL, We use many based on the number of cores workers To handle requests concurrently .
Currently, single node Python Service can be done 192 individual RPS , Each about 400 Yes . Average CPU Utilization is only 20% about . The limiting factor now is language 、 Service framework and network call to storage function .
Why Golang?
Golang It's a statically typed language , It is very instrumental . This means that errors will be detected early , And it's easy to refactor code .Golang Concurrency of is native , This is for machine learning algorithms that can run in parallel and for Featurestore Concurrent network calls are very important . It is here One of the fastest service languages in the benchmark . It is also a compilation language , So it can be optimized at compile time .
Transplant the existing MAB To Golang On
The basic idea , Divide the system into 3 Parts of :
Basic for prediction and health REST API And stub Featurestore Acquisition , To do this, implement a module Use cgo Ascension and transfer c++ Sampling code
The first part is easy , I chose Fiber Framework for REST API. It seems to be the most popular , Well documented , similar Expressjs Of API. And it performs quite well in the benchmark .
Early code :
func main() {
// setup fiber
app := fiber.New()
// catch all exception
app.Use(recover.New())
// load model struct
ctx := context.Background()
md, err := model.NewModel(ctx)
if err != nil {
fmt.Println(err)
}
defer md.Close()
// health API
app.Get("/health", func(c *fiber.Ctx) error {
if err != nil {
return fiber.NewError(
fiber.StatusServiceUnavailable,
fmt.Sprintf("Model couldn't load: %v", err))
}
return c.JSON(&fiber.Map{
"status": "ok",
})
})
// predict API
app.Post("/predict", func(c *fiber.Ctx) error {
var request map[string]interface{}
err := json.Unmarshal(c.Body(), &request)
if err != nil {
return err
}
return c.JSON(md.Predict(request))
})
That's it , Once the task is completed . It took less than an hour .
In the second part , You need to learn a little about how to write Structure with method and goroutines . And C++ and Python One of the main differences is ,Golang Full object-oriented programming is not supported , Mainly, inheritance is not supported . Its method on the structure is also completely different from other languages I have encountered .
What we use Featurestore Yes Golang client , All I have to do is write a wrapper around it to read a large number of concurrent entities .
The basic structure I want is :
type VertexFeatureStoreClient struct {
//client reference to gcp's client
}
func NewVertexFeatureStoreClient(ctx context.Context,) (*VertexFeatureStoreClient, error) {
// client creation code
}
func (vfs *VertexFeatureStoreClient) GetFeaturesByIdsChunk(ctx context.Context, featurestore, entityName string, entityIds []string, featureList []string) (map[string]map[string]interface{}, error) {
// fetch code for 100 items
}
func (vfs *VertexFeatureStoreClient) GetFeaturesByIds(ctx context.Context, featurestore, entityName string, entityIds []string, featureList []string) (map[string]map[string]interface{}, error) {
const chunkSize = 100 // limit from GCP
// code to run each fetch concurrently
featureChannel := make(chan map[string]map[string]interface{})
errorChannel := make(chan error)
var count = 0
for i := 0; i < len(entityIds); i += chunkSize {
end := i + chunkSize
if end > len(entityIds) {
end = len(entityIds)
}
go func(ents []string) {
features, err := vfs.GetFeaturesByIdsChunk(ctx, featurestore, entityName, ents, featureList)
if err != nil {
errorChannel <- err
return
}
featureChannel <- features
}(entityIds[i:end])
count++
}
results := make(map[string]map[string]interface{}, len(entityIds))
for {
select {
case err := <-errorChannel:
return nil, err
case res := <-featureChannel:
for k, v := range res {
results[k] = v
}
}
count--
if count < 1 {
break
}
}
return results, nil
}
func (vfs *VertexFeatureStoreClient) Close() error {
//close code
}
About Goroutine A hint of
Use as many channels as possible , There are many tutorials to use Goroutine Of sync workgroups. Those are lower level API, In most cases, you don't need . The channel is running Goroutine In an elegant way , Even if you don't need to pass data , You can send flags in the channel to collect .goroutines Is a cheap virtual thread , You don't have to worry about making too many threads and running on multiple cores . Abreast of the times golang It can run across cores for you .
About the third part , This is the hardest part . It took about a day to debug it . therefore , If your use case does not require complex sampling and C++, I suggest using it directly Gonum , You'll save yourself a lot of time .
I didn't realize , from cython when , I have to compile it manually C++ file , And load it into cgo include flags in .
The header file :
#ifndef BETA_DIST_H
#define BETA_DIST_H
#ifdef __cplusplus
extern "C"
{
#endif
double beta_sample(double, double, long);
#ifdef __cplusplus
}
#endif
#endif
Be careful extern C , This is a C++ Code in go Need to be used in , because mangling ,C Unwanted . Another problem is , I can't do anything in the header file #include sentence , under these circumstances cgo link failure ( Unknown cause ). So I moved these statements to .cpp In file .
Compile it :
g++ -fPIC -I/usr/local/include -L/usr/local/lib betadist.cpp -shared -o libbetadist.so
Once the compilation is complete , You can use it cgo.
cgo Packaging documents :
/*
#cgo CPPFLAGS: -I${SRCDIR}/cbetadist
#cgo CPPFLAGS: -I/usr/local/include
#cgo LDFLAGS: -Wl,-rpath,${SRCDIR}/cbetadist
#cgo LDFLAGS: -L${SRCDIR}/cbetadist
#cgo LDFLAGS: -L/usr/local/lib
#cgo LDFLAGS: -lstdc++
#cgo LDFLAGS: -lbetadist
#include <betadist.hpp>
*/
import "C"
func Betasample(alpha, beta float64, random int) float64 {
return float64(C.beta_sample(C.double(alpha), C.double(beta), C.long(random)))
}
Be careful LDFLAGS Medium -lbetadist Is used to link libbetadist.so Of . You must also run export DYLD_LIBRARY_PATH=/fullpath_to/folder_containing_so_file/ . Then I can run go run . , It can be like go Work like a project .
It is very simple to integrate them with simple model structure and prediction methods , And it takes less time .
result

| Metric | Python | Go |
|---|---|---|
| Max RPS | 192 | 819 |
| Max latency | 78ms | 110ms |
| Max CPU util. | ~20% | ~55% |
That's right RPS Of 4.3 times The promotion of , This makes our minimum number of nodes from 80 Reduce to 19 individual , This is a huge cost advantage . The maximum delay is slightly higher , But it's acceptable , because python The service is 192 It is already saturated by o'clock , If the flow exceeds this figure , It will decrease significantly .
I should convert all my models into Golang Do you ?
A short answer : no need .
Long answer .Go It has great advantages in service , but Python It is still the king of experiments . I only recommend using it in the basic model with simple model and long-term operation Go, Not experiments .Go For complex ML For use cases still Not very mature .
So the elephant in the room , Why not Rust ?
Um. , Schiff did it . Have a look . It's even better than Go faster .

边栏推荐
- 裁员下的上海
- 【学习笔记】阶段测试1
- ASP.NET大型外卖订餐系统源码 (PC版+手机版+商户版)
- Photoshop插件-动作相关概念-ActionList-ActionDescriptor-ActionList-动作执行加载调用删除-PS插件开发
- PHP - fatal error: allowed memory size of 314572800 bytes exhausted
- Webrtc learning (II)
- 选择排序和冒泡排序
- 浅谈Dataset和Dataloader在加载数据时如何调用到__getitem__()函数
- dynamic programming
- Which Internet companies are worth going to in Shenzhen for software testers [Special Edition for software testers]
猜你喜欢

【华为机试真题详解】字符统计及重排

Security analysis of Web Architecture

用 Go 跑的更快:使用 Golang 为机器学习服务

家用电器行业商业供应链协同平台解决方案:供应链系统管理精益化,助推企业智造升级

Opengauss database source code analysis series articles -- detailed explanation of dense equivalent query technology (Part 2)

Lepton 无损压缩原理及性能分析

Redis如何实现多可用区?

How to protect user privacy without password authentication?

无密码身份验证如何保障用户隐私安全?

循环不变式
随机推荐
APR protocol and defense
How to protect user privacy without password authentication?
网上电子元器件采购商城:打破采购环节信息不对称难题,赋能企业高效协同管理
Sharing the 12 most commonly used regular expressions can solve most of your problems
一键更改多个文件名字
开挖财上的证券账户可以吗?安全吗?
Is the securities account given by the head teacher of qiniu school safe? Can I open an account?
CPU设计实战-第四章实践任务二用阻塞技术解决相关引发的冲突
【华为机试真题详解】字符统计及重排
01. Solr7.3.1 deployment and configuration of jetty under win10 platform
【華為機試真題詳解】歡樂的周末
黑马程序员-软件测试-10阶段2-linux和数据库-44-57为什么学习数据库,数据库分类关系型数据库的说明Navicat操作数据的说明,Navicat操作数据库连接说明,Navicat的基本使用,
想进阿里必须啃透的12道MySQL面试题
Lepton 无损压缩原理及性能分析
实现一个博客系统----使用模板引擎技术
CPU设计实战-第四章实践任务三用前递技术解决相关引发的冲突
注意!软件供应链安全挑战持续升级
The speed monitoring chip based on Bernoulli principle can be used for natural gas pipeline leakage detection
区间 - 左闭右开
【华为机试真题详解】欢乐的周末