当前位置:网站首页>Run faster with go: use golang to serve machine learning
Run faster with go: use golang to serve machine learning
2022-07-05 14:37:00 【fifteen billion two hundred and thirty-one million one hundred 】
use Go Run faster : Use Golang Serve machine learning
therefore , Our requirement is to complete every second with as few resources as possible 300 Ten thousand predictions . thankfully , This is a relatively simple recommendation system model , That is, dobby slot machine (MAB). Dobby slot machines usually involve from Beta Distribution Sampling in equal distribution . This is also where it takes the most time . If we can do as many samples as possible at the same time , We can make good use of resources . Maximizing resource utilization is the key to reducing the overall resources required by the model .
Our current forecasting service uses Python Written microservices , They follow the following general structure :
request -> Function acquisition -> forecast -> Post processing -> return
A request may require us to respond to thousands of users 、 Score the content . with GIL And multi process Python The handling performance is very good , We have achieved the goal based on cython and C++ Batch sampling method , Around GIL, We use many based on the number of cores workers To handle requests concurrently .
Currently, single node Python Service can be done 192 individual RPS , Each about 400 Yes . Average CPU Utilization is only 20% about . The limiting factor now is language 、 Service framework and network call to storage function .
Why Golang?
Golang It's a statically typed language , It is very instrumental . This means that errors will be detected early , And it's easy to refactor code .Golang Concurrency of is native , This is for machine learning algorithms that can run in parallel and for Featurestore Concurrent network calls are very important . It is here One of the fastest service languages in the benchmark . It is also a compilation language , So it can be optimized at compile time .
Transplant the existing MAB To Golang On
The basic idea , Divide the system into 3 Parts of :
Basic for prediction and health REST API And stub Featurestore Acquisition , To do this, implement a module Use cgo Ascension and transfer c++ Sampling code
The first part is easy , I chose Fiber Framework for REST API. It seems to be the most popular , Well documented , similar Expressjs Of API. And it performs quite well in the benchmark .
Early code :
func main() {
// setup fiber
app := fiber.New()
// catch all exception
app.Use(recover.New())
// load model struct
ctx := context.Background()
md, err := model.NewModel(ctx)
if err != nil {
fmt.Println(err)
}
defer md.Close()
// health API
app.Get("/health", func(c *fiber.Ctx) error {
if err != nil {
return fiber.NewError(
fiber.StatusServiceUnavailable,
fmt.Sprintf("Model couldn't load: %v", err))
}
return c.JSON(&fiber.Map{
"status": "ok",
})
})
// predict API
app.Post("/predict", func(c *fiber.Ctx) error {
var request map[string]interface{}
err := json.Unmarshal(c.Body(), &request)
if err != nil {
return err
}
return c.JSON(md.Predict(request))
})
That's it , Once the task is completed . It took less than an hour .
In the second part , You need to learn a little about how to write Structure with method and goroutines . And C++ and Python One of the main differences is ,Golang Full object-oriented programming is not supported , Mainly, inheritance is not supported . Its method on the structure is also completely different from other languages I have encountered .
What we use Featurestore Yes Golang client , All I have to do is write a wrapper around it to read a large number of concurrent entities .
The basic structure I want is :
type VertexFeatureStoreClient struct {
//client reference to gcp's client
}
func NewVertexFeatureStoreClient(ctx context.Context,) (*VertexFeatureStoreClient, error) {
// client creation code
}
func (vfs *VertexFeatureStoreClient) GetFeaturesByIdsChunk(ctx context.Context, featurestore, entityName string, entityIds []string, featureList []string) (map[string]map[string]interface{}, error) {
// fetch code for 100 items
}
func (vfs *VertexFeatureStoreClient) GetFeaturesByIds(ctx context.Context, featurestore, entityName string, entityIds []string, featureList []string) (map[string]map[string]interface{}, error) {
const chunkSize = 100 // limit from GCP
// code to run each fetch concurrently
featureChannel := make(chan map[string]map[string]interface{})
errorChannel := make(chan error)
var count = 0
for i := 0; i < len(entityIds); i += chunkSize {
end := i + chunkSize
if end > len(entityIds) {
end = len(entityIds)
}
go func(ents []string) {
features, err := vfs.GetFeaturesByIdsChunk(ctx, featurestore, entityName, ents, featureList)
if err != nil {
errorChannel <- err
return
}
featureChannel <- features
}(entityIds[i:end])
count++
}
results := make(map[string]map[string]interface{}, len(entityIds))
for {
select {
case err := <-errorChannel:
return nil, err
case res := <-featureChannel:
for k, v := range res {
results[k] = v
}
}
count--
if count < 1 {
break
}
}
return results, nil
}
func (vfs *VertexFeatureStoreClient) Close() error {
//close code
}
About Goroutine A hint of
Use as many channels as possible , There are many tutorials to use Goroutine Of sync workgroups. Those are lower level API, In most cases, you don't need . The channel is running Goroutine In an elegant way , Even if you don't need to pass data , You can send flags in the channel to collect .goroutines Is a cheap virtual thread , You don't have to worry about making too many threads and running on multiple cores . Abreast of the times golang It can run across cores for you .
About the third part , This is the hardest part . It took about a day to debug it . therefore , If your use case does not require complex sampling and C++, I suggest using it directly Gonum , You'll save yourself a lot of time .
I didn't realize , from cython when , I have to compile it manually C++ file , And load it into cgo include flags in .
The header file :
#ifndef BETA_DIST_H
#define BETA_DIST_H
#ifdef __cplusplus
extern "C"
{
#endif
double beta_sample(double, double, long);
#ifdef __cplusplus
}
#endif
#endif
Be careful extern C , This is a C++ Code in go Need to be used in , because mangling ,C Unwanted . Another problem is , I can't do anything in the header file #include sentence , under these circumstances cgo link failure ( Unknown cause ). So I moved these statements to .cpp In file .
Compile it :
g++ -fPIC -I/usr/local/include -L/usr/local/lib betadist.cpp -shared -o libbetadist.so
Once the compilation is complete , You can use it cgo.
cgo Packaging documents :
/*
#cgo CPPFLAGS: -I${SRCDIR}/cbetadist
#cgo CPPFLAGS: -I/usr/local/include
#cgo LDFLAGS: -Wl,-rpath,${SRCDIR}/cbetadist
#cgo LDFLAGS: -L${SRCDIR}/cbetadist
#cgo LDFLAGS: -L/usr/local/lib
#cgo LDFLAGS: -lstdc++
#cgo LDFLAGS: -lbetadist
#include <betadist.hpp>
*/
import "C"
func Betasample(alpha, beta float64, random int) float64 {
return float64(C.beta_sample(C.double(alpha), C.double(beta), C.long(random)))
}
Be careful LDFLAGS Medium -lbetadist Is used to link libbetadist.so Of . You must also run export DYLD_LIBRARY_PATH=/fullpath_to/folder_containing_so_file/ . Then I can run go run . , It can be like go Work like a project .
It is very simple to integrate them with simple model structure and prediction methods , And it takes less time .
result

| Metric | Python | Go |
|---|---|---|
| Max RPS | 192 | 819 |
| Max latency | 78ms | 110ms |
| Max CPU util. | ~20% | ~55% |
That's right RPS Of 4.3 times The promotion of , This makes our minimum number of nodes from 80 Reduce to 19 individual , This is a huge cost advantage . The maximum delay is slightly higher , But it's acceptable , because python The service is 192 It is already saturated by o'clock , If the flow exceeds this figure , It will decrease significantly .
I should convert all my models into Golang Do you ?
A short answer : no need .
Long answer .Go It has great advantages in service , but Python It is still the king of experiments . I only recommend using it in the basic model with simple model and long-term operation Go, Not experiments .Go For complex ML For use cases still Not very mature .
So the elephant in the room , Why not Rust ?
Um. , Schiff did it . Have a look . It's even better than Go faster .

边栏推荐
- Thymeleaf 使用后台自定义工具类处理文本
- R language uses the polR function of mass package to build an ordered multi classification logistic regression model, and uses the coef function to obtain the log odds ratio corresponding to each vari
- 03_ Dataimport of Solr
- How to make a second clip of our media video without infringement
- 一网打尽异步神器CompletableFuture
- 黑马程序员-软件测试-10阶段2-linux和数据库-44-57为什么学习数据库,数据库分类关系型数据库的说明Navicat操作数据的说明,Navicat操作数据库连接说明,Navicat的基本使用,
- 做自媒體視頻二次剪輯,怎樣剪輯不算侵權
- 【学习笔记】图的连通性与回路
- CPU设计实战-第四章实践任务三用前递技术解决相关引发的冲突
- Faire un clip vidéo auto - média deux fois, comment clip n'est pas considéré comme une infraction
猜你喜欢

729. 我的日程安排表 I :「模拟」&「线段树(动态开点)」&「分块 + 位运算(分桶)」

How to choose the appropriate certificate brand when applying for code signing certificate?

leetcode:881. 救生艇
![[learning notes] stage test 1](/img/22/ad16375d8d1510c2ec75c56403a8bf.png)
[learning notes] stage test 1

实现一个博客系统----使用模板引擎技术

How can non-technical departments participate in Devops?

Pointer operation - C language

无密码身份验证如何保障用户隐私安全?

Penetration testing methodology

How to protect user privacy without password authentication?
随机推荐
C language -- structure and function
浅谈Dataset和Dataloader在加载数据时如何调用到__getitem__()函数
729. 我的日程安排表 I :「模拟」&「线段树(动态开点)」&「分块 + 位运算(分桶)」
Pointer operation - C language
webRTC SDP mslabel lable
家用电器行业商业供应链协同平台解决方案:供应链系统管理精益化,助推企业智造升级
R language uses the multinom function of NNET package to build an unordered multi classification logistic regression model, and uses the coef function to obtain the log odds ratio corresponding to eac
Topology可视化绘图引擎
LeetCode_ 2 (add two numbers)
Photoshop插件-动作相关概念-非加载执行动作文件中动作-PS插件开发
Implement a blog system -- using template engine technology
R language ggplot2 visualization: use ggplot2 to visualize the scatter diagram, and use the labs parameter to customize the X axis label text (customize X axis labels)
周大福践行「百周年承诺」,真诚服务推动绿色环保
动态规划
PyTorch二分类时BCELoss,CrossEntropyLoss,Sigmoid等的选择和使用
anaconda使用中科大源
LeetCode_ 69 (square root of x)
R language uses the polR function of mass package to build an ordered multi classification logistic regression model, and uses the coef function to obtain the log odds ratio corresponding to each vari
CPU设计相关笔记
超级哇塞的快排,你值得学会!