当前位置:网站首页>Run faster with go: use golang to serve machine learning
Run faster with go: use golang to serve machine learning
2022-07-05 14:37:00 【fifteen billion two hundred and thirty-one million one hundred 】
use Go Run faster : Use Golang Serve machine learning
therefore , Our requirement is to complete every second with as few resources as possible 300 Ten thousand predictions . thankfully , This is a relatively simple recommendation system model , That is, dobby slot machine (MAB). Dobby slot machines usually involve from Beta Distribution Sampling in equal distribution . This is also where it takes the most time . If we can do as many samples as possible at the same time , We can make good use of resources . Maximizing resource utilization is the key to reducing the overall resources required by the model .
Our current forecasting service uses Python Written microservices , They follow the following general structure :
request -> Function acquisition -> forecast -> Post processing -> return
A request may require us to respond to thousands of users 、 Score the content . with GIL And multi process Python The handling performance is very good , We have achieved the goal based on cython and C++ Batch sampling method , Around GIL, We use many based on the number of cores workers To handle requests concurrently .
Currently, single node Python Service can be done 192 individual RPS , Each about 400 Yes . Average CPU Utilization is only 20% about . The limiting factor now is language 、 Service framework and network call to storage function .
Why Golang?
Golang It's a statically typed language , It is very instrumental . This means that errors will be detected early , And it's easy to refactor code .Golang Concurrency of is native , This is for machine learning algorithms that can run in parallel and for Featurestore Concurrent network calls are very important . It is here One of the fastest service languages in the benchmark . It is also a compilation language , So it can be optimized at compile time .
Transplant the existing MAB To Golang On
The basic idea , Divide the system into 3 Parts of :
Basic for prediction and health REST API And stub Featurestore Acquisition , To do this, implement a module Use cgo Ascension and transfer c++ Sampling code
The first part is easy , I chose Fiber Framework for REST API. It seems to be the most popular , Well documented , similar Expressjs Of API. And it performs quite well in the benchmark .
Early code :
func main() {
// setup fiber
app := fiber.New()
// catch all exception
app.Use(recover.New())
// load model struct
ctx := context.Background()
md, err := model.NewModel(ctx)
if err != nil {
fmt.Println(err)
}
defer md.Close()
// health API
app.Get("/health", func(c *fiber.Ctx) error {
if err != nil {
return fiber.NewError(
fiber.StatusServiceUnavailable,
fmt.Sprintf("Model couldn't load: %v", err))
}
return c.JSON(&fiber.Map{
"status": "ok",
})
})
// predict API
app.Post("/predict", func(c *fiber.Ctx) error {
var request map[string]interface{}
err := json.Unmarshal(c.Body(), &request)
if err != nil {
return err
}
return c.JSON(md.Predict(request))
})
That's it , Once the task is completed . It took less than an hour .
In the second part , You need to learn a little about how to write Structure with method and goroutines . And C++ and Python One of the main differences is ,Golang Full object-oriented programming is not supported , Mainly, inheritance is not supported . Its method on the structure is also completely different from other languages I have encountered .
What we use Featurestore Yes Golang client , All I have to do is write a wrapper around it to read a large number of concurrent entities .
The basic structure I want is :
type VertexFeatureStoreClient struct {
//client reference to gcp's client
}
func NewVertexFeatureStoreClient(ctx context.Context,) (*VertexFeatureStoreClient, error) {
// client creation code
}
func (vfs *VertexFeatureStoreClient) GetFeaturesByIdsChunk(ctx context.Context, featurestore, entityName string, entityIds []string, featureList []string) (map[string]map[string]interface{}, error) {
// fetch code for 100 items
}
func (vfs *VertexFeatureStoreClient) GetFeaturesByIds(ctx context.Context, featurestore, entityName string, entityIds []string, featureList []string) (map[string]map[string]interface{}, error) {
const chunkSize = 100 // limit from GCP
// code to run each fetch concurrently
featureChannel := make(chan map[string]map[string]interface{})
errorChannel := make(chan error)
var count = 0
for i := 0; i < len(entityIds); i += chunkSize {
end := i + chunkSize
if end > len(entityIds) {
end = len(entityIds)
}
go func(ents []string) {
features, err := vfs.GetFeaturesByIdsChunk(ctx, featurestore, entityName, ents, featureList)
if err != nil {
errorChannel <- err
return
}
featureChannel <- features
}(entityIds[i:end])
count++
}
results := make(map[string]map[string]interface{}, len(entityIds))
for {
select {
case err := <-errorChannel:
return nil, err
case res := <-featureChannel:
for k, v := range res {
results[k] = v
}
}
count--
if count < 1 {
break
}
}
return results, nil
}
func (vfs *VertexFeatureStoreClient) Close() error {
//close code
}
About Goroutine A hint of
Use as many channels as possible , There are many tutorials to use Goroutine Of sync workgroups. Those are lower level API, In most cases, you don't need . The channel is running Goroutine In an elegant way , Even if you don't need to pass data , You can send flags in the channel to collect .goroutines Is a cheap virtual thread , You don't have to worry about making too many threads and running on multiple cores . Abreast of the times golang It can run across cores for you .
About the third part , This is the hardest part . It took about a day to debug it . therefore , If your use case does not require complex sampling and C++, I suggest using it directly Gonum , You'll save yourself a lot of time .
I didn't realize , from cython when , I have to compile it manually C++ file , And load it into cgo include flags in .
The header file :
#ifndef BETA_DIST_H
#define BETA_DIST_H
#ifdef __cplusplus
extern "C"
{
#endif
double beta_sample(double, double, long);
#ifdef __cplusplus
}
#endif
#endif
Be careful extern C , This is a C++ Code in go Need to be used in , because mangling ,C Unwanted . Another problem is , I can't do anything in the header file #include sentence , under these circumstances cgo link failure ( Unknown cause ). So I moved these statements to .cpp In file .
Compile it :
g++ -fPIC -I/usr/local/include -L/usr/local/lib betadist.cpp -shared -o libbetadist.so
Once the compilation is complete , You can use it cgo.
cgo Packaging documents :
/*
#cgo CPPFLAGS: -I${SRCDIR}/cbetadist
#cgo CPPFLAGS: -I/usr/local/include
#cgo LDFLAGS: -Wl,-rpath,${SRCDIR}/cbetadist
#cgo LDFLAGS: -L${SRCDIR}/cbetadist
#cgo LDFLAGS: -L/usr/local/lib
#cgo LDFLAGS: -lstdc++
#cgo LDFLAGS: -lbetadist
#include <betadist.hpp>
*/
import "C"
func Betasample(alpha, beta float64, random int) float64 {
return float64(C.beta_sample(C.double(alpha), C.double(beta), C.long(random)))
}
Be careful LDFLAGS Medium -lbetadist Is used to link libbetadist.so Of . You must also run export DYLD_LIBRARY_PATH=/fullpath_to/folder_containing_so_file/ . Then I can run go run . , It can be like go Work like a project .
It is very simple to integrate them with simple model structure and prediction methods , And it takes less time .
result

| Metric | Python | Go |
|---|---|---|
| Max RPS | 192 | 819 |
| Max latency | 78ms | 110ms |
| Max CPU util. | ~20% | ~55% |
That's right RPS Of 4.3 times The promotion of , This makes our minimum number of nodes from 80 Reduce to 19 individual , This is a huge cost advantage . The maximum delay is slightly higher , But it's acceptable , because python The service is 192 It is already saturated by o'clock , If the flow exceeds this figure , It will decrease significantly .
I should convert all my models into Golang Do you ?
A short answer : no need .
Long answer .Go It has great advantages in service , but Python It is still the king of experiments . I only recommend using it in the basic model with simple model and long-term operation Go, Not experiments .Go For complex ML For use cases still Not very mature .
So the elephant in the room , Why not Rust ?
Um. , Schiff did it . Have a look . It's even better than Go faster .

边栏推荐
- 裁员下的上海
- 一网打尽异步神器CompletableFuture
- 选择排序和冒泡排序
- webRTC SDP mslabel lable
- Is the securities account given by the head teacher of qiniu school safe? Can I open an account?
- useMemo,memo,useRef等相关hooks详解
- 面试突击62:group by 有哪些注意事项?
- 【数组和进阶指针经典笔试题12道】这些题,满足你对数组和指针的所有幻想,come on !
- Section - left closed right open
- 美国费城发生“安全事故” 2名警察遭枪杀
猜你喜欢

LeetCode_ 2 (add two numbers)

How does redis implement multiple zones?

循环不变式

FR练习题目---简单题

SaaS multi tenant solution for FMCG industry to build digital marketing competitiveness of the whole industry chain

面试突击62:group by 有哪些注意事项?

Under the crisis of enterprise development, is digital transformation the future savior of enterprises

freesurfer运行完recon-all怎么快速查看有没有报错?——核心命令tail重定向

Security analysis of Web Architecture

FR练习题目---综合题
随机推荐
Thymeleaf th:classappend属性追加 th:styleappend样式追加 th:data-自定义属性
APR protocol and defense
R language uses the polR function of mass package to build an ordered multi classification logistic regression model, and uses the coef function to obtain the log odds ratio corresponding to each vari
家用电器行业商业供应链协同平台解决方案:供应链系统管理精益化,助推企业智造升级
The forked VM terminated without saying properly goodbye
分享 12 个最常用的正则表达式,能解决你大部分问题
Thymeleaf 常用函数
PMP考试20天能通过吗?
浅谈Dataset和Dataloader在加载数据时如何调用到__getitem__()函数
乌卡时代下,企业供应链管理体系的应对策略
[learning notes] connectivity and circuit of graph
CPU设计实战-第四章实践任务二用阻塞技术解决相关引发的冲突
分享 20 个稀奇古怪的 JS 表达式,看看你能答对多少
Microframe technology won the "cloud tripod Award" at the global Cloud Computing Conference!
R language dplyr package select function, group_ By function, mutate function and cumsum function calculate the cumulative value of the specified numerical variable in the dataframe grouping data and
anaconda使用中科大源
Redis如何实现多可用区?
FR练习题目---简单题
黑马程序员-软件测试-10阶段2-linux和数据库-44-57为什么学习数据库,数据库分类关系型数据库的说明Navicat操作数据的说明,Navicat操作数据库连接说明,Navicat的基本使用,
通过npm 或者 yarn安装依赖时 报错 出现乱码解决方式