当前位置:网站首页>[ml] PMML of machine learning model -- Overview
[ml] PMML of machine learning model -- Overview
2022-07-29 06:00:00 【Machines don't learn I learn】
The application of machine learning model generally goes through two main processes :
- 1、 Offline development
- 2、 Online deployment
The offline part is responsible for model training and model export , The online part is responsible for importing models and making predictions .

The above pictures are from :https://zhuanlan.zhihu.com/p/30378213
One 、PMML Brief introduction
PMML(Predictive Model Markup Language,PMML)) It's a set of bases XML standard , Platform and environment independent model representation language . It is mainly through XML schema Defines and stores the core elements of an algorithm model :
- The data dictionary : Describe input data
- Data conversion : Defines the way of raw data preprocessing , For example, standardization 、 Missing value processing 、 Generation of dummy variables
- Model definition : Model type and parameters , For example, the split node of the tree model
- Model output : The output of the model
It can be seen that , By defining PMML All core elements in , It can complete all processes of data mining , That is, when backend developers deploy , Just read the data , call PMML file , Then you can get the output data , Instead of focusing on data transformation 、 Problems such as model parameters , It accelerates the deployment efficiency of the model .
Two 、PMML Model generation and loading related class libraries
PMML The library related to model generation needs to see the offline training library we use . If we use sklearn, Then you can use sklearn2pmml This python Library to generate model files ; If the training model is xgboost, You can also use nyoka In the library xgboost_to_pmml export pmml Model file ;
load PMML The model needs target environment support PMML Loaded Libraries , If it is JAVA, You can use JPMML To load the PMML Model file ; If it is python, Use pypmml To load the model file ;
3、 ... and 、PMML Summary and reflection
- PMML To meet the cross platform , At the expense of many platform specific optimizations , So many times we use our own algorithm library to save the model API The resulting model file , More than generated PMML The model file is much smaller . meanwhile PMML The loading speed of the file is also much slower than that of the model file in the unique format of the algorithm library .
- PMML The loaded model is compared with the unique model of algorithm library , There will be a little deviation in the forecast , Of course, this deviation is not big . For example, a certain sample , use sklearn The decision tree model is predicted to be a category 1, But if we make this decision tree into a PMML file , And use JAVA After loading , Continue to predict the sample just now , There is a small probability that the predicted result is not a category 1.
- For oversized models , For example, large-scale integrated learning model , such as xgboost, Random forests , perhaps tensorflow, Generated PMML It's easy to get a few files G, Even on T( According to practical experience , The larger the training data set ,pmml The larger the model file is ), Use this time PMML File load prediction will be very slow , At this point, it is recommended to build a proprietary environment for the model , There's no need to think about cross platform .
Reference resources :
https://cloud.tencent.com/developer/article/1596754
https://zhuanlan.zhihu.com/p/30378213
https://zhuanlan.zhihu.com/p/458117655
边栏推荐
- "Shandong University mobile Internet development technology teaching website construction" project training log I
- 『全闪实测』数据库加速解决方案
- 钉钉告警脚本
- 性能优化之趣谈线程池:线程开的越多就越好吗?
- iSCSI vs iSER vs NVMe-TCP vs NVMe-RDMA
- day02 作业之文件权限
- Changed crying, and finally solved cannot read properties of undefined (reading 'parsecomponent')
- 关于Flow的原理解析
- DCAT batch operation popup and parameter transfer
- Spring, summer, autumn and winter with Miss Zhang (3)
猜你喜欢

Flutter正在被悄悄放弃?浅析Flutter的未来

nacos外置数据库的配置与使用

MySql统计函数COUNT详解

Changed crying, and finally solved cannot read properties of undefined (reading 'parsecomponent')

微信内置浏览器禁止缓存的问题

Simple optimization of interesting apps for deep learning (suitable for novices)

Realize the scheduled backup of MySQL database in Linux environment through simple script (mysqldump command backup)

Intelligent security of the fifth space ⼤ real competition problem ----------- PNG diagram ⽚ converter

mysql插入百万数据(使用函数和存储过程)

How to PR an open source composer project
随机推荐
Training log 6 of the project "construction of Shandong University mobile Internet development technology teaching website"
Training log 7 of the project "construction of Shandong University mobile Internet development technology teaching website"
Win10+opencv3.2+vs2015 configuration
ANR优化:导致 OOM 崩溃及相对应的解决方案
How to obtain openid of wechat applet in uni app project
与张小姐的春夏秋冬(1)
How does PHP generate QR code?
Use of xtrabackup
Move protocol global health declaration, carry out the health campaign to the end
Lock lock of concurrent programming learning notes and its implementation basic usage of reentrantlock, reentrantreadwritelock and stampedlock
并发编程学习笔记 之 ReentrantLock实现原理的探究
NIFI 改UTC时间为CST时间
Research and implementation of flash loan DAPP
Flink connector Oracle CDC 实时同步数据到MySQL(Oracle19c)
Gluster集群管理小分析
Breaking through the hardware bottleneck (I): the development of Intel Architecture and bottleneck mining
nacos外置数据库的配置与使用
D3.JS 纵向关系图(加箭头,连接线文字描述)
『全闪实测』数据库加速解决方案
Ribbon learning notes II