当前位置:网站首页>[ml] PMML of machine learning model -- Overview
[ml] PMML of machine learning model -- Overview
2022-07-29 06:00:00 【Machines don't learn I learn】
The application of machine learning model generally goes through two main processes :
- 1、 Offline development
- 2、 Online deployment
The offline part is responsible for model training and model export , The online part is responsible for importing models and making predictions .

The above pictures are from :https://zhuanlan.zhihu.com/p/30378213
One 、PMML Brief introduction
PMML(Predictive Model Markup Language,PMML)) It's a set of bases XML standard , Platform and environment independent model representation language . It is mainly through XML schema Defines and stores the core elements of an algorithm model :
- The data dictionary : Describe input data
- Data conversion : Defines the way of raw data preprocessing , For example, standardization 、 Missing value processing 、 Generation of dummy variables
- Model definition : Model type and parameters , For example, the split node of the tree model
- Model output : The output of the model
It can be seen that , By defining PMML All core elements in , It can complete all processes of data mining , That is, when backend developers deploy , Just read the data , call PMML file , Then you can get the output data , Instead of focusing on data transformation 、 Problems such as model parameters , It accelerates the deployment efficiency of the model .
Two 、PMML Model generation and loading related class libraries
PMML The library related to model generation needs to see the offline training library we use . If we use sklearn, Then you can use sklearn2pmml This python Library to generate model files ; If the training model is xgboost, You can also use nyoka In the library xgboost_to_pmml export pmml Model file ;
load PMML The model needs target environment support PMML Loaded Libraries , If it is JAVA, You can use JPMML To load the PMML Model file ; If it is python, Use pypmml To load the model file ;
3、 ... and 、PMML Summary and reflection
- PMML To meet the cross platform , At the expense of many platform specific optimizations , So many times we use our own algorithm library to save the model API The resulting model file , More than generated PMML The model file is much smaller . meanwhile PMML The loading speed of the file is also much slower than that of the model file in the unique format of the algorithm library .
- PMML The loaded model is compared with the unique model of algorithm library , There will be a little deviation in the forecast , Of course, this deviation is not big . For example, a certain sample , use sklearn The decision tree model is predicted to be a category 1, But if we make this decision tree into a PMML file , And use JAVA After loading , Continue to predict the sample just now , There is a small probability that the predicted result is not a category 1.
- For oversized models , For example, large-scale integrated learning model , such as xgboost, Random forests , perhaps tensorflow, Generated PMML It's easy to get a few files G, Even on T( According to practical experience , The larger the training data set ,pmml The larger the model file is ), Use this time PMML File load prediction will be very slow , At this point, it is recommended to build a proprietary environment for the model , There's no need to think about cross platform .
Reference resources :
https://cloud.tencent.com/developer/article/1596754
https://zhuanlan.zhihu.com/p/30378213
https://zhuanlan.zhihu.com/p/458117655
边栏推荐
- DCAT batch operation popup and parameter transfer
- Nailing alarm script
- mysql在查询字符串类型的时候带单引号和不带的区别和原因
- 中海油集团,桌面云&网盘存储系统应用案例
- July 28 ens/usd Value Forecast: ENS attracts huge profits
- xSAN高可用—XDFS与SAN融合焕发新生命力
- 裸金属云FASS高性能弹性块存储解决方案
- D3.JS 纵向关系图(加箭头,连接线文字描述)
- Spring, summer, autumn and winter with Miss Zhang (3)
- Training log 4 of the project "construction of Shandong University mobile Internet development technology teaching website"
猜你喜欢

与张小姐的春夏秋冬(2)

Show profiles of MySQL is used.

Detailed steps of JDBC connection to database

Flink connector Oracle CDC 实时同步数据到MySQL(Oracle19c)

"Shandong University mobile Internet development technology teaching website construction" project training log I

datax安装

Plato farm is expected to further expand its ecosystem through elephant swap

“山东大学移动互联网开发技术教学网站建设”项目实训日志五

Ribbon learning notes II

FFmpeg创作GIF表情包教程来了!赶紧说声多谢乌蝇哥?
随机推荐
How to obtain openid of wechat applet in uni app project
Detailed steps of JDBC connection to database
XDFS&空天院HPC集群典型案例
并发编程学习笔记 之 工具类CountDownLatch、CyclicBarrier详解
Most PHP programmers don't understand how to deploy safe code
C# 判断用户是手机访问还是电脑访问
突破硬件瓶颈(一):Intel体系架构的发展与瓶颈挖掘
Thinkphp6 output QR code image format to solve the conflict with debug
Print out all prime numbers between 1-100
在uni-app项目中,如何实现微信小程序openid的获取
SQL repair duplicate data
有价值的博客、面经收集(持续更新)
关于Flow的原理解析
Idea using JDBC to connect mysql database personal detailed tutorial
Lock lock of concurrent programming learning notes and its implementation basic usage of reentrantlock, reentrantreadwritelock and stampedlock
Android studio login registration - source code (connect to MySQL database)
nacos外置数据库的配置与使用
【数据库】数据库课程设计一一疫苗接种数据库
Refresh, swagger UI theme changes
Detailed explanation of MySQL statistical function count