当前位置:网站首页>Introduction and use of automatic machine learning framework (flaml, H2O)
Introduction and use of automatic machine learning framework (flaml, H2O)
2022-07-06 11:06:00 【zkkkkkkkkkkkkk】
3.2.2、 Import related libraries
4.2.2、 Import related libraries
4.2.4、 start-up h2o Of jar package
One 、 Introduce
Automatic machine learning (Automl), It is a way to turn traditional machine learning into Automation ,start—end Fully automatic . At present, the popular automatic machine learning frameworks in the market include :Flaml、H20 wait . This chapter records how these two frameworks are used .
Two 、 Data is introduced
The bank transaction flow created by the data refers to . share 23 Column data , Among them is 18 Column characteristic data ,1 Column label data ,4 Column user information data .
3、 ... and 、flaml frame
3.1、flaml brief introduction
Flaml It is an automatic machine learning framework launched by Microsoft , Support custom learners and parameters , It also provides a fast automatic adjustment tool .flaml You can find the accuracy with low computational resources in the customized learner ML Model . It frees users from choosing learners and super parameters . It is very convenient to use .
3.2、 Use flaml
3.2.1、 download flaml library
pip install flaml
3.2.2、 Import related libraries
from flaml import AutoML
from sklearn.datasets import load_iris
from sklearn.datasets import load_boston
import pandas as pd
import sys,logging
from sklearn.metrics import confusion_matrix,classification_report,recall_score,accuracy_score,f1_score,precision_score
from sklearn.preprocessing import MinMaxScaler
from imblearn.over_sampling import RandomOverSampler
import time
3.2.3、 Data processing
# Data path
data_path = r"source/data_jianhang.csv"
input_data_all = pd.read_csv(data_path,encoding="gbk",index_col=0)
# Extract forecast customer information 【 Customer name 、 Customer number 、 Customer account number 】
customer_info = input_data_all.iloc[:,:3]
print(customer_info)
# features
input_data_target = input_data_all[" label "]
input_data_feature = input_data_all.iloc[:, 3:-1]
# selection input_data_all All of the line , From the third column to the last column ( The head is not the tail )
input_data = input_data_all.iloc[:, 3:]
# Fill the blank value with 0
input_data.fillna(0, inplace=True)
# Output the first five lines to view
print(input_data.head())
# Random oversampling for the imbalance of positive and negative samples
f = RandomOverSampler(random_state=0)
data, target = f.fit_resample(input_data.iloc[:,:-1], input_data.iloc[:,-1])
# Data maximum and minimum normalization
data = MinMaxScaler().fit_transform(data)
# Output sample quantity
print(target.to_frame().value_counts())
# Segmentation data X: Feature set y: label
X,y = input_data.iloc[:,:-1],input_data.iloc[:,-1]
X = MinMaxScaler().fit_transform(X)
3.2.4、 call flaml
Looking at the log printed by the program, we can find , Through parameters estimator_list, Calling flaml In the process of ,flaml Automatically compare for us lgbm、rf、xgboost The effect of classifier , Finally, print and use the optimal classifier and parameters to train the model . The whole process is automated , There is no need for artificial operation comparison . This is also a feature of automated machine learning .
t1 = time.time()
# initialization flaml Automated modeling framework
flaml_automl = AutoML()
# Pass in training data x and y Conduct fit Training
flaml_automl.fit(data,target,task='classification',log_file_name="xxx.log",metric="f1",estimator_list = ['lgbm', 'rf', 'xgboost'])
# fit Introduction to common parameters
'''
# X_train=None, Training data feature set
# y_train=None, Training data tag set
# estimator_list = ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'lrl1']
# metric: 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo','f1', 'micro_f1', 'macro_f1', 'log_loss', 'mae', 'mse', 'r2','mape'.
# n_jobs: Pass in an integer , Enable multithreading
# n_splits: Pass in an integer , Number of folds cross verified
# log_file_name: Log output , If you don't want to output logs , Pass in an empty string ’’ that will do
# estimator_list: Model list , Optional 【‘lgbm’,’xgboot’,’xgb_limitdepth’,’catboost’,’rf’,’extra_tree’】, It will eventually output best Model .
# time_budget: The time limit , In seconds . If restricted 10s, Then the optimal model is output in ten seconds . There is no time limit for incoming -1
# sample: Boolean value , Default False. Whether to sample the incoming data .
# early_stop: Boolean value , Default False. If the model search converges , Stop ahead of time .
'''
# flaml Print the optimal model and parameters
'''
[flaml.automl: 03-09 14:52:24] {2694} INFO - retrain lgbm for 1.3s
[flaml.automl: 03-09 14:52:24] {2699} INFO - retrained model: LGBMClassifier(colsample_bytree=0.5716563773446997,
global_max_steps=9223372036854775807,
learning_rate=0.7886932330930241, max_bin=511,
min_child_samples=7, n_estimators=181, num_leaves=1006,
reg_alpha=0.007095760722363662, reg_lambda=0.3005614400342159,
verbose=-1)
[flaml.automl: 03-09 14:52:24] {2077} INFO - fit succeeded
[flaml.automl: 03-09 14:52:24] {2079} INFO - Time taken to find the best model: 23.60042953491211
'''
# Print some results
print(" Elapsed time : ",time.time()-t1)
print(flaml_automl.estimator_list)
print(" The optimal model ",flaml_automl.model)
print(" Optimal parameters ",flaml_automl.best_config)
print(" Training time ",flaml_automl.best_config_train_time)
print(" classifier ",flaml_automl.best_estimator)
print(" Loss ",flaml_automl.best_loss)
# call predict forecast X
y_pred = flaml_automl.predict(X)
# Output forecast results
print(y_pred)
# Print indicators 、 Classification effect
print(" Confusion matrix :\n",confusion_matrix(y,y_pred))
print(" Classified reports :\n",classification_report(y,y_pred))
print(" Recall rate :",recall_score(y,y_pred))
print(" Accuracy rate :",accuracy_score(y,y_pred))
print("f1 The score is :",f1_score(y,y_pred))
print(" Accuracy :",precision_score(y,y_pred))
print(" Total time consumed : ",time.time()-t1)
Four 、h2o frame
4.1、h2o brief introduction
h2o The framework is an open source , Distributed based on java The framework of machine learning .h2o By (h2o.AI) The company developed and released . The website of their company :H2O.ai | AI Cloud Platform.h20 It also supports visual analysis of user tasks .
4.2、h2o Use
4.2.1、 download h2o
pip install h20
4.2.2、 Import related libraries
import h2o
from h2o.automl import H2OAutoML
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
from h2o.grid.grid_search import H2OGridSearch
import numpy as np
import pandas as pd
import time
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix,classification_report,recall_score,accuracy_score,f1_score,precision_score
from imblearn.over_sampling import RandomOverSampler
4.2.3、 Data processing
Same as 3.2.3 The chapters are exactly the same , I'm not going to repeat it here , Direct copy 3.2.3 Chapter data processing code .
4.2.4、 start-up h2o Of jar package
1) Download free : Automatic machine learning h2o start-up jar package - Machine learning document class
2) open cmd, Input java -jar h2o.jar start-up
3) visit web page : http://localhost:54321
4.2.5、 call h2o
##``"DRF"``,``"GLM"``,``"XGBoost"``,``"GBM"``,``"DeepLearning"``,``"StackedEnsemble"``.
# initialization
automl_estimator = H2OAutoML(max_runtime_secs=50,balance_classes=True,exclude_algos=["DeepLearning"],stopping_metric="auc",sort_metric="auc")
# Training
automl_estimator.train(x=train_data_h2o.names[0:-1],y="target",training_frame=train_data_h2o)
print(" Time 2:",time.time()-t1)
# predict To make predictions , And output the prediction results
h2o_result = automl_estimator.predict(test_data_h2o[:-1])[:,0]
print(h2o_result)
# Print indicators
print(" Confusion matrix :\n",confusion_matrix(test_data_h2o[:,-1].as_data_frame(),h2o_result.as_data_frame()))
print(" Classified reports :\n",classification_report(test_data_h2o[:,-1].as_data_frame(),h2o_result.as_data_frame()))
print(" Recall rate :",recall_score(test_data_h2o[:,-1].as_data_frame(),h2o_result.as_data_frame()))
print(" Accuracy rate :",accuracy_score(test_data_h2o[:,-1].as_data_frame(),h2o_result.as_data_frame()))
print("f1 The score is :",f1_score(test_data_h2o[:,-1].as_data_frame(),h2o_result.as_data_frame()))
print(" Accuracy :",precision_score(test_data_h2o[:,-1].as_data_frame(),h2o_result.as_data_frame()))
print(" Time 3:",time.time()-t1)
5、 ... and 、 summary
Because of h2o Less understanding , And so on h2o The code of the framework is not explained too much . Personally, I prefer to use flaml Do automated machine learning . As for the effect, it's still very good , No screenshots are posted here . Those who are interested can practice offline .
边栏推荐
- Detailed reading of stereo r-cnn paper -- Experiment: detailed explanation and result analysis
- Kubernetes - problems and Solutions
- 连接MySQL数据库出现错误:2059 - authentication plugin ‘caching_sha2_password‘的解决方法
- [ahoi2009]chess Chinese chess - combination number optimization shape pressure DP
- Global and Chinese market of transfer switches 2022-2028: Research Report on technology, participants, trends, market size and share
- [recommended by bloggers] background management system of SSM framework (with source code)
- MySQL19-Linux下MySQL的安装与使用
- Yum prompt another app is currently holding the yum lock; waiting for it to exit...
- MySQL master-slave replication, read-write separation
- npm一个错误 npm ERR code ENOENT npm ERR syscall open
猜你喜欢
Valentine's Day is coming, are you still worried about eating dog food? Teach you to make a confession wall hand in hand. Express your love to the person you want
Deoldify项目问题——OMP:Error#15:Initializing libiomp5md.dll,but found libiomp5md.dll already initialized.
【博主推荐】C#MVC列表实现增删改查导入导出曲线功能(附源码)
CSDN问答标签技能树(五) —— 云原生技能树
Solution: log4j:warn please initialize the log4j system properly
连接MySQL数据库出现错误:2059 - authentication plugin ‘caching_sha2_password‘的解决方法
Opencv uses freetype to display Chinese
Postman uses scripts to modify the values of environment variables
Breadth first search rotten orange
【博主推荐】asp.net WebService 后台数据API JSON(附源码)
随机推荐
Navicat 導出錶生成PDM文件
Invalid global search in idea/pychar, etc. (win10)
Global and Chinese market of transfer switches 2022-2028: Research Report on technology, participants, trends, market size and share
[download app for free]ineukernel OCR image data recognition and acquisition principle and product application
Use dapr to shorten software development cycle and improve production efficiency
Attention apply personal understanding to images
CSDN markdown editor
Esp8266 at+cipstart= "", "", 8080 error closed ultimate solution
QT creator create button
Solution: log4j:warn please initialize the log4j system properly
安装numpy问题总结
Data dictionary in C #
自动机器学习框架介绍与使用(flaml、h2o)
Record a problem of raspberry pie DNS resolution failure
安全测试涉及的测试对象
Copie maître - esclave MySQL, séparation lecture - écriture
基于apache-jena的知识问答
MySQL master-slave replication, read-write separation
[reading notes] rewards efficient and privacy preserving federated deep learning
CSDN question and answer module Title Recommendation task (II) -- effect optimization