当前位置:网站首页>sklearnex 让你的 sklearn 机器学习模型训练快得飞起?
sklearnex 让你的 sklearn 机器学习模型训练快得飞起?
2022-06-25 11:47:00 【叶庭云】
一、引言
scikit-learn 作为经典的机器学习框架,从诞生至今已发展了十余年,其简洁易用的 API 深受用户的喜欢(fit()、predict()、transform() 等),其他机器学习框架或多或少都会借鉴。但其运算速度一直广受用户的诟病。熟悉 scikit-learn 的朋友应该清楚,scikit-learn 中自带的一些基于 joblib 等库的运算加速功能效果有限,并不能很充分地利用算力。
今天给大家分享一个技巧,可以帮助我们在不改变原有代码的基础上,获得数十倍甚至上千倍的 scikit-learn 运算效率提升。
二、利用 sklearnex 加速 scikit-learn
为了达到加速运算的效果,只需要额外装上 sklearnex 这个库,就可以帮助我们在拥有 Intel 处理器的设备上,获得大幅度的运算效率提升。
这种较新的库,最好创建一个干净的 conda 虚拟环境做实验(免得某些依赖库版本跟 Base 环境里冲突,多一些不必要麻烦)全部命令如下,我们顺便安装jupyterlab作为IDE:
conda create -n sklearnex python=3.8
conda activate sklearnex
conda install jupyter
conda install nb_conda
pip install scikit-learn scikit-learn-intelex -i http://pypi.douban.com/simple --trusted-host pypi.douban.com完成实验环境的准备后,在 jupyter notebook 中编写测试用代码来看看加速效果如何,使用方式很简单,只需要在代码中导入scikit-learn相关功能模块之前,运行下列代码即可:
from sklearnex import patch_sklearn, unpatch_sklearn
patch_sklearn()成功开启加速模式后会打印以下信息:
其他要做的仅仅是将你原本的 scikit-learn 代码在后面继续执行即可,我在自己平时学习和写代码的老款华硕笔记本上简单测试了一下。
以 K-Means 聚类为例,在十万级别样本量的示例数据集上,开启加速后仅耗时 46.84 秒就完成对训练集的训练,而使用 unpatch_sklearn() 强制关闭加速模式后(注意 scikit-learn 相关模块需要重新导入),训练耗时随即上升到 100.52 秒,意味着通过 sklearnex 我们获得了 2 多倍的运算速度提升。
结果如下:
而按照官方的说法,越强劲的 CPU 可以获得的性能提升比例也会更高(就我本地跑 exampls 来看,老的 Intel CPU 加速不够劲儿),下图是官方在 Intel Xeon Platinum 8275CL 处理器下测试了一系列算法后得出的性能提升结果,不仅可以提升训练速度,还可以提升模型推理预测速度,在某些场景下甚至达到数千倍的性能提升:
官方也提供了一些 ipynb 示例:https://github.com/intel/scikit-learn-intelex/tree/master/examples/notebooks
展示了包含K-means、DBSCAN、随机森林、逻辑回归、岭回归等多种常用模型的加速,感兴趣的读者朋友们可以自行去查阅学习。
此外,还可加速 sklearn 在 GPU 上的表现,使用方法类似:
import numpy as np
import dpctl
from sklearnex import patch_sklearn, config_context
patch_sklearn()
from sklearn.cluster import DBSCAN
X = np.array([[1., 2.], [2., 2.], [2., 3.],
[8., 7.], [8., 8.], [25., 80.]], dtype=np.float32)
with config_context(target_offload="gpu:0"):
clustering = DBSCAN(eps=3, min_samples=2).fit(X)参考了:
边栏推荐
- 客从何处来
- Shichuang energy rushes to the scientific innovation board: it plans to raise 1.1 billion yuan, with an annual revenue of 700million yuan and a 36% decrease in net profit
- Wait (), notify (), notifyAll (), sleep (), condition, await (), signal()
- Detailed explanation of Flink checkpoint specific operation process and summary of error reporting and debugging methods
- .Net Core 中使用工厂模式
- How PHP extracts image addresses from strings
- Simple use of stream (II)
- What should I do to dynamically add a column and button to the gird of VFP?
- VFP develops a official account to receive coupons, and users will jump to various target pages after registration, and a set of standard processes will be sent to you
- What are redis avalanche, penetration and breakdown?
猜你喜欢

Xishan technology rushes to the scientific innovation board: it plans to raise 660million yuan. Guoyijun and his wife have 60% of the voting rights

How TCP handles exceptions during three handshakes and four waves

Specific meanings of node and edge in Flink graph

Recommend a virtual machine software available for M1 computer

confluence7.4.X升级实录

Spark history server and event log details

The service layer reports an error. The XXX method invalid bound statement (not found) cannot be found

ROS 笔记(06)— 话题消息的定义和使用

Source code analysis of AQS & reentrantlock

Design and implementation of university laboratory goods management information system based on SSH
随机推荐
Niuke.com: Candy distribution
Dark horse shopping mall ---8 Microservice gateway and JWT token
Thingpanel publie le client mobile IOT (Multi - images)
GC
Research on parallel computing architecture of meteorological early warning based on supercomputing platform
2022 mathematical modeling competition time and registration fee
Evaluating the overall situation of each class in a university based on entropy weight method (formula explanation + simple tool introduction)
.Net Core 中使用工厂模式
使用php脚本查看已开启的扩展
RPC typical framework
Use of JSP sessionscope domain
Where do the guests come from
How terrible is it not to use error handling in VFP?
Is industrial securities a state-owned enterprise? Is it safe to open an account in industrial securities?
Spark history server performance improvement (I) -- Application List
ROS 笔记(06)— 话题消息的定义和使用
Recommend a virtual machine software available for M1 computer
Detailed explanation of Flink checkpoint specific operation process and summary of error reporting and debugging methods
Ladder side tuning: the "wall ladder" of the pre training model
MYSQL中对复杂JSON的更新