With the rising pattern in the direction of deep studying strategies in AI, there are various investments in accelerating neural community fashions utilizing GPUs and different specialised {hardware}. Nevertheless, many fashions utilized in manufacturing are nonetheless primarily based on conventional machine studying libraries or generally a mixture of conventional machine studying (ML) and DNNs. We’ve beforehand shared the efficiency good points that ONNX Runtime supplies for widespread DNN fashions comparable to BERT, quantized GPT-2, and other Huggingface Transformer models. Now, by using Hummingbird with ONNX Runtime, it’s also possible to seize the advantages of GPU acceleration for conventional ML fashions.

This functionality is enabled via the just lately added integration of Hummingbird with the LightGBM converter in ONNXMLTools, an open supply library that may convert fashions to the interoperable ONNX format. LightGBM is a gradient boosting framework that makes use of tree-based studying algorithms, designed for quick coaching pace and low reminiscence utilization. By merely setting a flag, you may feed a LightGBM mannequin to the converter to supply an ONNX mannequin that makes use of neural community operators somewhat than conventional ML. This Hummingbird integration permits customers of LightGBM to make the most of the GPU accelerations sometimes solely out there for neural networks.

What’s Hummingbird?

Hummingbird is a library for changing conventional ML operators to tensors, with the purpose of accelerating inference (scoring/prediction) for conventional machine studying fashions. You’ll be able to be taught extra about Hummingbird in our introductory blog post, however we’ll current a brief abstract right here.

  • Conventional ML libraries and toolkits are normally developed to run in CPU environments. For instance, LightGBM doesn’t help utilizing GPU for inference, just for coaching. Conventional ML fashions (comparable to DecisionTrees and LinearRegressors) additionally don’t help {hardware} acceleration.
  • Hummingbird addresses this hole and permits customers to seamlessly leverage {hardware} acceleration with out having to re-engineer their fashions. That is finished by reconfiguring algorithmic operators within the conventional ML pipelines such that we will carry out computations that are amenable to GPU execution.
  • Hummingbird is aggressive and even outperforms hand-crafted kernels on micro-benchmarks, whereas enabling seamless end-to-end acceleration of ML pipelines. We’ll present an instance of this speedup under.

Why use ONNX Runtime?

The mixing of Hummingbird with ONNXMLTools permits customers to make the most of the flexibleness and efficiency advantages of ONNX Runtime. ONNX Runtime supplies a constant API throughout platforms and architectures with APIs in Python, C++, C#, Java, and extra. This enables fashions skilled in Python for use in quite a lot of manufacturing environments. ONNX Runtime additionally supplies an abstraction layer for {hardware} accelerators, comparable to Nvidia CUDA and TensorRT, Intel OpenVINO, Home windows DirectML, and others. This provides customers the flexibleness to deploy on their {hardware} of alternative with minimal modifications to the runtime integration and no modifications within the transformed mannequin.

Whereas ONNX Runtime does natively help each DNNs and conventional ML fashions, the Hummingbird integration supplies efficiency enhancements by utilizing the neural community type of LightGBM fashions for inferencing. This can be notably helpful for these already using GPUs for the acceleration of different DNNs. Let’s check out this in motion.

Code and efficiency


import numpy as np
import lightgbm as lgb
import timeit
import onnxruntime as ort
from onnxmltools.convert import convert_lightgbm
from onnxconverter_common.data_types import FloatTensorType

Create some random knowledge for binary classification

max_depth = 8
num_classes = 2
n_estimators = 1000
n_features = 30
n_fit = 1000
n_pred= 10000
X = np.random.rand(n_fit, n_features)
X = np.array(X, dtype=np.float32)
y = np.random.randint(num_classes, measurement=n_fit)
test_data = np.random.rand(n_pred, n_features).astype('float32')

Create and practice a LightGBM mannequin

mannequin = lgb.LGBMClassifier(n_estimators=n_estimators, max_depth=max_depth, pred_early_stop=False)
mannequin.match(X, y)

Use ONNXMLTOOLS to transform the mannequin to ONNXML

input_types = [("input", FloatTensorType([n_pred, n_features))] # Outline the inputs for the ONNX
onnx_ml_model = convert_lightgbm(mannequin, initial_types=input_types)

Predict with LightGBM

lgbm_time = timeit.timeit("mannequin.predict_proba(test_data)", quantity=7, 
                          setup="from __main__ import mannequin, test_data")
print("LightGBM (CPU): ".format(num_classes, max_depth, n_estimators, lgbm_time))

Predict with ONNX ML mannequin

sessionml = ort.InferenceSession(onnx_ml_model.SerializeToString())
onnxml_time = timeit.timeit("sessionml.run( [sessionml.get_outputs()[1].identify],  
                             sessionml.get_inputs()[0].identify: test_data )", 
                            quantity=7, setup="from __main__ import sessionml, test_data")
print("LGBM->ONNXML (CPU): ".format(num_classes, max_depth, n_estimators, onnxml_time))

The result’s the next:

LightGBM (CPU): 1.1157575770048425
LGBM->ONNXML (CPU) 1.0180995319969952

Not unhealthy! Now let’s see Hummingbird in motion. The one change to the conversion code above is the addition of without_onnx_ml=True

Use ONNXMLTOOLS to generate an ONNX (mannequin with none ML operator) utilizing Hummingbird

input_types = [("input", FloatTensorType([n_pred, n_features))] # Outline the inputs for the ONNX
onnx_model = convert_lightgbm(mannequin, initial_types=input_types, without_onnx_ml=True)

We are able to now pip set up onnxruntime-gpu and run the prediction over the onnx_model:

Predict with the ONNX mannequin (on GPU)

sess_options = ort.SessionOptions()
session = ort.InferenceSession(onnx_model.SerializeToString(), sess_options)
onnx_time = timeit.timeit("session.run( [session.get_outputs()[1].identify], session.get_inputs()[0].identify:
                            test_data )", quantity=7, setup="from __main__ import session, test_data")
print("LGBM->ONNX (GPU): ".format(onnx_time))

And we get:

LGBM->ONNXML->ONNX (GPU): 0.2364534509833902

There’s an approximate 5x enchancment over the CPU implementation. Moreover, the ONNX mannequin can make the most of any extra optimizations out there in future releases of ORT, and it might probably run on any {hardware} accelerator supported by ORT.

Going ahead

Hummingbird at present helps converters for ONNX, scikit-learn, XGBoost, and LightGBM. Sooner or later, we plan to offer related options for different converters within the ONNXMLTools household, comparable to XGBoost and scikit-learn. If there are extra operators or integrations you want to see, please file an issue. We might love to listen to about how Hummingbird can assist speed-up your workloads and we stay up for including extra options!

Leave a Reply

Your email address will not be published. Required fields are marked *