# Intel® Extension for Scikit-learn ElasticNet for Airlines DepDelay dataset

In [1]:
from timeit import default_timer as timer
from sklearn import metrics
from sklearn.model_selection import train_test_split
import warnings
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import LabelEncoder
from IPython.display import HTML
warnings.filterwarnings('ignore')

### Download the data

In [2]:
x, y = fetch_openml(name='Airlines_DepDelay_10M', return_X_y=True)

### Preprocessing
Let's encode categorical features with LabelEncoder

In [3]:
for col in ['UniqueCarrier', 'Origin', 'Dest']:
    le = LabelEncoder().fit(x[col])
    x[col] = le.transform(x[col])

Split the data into train and test sets

In [4]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((9000000, 9), (1000000, 9), (9000000,), (1000000,))

Normalize the data

In [5]:
from sklearn.preprocessing import StandardScaler
scaler_y = StandardScaler()

In [6]:
y_train = y_train.to_numpy().reshape(-1, 1)
y_test = y_test.to_numpy().reshape(-1, 1)

scaler_y.fit(y_train)
y_train = scaler_y.transform(y_train).ravel()
y_test = scaler_y.transform(y_test).ravel()

### Patch original Scikit-learn with Intel® Extension for Scikit-learn
Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

In [7]:
from sklearnex import patch_sklearn
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the [list of supported algorithms and parameters](https://intel.github.io/scikit-learn-intelex/algorithms.html) for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, [submit an issue on GitHub](https://github.com/intel/scikit-learn-intelex/issues).

Training of the ElasticNet algorithm with Intel® Extension for Scikit-learn for Airlines DepDelay dataset

In [8]:
from sklearn.linear_model import ElasticNet

params = {
    "alpha": 0.3,    
    "fit_intercept": False,
    "l1_ratio": 0.7,
    "random_state": 0,
    "copy_X": False,
}
start = timer()
model = ElasticNet(**params).fit(x_train, y_train)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"

'Intel® extension for Scikit-learn time: 0.28 s'

Predict and get a result of the ElasticNet algorithm with Intel® Extension for Scikit-learn

In [9]:
y_predict = model.predict(x_test)
mse_metric_opt = metrics.mean_squared_error(y_test, y_predict)
f'Patched Scikit-learn MSE: {mse_metric_opt}'

'Patched Scikit-learn MSE: 1.0109113399224974'

### Train the same algorithm with original Scikit-learn
In order to cancel optimizations, we use *unpatch_sklearn* and reimport the class ElasticNet

In [10]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

Training of the ElasticNet algorithm with original Scikit-learn library for Airlines DepDelay dataset

In [11]:
from sklearn.linear_model import ElasticNet

start = timer()
model = ElasticNet(**params).fit(x_train, y_train)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"

'Original Scikit-learn time: 3.96 s'

Predict and get a result of the ElasticNet algorithm with original Scikit-learn

In [12]:
y_predict = model.predict(x_test)
mse_metric_original = metrics.mean_squared_error(y_test, y_predict)
f'Original Scikit-learn MSE: {mse_metric_original}'

'Original Scikit-learn MSE: 1.0109113399545733'

In [13]:
HTML(f"<h3>Compare MSE metric of patched Scikit-learn and original</h3>"
     f"MSE metric of patched Scikit-learn: {mse_metric_opt} <br>"
     f"MSE metric of unpatched Scikit-learn: {mse_metric_original} <br>"
     f"Metrics ratio: {mse_metric_opt/mse_metric_original} <br>"
     f"<h3>With Scikit-learn-intelex patching you can:</h3>"
     f"<ul>"
     f"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>"
     f"<li>Fast execution training and prediction of Scikit-learn models;</li>"
     f"<li>Get the similar quality</li>"
     f"<li>Get speedup in <strong>{(train_unpatched/train_patched):.1f}</strong> times.</li>"
     f"</ul>")