Test UNIQUE with a Random Forest regressor on the California Housing dataset

Test UNIQUE with a Random Forest regressor on the California Housing dataset#

In this Notebook, we showcase how to use UNIQUE to assess the uncertainty quantification methods for a random forest (RF) regressor and a multilayer perceptron (MLP), both trained on the California Housing dataset provided by scikit-learn package.

As uncertainty quantification methods based on models, we derive the variance of the predictions from 8 trees for the RF and the Monte Carlo dropout for the MLP.

The UNIQUE pipeline has been set with the following parameters:

  • Regression Task

  • UQ metrics:

    • Ensemble Variance (from the property model output)

    • ManhattanDistance

    • EuclideanDistance

  • Error model:

    • UniqueRandomForestRegressor

    • UniqueLASSO

UNIQUE Input data generation#

import json
import os
import yaml
from pathlib import Path

import numpy as np
import pandas as pd

# Install torch to prepare the California Housing data
try:
    import torch
except ImportError:
    # Replace with conda if mamba not available
    # %mamba install pytorch::pytorch -y
    %pip install torch --index-url https://download.pytorch.org/whl/cpu

from unique import Pipeline

# Set the project's directory
PROJECT_PATH = os.environ.get("PROJECT_PATH", os.path.abspath("")) # ALTERNATIVELY, REPLACE `os.path.abspath("")` WITH YOUR PATH TO THE SYNTHETIC EXAMPLE FOLDER
%cd $PROJECT_PATH

from preparation import SyntheticDataExamplePreparation
Looking in indexes: https://download.pytorch.org/whl/cpu
Collecting torch
Downloading https://download.pytorch.org/whl/cpu/torch-2.4.1%2Bcpu-cp312-cp312-linux_x86_64.whl (194.8 MB)
?25l     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/194.8 MB ? eta -:--:--
     ━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.9/194.8 MB 300.4 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 128.7/194.8 MB 326.2 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 194.8/194.8 MB 336.0 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 194.8/194.8 MB 336.0 MB/s eta 0:00:01
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.8/194.8 MB 205.5 MB/s eta 0:00:00
?25h
Requirement already satisfied: filelock in /home/runner/work/UNIQUE/UNIQUE/.conda/unique/lib/python3.12/site-packages (from torch) (3.16.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /home/runner/work/UNIQUE/UNIQUE/.conda/unique/lib/python3.12/site-packages (from torch) (4.12.2)
Collecting sympy (from torch)
  Downloading https://download.pytorch.org/whl/sympy-1.12-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 182.0 MB/s eta 0:00:00
?25h
Collecting networkx (from torch)
Downloading https://download.pytorch.org/whl/networkx-3.2.1-py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 157.6 MB/s eta 0:00:00
?25hRequirement already satisfied: jinja2 in /home/runner/work/UNIQUE/UNIQUE/.conda/unique/lib/python3.12/site-packages (from torch) (3.1.3)
Collecting fsspec (from torch)
  Downloading https://download.pytorch.org/whl/fsspec-2024.2.0-py3-none-any.whl (170 kB)
Requirement already satisfied: setuptools in /home/runner/work/UNIQUE/UNIQUE/.conda/unique/lib/python3.12/site-packages (from torch) (75.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/runner/work/UNIQUE/UNIQUE/.conda/unique/lib/python3.12/site-packages (from jinja2->torch) (2.1.5)
Collecting mpmath>=0.19 (from sympy->torch)
  Downloading https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl (536 kB)
?25l     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/536.2 kB ? eta -:--:--
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 71.8 MB/s eta 0:00:00
?25h
Installing collected packages: mpmath, sympy, networkx, fsspec, torch
Successfully installed fsspec-2024.2.0 mpmath-1.3.0 networkx-3.2.1 sympy-1.12 torch-2.4.1+cpu
Note: you may need to restart the kernel to use updated packages.
/home/runner/work/UNIQUE/UNIQUE/notebooks/california_housing
unique_input_data_path = f'{PROJECT_PATH}/unique_input_data.json'

s = SyntheticDataExamplePreparation()

if Path(unique_input_data_path).is_file():
    print('UNIQUE input data already generated.')
else:
    train_indices, val_indices, test_indices, \
    X_train, X_val, X_test, y_train, y_val, y_test, \
    rf_all_predictions, rf_variances, \
    mlp_all_predictions, mlp_variances = s.run()
    
    unique_dict = {
        'ID': np.concatenate([train_indices, val_indices, test_indices]).tolist(),
        'feature': [X_train[i, :].tolist() for i in range(X_train.shape[0])] +
                    [X_val[i, :].tolist() for i in range(X_val.shape[0])] +
                    [X_test[i, :].tolist() for i in range(X_test.shape[0])],
        'label': np.concatenate([y_train, y_val, y_test]).tolist(),
        'which_set': ['TRAIN'] * len(train_indices) +
                    ['CALIBRATION'] * len(val_indices) +
                    ['TEST'] * len(test_indices),
        'rf_predictions': rf_all_predictions.tolist(),
        'rf_variances': rf_variances,
        'mlp_predictions': mlp_all_predictions.detach().squeeze().numpy().tolist(),
        'mlp_variances': mlp_variances.tolist()
    }


    with open(unique_input_data_path, 'w') as f:
        json.dump(unique_dict, f)
RF training...
Random Forest Test performance:
MAE: 0.3653
RMSE: 0.5515
R2: 0.7729

MLP training...
Epoch: 0 | train_loss: 1.219 | val_loss: 0.450
Epoch: 1 | train_loss: 0.674 | val_loss: 0.413
Epoch: 2 | train_loss: 0.611 | val_loss: 0.378
Epoch: 3 | train_loss: 0.582 | val_loss: 0.391
Epoch: 4 | train_loss: 0.539 | val_loss: 0.357
Epoch: 5 | train_loss: 0.518 | val_loss: 0.367
Epoch: 6 | train_loss: 0.494 | val_loss: 0.364
Epoch: 7 | train_loss: 0.477 | val_loss: 0.338
Epoch: 8 | train_loss: 0.456 | val_loss: 0.330
Epoch: 9 | train_loss: 0.449 | val_loss: 0.325
MLP Test performance:
MAE: 0.3879
RMSE: 0.5503
R2: 0.7739

Collecting RF variance...
Dropout Monte Carlo...:   0%|          | 0/100 [00:00<?, ?it/s]
Dropout Monte Carlo...:   9%|▉         | 9/100 [00:00<00:01, 86.37it/s]
Dropout Monte Carlo...:  18%|█▊        | 18/100 [00:00<00:00, 86.32it/s]
Dropout Monte Carlo...:  27%|██▋       | 27/100 [00:00<00:00, 85.91it/s]
Dropout Monte Carlo...:  36%|███▌      | 36/100 [00:00<00:00, 86.41it/s]
Dropout Monte Carlo...:  45%|████▌     | 45/100 [00:00<00:00, 86.76it/s]
Dropout Monte Carlo...:  54%|█████▍    | 54/100 [00:00<00:00, 86.99it/s]
Dropout Monte Carlo...:  63%|██████▎   | 63/100 [00:00<00:00, 86.69it/s]
Dropout Monte Carlo...:  72%|███████▏  | 72/100 [00:00<00:00, 86.79it/s]
Dropout Monte Carlo...:  81%|████████  | 81/100 [00:00<00:00, 86.81it/s]
Dropout Monte Carlo...:  90%|█████████ | 90/100 [00:01<00:00, 86.76it/s]
Dropout Monte Carlo...:  99%|█████████▉| 99/100 [00:01<00:00, 84.73it/s]
Dropout Monte Carlo...: 100%|██████████| 100/100 [00:01<00:00, 85.99it/s]

UNIQUE Pipeline#

To evaluate the UQ methods of interest, including an additional set of UQ methods generated by the UNIQUE pipeline, you can run the fit() method of the pipeline. This will allow you to assess their performance using three main UQ evaluation types: Ranking, Proper scoring rules, and Calibration curves.

The summary tables provide scores for each UQ method based on a set of UQ evaluation metrics that are indicative of each evaluation type. The UQ method with the highest score is highlighted in green, indicating it as the best performing method.

Following the summary tables, you will find individual plots showcasing the performance of the best UQ methods.

Additionally, you can explore the summary plots generated for all the evaluated UQ methods, providing a comprehensive overview of their performance.

def overwrite_paths(yaml_file: str, project_path: str, input_data_file: str = "unique_input_data.json"):
	"""Given a yaml UNIQUE config file, overwrite the `data_path` and `output_path` fields."""
	# Use ruamel.yaml to preserve comments
	from ruamel.yaml import YAML
	yaml = YAML()

	# Read
	with open(yaml_file, "r") as f:
		# If you want the equivalent of yaml.safe_load use `typ="safe"`
		config = yaml.load(f) # defaults to `typ="rt"` (round-trip) argument. 

	# Overwrite
	config["data_path"] = os.path.join(project_path, input_data_file)
	config["output_path"] = os.path.join(project_path, "output")

	# Save
	with open(yaml_file, "w") as f:
		yaml.dump(config, f)
config_file = f'{PROJECT_PATH}/config_mlp.yaml'

# Replace `data_path` and `output_path` to be able run the notebook automatically
overwrite_paths(config_file, PROJECT_PATH) # COMMENT TO DISABLE OVERWRITING

pipeline = Pipeline.from_config(config_file)

# Compute UQ metrics, train error models (if any), evaluate UQ metrics
output, eval_results = pipeline.fit()
[2024-10-04 18:15:00] | [UNIQUE - INFO]: ************************ UNIQUE - INITIALIZING PIPELINE ************************
[2024-10-04 18:15:00] | [UNIQUE - INFO]: Loaded Pipeline configuration from: 'config_mlp.yaml'
[2024-10-04 18:15:00] | [UNIQUE - INFO]: Loading data from 'unique_input_data.json'...
[2024-10-04 18:15:00] | [UNIQUE - INFO]: Dataset with 20640 entries correctly loaded.
[2024-10-04 18:15:00] | [UNIQUE - INFO]: UQ inputs initialization...
[2024-10-04 18:15:00] | [UNIQUE - INFO]: UQ inputs summary: 
                                             1. [Data-Based Feature] Column: 'feature' | UQ methods to compute: Manhattan Distance, Euclidean Distance
                                             2. [Model-Based Feature] Column: 'mlp_variances' | UQ methods to compute: Ensemble Variance
[2024-10-04 18:15:00] | [UNIQUE - INFO]: Selected error model(s): 
                                             1. UniqueRandomForestRegressor
                                             2. UniqueLASSO
[2024-10-04 18:15:00] | [UNIQUE - INFO]: ************************ UNIQUE - COMPUTING UQ METHODS *************************
[2024-10-04 18:15:00] | [UNIQUE - INFO]: Computing UQ methods for provided inputs...
[2024-10-04 18:15:00] | [UNIQUE - INFO]: Combining 'base' UQ methods and computing 'transformed' UQ methods...
[2024-10-04 18:15:01] | [UNIQUE - INFO]: Initializing error models...
[2024-10-04 18:15:01] | [UNIQUE - INFO]: Preparing error models inputs...
[2024-10-04 18:15:01] | [UNIQUE - INFO]: Training error models...
[2024-10-04 18:15:06] | [UNIQUE - INFO]: Collected and computed 16 UQ methods.
[2024-10-04 18:15:06] | [UNIQUE - INFO]: Note: UQ method 'SumOfVariancesAndDistances' summed the input variance(s) and the following distances (converted to variances):
                                              1. Dist2Var[EuclideanDistance[feature]]
[2024-10-04 18:15:06] | [UNIQUE - INFO]: ************************ UNIQUE - EVALUATING UQ METHODS ************************
[2024-10-04 18:15:06] | [UNIQUE - INFO]: Evaluating and benchmarking 16 UQ methods by bootstrapping (n=500) on the test set...
[2024-10-04 18:18:52] | [UNIQUE - INFO]: Evaluated 5 UQ methods out of 16...
[2024-10-04 18:23:35] | [UNIQUE - INFO]: Evaluated 10 UQ methods out of 16...
[2024-10-04 18:29:21] | [UNIQUE - INFO]: Evaluated 15 UQ methods out of 16...
[2024-10-04 18:30:30] | [UNIQUE - INFO]: Evaluated 16 UQ methods out of 16.
[2024-10-04 18:30:31] | [UNIQUE - INFO]: Generating summary tables...
[2024-10-04 18:30:31] | [UNIQUE - INFO]: Summary evaluation tables saved to: .../california_housing/output/summary.
[2024-10-04 18:30:42] | [UNIQUE - INFO]: Generating summary plots...
[2024-10-04 18:30:56] | [UNIQUE - INFO]: Summary plots saved to: .../california_housing/output/summary.
[2024-10-04 18:30:56] | [UNIQUE - INFO]: Summary of best UQ method for each UQ evaluation type:
                                             1. [TEST] RankingBasedEvaluation: UniqueRandomForestRegressor[feature+UQmetrics+predictions](l1)
                                             2. [TEST] CalibrationBasedEvaluation: Dist2Var[EuclideanDistance[feature]]
                                             3. [TEST] ProperScoringRulesEvaluation: Dist2Var[ManhattanDistance[feature]]
[2024-10-04 18:30:56] | [UNIQUE - INFO]: ********************************* UNIQUE - END *********************************
[2024-10-04 18:30:56] | [UNIQUE - INFO]: Time elapsed: 00h:15m:56s
  UQ Method Subset AUC Difference: UQ vs. True Error Spearman Correlation Decreasing Coefficient Performance Drop: High UQ vs. Low UQ (3-Bins) Increasing Coefficient Performance Drop: All vs. Low UQ (10-Bins) Performance Drop: High UQ vs. Low UQ (10-Bins) Performance Drop: All vs. Low UQ (3-Bins)
0 ManhattanDistance[feature] TEST 4.342 -0.147 2.616 1.011 2.706 0.986 1.044 0.997
1 EuclideanDistance[feature] TEST 4.221 -0.122 3.304 1.017 2.273 0.991 1.029 1.002
2 EnsembleVariance[mlp_variances] TEST 7.595 -0.593 0.272 0.859 4.468 0.832 0.809 0.925
3 Diff5NN[ManhattanDistance[feature], EnsembleVariance[mlp_variances]] TEST 6.194 -0.375 0.110 0.902 4.455 0.936 0.898 0.952
4 Diff5NN[ManhattanDistance[feature], predictions] TEST 3.778 0.075 4.432 1.034 0.825 1.015 1.092 1.012
5 Diff5NN[EuclideanDistance[feature], EnsembleVariance[mlp_variances]] TEST 6.149 -0.369 0.102 0.904 4.432 0.940 0.899 0.953
6 Diff5NN[EuclideanDistance[feature], predictions] TEST 3.776 0.080 4.460 1.038 0.674 1.014 1.092 1.015
7 Dist2Var[ManhattanDistance[feature]] TEST 4.342 -0.147 2.616 1.011 2.706 0.986 1.044 0.997
8 Dist2Var[EuclideanDistance[feature]] TEST 4.221 -0.122 3.304 1.017 2.273 0.991 1.029 1.002
9 SumOfVariances[Dist2Var[EuclideanDistance[feature]]] TEST 7.557 -0.597 0.272 0.857 4.468 0.851 0.828 0.924
10 UniqueRandomForestRegressor[feature+UQmetrics+predictions](l1) TEST 0.023 0.992 4.468 1.268 0.000 1.195 1.539 1.126
11 UniqueRandomForestRegressor[UQmetrics+predictions](l1) TEST 0.052 0.984 4.468 1.266 0.000 1.194 1.536 1.125
12 UniqueRandomForestRegressor[transformedUQmetrics+predictions](l1) TEST 0.051 0.984 4.468 1.266 0.000 1.194 1.536 1.125
13 UniqueLASSO[feature+UQmetrics+predictions](l1) TEST 0.031 0.990 4.468 1.268 0.000 1.195 1.538 1.126
14 UniqueLASSO[UQmetrics+predictions](l1) TEST 0.061 0.981 4.468 1.266 0.000 1.193 1.534 1.125
15 UniqueLASSO[transformedUQmetrics+predictions](l1) TEST 0.081 0.976 4.468 1.264 0.000 1.192 1.531 1.124
  UQ Method Subset MACE RMSCE
0 EnsembleVariance[mlp_variances] TEST 0.381 0.476
1 Dist2Var[ManhattanDistance[feature]] TEST 0.314 0.388
2 Dist2Var[EuclideanDistance[feature]] TEST 0.310 0.383
3 SumOfVariances[Dist2Var[EuclideanDistance[feature]]] TEST 0.379 0.474
4 UniqueRandomForestRegressor[feature+UQmetrics+predictions](l1) TEST 0.400 0.502
5 UniqueRandomForestRegressor[UQmetrics+predictions](l1) TEST 0.400 0.502
6 UniqueRandomForestRegressor[transformedUQmetrics+predictions](l1) TEST 0.400 0.502
7 UniqueLASSO[feature+UQmetrics+predictions](l1) TEST 0.400 0.502
8 UniqueLASSO[UQmetrics+predictions](l1) TEST 0.400 0.502
9 UniqueLASSO[transformedUQmetrics+predictions](l1) TEST 0.400 0.502
  UQ Method Subset NLL CheckScore CRPS IntervalScore
0 EnsembleVariance[mlp_variances] TEST 5.965 20.093 39.794 187.592
1 Dist2Var[ManhattanDistance[feature]] TEST 5.517 18.278 36.196 157.577
2 Dist2Var[EuclideanDistance[feature]] TEST 5.517 18.278 36.197 157.589
3 SumOfVariances[Dist2Var[EuclideanDistance[feature]]] TEST 5.604 18.497 36.633 173.948
4 UniqueRandomForestRegressor[feature+UQmetrics+predictions](l1) TEST 32.847 27.800 55.416 503.113
5 UniqueRandomForestRegressor[UQmetrics+predictions](l1) TEST 32.887 27.802 55.419 503.188
6 UniqueRandomForestRegressor[transformedUQmetrics+predictions](l1) TEST 32.886 27.802 55.419 503.187
7 UniqueLASSO[feature+UQmetrics+predictions](l1) TEST 32.846 27.800 55.416 503.101
8 UniqueLASSO[UQmetrics+predictions](l1) TEST 32.844 27.800 55.415 503.067
9 UniqueLASSO[transformedUQmetrics+predictions](l1) TEST 32.804 27.798 55.411 502.971
../../../_images/73f63ef218eb2da5087af179c2c5de39cae56a8bbd0458c391fd774d8d724cb0.png ../../../_images/f9569325988d4c5aa76e7e19968db1e18b24412e3ce901abb9df4d14ff246d56.png ../../../_images/ccfe2d34b96632a7ff81bfc558f297f0b11cf6acffe072a0a92ab683c9aa936e.png ../../../_images/c021ec1bd8a08772a7501453a18b6e7287ce825fcdb7ddfa41e4fbbf5e5ef568.png ../../../_images/f599d5bc7d758a86a31e5b01ad81948806fdc4df7ec9ceb59960590a3e6f7a47.png ../../../_images/24ff111667ed22e8701a5e09ce71ae9c33edb5ac8578285fba414127b34b0fec.png ../../../_images/ff4c4cbe8ad2aad0bd63a4cfb502d89868d3a2f1189eaa8b08c1cafd4e052c9d.png ../../../_images/4e2b899e95fac5934390a0843ba24b1fde76fab0bef39719d66af12dd507ebd9.png ../../../_images/162b18bfdf03c25321a4b047fdb8ba4246907c60223cfefa5a8002385704acce.png ../../../_images/1dd6039dba84d714ae793ff71e96f59bfed86011768685246273ab901f63ee00.png ../../../_images/a6cb3243d2271ffaa7302a462c65032a582c5db015616a26cb00ae403205ed6e.png ../../../_images/b2810aec228a5c09ee7c9288ebb10065faeb452968b32c12a8ba3494248aa142.png ../../../_images/a465ffbb90aa911f24fd7583fe38d0384688afab42ced061d86ce9b346a59c80.png ../../../_images/1f74842347f3c1e82ed312947650ad0a06bb825a3f384bdf2d9cd85ca47097dd.png ../../../_images/fada5fb8bb35145d8b08ab6631666382107ac9b143184156c22b886af6cb625f.png ../../../_images/4148150084f2befd3de3eb3aba22f767585f6640e338f77cc9fd140cec704153.png ../../../_images/afbdb08bf9f9ea4f1fb7aba67757a03ef5ff6df3af89c1badd6da39ce3c40da8.png ../../../_images/d634c3ca585d53e22227b3737e822159ef36aa9114713622c00e0a1d62236709.png ../../../_images/6354df160f06cbaed512d9992791e27ae2c3c01165045b46c891d98111061690.png ../../../_images/97758dc9531b60ba1d5b654b92affa941adeab1b8ea715825172d17a579be68f.png ../../../_images/5e892a9a062a4110c1acfe0829560678f1c21840469d77c6f8461b9b52bf06c8.png ../../../_images/e21b51d7f5aaad8a2c5305769bba480a096c68907e2c0231b3cb9da88ba2225a.png ../../../_images/4a8acef8043f7339564b7698c1ab099a286fd969ff3694c29b7fa9b4e9ccf32e.png ../../../_images/80ab7fdee183ada281a4bdb2db9195e69849601229d7d2f0c99b6195e87143eb.png ../../../_images/f060c335cd249197ec45a20875164f22c37d60593473701f30c5336ab009743c.png
# Optionally save the computed UQ metrics
pd.DataFrame.from_dict(output).to_csv(pipeline.output_path / "uq_metrics_values.csv", index=False)

# `eval_results` is a dict containing the evaluation data used to generate the plots