A new beginning

2025-11-24 23:15:00 -04:00
commit 59139bab3f
10 changed files with 1070 additions and 0 deletions
--- a/ACENET_HPC_Guide.md
+++ b/ACENET_HPC_Guide.md
@@ -0,0 +1,135 @@
+# Using Compute Canada / ACENET HPC
+
+This guide explains how to connect to the Digital Research Alliance of Canada (Compute Canada) or ACENET clusters, create a working directory in scratch, transfer files with Globus, and submit jobs using SLURM.
+
+## 1. Connect to the HPC via SSH
+
+1. Determine which cluster to use (examples):
+   - Graham: `graham.computecanada.ca`
+   - Cedar: `cedar.computecanada.ca`
+   - Beluga: `beluga.computecanada.ca`
+   - Niagara: `niagara.scinet.utoronto.ca`
+   - ACENET: `login1.acenet.ca`
+
+2. Open a terminal and connect via SSH:
+
+   ```bash
+   ssh username@graham.computecanada.ca
+   ```
+
+3. When prompted, confirm the host fingerprint and enter your password.
+
+---
+
+## 2. Create a Folder in Scratch
+
+Your `$SCRATCH` directory is a temporary workspace for large data and computations. It is purged after 60 days of inactivity.
+
+After logging in:
+
+```bash
+cd $SCRATCH
+mkdir my_project
+cd my_project
+```
+
+Confirm your path:
+
+```bash
+pwd
+# Example output: /scratch/username/my_project
+```
+
+---
+
+## 3. Install and Use Globus for File Transfers
+
+Globus is a fast, reliable tool for large file transfers. It requires a small local agent called **Globus Connect Personal**.
+
+### Install Globus Connect Personal
+
+- **Linux:**  
+  ```bash
+  wget https://downloads.globus.org/globus-connect-personal/linux/stable/globusconnectpersonal-latest.tgz
+  tar xzf globusconnectpersonal-latest.tgz
+  cd globusconnectpersonal*
+  ./globusconnectpersonal -setup
+  ```
+
+- **macOS:**  
+  Download and install from: [https://www.globus.org/globus-connect-personal](https://www.globus.org/globus-connect-personal)
+
+- **Windows:**  
+  Download the installer from the same link and follow the setup wizard.
+
+After installation, your local computer will appear as a **Globus endpoint**.
+
+### Transfer Files
+
+1. Visit [https://app.globus.org](https://app.globus.org) and log in using **Compute Canada credentials**.
+2. In the web app, choose two endpoints:
+   - **Source:** Your local computer or institutional storage.
+   - **Destination:** Your HPC endpoint (for example, *Compute Canada Graham Scratch*).
+3. Navigate to your target scratch folder (`/scratch/username/my_project`).
+4. Select files and click **Start Transfer**.
+
+Globus will handle transfers asynchronously and resume interrupted transfers automatically.
+
+---
+
+## 4. Submit Jobs to ACENET with SLURM
+
+Job submissions use the SLURM scheduler. Create a batch file describing your job resources and commands.
+
+### Example job script (`job.slurm`)
+
+```bash
+#!/bin/bash
+#SBATCH --job-name=my_analysis
+#SBATCH --account=def-yourprof
+#SBATCH --time=2:00:00
+#SBATCH --nodes=1
+#SBATCH --ntasks=4
+#SBATCH --mem=8G
+#SBATCH --output=output_%j.log
+
+module load python/3.11
+source ~/myenv/bin/activate
+
+python my_script.py
+```
+
+### Submit and Monitor Jobs
+
+```bash
+sbatch job.slurm        # Submit job
+squeue -u username      # Check status
+scancel job_id          # Cancel job
+```
+
+### View Results
+
+After completion, check output logs:
+
+```bash
+less output_<jobid>.log
+```
+
+---
+
+## 5. Useful Commands
+
+```bash
+module avail             # List available software modules
+module load python/3.11  # Load a module
+df -h $SCRATCH           # Check scratch usage
+quota -s                 # Check your disk quota
+```
+
+---
+
+## 6. References
+
+- Alliance Docs: [https://docs.alliancecan.ca/wiki/Technical_documentation](https://docs.alliancecan.ca/wiki/Technical_documentation)
+- ACENET Training: [https://www.ace-net.ca/training/](https://www.ace-net.ca/training/)
+- Globus Setup: [https://www.globus.org/globus-connect-personal](https://www.globus.org/globus-connect-personal)
--- a/README.md
+++ b/README.md
@@ -0,0 +1,62 @@
+# Explanation-Aware Optimization and AutoML (DEAP + SHAP Stability)
+
+This project implements an **AutoML framework** that uses **DEAP’s NSGA-II** for multi-objective optimization, balancing **model accuracy** and **SHAP-based stability**.  
+It supports both **classification** and **regression** datasets via OpenML and sklearn.  
+All results are tracked with **MLflow**.
+
+
+---
+
+## 1. Environment Setup (macOS / Linux)
+
+### Create and activate a virtual environment
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install --upgrade pip wheel setuptools
+
+pip install \
+  numpy==1.26.4 \
+  pandas==1.5.3 \
+  scikit-learn==1.3.2 \
+  shap==0.45.0 \
+  deap==1.4.1 \
+  openml==0.14.2 \
+  mlflow==2.11.3 \
+  matplotlib==3.7.5
+```
+## 2. Running Experiments
+Classification: Adult Dataset
+ ```bash
+python run_deap.py \
+  --dataset adult \
+  --generations 5 \
+  --pop-size 24 \
+  --cv-folds 3
+```
+Regression: California Housing Dataset
+
+```bash
+python run_deap.py \
+  --dataset cal_housing \
+  --generations 5 \
+  --pop-size 24 \
+  --cv-folds 3
+```
+Results are saved under:
+
+```bash
+runs/<dataset>/pareto_front.csv
+```
+## 3. Viewing Results in MLflow
+
+```bash
+mlflow ui --backend-store-uri ./mlruns --host 0.0.0.0 --port 5000
+```
+Then open: http://localhost:5000
+
+You can visualize:
+
+MSE-like score (lower is better)
+
+SHAP stability (higher is better)
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,8 @@
+scikit-learn
+openml
+deap
+mlflow
+shap
+numpy
+pandas
+matplotlib
--- a/run_deap.py
+++ b/run_deap.py
@@ -0,0 +1,170 @@
+import argparse
+import random
+import pickle
+from pathlib import Path
+
+import numpy as np
+import mlflow
+from deap import algorithms
+from deap.tools.emo import sortNondominated
+import pandas as pd
+
+from src.data_openml import load_dataset
+from src.search.nsga_deap import build_toolbox, decode
+from src.preprocessing import build_preprocessor
+from src.models import make_model
+from src.stability import compute_shap_matrix
+
+
+def save_checkpoint(path, gen, pop, seed):
+    state = {
+        "gen": gen,
+        "pop": pop,
+        "py_random_state": random.getstate(),
+        "np_random_state": np.random.get_state(),
+        "seed": seed,
+    }
+    with open(path, "wb") as f:
+        pickle.dump(state, f)
+
+
+def load_checkpoint(path):
+    with open(path, "rb") as f:
+        state = pickle.load(f)
+    random.setstate(state["py_random_state"])
+    np.random.set_state(state["np_random_state"])
+    return state["gen"], state["pop"], state["seed"]
+
+
+def main():
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--dataset", required=True, choices=["adult", "cal_housing"])
+    ap.add_argument("--generations", type=int, default=10)
+    ap.add_argument("--pop-size", type=int, default=24)
+    ap.add_argument("--seed", type=int, default=42)
+    ap.add_argument("--cv-folds", type=int, default=3)
+    ap.add_argument("--experiment", default="deap_nsga_shap")
+    ap.add_argument("--checkpoint-every", type=int, default=5)
+    ap.add_argument(
+        "--shap-pf-eval-rows",
+        type=int,
+        default=512,
+        help="Number of rows from the dataset to use when saving SHAP for Pareto models",
+    )
+    args = ap.parse_args()
+
+    # data and experiment
+    X, y, task = load_dataset(args.dataset, random_state=args.seed)
+    mlflow.set_experiment(args.experiment)
+
+    outdir = Path("runs") / args.dataset
+    outdir.mkdir(parents=True, exist_ok=True)
+    ckpt_path = outdir / "checkpoint.pkl"
+
+    # seed RNGs
+    random.seed(args.seed)
+    np.random.seed(args.seed)
+
+    # toolbox for this run
+    toolbox = build_toolbox(
+        X,
+        y,
+        task,
+        seed=args.seed,
+        cv_folds=args.cv_folds,
+        mlflow_experiment=args.experiment,
+    )
+
+    # initial population or resume from checkpoint
+    if ckpt_path.exists():
+        start_gen, pop, loaded_seed = load_checkpoint(ckpt_path)
+        if loaded_seed != args.seed:
+            print(
+                f"Warning: checkpoint seed {loaded_seed} differs from current seed {args.seed}"
+            )
+        print(f"Resuming from checkpoint at generation {start_gen}")
+    else:
+        pop = toolbox.population(n=args.pop_size)
+        fits = list(map(toolbox.evaluate, pop))
+        for ind, fit in zip(pop, fits):
+            ind.fitness.values = fit
+        start_gen = 0
+        save_checkpoint(ckpt_path, start_gen, pop, args.seed)
+        print(f"Initial checkpoint saved at generation {start_gen}")
+
+    # GA loop
+    for gen in range(start_gen, args.generations):
+        offspring = algorithms.varAnd(pop, toolbox, cxpb=0.7, mutpb=0.2)
+        fits = list(map(toolbox.evaluate, offspring))
+        for ind, fit in zip(offspring, fits):
+            ind.fitness.values = fit
+        pop = toolbox.select(pop + offspring, k=args.pop_size)
+
+        if (gen + 1) % args.checkpoint_every == 0:
+            save_checkpoint(ckpt_path, gen + 1, pop, args.seed)
+            print(f"Checkpoint saved at generation {gen + 1}")
+
+    # final Pareto front
+    pf = sortNondominated(pop, len(pop), first_front_only=True)[0]
+    rows = []
+    for ind in pf:
+        algo, model_params, pre_cfg = decode(ind)
+        rows.append(
+            {
+                "algo": algo,
+                "mse_like": ind.fitness.values[0],
+                "stability": ind.fitness.values[1],
+                **{f"m_{k}": v for k, v in model_params.items()},
+                **{f"p_{k}": v for k, v in pre_cfg.items()},
+            }
+        )
+
+    pareto_path = outdir / "pareto_front.csv"
+    pd.DataFrame(rows).to_csv(pareto_path, index=False)
+    print(f"Saved Pareto front to {pareto_path}")
+
+    shap_dir = outdir / "shap"
+    shap_dir.mkdir(exist_ok=True)
+
+    eval_rows = min(args.shap_pf_eval_rows, len(X))
+    rng = np.random.RandomState(args.seed)
+    eval_idx = rng.choice(len(X), size=eval_rows, replace=False)
+    X_eval_shap = X.iloc[eval_idx]
+    y_full = y
+
+    for i, ind in enumerate(pf):
+        algo, model_params, pre_cfg = decode(ind)
+
+        fixed_poly_degree = pre_cfg.get("poly_degree", 1)
+        fixed_k = pre_cfg.get("select_k", None)
+
+        preproc = build_preprocessor(
+            X,
+            task,
+            pre_cfg,
+            fixed_k=fixed_k,
+            fixed_poly_degree=fixed_poly_degree,
+        )
+        model = make_model(task, algo, model_params, random_state=args.seed)
+        from sklearn.pipeline import Pipeline as SkPipeline
+        pipe = SkPipeline([("pre", preproc), ("model", model)])
+
+        shap_vals, t_fit, t_shap, feat_names = compute_shap_matrix(
+            pipe,
+            X_fit=X,
+            y_fit=y_full,
+            X_eval=X_eval_shap,
+            task_type=task,
+            bg_size=128,
+            max_eval_rows=eval_rows,
+            rng_seed=args.seed,
+        )
+
+        np.save(shap_dir / f"pf_{i}_shap_vals.npy", shap_vals)
+        np.save(shap_dir / f"pf_{i}_feat_names.npy", np.asarray(feat_names))
+
+    print(f"Saved SHAP arrays for {len(pf)} Pareto models under {shap_dir}")
+
+
+if __name__ == "__main__":
+    main()
--- a/src/data_openml.py
+++ b/src/data_openml.py
@@ -0,0 +1,16 @@
+from sklearn.datasets import fetch_california_housing, fetch_openml
+
+def load_dataset(name: str, random_state: int = 42):
+    name = name.lower()
+    if name == "cal_housing":
+        ds = fetch_california_housing(as_frame=True)
+        X = ds.data
+        y = ds.target
+        return X, y, "regression"
+    elif name == "adult":
+        ds = fetch_openml(data_id=1590, as_frame=True)  # Adult
+        X = ds.data
+        y = (ds.target == ">50K").astype(int)
+        return X, y, "classification"
+    else:
+        raise ValueError("dataset must be adult or cal_housing")
--- a/src/models.py
+++ b/src/models.py
@@ -0,0 +1,55 @@
+from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier, GradientBoostingRegressor, GradientBoostingClassifier
+from sklearn.neural_network import MLPRegressor, MLPClassifier
+
+def make_model(task, algo, params, random_state=0):
+    if task == "regression":
+        if algo == "rf":
+            return RandomForestRegressor(
+                n_estimators=int(params["n_estimators"]),
+                max_depth=int(params["max_depth"]),
+                max_features=params["max_features"],
+                random_state=random_state,
+                n_jobs=1
+            )
+        elif algo == "gbt":
+            return GradientBoostingRegressor(
+                n_estimators=int(params["n_estimators"]),
+                learning_rate=float(params["learning_rate"]),
+                max_depth=int(params["max_depth"]),
+                random_state=random_state
+            )
+        elif algo == "mlp":
+            return MLPRegressor(
+                hidden_layer_sizes=tuple(params["hidden_layers"]),
+                activation=params["activation"],
+                alpha=float(params["alpha"]),
+                learning_rate_init=float(params["lr_init"]),
+                max_iter=int(params.get("max_iter", 200)),
+                random_state=random_state
+            )
+    else:
+        if algo == "rf":
+            return RandomForestClassifier(
+                n_estimators=int(params["n_estimators"]),
+                max_depth=int(params["max_depth"]),
+                max_features=params["max_features"],
+                random_state=random_state,
+                n_jobs=1
+            )
+        elif algo == "gbt":
+            return GradientBoostingClassifier(
+                n_estimators=int(params["n_estimators"]),
+                learning_rate=float(params["learning_rate"]),
+                max_depth=int(params["max_depth"]),
+                random_state=random_state
+            )
+        elif algo == "mlp":
+            return MLPClassifier(
+                hidden_layer_sizes=tuple(params["hidden_layers"]),
+                activation=params["activation"],
+                alpha=float(params["alpha"]),
+                learning_rate_init=float(params["lr_init"]),
+                max_iter=int(params.get("max_iter", 200)),
+                random_state=random_state
+            )
+    raise ValueError("Unknown algo")
--- a/src/objectives.py
+++ b/src/objectives.py
@@ -0,0 +1,66 @@
+import numpy as np
+from sklearn.pipeline import Pipeline
+from sklearn.model_selection import KFold
+from sklearn.metrics import mean_squared_error, brier_score_loss
+
+from .preprocessing import build_preprocessor
+from .models import make_model
+from .stability import compute_shap_matrix, shap_stability_from_matrices
+
+
+def evaluate_config(X, y, task, algo, model_params, preproc_cfg, cv_folds=3, seed=42):
+    kf = KFold(n_splits=cv_folds, shuffle=True, random_state=seed)
+    losses = []
+
+    # this will store full SHAP matrices and feature names for stability
+    shap_mats_with_names = []
+
+    # pick a fixed evaluation pool for SHAP, same rows and order for all folds
+    rng = np.random.RandomState(seed)
+    max_eval_rows = 1024
+    eval_size = min(max_eval_rows, len(X))
+    eval_idx = rng.choice(len(X), size=eval_size, replace=False)
+    X_eval_fixed = X.iloc[eval_idx]
+
+    # probe preprocessor to compute a safe cap for k
+    fixed_poly_degree = preproc_cfg.get("fixed_poly_degree", preproc_cfg.get("poly_degree", 1))
+    probe_pre = build_preprocessor(X, task, preproc_cfg, fixed_k=None, fixed_poly_degree=fixed_poly_degree)
+    Xp = probe_pre.fit_transform(X, y)
+    n_after_prep = Xp.shape[1]
+    desired_k = preproc_cfg.get("select_k", None)
+    k_cap = None if desired_k is None else int(min(max(1, desired_k), n_after_prep))
+
+    for fold_idx, (tr, te) in enumerate(kf.split(X)):
+        preproc = build_preprocessor(X, task, preproc_cfg, fixed_k=k_cap, fixed_poly_degree=fixed_poly_degree)
+        model = make_model(task, algo, model_params, random_state=seed + fold_idx)
+        pipe = Pipeline([("pre", preproc), ("model", model)])
+
+        # 1) SHAP stability: always use the same X_eval_fixed for all folds
+        shap_vals, t_fit, t_shap, feat_names = compute_shap_matrix(
+            pipe,
+            X_fit=X.iloc[tr],
+            y_fit=y.iloc[tr],
+            X_eval=X_eval_fixed,
+            task_type=task,
+        )
+        shap_mats_with_names.append((shap_vals, feat_names))
+
+        # 2) Loss: still use standard CV split (te) for generalization
+        if task == "regression":
+            y_pred = pipe.predict(X.iloc[te])
+            loss = float(mean_squared_error(y.iloc[te], y_pred))
+        else:
+            if hasattr(pipe.named_steps["model"], "predict_proba"):
+                y_prob = pipe.predict_proba(X.iloc[te])[:, 1]
+            else:
+                scores = pipe.decision_function(X.iloc[te])
+                scores = (scores - scores.min()) / (scores.max() - scores.min() + 1e-8)
+                y_prob = scores
+            loss = float(brier_score_loss(y.iloc[te], y_prob))
+        losses.append(loss)
+
+    # instance level SHAP stability across folds
+    agg_std, stability, per_feat_std, per_inst_std = shap_stability_from_matrices(shap_mats_with_names)
+
+    mse_like = float(np.mean(losses))
+    return mse_like, float(stability), per_feat_std
--- a/src/preprocessing.py
+++ b/src/preprocessing.py
@@ -0,0 +1,138 @@
+from sklearn.compose import ColumnTransformer
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import OneHotEncoder, StandardScaler, RobustScaler, MinMaxScaler, PowerTransformer, PolynomialFeatures
+from sklearn.impute import SimpleImputer
+from sklearn.feature_selection import SelectKBest, f_classif, f_regression
+from sklearn.feature_selection import VarianceThreshold
+from sklearn.base import BaseEstimator, TransformerMixin
+import numpy as np
+
+class SafeSelectK(BaseEstimator, TransformerMixin):
+    def __init__(self, task: str, k=None):
+        self.task = task
+        self.k = k
+        self.selector_ = None
+        self.k_effective_ = None
+        self.support_mask_ = None
+        self.feature_names_in_ = None
+        self.feature_names_out_ = None
+
+    def fit(self, X, y=None):
+        if self.k is None:
+            self.selector_ = "passthrough"
+            self.feature_names_out_ = self.feature_names_in_
+            return self
+        n_feats = X.shape[1]
+        k_eff = int(min(max(1, self.k), n_feats))
+        score_func = f_classif if self.task == "classification" else f_regression
+        sel = SelectKBest(score_func=score_func, k=k_eff).fit(X, y)
+        self.selector_ = sel
+        self.k_effective_ = k_eff
+        mask = np.zeros(n_feats, dtype=bool)
+        mask[sel.get_support(indices=True)] = True
+        self.support_mask_ = mask
+        if self.feature_names_in_ is not None:
+            self.feature_names_out_ = self.feature_names_in_[mask]
+        return self
+
+    def set_feature_names_in(self, names):
+        self.feature_names_in_ = np.asarray(names)
+
+    def transform(self, X):
+        if self.selector_ == "passthrough":
+            return X
+        return self.selector_.transform(X)
+
+    def get_feature_names_out(self, input_features=None):
+        if getattr(self, "feature_names_out_", None) is not None:
+            return self.feature_names_out_
+        if getattr(self, "support_mask_", None) is not None and input_features is not None:
+            input_features = np.asarray(input_features)
+            return input_features[self.support_mask_]
+        return None
+
+
+class ConstantFilter(BaseEstimator, TransformerMixin):
+    def __init__(self, eps=0.0):
+        self.eps = eps
+        self.mask_ = None
+        self.feature_names_in_ = None
+        self.feature_names_out_ = None
+
+    def fit(self, X, y=None):
+        X = np.asarray(X)
+        var = X.var(axis=0)
+        self.mask_ = var > self.eps
+        if self.feature_names_in_ is not None:
+            self.feature_names_out_ = np.asarray(self.feature_names_in_)[self.mask_]
+        return self
+
+    def set_feature_names_in(self, names):
+        self.feature_names_in_ = np.asarray(names)
+
+    def get_feature_names_out(self):
+        if self.feature_names_out_ is not None:
+            return self.feature_names_out_
+        # fallback when names were not set
+        return np.array([f"f{i}" for i, keep in enumerate(self.mask_) if keep])
+
+    def transform(self, X):
+        X = np.asarray(X)
+        return X[:, self.mask_]
+
+
+def build_preprocessor(X_full, task, cfg, fixed_k=None, fixed_poly_degree=None):
+    cat_cols = X_full.select_dtypes(include=["object", "category", "bool"]).columns.tolist()
+    num_cols = [c for c in X_full.columns if c not in cat_cols]
+
+    num_imputer = SimpleImputer(strategy=cfg.get("num_impute_strategy", "median"))
+    cat_imputer = SimpleImputer(strategy=cfg.get("cat_impute_strategy", "most_frequent"))
+
+    scaler_name = cfg.get("scaler", "standard")
+    if scaler_name == "standard":
+        num_scaler = StandardScaler(with_mean=True, with_std=True)
+    elif scaler_name == "robust":
+        num_scaler = RobustScaler()
+    elif scaler_name == "minmax":
+        num_scaler = MinMaxScaler()
+    elif scaler_name == "power":
+        num_scaler = PowerTransformer(method="yeo-johnson")
+    else:
+        num_scaler = "passthrough"
+
+    poly_degree = fixed_poly_degree if fixed_poly_degree is not None else cfg.get("poly_degree", 1)
+    poly = PolynomialFeatures(degree=poly_degree, include_bias=False) if poly_degree > 1 else "passthrough"
+
+    # always fix categories from the full dataset
+    fixed_categories = None
+    if len(cat_cols) > 0:
+        fixed_categories = {c: sorted(X_full[c].dropna().astype(str).unique()) for c in cat_cols}
+
+    ohe_kwargs = dict(handle_unknown="ignore")
+    try:
+        ohe_kwargs["sparse_output"] = False
+    except TypeError:
+        ohe_kwargs["sparse"] = False
+    if fixed_categories is not None:
+        ohe_kwargs["categories"] = [fixed_categories[c] for c in cat_cols]
+    cat_encoder = OneHotEncoder(**ohe_kwargs)
+
+    num_steps = [("impute", num_imputer), ("scale", num_scaler), ("poly", poly)]
+    if int(cfg.get("use_vt", 0)):
+        num_steps.append(("vt", VarianceThreshold(threshold=float(cfg.get("vt_thr", 0.0)))))
+
+    ct = ColumnTransformer([
+        ("num", Pipeline(steps=num_steps), num_cols),
+        ("cat", Pipeline(steps=[("impute", cat_imputer), ("oh", cat_encoder)]), cat_cols),
+    ])
+
+    select_k = fixed_k if fixed_k is not None else cfg.get("select_k", None)
+    selector = SafeSelectK(task=task, k=select_k)
+
+
+    pre = Pipeline([
+        ("prep", ct),
+        ("drop_const", ConstantFilter(eps=0.0)),
+        ("select", selector),
+    ])
+    return pre
--- a/src/search/nsga_deap.py
+++ b/src/search/nsga_deap.py
@@ -0,0 +1,147 @@
+import mlflow
+from deap import base, creator, tools
+from sklearn.utils import check_random_state
+
+from src.objectives import evaluate_config
+
+SCALERS = ["standard", "robust", "minmax", "power", "none"]
+NUM_IMPUTE = ["median", "mean"]
+CAT_IMPUTE = ["most_frequent"]
+ALGOS = ["rf", "gbt", "mlp"]
+
+
+def decode(ind):
+    i = 0
+    algo = ALGOS[int(ind[i]) % len(ALGOS)]
+    i += 1
+    scaler = SCALERS[int(ind[i]) % len(SCALERS)]
+    i += 1
+    num_imp = NUM_IMPUTE[int(ind[i]) % len(NUM_IMPUTE)]
+    i += 1
+    cat_imp = CAT_IMPUTE[int(ind[i]) % len(CAT_IMPUTE)]
+    i += 1
+    poly_degree = 1 + int(ind[i]) % 2
+    i += 1
+    use_selectk = int(ind[i]) % 2
+    i += 1
+    select_k = [None, 16, 32, 64, 128][int(ind[i]) % 5]
+    i += 1
+    if not use_selectk:
+        select_k = None
+
+    pre_cfg = {
+        "num_impute_strategy": num_imp,
+        "cat_impute_strategy": cat_imp,
+        "scaler": scaler,
+        "poly_degree": poly_degree,
+        "select_k": select_k,
+    }
+
+    if algo == "rf":
+        n_estimators = [100, 200, 300, 400, 500][int(ind[i]) % 5]
+        i += 1
+        max_depth = [2, 4, 6, 8, 10, 12][int(ind[i]) % 6]
+        i += 1
+        max_features = ["sqrt", "log2", None][int(ind[i]) % 3]
+        i += 1
+        params = {
+            "n_estimators": n_estimators,
+            "max_depth": max_depth,
+            "max_features": max_features,
+        }
+    elif algo == "gbt":
+        n_estimators = [100, 200, 300, 400, 500][int(ind[i]) % 5]
+        i += 1
+        max_depth = [2, 3, 4, 5][int(ind[i]) % 4]
+        i += 1
+        lr = [0.01, 0.02, 0.05, 0.1, 0.2][int(ind[i]) % 5]
+        i += 1
+        params = {
+            "n_estimators": n_estimators,
+            "max_depth": max_depth,
+            "learning_rate": lr,
+        }
+    else:
+        n_layers = [1, 2, 3][int(ind[i]) % 3]
+        i += 1
+        h = []
+        for _ in range(n_layers):
+            h.append([16, 32, 64, 128, 256][int(ind[i]) % 5])
+            i += 1
+        # skip unused gene slots if fewer than 3 layers
+        i += max(0, 3 - n_layers)
+        alpha = [1e-6, 1e-5, 1e-4, 1e-3, 1e-2][int(ind[i]) % 5]
+        i += 1
+        lr_init = [1e-4, 5e-4, 1e-3, 5e-3, 1e-2][int(ind[i]) % 5]
+        i += 1
+        params = {
+            "hidden_layers": tuple(h),
+            "activation": "relu",
+            "alpha": alpha,
+            "lr_init": lr_init,
+            "max_iter": 200,
+        }
+
+    return algo, params, pre_cfg
+
+
+def build_toolbox(X, y, task, seed, cv_folds, mlflow_experiment):
+    rng = check_random_state(seed)
+
+    # guard against duplicate class creation in the same process
+    if not hasattr(creator, "FitnessMSEStab"):
+        creator.create("FitnessMSEStab", base.Fitness, weights=(-1.0, 1.0))
+    if not hasattr(creator, "Individual"):
+        creator.create("Individual", list, fitness=creator.FitnessMSEStab)
+
+    toolbox = base.Toolbox()
+    toolbox.register("gene", rng.randint, 0, 1000000)
+    toolbox.register(
+        "individual",
+        tools.initRepeat,
+        creator.Individual,
+        toolbox.gene,
+        n=16,
+    )
+    toolbox.register("population", tools.initRepeat, list, toolbox.individual)
+
+    def eval_ind(individual):
+        algo, model_params, pre_cfg = decode(individual)
+
+        # one run per individual; the outer script sets the experiment
+        with mlflow.start_run(run_name=f"{algo}", nested=True):
+            for gi, g in enumerate(individual):
+                mlflow.log_param(f"g{gi}", int(g))
+            mlflow.log_param("algo", algo)
+            for k, v in model_params.items():
+                mlflow.log_param(f"m_{k}", v)
+            for k, v in pre_cfg.items():
+                mlflow.log_param(f"p_{k}", v)
+
+            mse_like, stability, _ = evaluate_config(
+                X,
+                y,
+                task,
+                algo,
+                model_params,
+                pre_cfg,
+                cv_folds=cv_folds,
+                seed=seed,
+            )
+            mlflow.log_metric("mse_like", mse_like)
+            mlflow.log_metric("stability", stability)
+
+        return mse_like, stability
+
+    toolbox.register("evaluate", eval_ind)
+    toolbox.register("mate", tools.cxTwoPoint)
+    toolbox.register(
+        "mutate",
+        tools.mutUniformInt,
+        low=0,
+        up=1000000,
+        indpb=0.2,
+    )
+    toolbox.register("select", tools.selNSGA2)
+
+    return toolbox
--- a/src/stability.py
+++ b/src/stability.py
@@ -0,0 +1,273 @@
+import numpy as np
+import pandas as pd
+import shap
+import time
+
+
+def compute_shap_matrix(
+    pipe,
+    X_fit,
+    y_fit,
+    X_eval,
+    task_type,
+    bg_size=128,
+    max_eval_rows=1024,
+    rng_seed=0,
+):
+    """
+    Fit the pipeline on (X_fit, y_fit), then compute SHAP values on X_eval.
+
+    Important: for stability, X_eval should be the same rows and order
+    across all folds or retrains that you want to compare.
+    """
+    t0 = time.time()
+    pipe.fit(X_fit, y_fit)
+    t_fit = time.time() - t0
+
+    pre = pipe.named_steps["pre"]
+    model = pipe.named_steps["model"]
+    prep = pre.named_steps["prep"]
+
+    # derive names after prep using TRAIN data, not eval
+    X_probe = prep.transform(X_fit[:1])
+    names_after_prep = getattr(
+        prep,
+        "get_feature_names_out",
+        lambda: np.array([f"f{i}" for i in range(X_probe.shape[1])]),
+    )()
+
+    # thread names through constant-dropper
+    if "drop_const" in pre.named_steps:
+        dropper = pre.named_steps["drop_const"]
+        if hasattr(dropper, "set_feature_names_in"):
+            dropper.set_feature_names_in(names_after_prep)
+        names_into_select = dropper.get_feature_names_out()
+    else:
+        names_into_select = names_after_prep
+
+    # thread names into selector
+    selector = pre.named_steps["select"]
+    if hasattr(selector, "set_feature_names_in"):
+        selector.set_feature_names_in(names_into_select)
+
+    # preprocess eval and train splits
+    X_eval_proc = pre.transform(X_eval)
+    X_train_proc = pre.transform(X_fit)
+
+    # cap eval rows for speed, keep deterministic subsample
+    n_eval = X_eval_proc.shape[0]
+    if n_eval > max_eval_rows:
+        rng = np.random.RandomState(rng_seed)
+        idx = rng.choice(n_eval, size=max_eval_rows, replace=False)
+        X_eval_proc = X_eval_proc[idx]
+
+    n_cols = X_eval_proc.shape[1]
+
+    # resolve final feature names after selection
+    feat_names = None
+    if hasattr(selector, "get_feature_names_out"):
+        feat_names = selector.get_feature_names_out(input_features=names_into_select)
+
+    if feat_names is None:
+        supp = getattr(selector, "support_mask_", None)
+        if (
+            supp is not None
+            and len(names_into_select) == supp.shape[0]
+            and supp.sum() == n_cols
+        ):
+            feat_names = np.asarray(names_into_select)[supp]
+        else:
+            feat_names = np.array([f"f{i}" for i in range(n_cols)])
+    else:
+        feat_names = np.asarray(feat_names)
+        if len(feat_names) != n_cols:
+            feat_names = np.array([f"f{i}" for i in range(n_cols)])
+
+    # build background from TRAIN split to avoid leakage
+    rng = np.random.RandomState(rng_seed)
+    n_bg_pool = X_train_proc.shape[0]
+    bg_n = min(bg_size, n_bg_pool)
+    bg_idx = rng.choice(n_bg_pool, size=bg_n, replace=False)
+    background = X_train_proc[bg_idx]
+
+    # choose explainer by model type
+    def _is_tree_model(m):
+        # sklearn trees and ensembles
+        if hasattr(m, "tree_") or hasattr(m, "estimators_"):
+            return True
+        # xgboost and lightgbm wrappers
+        try:
+            import xgboost as _xgb
+
+            if isinstance(m, _xgb.XGBModel):
+                return True
+        except Exception:
+            pass
+        try:
+            import lightgbm as _lgb
+
+            if isinstance(getattr(m, "booster_", None), _lgb.basic.Booster):
+                return True
+        except Exception:
+            pass
+        return False
+
+    # compute SHAP values
+    t1 = time.time()
+    vals = None
+
+    if _is_tree_model(model):
+        # tree specific, interventional mode with background data
+        explainer = shap.TreeExplainer(
+            model,
+            data=background,
+            feature_perturbation="interventional",
+        )
+        # disable strict additivity check to avoid ExplainerError
+        sv = explainer.shap_values(X_eval_proc, check_additivity=False)
+        # shap API can return list for classification. pick positive class if so.
+        if isinstance(sv, list):
+            cls_idx = 1 if len(sv) > 1 else 0
+            vals = np.asarray(sv[cls_idx])
+        else:
+            vals = np.asarray(sv)
+
+    elif hasattr(model, "coef_"):
+        # linear models
+        explainer = shap.LinearExplainer(model, background)
+        vals = np.asarray(explainer.shap_values(X_eval_proc))
+
+    else:
+        # fallback generic with training background masker
+        masker = shap.maskers.Independent(background)
+        if task_type == "classification" and hasattr(model, "predict_proba"):
+            f = lambda M: model.predict_proba(M)[:, 1]
+        else:
+            f = lambda M: model.predict(M)
+        explainer = shap.Explainer(f, masker)
+        out = explainer(X_eval_proc)
+        vals = np.asarray(getattr(out, "values", out))
+
+    t_shap = time.time() - t1
+
+    # normalize shapes to (n_rows, n_features)
+    vals = np.asarray(vals)
+    vals = np.squeeze(vals)
+
+    if vals.ndim == 3 and vals.shape[2] in (2,) and vals.shape[1] == len(feat_names):
+        vals = vals[..., -1]
+    if vals.ndim == 3 and vals.shape[-1] == len(feat_names):
+        vals = vals.reshape(-1, vals.shape[-1])
+    if vals.ndim == 2 and vals.shape[0] == len(feat_names):
+        vals = vals.T
+
+    return vals, t_fit, t_shap, feat_names
+
+
+def mean_abs_shap(shap_matrix, feature_names):
+    """
+    Global mean absolute SHAP per feature.
+    Still useful for descriptive plots, but not used for instance level stability.
+    """
+    return pd.Series(
+        np.abs(shap_matrix).mean(axis=0),
+        index=np.asarray(feature_names),
+    )
+
+
+def _align_matrices_to_union(mats_with_names):
+    """
+    mats_with_names: list of (shap_matrix, feature_names)
+
+    Each shap_matrix has shape (n_instances, n_features_k).
+    feature_names is array-like of length n_features_k.
+
+    Returns:
+      all_feats: list[str]
+      T: np.ndarray of shape (n_models, n_instances, n_all_feats)
+    """
+    if not mats_with_names:
+        raise ValueError("No SHAP matrices provided")
+
+    # check that all matrices have the same number of instances
+    n_instances = mats_with_names[0][0].shape[0]
+    for M, names in mats_with_names:
+        if M.shape[0] != n_instances:
+            raise ValueError(
+                f"All SHAP matrices must have same number of rows. "
+                f"Expected {n_instances}, got {M.shape[0]}"
+            )
+
+    # union of all feature names
+    all_feats = sorted(
+        set().union(*[set(np.asarray(names)) for _, names in mats_with_names])
+    )
+    n_models = len(mats_with_names)
+    n_feats = len(all_feats)
+
+    T = np.zeros((n_models, n_instances, n_feats), dtype=float)
+    feat_index = {f: j for j, f in enumerate(all_feats)}
+
+    for m_idx, (M, names) in enumerate(mats_with_names):
+        names = np.asarray(names)
+        col_map = {name: c for c, name in enumerate(names)}
+        for fname, j_global in feat_index.items():
+            if fname in col_map:
+                j_local = col_map[fname]
+                T[m_idx, :, j_global] = M[:, j_local]
+            # if fname not present, leave zeros
+
+    return all_feats, T
+
+
+def shap_stability_from_matrices(mats_with_names):
+    """
+    mats_with_names: list of tuples (shap_matrix, feature_names)
+      shap_matrix: np.ndarray of shape (n_instances, n_features_k)
+      feature_names: list or array of length n_features_k
+
+    This measures instance level stability:
+      For each instance i and feature j we look at the SHAP values
+      across models and compute their standard deviation.
+
+    Steps:
+      1) Align all matrices on the union of feature names.
+      2) Build tensor T of shape (n_models, n_instances, n_features_union).
+      3) Compute std across models: per_inst_feat_std = T.std(axis=0).
+      4) Aggregate:
+         agg_std = mean of per_inst_feat_std over instances and features.
+         stability_score = 1 / (1 + agg_std).
+
+    Returns:
+      agg_std: float
+      stability_score: float
+      per_feat_std: pd.Series with mean std per feature over instances
+      per_inst_std: np.ndarray with mean std per instance over features
+    """
+    if not mats_with_names:
+        raise ValueError("No SHAP matrices provided")
+
+    if len(mats_with_names) < 2:
+        raise ValueError(
+            f"Need at least 2 models to estimate stability, got {len(mats_with_names)}"
+        )
+
+    feat_names_union, T = _align_matrices_to_union(mats_with_names)
+
+    # std across models for each instance and feature
+    per_inst_feat_std = T.std(axis=0)  # shape (n_instances, n_features)
+
+    # aggregate
+    agg_std = float(per_inst_feat_std.mean())
+    stability_score = 1.0 / (1.0 + agg_std)
+
+    # per feature: average std over instances
+    per_feat_std = per_inst_feat_std.mean(axis=0)
+    per_feat_std_series = pd.Series(per_feat_std, index=feat_names_union)
+
+    # per instance: average std over features
+    per_inst_std = per_inst_feat_std.mean(axis=1)
+
+    return agg_std, stability_score, per_feat_std_series, per_inst_std
+
+