commit fd24c2a2aa3bcb159c138c3b5ea8c09b68d65627 Author: git Date: Tue Nov 11 02:17:59 2025 +0000 Upload files to "Figures" diff --git a/Figures/Framework.png b/Figures/Framework.png new file mode 100644 index 0000000..7318489 Binary files /dev/null and b/Figures/Framework.png differ diff --git a/Figures/Model_complex_Opt.png b/Figures/Model_complex_Opt.png new file mode 100644 index 0000000..443b1c1 Binary files /dev/null and b/Figures/Model_complex_Opt.png differ diff --git a/Figures/README.md b/Figures/README.md new file mode 100644 index 0000000..a330354 --- /dev/null +++ b/Figures/README.md @@ -0,0 +1,23 @@ +# Explanation-Aware Automated Machine Learning + +This repository accompanies the research paper: + +**โ€œMulti-Objective Automated Machine Learning for Explainable Artificial Intelligence: Optimizing Predictive Accuracy and Shapley-Based Feature Stability.โ€** + +In high-stakes domains such as agriculture, machine learning models must be not only accurate but also transparent and aligned with domain knowledge. This project presents a novel **multi-objective optimization framework** that jointly maximizes predictive performance and explanation stability. Specifically, we introduce a formal metric based on the **variance of Shapley Additive Explanations across cross-validation folds**, embedding it directly into the model selection process. + +Our approach leverages the **Non-dominated Sorting Genetic Algorithm II** to evolve models that balance predictive accuracy with robust, semantically consistent explanations. When applied to potato yield prediction, the framework outperforms both **H2O.ai's Automatic Machine Learning platform** and traditional grid search, producing models that are both high-performing and interpretable. + +--- + +## ๐Ÿ” Key Features + +- Multi-objective optimization for predictive accuracy and explanation stability +- Shapley-based metric embedded into the model selection loop +- Implementation using NSGA-II for evolutionary search +- Reproducible case study in potato yield forecasting +- Baseline comparisons with grid search and H2O.aiโ€™s platform + +--- + +## ๐Ÿ“‚ Repository Structure diff --git a/Figures/background.txt b/Figures/background.txt new file mode 100644 index 0000000..8ba4a4a --- /dev/null +++ b/Figures/background.txt @@ -0,0 +1,42 @@ +https://gitlab.com/university-of-prince-edward-isalnd/explanation-aware-optimization-and-automl/-/tree/main/src?ref_type=heads + + + +############################################################################################################################################################ +Code File Structure + +Shell scripts + + h20_batch.sh -> + nsga_batch.sh -> + grid_search_batch.sh -> + + + + +############################################################################################################################################################ +Code Changes: + +- SHAP KernelExplainer + Use shap.TreeExplainer on tree-based models instead + +- AutoML search size + Reduce max_models or max_runtime_secs per fold or pre-select algorithms + +- Data transformations + Cache intermediate NumPy arrays to skip repeated fit_transform calls in each fold + +- Parallel folds + if CPU has many cores, parallelize the K-fold loop with joblib.parallel to fully use a higher core count CPU + +############################################################################################################################################################ +Notes +- The Slurm headers indicate that the programs should be run on a system with 4 cores per task and 10GB of RAM. + This is quite conservative and would not need to be directed towards a cloud-computing environment to run + +- The three jobs run with a run time limit of 11 hours. Considering average Compute Canada / AceNet servers (approx 2.5GHz CPUs), + allocate a time limit of at least 5 hours to run on a 13600KF system (assuming no hyperthreading and E-core processing) + +- H20 AutoML supports GPU compute using CUDA libraries. A CUDA accelerate GPU may see performance gains for this computation + +- \ No newline at end of file diff --git a/Figures/features_heatmap.png b/Figures/features_heatmap.png new file mode 100644 index 0000000..3dfb2b3 Binary files /dev/null and b/Figures/features_heatmap.png differ