Upload files to "Figures"

2025-11-11 02:17:59 +00:00
commit fd24c2a2aa
5 changed files with 65 additions and 0 deletions
--- a/Figures/Framework.png
+++ b/Figures/Framework.png
--- a/Figures/Model_complex_Opt.png
+++ b/Figures/Model_complex_Opt.png
--- a/Figures/README.md
+++ b/Figures/README.md
@@ -0,0 +1,23 @@
+# Explanation-Aware Automated Machine Learning
+
+This repository accompanies the research paper:
+
+**“Multi-Objective Automated Machine Learning for Explainable Artificial Intelligence: Optimizing Predictive Accuracy and Shapley-Based Feature Stability.”**
+
+In high-stakes domains such as agriculture, machine learning models must be not only accurate but also transparent and aligned with domain knowledge. This project presents a novel **multi-objective optimization framework** that jointly maximizes predictive performance and explanation stability. Specifically, we introduce a formal metric based on the **variance of Shapley Additive Explanations across cross-validation folds**, embedding it directly into the model selection process.
+
+Our approach leverages the **Non-dominated Sorting Genetic Algorithm II** to evolve models that balance predictive accuracy with robust, semantically consistent explanations. When applied to potato yield prediction, the framework outperforms both **H2O.ai's Automatic Machine Learning platform** and traditional grid search, producing models that are both high-performing and interpretable.
+
+---
+
+## 🔍 Key Features
+
+- Multi-objective optimization for predictive accuracy and explanation stability
+- Shapley-based metric embedded into the model selection loop
+- Implementation using NSGA-II for evolutionary search
+- Reproducible case study in potato yield forecasting
+- Baseline comparisons with grid search and H2O.ai’s platform
+
+---
+
+## 📂 Repository Structure
--- a/Figures/background.txt
+++ b/Figures/background.txt
@@ -0,0 +1,42 @@
+https://gitlab.com/university-of-prince-edward-isalnd/explanation-aware-optimization-and-automl/-/tree/main/src?ref_type=heads
+
+
+
+############################################################################################################################################################
+Code File Structure
+
+Shell scripts
+
+            h20_batch.sh ->   
+            nsga_batch.sh ->
+            grid_search_batch.sh ->
+
+
+
+
+############################################################################################################################################################
+Code Changes:
+
+- SHAP KernelExplainer
+        Use shap.TreeExplainer on tree-based models instead
+
+- AutoML search size
+        Reduce max_models or max_runtime_secs per fold or pre-select algorithms
+
+- Data transformations
+        Cache intermediate NumPy arrays to skip repeated fit_transform calls in each fold
+
+- Parallel folds
+        if CPU has many cores, parallelize the K-fold loop with joblib.parallel to fully use a higher core count CPU                        
+
+############################################################################################################################################################
+Notes
+- The Slurm headers indicate that the programs should be run on a system with 4 cores per task and 10GB of RAM. 
+  This is quite conservative and would not need to be directed towards a cloud-computing environment to run
+
+- The three jobs run with a run time limit of 11 hours. Considering average Compute Canada / AceNet servers (approx 2.5GHz CPUs), 
+  allocate a time limit of at least 5 hours to run on a 13600KF system (assuming no hyperthreading and E-core processing)
+
+- H20 AutoML supports GPU compute using CUDA libraries. A CUDA accelerate GPU may see performance gains for this computation
+
+- 
--- a/Figures/features_heatmap.png
+++ b/Figures/features_heatmap.png