automl_datasets/README.md

# Explanation-Aware Automated Machine Learning

This repository accompanies the research paper:

**“Multi-Objective Automated Machine Learning for Explainable Artificial Intelligence: Optimizing Predictive Accuracy and Shapley-Based Feature Stability.”**

In high-stakes domains such as agriculture, machine learning models must be not only accurate but also transparent and aligned with domain knowledge. This project presents a novel **multi-objective optimization framework** that jointly maximizes predictive performance and explanation stability. Specifically, we introduce a formal metric based on the **variance of Shapley Additive Explanations across cross-validation folds**, embedding it directly into the model selection process.

Our approach leverages the **Non-dominated Sorting Genetic Algorithm II** to evolve models that balance predictive accuracy with robust, semantically consistent explanations. When applied to potato yield prediction, the framework outperforms both **H2O.ai's Automatic Machine Learning platform** and traditional grid search, producing models that are both high-performing and interpretable.

---

## 🔍 Key Features

- Multi-objective optimization for predictive accuracy and explanation stability
- Shapley-based metric embedded into the model selection loop
- Implementation using NSGA-II for evolutionary search
- Reproducible case study in potato yield forecasting
- Baseline comparisons with grid search and H2O.ai’s platform

---

## 📂 Repository Structure