automl_datasets/background.txt

https://gitlab.com/university-of-prince-edward-isalnd/explanation-aware-optimization-and-automl/-/tree/main/src?ref_type=heads

Operation:
Specify working directory (local repo location), cache directory (dataset download location), and


$WORK_DIR=


############################################################################################################################################################
Code File Structure

Shell scripts

            h20_batch.sh ->
            nsga_batch.sh ->
            grid_search_batch.sh ->


############################################################################################################################################################
Code Changes:

- SHAP KernelExplainer
        Use shap.TreeExplainer on tree-based models instead

- AutoML search size
        Reduce max_models or max_runtime_secs per fold or pre-select algorithms

- Data transformations
        Cache intermediate NumPy arrays to skip repeated fit_transform calls in each fold

- Parallel folds
        if CPU has many cores, parallelize the K-fold loop with joblib.parallel to fully use a higher core count CPU

############################################################################################################################################################
Notes
- The Slurm headers indicate that the programs should be run on a system with 4 cores per task and 10GB of RAM.
  This is quite conservative and would not need to be directed towards a cloud-computing environment to run

- The three jobs run with a run time limit of 11 hours. Considering average Compute Canada / AceNet servers (approx 2.5GHz CPUs),
  allocate a time limit of at least 5 hours to run on a 13600KF system (assuming no hyperthreading and E-core processing)

- H20 AutoML supports GPU compute using CUDA libraries. A CUDA accelerate GPU may see performance gains for this computation

-