Beyond Accuracy in AI: A Multi-Objective Benchmark of Inductive Bias, Robustness, Computational Efficiency, and Pareto-Optimal Trade-Off

OKUTAN, HÜSEYİN; Baykara, Muhammet

doi:10.3390/app16104637

Beyond Accuracy in AI: A Multi-Objective Benchmark of Inductive Bias, Robustness, Computational Efficiency, and Pareto-Optimal Trade-Off

OKUTAN H. E., Baykara M.

Applied Sciences (Switzerland), cilt.16, sa.10, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 16 Sayı: 10
Basım Tarihi: 2026
Doi Numarası: 10.3390/app16104637
Dergi Adı: Applied Sciences (Switzerland)
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Applied Science & Technology Source, Compendex, INSPEC, Directory of Open Access Journals
Anahtar Kelimeler: AI benchmarking, computational efficiency, inductive bias, model complexity trade-offs, multi-objective analysis, nonlinear classification, Pareto analysis, resource-aware machine learning, resource-constrained AI
İnönü Üniversitesi Adresli: Evet

Özet

Nonlinear classification problems such as XOR are widely used to evaluate machine learning models beyond linear separability. In this study, a comprehensive benchmark is proposed to analyze eight classifiers (Logistic Regression, Linear SVM, RBF SVM, Decision Tree, Random Forest, KNN, MLP_small, MLP_deep) across four XOR variants (clean, noisy, rotated, high-dimensional). A total of 640 controlled experiments are conducted using multiple sample sizes and random seeds. Models are evaluated using a multi-objective framework including accuracy, training and inference time, memory usage, energy consumption, and model size. Results show that MLP_deep achieves the highest overall accuracy, while MLP_small provides competitive performance with significantly lower computational cost. Decision Tree offers a strong balance between efficiency and accuracy, whereas Random Forest achieves competitive accuracy at higher resource usage. High-dimensional XOR is the most challenging scenario, significantly reducing overall performance across models. Pareto frontier analysis further highlights optimal trade-offs between predictive performance and resource efficiency. The study demonstrates that no single model is universally optimal and emphasizes the importance of resource-aware model selection in nonlinear classification tasks.