Skip to main navigation Skip to search Skip to main content

EXAGREE: Mitigating Explanation Disagreement with Stakeholder-Aligned Models

Research output: Contribution to journalArticle

Abstract

Conflicting explanations, arising from different attribution methods or model internals, limit the adoption of machine learning models in safety-critical domains. We turn this disagreement into an advantage and introduce EXplanation AGREEment (EXAGREE), a two-stage framework that selects a Stakeholder-Aligned Explanation Model (SAEM) from a set of similar-performing models. The selection maximizes Stakeholder-Machine Agreement (SMA), a single metric that unifies faithfulness and plausibility. EXAGREE couples a differentiable mask-based attribution network (DMAN) with monotone differentiable sorting, enabling gradient-based search inside the constrained model space. Experiments on six real-world datasets demonstrate simultaneous gains of faithfulness, plausibility, and fairness over baselines, while preserving task accuracy. Extensive ablation studies, significance tests, and case studies confirm the robustness and feasibility of the method in practice.
Original languageEnglish
Article numberabs/2411.01956
Number of pages24
JournalCoRR
Issue numberNovember 2024
DOIs
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'EXAGREE: Mitigating Explanation Disagreement with Stakeholder-Aligned Models'. Together they form a unique fingerprint.

Cite this