TY - JOUR
T1 - A Scoping Study of Evaluation Practices for Responsible AI Tools: Steps Towards Effectiveness Evaluations
AU - Berman, Glen
AU - Goyal, Nitesh
AU - Madaio, Michael
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s)
PY - 2024/5/11
Y1 - 2024/5/11
N2 - Responsible design of AI systems is a shared goal across HCI and AI communities. Responsible AI (RAI) tools have been developed to support practitioners to identify, assess, and mitigate ethical issues during AI development. These tools take many forms (e.g., design playbooks, software toolkits, documentation protocols). However, research suggests that use of RAI tools is shaped by organizational contexts, raising questions about how effective such tools are in practice. To better understand how RAI tools are—and might be—evaluated, we conducted a qualitative analysis of 37 publications that discuss evaluations of RAI tools. We find that most evaluations focus on usability, while questions of tools’ effectiveness in changing AI development are sidelined. While usability evaluations are an important approach to evaluate RAI tools, we draw on evaluation approaches from other fields to highlight developer- and community-level steps to support evaluations of RAI tools’ effectiveness in shaping AI development practices and outcomes.
AB - Responsible design of AI systems is a shared goal across HCI and AI communities. Responsible AI (RAI) tools have been developed to support practitioners to identify, assess, and mitigate ethical issues during AI development. These tools take many forms (e.g., design playbooks, software toolkits, documentation protocols). However, research suggests that use of RAI tools is shaped by organizational contexts, raising questions about how effective such tools are in practice. To better understand how RAI tools are—and might be—evaluated, we conducted a qualitative analysis of 37 publications that discuss evaluations of RAI tools. We find that most evaluations focus on usability, while questions of tools’ effectiveness in changing AI development are sidelined. While usability evaluations are an important approach to evaluate RAI tools, we draw on evaluation approaches from other fields to highlight developer- and community-level steps to support evaluations of RAI tools’ effectiveness in shaping AI development practices and outcomes.
KW - AI
KW - effectiveness
KW - ethics
KW - evaluation
KW - fairness
KW - responsibility
KW - toolkits
UR - http://www.scopus.com/inward/record.url?scp=85194881498&partnerID=8YFLogxK
U2 - 10.1145/3613904.3642398
DO - 10.1145/3613904.3642398
M3 - Conference article
JO - CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems
JF - CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems
ER -