Abstract
Voice-enabled technologies, such as virtual assistants, are becoming increasingly ubiquitous. Their functionality depends on machine learning (ML) models that perform tasks like automatic speech recognition (ASR). Currently, these models exhibit reduced accuracy for certain speaker cohorts, influenced by factors such as age, gender, and accent, indicating a presence of bias. ML models are trained using large datasets, and ML practitioners (MLPs) are keen on addressing this bias throughout the ML lifecycle. Dataset documentation plays a crucial role in understanding dataset characteristics, yet there is a notable lack of research focused on voice—spoken language—dataset documentation.Our work contributes empirically to addressing this gap by identifying deficiencies in voice dataset documents (VDDs) and advocating for improvements. We conducted 13 interviews with MLPs working with voice data to explore their use of VDDs, focusing on their roles and the trade-offs they encounter. Based on literature and interview data, we developed a rubric to analyze nine voice datasets' VDDs.
Original language | English |
---|---|
Pages (from-to) | 51-66 |
Number of pages | 16 |
Journal | Proceedings of the Australasian Language Technology Workshop |
Volume | 2 |
Publication status | Published - 2023 |
Event | 21st Annual Workshop of the Australasian Language Technology Association, ALTA 2023 - Melbourne, Australia Duration: 29 Nov 2023 → 1 Dec 2023 |