Tuesday, September 8, 2015, 16:30–17:30, Large Hall
Michael Wagner, National Centre for Biometric Studies (NCBS), Australia
Modern speech technology relies on massive databases of thousands of hours of speech recordings by tens of thousands of speakers. Automatic speech recognition requires the meticulous modelling of even the quirkiest turn of phrase and its every acoustic realisation by just about anyone anywhere. Speaker recognition requires, in both its telephone banking and forensic identification guises, universal background models with speech data representative of the entire populations of countries, of the speakers of particular languages, dialects or accents. Text-to-speech synthesis, speech coding, noise reduction and most other areas of speech technology require similarly research using large speech databases. Accordingly, the recording of speech data has been undertaken on a huge scale, funded both commercially and publicly, for at least a quarter of a century.
In earlier years, the anonymity of the data and the privacy of the speakers was tacitly assumed to be preserved by not revealing speakers’ names or other identifying information, and metadata was restricted “reasonably” for research use. However, with current speaker recognition technology, it is no longer clear whether the anonymity of speech data and the privacy of the speakers are still protected adequately; even less so where other biometric information, such as facial images or finger ridge patterns, is stored together with the speech data.
Research which relates the manner of speaking to medical conditions such as depression, to physical and behavioural conditions such as drug use and mood, or to personality traits has been reported at recent Interspeech conferences. Such research is usually motivated by, and aimed at, potentially beneficial applications, but it also raises the possibility of medical, physical and behavioural profiling for purposes that could be considered abusive of the individual.
As researchers in speech science and speech technology we should conduct our research responsibly, handle our data with care and observe the legitimate privacy concerns of our speakers. Therefore, it is an appropriate time at Interspeech-2015 to begin a public discussion on potential problems in large-scale speech data collection and on the adequacy of current safeguards regarding ethical protocols for the recording and usage of spoken language databases.
A panel of experts will kick-start a lively discussion on this topic, and audience participation by the Interspeech community is invited. At print time, the following panellists have confirmed their participation:
- Moderator: Michael Wagner, Managing Director NCBS, Em Prof U Canberra, Hon Prof TU Berlin
- Khalid Choukri, Secretary General, European Language Resources Association
- Phil Hall, Senior Vice President Speech and Data Collection, Appen
- Kate Knill, Senior Research Associate, BABEL Project, Cambridge University, Member ISCA Board
- David Sündermann-Oeft, Director of Research, Educational Testing Service
- David van der Vloed, Forensic Speech Researcher, Netherlands Forensic Institute