Track 2 – 14:00-17:30
Duration: 3 hours, 30min coffee break
Location: Conference 1
Presenters: Nicholas Cummins1,2, Stefan Scherer3, Jarek Krajewski4,5, Sebastian Schnieder4,5, Julien Epps1,2, Thomas F. Quatieri6
1) School of Elec. Eng. and Telecomm., The University of New South Wales, Sydney Australia
2) ATP Research Laboratory, National ICT Australia (NICTA), Australia
3) University of Southern California Institute for Creative Technologies, Playa Vista, CA 90094, USA
4) Experimental Industrial Psychology, University of Wuppertal, Wuppertal, Germany
5) Industrial Psychology, Rhenish University of Applied Sciences Cologne, Germany
6) MIT Lincoln Laboratory 244 Wood Street, Lexington, MA 02421
Target Audience: The tutorial is intended to be suitable for speech processing researchers and practitioners with basic knowledge of techniques used in the analysis of speech for classification applications. Some of the topics will also be of interest to speech scientists. The tutorial will be aimed towards established and new researchers in depressed or suicidal speech analysis as well as researchers in vocal biomarkers of any neurological disorder. By attending the tutorial as well as receiving a copy of the review paper, participants will gain key insights and understandings and – given the newly available AVEC databases – be able produce publishable systems and results in this field.
Many topics discussed in final section will be of great benefit to any speech researcher who is planning setting up a new study in either depressed or suicidal speech analysis. Further, large parts of the tutorial will be highly relevant to conference delegates from other areas of computational paralinguistics – such as entrants in the Interspeech 2015 Computational Paralinguistics Challenge (ComParE) – and social signal processing. Due to the similarities in the effects that a range of conditions and illness such as; emotion, fatigue, sleepiness, intoxication, bipolar disorder, anxiety, Post traumatic stress disorder, Parkinson’s Disorder (a 2015 ComParE Sub Challenge), dysarthria, and apraxia have on speech production many of features, systems and problems discussed will be relevant to researchers in these fields.
As well as discussing many established speech features the tutorial will also discuss many new speech features such as Reduced Vowel Space, Vocal Tract Correlation Features and Reduced Acoustic Variability. It will also feature an introduction to regression methods in paralinguistic speech processing and a discussion of the difficulties of regression against ordinal and ranked labels and of fusing predictor outcomes.
In the absence of a single characterization or measurable biological trait, the linguistic content of speech is used by clinicians as one measure of a variety of psychiatric conditions. However, due in part to a patient’s impaired outlook and motivation; this information can be time consuming to gather and requires a large degree of skill and training to objectively assess. In recent years, the problem of automatically detecting and monitoring depression and suicidality using speech, more specifically non-verbal paralinguistic cues, has gained popularity. This is evident by the growing number of papers published in this field over the last five years and the recent Audio/Visual Emotion Challenge Depression Score Prediction Sub-challenges (http://sspnet.eu/avec2013/, http://sspnet.eu/avec2014/).
This tutorial will review the current state of research and raise key research issues relating to the automatic analysis of speech for use as an objective predictor of depression and suicidality. Many of the issues covered are also applicable to other cognitive and neurological disorders. This will be the first tutorial in this topic area at any conference to our knowledge. It is significantly based on a Journal Paper – of the same name and authored by the presenters, a 52-page review paper – which is has been accepted for publication in Speech Communication. Both depression and suicide are major public health concerns; depression has long been recognized as a prominent cause of disability and burden worldwide, whilst suicide is a misunderstood and complex course of death that strongly impacts the quality of life and mental health of the families and communities left behind. Currently no objective measure, with clinical utility, for depression and suicidality exists. This compromises optimal patient care, compounding an already high burden on health, social, and economic services.
Speech is an attractive candidate for use in an automated detection and monitoring system; it can be measured cheaply, remotely, non-invasively and non-intrusively. Depression and Suicidality produce a range of cognitive and physiological changes that influence the process of speech production, affecting the acoustic quality of the speech produced in a way that is measurable and possible to be objectively assessed. However, automated detection and monitoring of either condition via paralinguistic assessment is a very challenging task. Key challenges include; the ordinal nature of clinical assessment scores, small databases and mitigating unwanted sources of variability.
To increase the likelihood of finding a set of speech-based markers with clinical utility for depression and suicidality it is important for the research community to conduct more focused – hypothesis driven – research, where specific aspects of the effects of depression and suicidality on speech are considered in system design. This tutorial is indeed to introduce attendees to these specific aspects. It will be divided into four topics; (i) an introductory section; which discusses current diagnostic and assessment methods for depression and suicidality and highlights key cognitive, neural and physiological changes associated with both conditions (ii) a review of both old and new speech features in relation to their suitability as a marker of either condition; (iii) a review of classification and prediction techniques that have been used in the automatic analysis of speech as a predictor of either condition; and (iv) a discussion on the key research problems and challenges associated with this fascinating and rapidly growing field of speech processing research.
Each topic will be split into 10-20% introductory material (for audience members new to the topic) and 80-90% discussion of the topic at hand. The times given beside each topic make some allowance for short questions during or after each topic. The presentation will be split between all presenters.
Topic 1: Speech, a key objective marker of either condition (40 mins)
1.1 Current Diagnostic Methods
In order to investigate how speech might be used to index or classify depression or suicidality, it is necessary to first understand how current diagnostic methods are used and what aspects of these may be relevant to speech analysis.
1.2 Objective Markers for Depression and Suicidality
As speech potentially represents just one diagnostic aid modality it is important to highlight current research into associated biological, physiological and behavioural markers so as to gain an understanding of how speech could be used to augment systems and analysis methods based on these systems.
1.3 Speech as an Objective Marker
The aim of this section will be to highlight the expected cognitive, neural and physiological changes associated with both conditions which affect speech production.
1.4 Differences and Similarities with other Paralinguistic States and Traits
The aim of this section will be to highlight the similarities and differences between depressed and suicidal speech analysis and other forms of paralinguistic speech analysis.
1.5 Depression and Suicidality Corpora
It is instructive to review the characteristics of depressed and suicidal speech databases, to understand what kinds of data collection protocols and objectives already exist and are well suited for research in this area. Accessibility of corpora will also be discussed.
Topic 2: Review of the paralinguistic changes in speech affected by either condition (50 mins)
2.1 The Ideal Speech Feature
Before starting the feature review it is worthwhile considering the properties of an ideal speech feature(s) for detecting either depression or suicidality.
2.2 Prosodic Features
A literature review on the use of prosodic features as a clinical marker of either condition.
2.3 Source Features
A literature review on the use of source features as a clinical marker of either condition.
2.4 Formant Analysis
A literature review followed by an introduction to modelling Reduced Vowel Space and Vocal Tract Correlation Features
2.5 Spectral Features
A literature review followed by an introduction to modelling Reduced Acoustic Variability.
Topic 3: Classification and Score Prediction (45 mins)
This section is a review of the investigations that have been carried out into the automatic analysis of speech as a predictor of suicidality and depression. During this section we will be highlighting the differences in classification and regression methodologies for depression/suicidality.
3.1 Presence and Severity of Depression
Review of automated systems that perform presence of depression (categorical assignment of voice recording into presence or absence classes) or assessment of severity (categorical assignment of an unknown voice sample into two or more distinct classes relating to a clinical assessment scale) of depression classification.
3.2 Depression Score Prediction
Review of automated systems which perform depression score level prediction – the assignment of an unknown voice sample to a continuous-valued mental state assessment scale score. This section will also be an introduction to regression methods in paralinguistic speech processing and feature discussions on predictor fusion techniques and the difficulties of predicting against ordinal scores.
3.3 Automatic Classification of Suicidal Speech
Review of automated systems that classify the presence or absence of suicidality in the voice.
Topic 4: Key research problems and challenges (30 mins)
The tutorial concludes by raising some of the major challenges and potential future research directions associated with this field of speech processing research.
4.1 Standardisation of Data Collection
A discussion on the issues relating to the standardisation of data collection, including researcher health effects related to the collection and annotation of data.
4.2 Clinical Dissemination
A discussion on developing technology to support clinicians and primary health care providers. This will also include a short section on context of use, i.e. how systems might actually be used in clinical practice, and how far away we are from practical systems.
4.3 Sources of Nuisance Variability and the need for Mitigation
A discussion on the need to investigate nuisance mitigation approaches, such factor analysis and i-vectors, for depressed and suicidal speech analyses.
Questions and General Discussion (15 Mins)
- Cummins, S. Scherer, J. Krajewski, S. Schnieder, J. Epps and T.F Quatieri, “A Review of Depression and Suicide Risk Assessment using Speech Analysis” In: Speech Communication (In Press). 2015
- Cummins, V. Sethu, J. Epps, S. Schnieder, and J. Krajewski, “Analysis of Acoustic Space Variability in Speech Affected by Depression In: Speech Communication (under review).
- Scherer, G. Lucas, J. Gratch, A. Rizzo, and L.-P. Morency. “Reduced Vowel Space in Conversational Speech Indicates Depression”. In: under review at IEEE Transactions on Affective Computing (under review).
- Scherer, G. Stratou, G. Lucas, M. Mahmoud, J. Boberg, J. Gratch, A. Rizzo, and L.-P. Morency. “Automatic Audio-visual Behaviour Descriptors for Psychological Disorder Analysis”. In: Image and Vision Computing Journal, Special Issue on Best of Face and Gesture 2013 32.10 (2014), pp. 648–658.
- Stratou, S. Scherer, J. Gratch, and L.-P. Morency. “Automatic nonverbal behaviour indicators of depression and PTSD: the effect of gender”. In: Journal on Multimodal User Interfaces (2014), pp. 1–13.
Joshi, J., Goecke, R., Alghowinem, S., Dhall, A., Wagner, M., Epps, J., Parker, G., and Breakspear, M., ″Multimodal Assistive Technologies for Depression Diagnosis and Monitoring″, Journal on Multimodal User Interfaces Special Issue on Multimodal Interfaces for Pervasive Assistance, vol. 7, no. 3, November 2013, pp. 217-228.
- Trevino, T. Quatieri, and N. Malyska, “Phonologically-based biomarkers for major depressive disorder,” EURASIP J. Adv. Signal Process., vol. 2011, no. 1, pp. 1–18, 2011.
Cummins, J. Epps, V. Sethu, and J. Krajewski, “Weighted Pairwise Gaussian Likelihood Regression for Depression Score Prediction,” accepted for publication in Proceedings of ICASSP, 2015, pp. NA
- Scherer, L.-P. Morency, J. Gratch, and J. P. Pestian. “Reduced Vowel Space is a Robust Indicator of Psychological Distress: A Cross-Corpus Analysis”. accepted for publication in Proceedings of ICASSP, 2015, pp. NA
- Cummins, V. Sethu, J. Epps, and J. Krajewski, “Probabilistic Acoustic Volume Analysis for Speech Affected by Depression,” in Proceedings of Interspeech, 2014, pp. 1238–1242.
- Hönig, A. Batliner, E. Nöth, S. Schnieder, and J. Krajewski, “Automatic Modelling of Depressed Speech: Relevant Features and Relevance of Gender,” in Proceedings of Interspeech, 2014, pp. 1248–1252.
- Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, and M. Pantic, “AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge,” in Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge (AVEC ’14), 2014, pp. 3–10.
- Williamson, T. Quatieri, B. Helfer, G. Ciccarelli, and D. D. Mehta, “Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing,” in Proceedings of the 4th ACM International Workshop on Audio/Visual Emotion Challenge (AVEC ’14), 2014, pp. 65–72.
- Cummins, J. Epps, V. Sethu, and J. Krajewski, “Variability Compensation in Small Data: Oversampled Extraction of i-vectors for the Classification of Depressed Speech,” in Proceedings of ICASSP, 2014, pp. 970–974.
- Cummins, J. Epps, V. Sethu, M. Breakspear, and R. Goecke, “Modeling Spectral Variability for the Classification of Depressed Speech,” in Proceedings of Interspeech, 2013, pp. 857–861.
- S. Helfer, T. F. Quatieri, J. R. Williamson, D. D. Mehta, R. Horwitz, and B. Yu, “Classification of depression state based on articulatory precision,” in Proceedings of Interspeech, 2013, pp. 2172–2176.
- Scherer, G. Stratou, J. Gratch, and L. Morency, “Investigating Voice Quality as a Speaker-Independent Indicator of Depression and PTSD,” in Proceedings of Interspeech, 2013, pp. 847–851.
- Cummins, J. Joshi, A. Dhall, V. Sethu, R. Goecke, and J. Epps, “Diagnosis of Depression by Behavioural Signals: A Multimodal Approach,” in Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013, pp. 11–20.
- Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic, “AVEC 2013: The Continuous Audio/Visual Emotion and Depression Recognition Challenge,” in Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013, pp. 3–10.
- R. Williamson, T. F. Quatieri, B. S. Helfer, R. Horwitz, B. Yu, and D. D. Mehta, “Vocal Biomarkers of Depression Based on Motor Incoordination,” in Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge, 2013, pp. 41–48.
- Cummins, J. Epps, and E. Ambikairajah, “Spectro-Temporal Analysis of Speech Affected by Depression and Psychomotor Retardation,” in Proceedings of ICASSP, 2013, pp. 7542–7546.
- Scherer, J. Pestian, and L. P. Morency, “Investigating the Speech Characteristics of Suicidal Adolescents,” in Proceedings of ICASSP, 2013, pp. 709 – 713.
- Scherer, G. Stratou, M. Mahmoud, J. Boberg, and J. Gratch, “Automatic Behavior Descriptors for Psychological Disorder Analysis,” in Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, 2013, pp. 1 – 8
- F. Quatieri, and N. Malyska, “Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity,” in Proceedings of Interspeech, 2012, pp. 1059–1062.
- Cummins, J. Epps, M. Breakspear, and R. Goecke, “An Investigation of Depressed Speech Detection: Features and Normalization,” in Proceedings of Interspeech, 2011, pp. 2997–3000.
D. Sturim, P. A. Torres-Carrasquillo, T. F. Quatieri, N. Malyska, and A. McCree, “Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis,” in Proceedings of Interspeech, 2011, pp. 2983–2986.