Track 3 – 09:30-13:00
Duration: 3 hours, 30min coffee break
Location: Conference 4+5
Presenters: Pejman Mowlaee1
1) Signal Processing and Speech Communication Lab, Graz University of Technology, Austria
Target Audience: As the target audience, we expect two groups of researchers be interested in this tutorial: 1) new researchers starting with the field of phase-aware signal processing in different speech applications, and 2) the specialists of adjacent fields in speech processing e.g robust automatic speech/speaker recognition and speech coding, speech analysis/synthesis. This is justified from the shown interest raised by the participants and several contributions made to the special session at INTERSPEECH2014 .
In addition to the description provided here the organizers of this tutorial will provide an individual web page for more detailed information and discussion (link).
On the Importance of Phase Information
In the literature for speech signal processing, the spectral phase information has been less investigated compared to the spectral amplitude [4–6]. This less emphasis is mainly due to two reasons: i) the spectral amplitude is known for a higher contribution in human perception [7–16], speech intelligibility  and signal model [18,19], ii) the wrapping issue in phase makes it difficult to have a reliable estimator in particular from a noisy observation. In recent years, several publications showed positive improvement due to phase-based signal processing in different speech applications. Last year at INTERSPEECH 2014 we organized the special session “On Phase Importance in Speech Processing” , collecting contributions from researchers on investigating the inclusion of phase in different speech processing applications including: signal enhancement, speech analysis/synthesis, and speech/speaker recognition. The importance of the topic is high, justified by the special session visibility in terms of number of submissions (9 accepted papers out of 19 submissions) and participants at the special session in INTERSPEECH 2014, as well as attracting many positive feedbacks from the participants (45 participants now collected as a phase mailing list). Further, we are currently organizing a special issue for speech communication Elsevier entitled “Phase-Aware Signal Processing in Speech Communication”, attracting the active researchers in the field to submit their work on phase-based signal processing . Finally, we organized two show & tell sessions at INTERSPEECH 2013  2014  with webpages and video links available online [22, 23].
Overview on Phase Estimation Methods
In the following, we briefly explain each of the existing phase estimation methods proposed in the literature, categorized into the following eight groups, explained in the following.
1) GL-based Methods : From a chronological viewpoint,the problem of phase estimation dates back to 1980s where researchers focused on recovering a time-domain signal from spectral amplitude only information . Griffin and Lim proposed iterative solution to the problem. Several extensions for the iterative GL was proposed for source separation [25, 26], Wiener filter and sinusoidal model . An overview study on the iterative signal reconstruction methods and consistent Wiener filter approach is given in .
2) Model-based Phase Estimation : Model-based phase estimation was proposed in , where baseband transformation was applied followed by reconstructing STFT phase improvement (STFTPI) across time-frequency.
3) Phase Randomization : The idea was to randomize noisy phase and showed improved noise reduction in autofocusing application. The idea of phase randomization is to destructive the noise-dominated spectral bins where the noisy phase is quite destructive to be used for reconstruction of enhanced speech signal.
4) MismatchedWindow : The idea was to apply different windows at the analysis-synthesis stages, differing in terms of their dynamic range. Using a Hamming window at the analysis and a Chebyshev window at the synthesis stage, they reported improvement in the perceived quality.
5) Geometry-based : The ambiguity problem in phase estimation in single-channel source separation was first addressed in . Relying on the geometry imposed by the single-channel configuration, they tried to solve the ambiguity in the underlying two phase candidates via incorporating additional constraint on group delay deviation , and further by taking into account the time-derivative and frequency-derivative .
6) Unwrapping Phase Smoothing : The idea was to decompose the STFT phase to its harmonic counterparts to get an unwrapped harmonic phase. Temporal smoothing filters were then applied to reduce the variance of the noisy phase [30, 31]. The method provided joint improvement in both perceived quality and speech intelligibility for two scenarios: phase-only enhancement and when combined with amplitude enhancement.
7) Maximum a Posteriori (MAP) : Given the uniform distribution as phase prior and the independence of the spectral amplitude and phase, the noisy phase has been shown be as the MAP  and minimum mean square error estimator (MMSE)  of the clean spectral phase. In , a MAP estimator of harmonic phase was proposed assuming a non-uniform phase prior, captured by a von Mises distribution. The proposed MAP estimator outperformed benchmarks in the sense of reaching closer to the clean phase upper-bound.
8) Least Squares (LS) : We proposed the least squares solution as the optimal estimator for clean phase where no prior distributions were assumed for amplitude or phase. Having the ambiguity problem solved from  or , the method was shown to reach the Cramer Rao Lower Bound for phase estimation error variance , in Monte-Carlo simulations .
The organization of the proposed tutorial is as follow; The proposed tutorial is scheduled for a 3 hours session, where the participants are provided with three different aspects on “Phase Estimation” topic, itemized as below:
- Full literature review
- Matlab Toolbox “PhaseLab”
- Performance Evaluation
In the following, we explain the aim for each item in details.
1. Full Literature Review
We present a complete overview on the literature encompassing all the previous and existing methods for estimating the spectral phase information from a noisy observation (see the summary provided in Section 2.1). In order to present the potential and limits of the phase estimation solutions, we exemplify the effectiveness of the studied methods for phase-aware signal processing in single-channel speech enhancement application. A detailed presentation of each method is given where we mainly focus on the presentation of the methodologies and their pros and cons. The importance of a reliable phase estimation will be elaborated throughout a detailed analysis and performance evaluation between methods. Future directions on further investigations for phase-aware signal processing will be given.
2. Matlab Toolbox “PhaseLab”
The implementation for each method will be provided in a Matlab toolbox called “PhaseLab”. A detailed comparison of the existing methods will be provided for two scenarios: when used for signal reconstruction only (we call it “phase-only enhancement”), and when combined with an amplitude enhancement method1. The toolbox will be a unique reference for implementation of the phase estimation contributions proposed in the literature, reviewed in subsection 2.1. The so-obtained estimated phase will be integrated to a phase-aware amplitude enhancement  to demonstrate the effectiveness of the phase estimation methods studied in the tutorial session.
Although the toolbox is initially targeted for single-channel speech enhancement application, the centralized idea of estimating clean spectral phase information could be extended and used for other adjacent speech processing applications including: robust automatic speech recognition, speaker recognition, speech coding, speech synthesis, artificial bandwidth extension of audio signals. These further aspects will be highlighted within the introductory presentation in the tutorial session. We hope that the presentation together with the provided Matlab Toolbox brings a nice starting point for new researchers interested to proceed in the field of phase-aware signal processing for speech applications. The webpage  contains audio examples demonstrating the effectiveness of several recently proposed phase estimation methods. Video links presented as show and tell contributions can be found in [22, 23].
3. Performance Evaluation
We highlight the requirement of a reliable instrumental metric to predict the subjective listening results in phase-aware speech enhancement. Recently in , we reported the disagreement between the performance evaluations achieved via subjective listening tests versus those predicted by the existing instrumental metrics. This will be demonstrated through a counter example, where the quality metrics are improved by the enhanced signal suffers from artifacts and buzzyness. Our aims here are twofold:
1) to study the reliability of the existing instrumental metrics in reliable prediction of the improvement in quality and intelligibility brought by phase-aware processing of noisy speech signals , and 2) we propose new phase-aware candidates showing a higher correlation with the subjective listening results, performing more reliable in predicting the perceived quality of a phase-aware speech enhancement quality.
 INTERSPEECH 2015 – 16th Annual Conference of the International Speech Communication Association, September 6–10, Dresden, Germany, Proceedings, 2015.
 P. Mowlaee, R. Saeidi, and Y. Stylianou, “”interspeech 2014 special session on phase importance in speech processing”,” in Annual Conference of the International Speech Communication Association, 2014, pp. 1623–1627.
 ——. (2015) Special issue on phase-aware signal processing in speech communication [Online]. Available: http://si.eurasip.org/issues/46/phase-aware-signal-processingin-speech/.
 P. Vary and R. Martin, Digital Speech Transmission: Enhancement, Coding And Error Concealment. John Wiley & Sons, 2006.
 R. Mcaulay and T. Quatieri, “Phase modelling and its application to sinusoidal transform coding,” in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’86., vol. 11, Apr 1986, pp. 1713–1716.
 A. Oppenheim and J. Lim, “The importance of phase in signals,” Proceedings of the IEEE, vol. 69, no. 5, pp. 529–541, May 1981.
 T. Gerkmann, “Bayesian estimation of clean speech spectral coefficients given a priori knowledge of the phase,” IEEE Transactions on Signal Processing, vol. 62, no. 16, pp. 4199–4208, Aug 2014.
 M. Krawczyk and T. Gerkmann, “STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, pp. 1931–1940, Dec 2014.
 T. Gerkmann, M. Krawczyk-Becker, and J. Le Roux, “Phase processing for single-channel speech enhancement: History and recent advances,” IEEE Signal Processing Magazine, vol. 32, no. 2, pp. 55–66, March 2015.
 P. Mowlaee, R. Saiedi, and R. Martin, “Phase estimation for signal reconstruction in single-channel speech separation,” in Annual Conference of the International Speech Communication Association, 2012.
 K. K. Paliwal, K. K. Wojcicki, and B. J. Shannon, “The importance of phase in speech enhancement,” speech communication, vol. 53, no. 4, pp. 465–494, 2011.
 P. Mowlaee and R. Saeidi, “Iterative closed-loop phase-aware single-channel speech enhancement,” IEEE Signal Processing Letters, vol. 20, no. 12, pp. 1235–1239, Dec. 2013.
 T. Gerkmann and M. Krawczyk, “MMSE-optimal spectral amplitude estimation given the STFT-phase,” IEEE Signal Processing Letters, vol. 20, no. 2, pp. 129 –132, Feb 2013.
 P. Mowlaee and R. Martin, “On phase importance in parameter estimation for single-channel source separation,” in The International Workshop on Acoustic Signal Enhancement (IWAENC), 2012, pp. 1–4.
 P. Mowlaee and R. Saeidi, “On phase importance in parameter estimation in single-channel speech enhancement,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7462–7466.
 ——, “Time-frequency constraint for phase estimation in singlechannel speech enhancement,” The International Workshop on Acoustic Signal Enhancement, pp. 338–342, 2014.
 L. D. Alsteris and K. K. Paliwal, “Further intelligibility results from human listening tests using the short-time phase spectrum,” Speech Communication, vol. 48, no. 6, pp. 727 – 736, 2006.
 S. Guangji, M. Shanechi, and P. Aarabi, “On the importance of phase in human speech recognition,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 14, no. 5, pp. 1867 –1874, Sept. 2006.
 K. S. R.Murty and B. Yegnanarayana, “Combining evidence from residual phase and mfcc features for speaker recognition,” IEEE Signal Processing Letters, vol. 13, no. 1, pp. 52–55, 2006.
 P. Mowlaee, M. Watanabe, and R. Saeidi, “Show & tell: Phase-aware single-channel speech enhancement,” in 14th Annual Conference of the International Speech Communication Association, 2013.
 P. Mowlaee, R. Saeidi, and M. Watanabe, “Show & tell: Iterative refinement of amplitude and phase in single-channel speech enhancement,” 2014, pp. 2134–2135.
 P. Mowlaee. (2015) On phase estimation in single-channel speech enhancement and separation [Online]. Available: https://www.spsc.tugraz.at/PhaseLabSPSC.
 P. Mowlaee, R. Saeidi, and M. Watanabe. (2014) Show & tell: Iterative refinement of amplitude and phase in single-channel speech enhancement [Online]. Available: http://www2.spsc.tugraz.at/people/pmowlaee/Video.wmv.
 D. Griffin and J. Lim, “Signal estimation from modified short-time fourier transform,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 8, Apr. 1983, pp. 804–807.
 C. Chacon and P. Mowlaee, “Least squares phase estimation of mixed signals,” in Proceedings of the 15th International Conference on Spoken Language Processing, pp. 2705–2709, 2014.
 J. Le Roux and E. Vincent, “Consistent Wiener filtering for audio source separation,” IEEE signal processing letters, vol. 20, no. 3, pp. 217 – 220, 2013.
 P. Mowlaee and M. Watanabe, “Iterative sinusoidal-based partial phase reconstruction in single-channel source separation,” in 14th Annual Conference of the International Speech Communication Association, 2013, pp. 832–836.
 A. Sugiyama and R. Miyahara, “Phase randomization – a new paradigm for single-channel signal enhancement,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7487–7491.
 K. K. Wojcicki and K. K. Paliwal, “Importance of the dynamic range of an analysis windowfunction for phase-only and magnitude-only reconstruction of speech,” vol. 4, Apr. 2007, pp. 729–732.
 J. Kulmer and P. Mowlaee, “Phase estimation in single channel speech enhancement using phase decomposition,” IEEE Signal Processing Letters, vol. 22, no. 5, pp. 598–602, May. 2014.
 J. Kulmer, P. Mowlaee, and M. Watanabe, “A probabilistic approach for phase estimation in single-channel speech enhancement using von mises phase priors,” in IEEE Workshop on Machine Learning for Signal Processing, Sept. 2014.
 J. Kulmer and P.Mowlaee, “Harmonic phase estimation in single-channel speech enhancement using von mises distribution and prior snr,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.
 T. Lotter and P. Vary, “Speech enhancement by map spectral amplitude estimation using a super-gaussian speech model,” EURASIP J. Adv. Sig. Proc., vol. 2005, no. 7, pp. 1110–1126, 2005.
 Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 32, no. 6, pp. 1109–1121, Dec 1984.
 S.M. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Prentice Hall, 1993.
 R. C. Hendriks, T. Gerkmann, and J. Jensen, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement, ser. Synthesis Lectures on Speech and Audio Processing. Morgan & Claypool Publishers, 2013.
 P. Mowlaee. (2015) Special issue on phase-aware signal processing in speech communication [Online]. Available: http://www2.spsc.tugraz.at/people/pmowlaee/PhaseLab.html.
 A. Gaich and P. Mowlaee, “On speech quality estimation of phase-aware single-channel speech enhancement,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2015.
Tutorial Title: Dialog Models and Dialog Phenomena
Duration: 3 hours, 30min coffee break
Presenters: Nigel G. Ward1, Gabriel Skantze2
1) University of Texas at El Paso, USA
2) KTH, Sweden
Target Audience: All Interspeech participants wanting an overview of current models, techniques, trends, and issues in dialog modeling; and especially those wanting to use dialog knowledge to improve the performance of their systems.
Models of dialog have traditionally been designed mostly to support dialog systems, but dialog knowledge is becoming used more broadly. Newer applications are leading to new models, more accurate and more robust than traditional finite-state models.
This tutorial will cover “what every speech researcher should know about dialog”. Participants will learn
- formalisms and models for representing dialog knowledge
- ways in which dialog knowledge can be used, for a variety of applications
- ways to analyze dialog and ways to discover dialog knowledge
- tools, resources, opportunities and open issues
In relation to the conference theme of “Speech beyond Speech”. we will consider not only the lexical and prosodic aspects of dialog but also gesture, gaze, and motion.
A) Basic Notions in their Historical Contexts [45 minutes]
A1 Dialog Modeling for Telecommunications
- key notions: turns, talkspurts, envelope information
- key issues: voice activity detection, delay, audio-visual alignment, attention variation over time
A2 Engendering Rapport through Dialog
- key notions: contingency, turn-taking, action coordination, adaptation
- key issues: incremental processing, individual differences, modeling frameworks, training/ learning methods
A3 Dialog Behavior Analysis
- key tasks: dialog outcomes prediction, dominance detection, personality inference, speaker state detection, clinical diagnosis, training
- key issues: language differences, normal and abnormal variation, feature engineering
A4 Dialog Modeling for Information Retrieval
- key notions: dialog acts, dialog activities, dialog genres, vector-space models
B) Empirical Interlude: Small-Group Exercise on Dialog States [40 minutes]
Given the first 15 seconds of 3 dialogs of smalltalk:
- identify the states at the end of each turn
- identify what the state predicts about the upcoming behavior of each speaker
- identify similar states across the dialogs, name them, discuss how they relate, and how they differ
- now do the same for a sampling of 10 within-turn timepoints
- discuss findings with respect to applications already discussed, and with respect to dialog systems
Learning outcome: see how classic notions abstract from and oversimplify reality
C) Dialog Knowledge and Speech Recognition and Synthesis [15 minutes]
C1 Dialog Modeling for Speech Recognition (including of dialog recordings and in live interaction)
- language model conditioning based on prompt type, dialog act, expected dialog act, slot type, dialog state
C2 Dialog Modeling for Speech Synthesis
- improving quality by considering the dialog act, emotional state, local context
D) Empirical Interlude: Behavior Patterns [20 minutes]
working on the same dialogs as before
- identify one or two key behavior patterns
- attempt to represent them formally
E) Dialog Models, Dialog Management, and Dialog Systems [20 minutes]
The commercial state-of-the-art: VoiceXML 2.1.
- Key notions: states, transitions, error handling, timeouts
- Key issues: natural language understanding, backend integration, realistic and unrealistic expectations of user behavior
- Mini-exercise: author a small dialog model and test it with an untrained “user”
Active Research Directions:
- Towards more modular models of dialog
- Towards tighter semantic integration
- Towards automatic tuning to replace hand tuning
- Towards multimodal dialogs, and situated dialogs
F) Other Dialog Applications [10 minutes]
Tutorial systems, chat systems, speech-to-speech translation, summarization engineering of human-human dialogs, tools to support human dialog
G) Dialog Phenomena [15 minutes]
Topic structures, grounding, turn-taking, confidence and control, rhetorical structures, conversational routines, digressions
- Memory processes and attention, self-monitoring, error correction, evidentiality and information status
- Interpersonal dynamics, attitude, multiparty phenomena
H) Prospects [5 minutes]
 Jaime C. Acosta and Nigel G. Ward. Achieving rapport with turn-by-turn, user-responsive emotional coloring. Speech Communication, 53:1137–1148, 2011.
 Tasha K. Hollingsed and Nigel G. Ward. A combined method for discovering short-term affect-based response rules for spoken tutorial dialog. In Workshop on Speech and Language Technology in Education (SLaTE), 2007.
 Tatsuya Iwase and NigelWard. Pacing spoken directions to suit the listener. In International Conference on Spoken Language Processing, pages 1203–1206, 1998.
 M. Johansson, Gabriel Skantze, and J. Gustafson. Head pose patterns in multiparty human-robot team-building interactions. In Proceedings of the International Conference on Social Robotics – ICSR 2013, 2013.
 R. Meena, Gabriel Skantze, and J. Gustafson. A data-driven model for timing feedback in a map task dialogue system. In 14th Sigdial, pages 375–383, 2013.
 Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. Data-driven models for timing feedback responses in a Map Task dialogue system. Computer Speech & Language, 28:903–922, 2014.
 Sebastian Möller and NigelWard. A framework for model-based evaluation of spoken dialog systems. In Sigdial, 2008.
 Jani Patokallio and Nigel Ward. A wearable cross-language communication aid. In International Symposium on Wearable Computing, IEEE, pages 176–177, 2001.
 David Schlangen and Gabriel Skantze. A general, abstract model of incremental dialogue processing. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 710–718, 2009.
 Gabriel Skantze. Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Communication, 45:325–341, 2005.
 Gabriel Skantze. Galatea: A discourse modeller supporting concept-level error handling in spoken dialogue systems. In L. Dybkjaer andW. Minker, editors, Recent Trends in Discourse and Dialogue. Springer, 2008.
 Gabriel Skantze and S. Al Moubayed. IrisTK: a statechart-based toolkit for multi-party face-to- face interaction. In ICMI, 2012.
 Gabriel Skantze and A. Hjalmarsson. Towards incremental speech generation in conversational systems. Computer Speech and Language, 27:243–262, 2013.
 Gabriel Skantze, A. Hjalmarsson, and C. Oertel. Exploring the effects of gaze and pauses in situated human-robot interaction. In 14th Sigdial, 2013.
 Gabriel Skantze, Anana Hjalmarsson, and Catharine Oertel. Turn-taking, feedback and joint attention in situated human-robot interaction. Speech Communication, 2014.
 Gabriel Skantze, Catharine Oertel, and Anna Hjalmarsson. User feedback in human-robot interaction: Prosody, gaze and timing. Interspeech, 2013.
 Gabriel Skantze, Catharine Oertel, and Anna Hjalmarsson. User feedback in human-robot dialogue: Task progression and uncertainty. 2014.
 Gabriel Skantze and David Schlangen. Incremental dialogue processing in a micro-domain. In EACL, pages 745–753, 2009.
 Shunsuke Soeda and NigelWard. Design for a system able to use time-critical spoken advice. In Fifteenth National Conference of the Japanese Society for Artificial Intelligence, 2001.
 Wataru Tsukahara and Nigel Ward. Evaluating responsiveness in spoken dialog systems. In International Conference on Spoken Language Processing, pages III: 1097–1100, 2000.
 Nigel Ward. Responsiveness in dialog and priorities for language research. Systems and Cybernetics, 28(6):521–533, 1997.
 Nigel Ward. Non-lexical conversational sounds in American English. Pragmatics and Cognition, 14:113–184, 2006.
 Nigel Ward and Yaffa Al Bayyari. A case study in the identification of prosodic cues to turn-taking: Back-channeling in Arabic. In Interspeech 2006 Proceedings, 2006.
 Nigel Ward, Yaffa Al Bayyari, Rafael Escalante, and Thamar Solorio. Learning to show you’re listening. in preparation, 2006.
 Nigel Ward and Satoshi Nakagawa. Automatic user-adaptive speaking rate selection. International Journal of Speech Technology, 7:235–238, 2004.
 Nigel Ward and Wataru Tsukahara. Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics, 32:1177–1207, 2000.
 NigelWard andWataru Tsukahara. A study in responsiveness in spoken dialog. International Journal of Human-Computer Studies, 59:603–630, 2003.
 Nigel G. Ward and Yaffa Al Bayyari. American and Arab perceptions of an Arabic turntaking cue. Journal of Cross-Cultural Psychology, 41:270–275, 2010.
 Nigel G. Ward and David DeVault. Ten challenges in highly-interactive dialog systems. In AAAI Symposium on Turn-taking and Coordination in Human-Machine Interaction, 2015.
 Nigel G. Ward, Rafael Escalante, Yaffa Al Bayyari, and Thamar Solorio. Learning to show you’re listening. Computer Assisted Language Learning, 20:385–407, 2007.
 Nigel G. Ward and Rafael Escalante-Ruiz. Using subtle prosodic variation to acknowledge the user’s current state. In Interspeech, pages 2431–2434, 2009.
 Nigel G. Ward and S. Kumar Mamidipally. Factors affecting speaking-rate adaptation in task-oriented dialogs. In Speech Prosody, 2008.
 Nigel G. Ward, David G. Novick, and Alejandro Vega. Where in dialog space does uh-huh occur? In InterdisciplinaryWorkshop on Feedback Behaviors in Dialog, at Interspeech 2012, 2012.
 Nigel G.Ward and Karen A. Richart-Ruiz. Patterns of importance variation in spoken dialog. In 14th SigDial, 2013.
 Nigel G.Ward, Anais G. Rivera, KarenWard, and David G. Novick. Root causes of lost time and user stress in a simple dialog system. In Interspeech, 2005.
 Nigel G. Ward and Alejandro Vega. Towards the use of inferred cognitive states in language modeling. In 11th IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 323–326, 2009.
 Nigel G. Ward and Alejandro Vega. A bottom-up exploration of the dimensions of dialog state in spoken interaction. In 13th Annual SIGdial Meeting on Discourse and Dialogue, 2012.
 Nigel G. Ward, Alejandro Vega, and Timo Baumann. Prosodic and temporal features for language modeling for dialog. Speech Communication, 54:161–174, 2011.
 Nigel G.Ward, Alejandro Vega, and David G. Novick. Lexico-prosodic anomalies in dialog. In Speech Prosody, 2010.
 Nigel G.Ward and Steven D.Werner. Using dialog-activity similarity for spoken information retrieval. In Interspeech, 2013.
 Nigel G. Ward, Steven D. Werner, Fernando Garcia, and Emilio Sanchis. A prosody-based vector-space model of dialog activity for information retrieval. Speech Communication, 68:86–96, 2015.