Track 4 – 09:30-13:00
Duration: 3 hours, 30min coffee break
Location: Conference 6
Presenters: Nigel G. Ward1, Gabriel Skantze2
1) University of Texas at El Paso, USA
2) KTH, Sweden
Target Audience: Interspeech participants wanting an overview of current models, techniques, trends, and issues in dialog modeling; and especially those wanting to use dialog knowledge to improve the performance of their systems.
Models of dialog have traditionally been designed mostly to support dialog systems, but dialog knowledge is becoming used more broadly. Newer applications are leading to new models, more accurate and more robust than traditional finite-state models.
This tutorial will cover “what every speech researcher should know about dialog”. Participants will learn
- models for representing and applying dialog knowledge
- ways dialog knowledge is used in a wide variety of applications
- tools, resources, opportunities and open issues
In relation to the conference theme of “Speech beyond Speech”, we will consider not only the lexical and prosodic aspects of dialog but also gesture, gaze, action,
Participants are encouraged to bring a laptop if convenient, as one
of the exercises will involve small-group analysis of provided audio
A) Basic Notions in their Historical Contexts [45 minutes]
A1 Dialog Modeling for Telecommunications
- concepts: turns, talkspurts, envelope information
- issues: voice activity detection, effects of delay, turn-taking signals, talk and near-talk, joint modeling versus coupled models
A2 Engendering Rapport through Dialog
- concepts: contingency, adaptation
- issues: incremental processing, multifunctional utterances, action selection and coordination, integrating authored and learned behaviors
A3 Dialog Modeling for Information Retrieval
- concepts: dialog activities, dialog genres, continuous vector-space modeling
A4 Information Delivery
- concepts: human cognitive capacity, attention variation over time, pacing, feedback
B) Philosophical Interlude [10 minutes]
- why is dialog necessary? the limited bandwidth of speech, the ephemerality of
sound, uncertainty and vagueness, grounding issues, human working-memory size
limitations, meta-communication, interpersonal relations
- when is dialog not necessary? voice-command systems, mobile personal assistants
- roles of dialog models: mediating
signal-to-interpretation mappings, supporting predictions, enabling action selection, coordinating behavior streams
C) Traditional Models of Dialog [10 minutes]
- concepts: states, turns, dialog acts; lexical and prosodic features
- issues: dilemmas in discretizing time, state and acts
- alternate conceptions: plan-based, rhetorical-structure, and information-state models
D) Empirical Interlude [15 minutes]
Given fifteen seconds each from three casual dialogs:
- identify the state at the end of each turn
- identify what the state predicts about the upcoming behavior of each speaker
- identify similar states across the dialogs, name them, discuss similarities and differences
- now do the same for a sampling of ten within-turn timepoints
- discuss with respect to previously-examined applications, models and issues
E) Dialog Systems: Basics [10 minutes]
Illustrations with IrisTK and VoiceXML 2.1
- basic functions: call flow, choices, input fields, forms
- other functionality: error handling, turn-taking, timeouts, universals, confidence
- issues: separating dialog management from domain knowledge and task knowledge, separating interface and backend, initiative
F) Dialog and Speech Recognition, Understanding and Synthesis [15 minutes]
F1 Speech Recognition
- language model conditioning: on prompt, previous dialog act, expected dialog act, slot type, dialog state, topic
- speech recognition for dialog systems: authored and learned grammars
F2 Language Understanding in Dialog
- issues: ambiguity and ellipsis; noise, fillers and disfluencies
F3 Speech Synthesis for Dialog Applications
- conditioning on: dialog acts, emotional state, local context, timing
- output-timing monitoring and uptake monitoring
G) Exercise: Dialog Design [15 minutes]
- Author a small dialog model
- Test it with an untrained “user”
- issues: representational convenience versus power, realistic and unrealistic expectations of user behavior, persona design
H) Dialog Systems: The Research Forefront [40 minutes]
- genres: autonomous robots, collaborative agents, tutorial systems
- issues: joint action, multimodal perception, multimodal
synthesis, situated dialog, incremental processing, tighter semantic
integration, multiparty interaction, integrating reactive and
I) Opportunities and Challenges [20 minutes]
- other applications: predicting dialog outcomes, role inference,
personality detection, speaker state detection, analytics, clinical
- unsupervised learning of: policies, dialog acts, dialog moves, strategies, structure, turn-taking
- other challenges: modeling interaction-styles variation, composable behaviors
- resources for research: organizations, software, shared tasks
 Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. Data-driven models for timing feedback responses in a Map Task dialogue system. Computer Speech & Language, 28:903–922, 2014.
 David Schlangen and Gabriel Skantze. A general, abstract model of incremental dialogue processing. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 710–718, 2009.
 Gabriel Skantze. Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Communication, 45:325–341, 2005.
 Gabriel Skantze and S. Al Moubayed. IrisTK: a statechart-based toolkit for multi-party face-to- face interaction. In ICMI, 2012.
 Gabriel Skantze and A. Hjalmarsson. Towards incremental speech generation in conversational systems. Computer Speech and Language, 27:243–262, 2013.
 Gabriel Skantze, Catharine Oertel, and Anna Hjalmarsson. User feedback in human-robot dialogue: Task progression and uncertainty. HRI Workshop on Timing in Human-Robot Interaction. 2014.
 Nigel G. Ward and David DeVault. Ten challenges in highly-interactive dialog systems. In AAAI Symposium on Turn-taking and Coordination in Human-Machine Interaction, 2015.
 Nigel G.Ward and Karen A. Richart-Ruiz. Patterns of importance variation in spoken dialog. In 14th SigDial, 2013.
 Nigel G.Ward, Anais G. Rivera, Karen Ward, and David G. Novick. Root causes of lost time and user stress in a simple dialog system. In Interspeech, 2005.
 Nigel G. Ward and Alejandro Vega. A bottom-up exploration of the dimensions of dialog state in spoken interaction. In 13th Annual SIGdial Meeting on Discourse and Dialogue, 2012.
 Nigel G. Ward, Alejandro Vega, and Timo Baumann. Prosodic and temporal features for language modeling for dialog. Speech Communication, 54:161–174, 2011.
 Nigel G. Ward, Steven D. Werner, Fernando Garcia, and Emilio Sanchis. A prosody-based vector-space model of dialog activity for information retrieval. Speech Communication, 68:86–96, 2015.