ISCA Medalist: The emergence of compositional structure in language evolution and development
Monday, September 7, 2015, 09:30–10:30, Large Hall
Mary E. Beckman
Spoken language has a complex multi-dimensional compositional structure that enables the rich productivity so characteristic of this most important human biosignal. Even the utterance of a single vowel sound, such as [o] or [i], requires the talker to coordinate gestures of the lip and tongue with gestures of the respiratory-laryngeal system, to combine the timbre properties of the vowel with a specific voice quality and a specific melodic pattern. The latter can be chosen from some set of morphemes that differentiate utterances, as in the English sentences Oh? versus Oh!, or from some set of tonemes that differentiate words, as in the Mandarin Chinese words yī ‘one’(一) versus yĭ ‘ant’ (蟻). In either case, the utterance will have a kind of “simultaneous compositionality” that can be found also in the vocal communication systems of many other primates. Its importance for the human biosignal is evident in the fact that human infants begin to develop the capacity for it as soon as the larynx disengages from the nasopharynx at about 2 months. Of course, most of the words of all spoken languages have an even richer internal compositional structure that depends on the coordination of gestural complexes for contrasting timbre properties, to create a series of alternating consonant and vowel sounds, as in the “canonical” CV syllable type that infants with normal hearing begin to produce between 6 and 8 months of age, or the even more complex rhythmic structures that are used to make words in many languages. The development of writing systems was the first technical innovation for modeling this property of “serial compositionality” that is shared by all human languages, as well as by the vocal communication systems of some other primates (such as those of many species of gibbon). The history of the Interspeech conference series is closely intertwined with many other important technical developments that have greatly increased our ability to model both types of compositionality. Now is a fruitful time to apply these more recent technical developments in a concerted way to achieve a better understanding of the emergence of compositionality in phylogeny and ontogeny.
Biography: Mary E. Beckman is a Humanities Distinguished Professor of Linguistics at the Ohio State University, where she has supervised 25 doctoral dissertations in a broad range of topics in phonetics and related areas of speech science and linguistics. Her own doctoral research was a cross-linguistic comparison of prosody and intonation, and this was the focus of much of her earliest postdoctoral research. Her research in the last 20 years has focused more on first language phonetic acquisition and recently she has begun to explore the relationship between phonetic development across the life span and diachronic sound change. She is a Fellow of the Acoustical Society of America and of the Linguistic Society of America, as well as a 2014 recipient of the Anneliese Maier Forschungspreis from the Alexander von Humboldt Foundation.
The technology powering personal digital assistants
Tuesday, September 8, 2015, 11:30–12:30, Large Hall
We have long envisioned that one day computers will understand natural language and anticipate what we need and when we need it to proactively complete tasks on our behalf. As computers get smaller and more pervasive, how humans interact with them is becoming a crucial issue. Despite numerous attempts over the past 40 years to make language understanding an effective and robust natural user interface for computer interaction, success was limited and scoped to applications that are not particularly central to everyday use. However, advances in speech recognition and machine learning, coupled with the emergence of structured data served by content providers and increased computational power have broadened the application of natural language understanding to a wide spectrum of everyday tasks that are central to the user’s productivity. We believe that as computers become smaller and more ubiquitous (e.g. wearable computers) and as the number of applications increases, both system-initiated and user initiated task completion across various applications and services will become indispensable for personal life management and work productivity. There has been already a tremendous investment in the industry (particularly Microsoft, Google, Apple, Amazon and Nuance) around digital personal assistants during the last couple of years. Each of the major companies in the speech and language technology space has a version of their personal assistants (Cortana, Google Now, Siri, Echo and Dragon, respectively) deployed in production. Yet there is not much talked about these technologies and products in any of the speech and language technology conferences. In this talk, we give an overview of personal digital assistants, describe the system design, architecture and the key components behind them. We will highlight challenges and describe best practices related to the bringing personal assistants from laboratories to the real-world and discuss their potential to fully redefine the human-computer interaction moving forward.
Dr. Ruhi Sarikaya is a principal science manager of the language understanding and dialog systems group at Microsoft. His group has been building language understanding and dialog management capabilities of both Cortana and Xbox One. Before Microsoft, he was a research staff member and team lead in the HLT Group at IBM T.J. Watson Research Center for ten years. Prior to joining IBM in 2001, he was a researcher at the Center for Spoken Language Research at the University of Colorado at Boulder for two years. He received his Ph.D. degree from Duke University, NC in 2001 in electrical and computer engineering. He has published 90 technical papers and is inventor of 40 issued/pending patents. He has received a number of prestigious awards for his work, including two Outstanding Technical Achievement Awards (2005 and 2008) and two Research Division Awards (2005 and 2007) and a best paper award (ASRU-2013). Dr. Sarikaya is currently serving in the IEEE SLTC. He served as the general co-chair of IEEE SLT’12, publicity chair of IEEE ASRU’05, associate editors of IEEE Trans. on Audio Speech and Language Processing and IEEE Signal Processing Letters. He gave a tutorial on “Processing Morphologically Rich Languages” at Interspeech’07 and was also the lead guest editor of the special issue on “Processing Morphologically-Rich Languages” for IEEE Trans. on Audio Speech & Language Processing.
The HBP-Atlas – concept, perspectives and application for language and speech research
Wednesday, September 9, 2015, 11:30–12:30, Large Hall
Studying the human brain remains one of the greatest scientific challenges. A comprehensive understanding of the structural and functional organization of the brain is not only of great importance for basic science, but also for the development of new approaches that improve diagnosis and the treatment of neurological and psychiatric diseases. With this mindset, the Human Brain Project (HBP) started its work in October 2013 with the aim of creating a European ICT infrastructure for neuroscience. The immense complexity of the brain, with its approximately 86 billion nerve cells, makes it essential to include modeling and simulation approaches, combined with methods of high performance computing (HPC), in order to analyze the organizational principles of the brain.
One of the central elements oft he HBP is the Human Brain Atlas. It includes data from different aspects of brain organization, e.g., cytoarchitectonics, fibre architecture, molecular architecture and results from fMRI studies revealing the functional segregation of the brain. Such multi-level atlas allows analyzing the neural underpinnings of language processes with unprecedented detail, and studying structural-functional relationships at the level of cortical areas.
Conversely, the understanding of neural mechanisms might inspire new advancements for HPC. Those insights into the brain provide simulation, and give computer scientists the opportunity to develop a new generation of computers and software that are inspired by the functional principles of the brain. HPC opens up new avenues for neuroscientists to develop virtual brain models, such as the BigBrain model, which connects the macroscopic with the microscopic organization level for the first time in a reference system. In such models, data from the genetic, molecular, and cellular levels up to cognitive systems could be combined together for a subsequent analysis at different scales.
Biography: Katrin Amunts did postdoctoral work in the C. & O. Vogt Institute for Brain Research at Duesseldorf University, Germany. In 1999, she moved to the Research Centre Juelich and set up a new research unit for Brain Mapping. In 2004, she became professor for Structural-Functional Brain Mapping at RWTH Aachen University, and in 2008 a full professor at the Department of Psychiatry, Psychotherapy and Psychosomatics at the RWTH Aachen University as well as director of the Institute of Neuroscience and Medicine (INM-1) at the Research Centre Juelich. In 2013, she became a full professor for Brain Research at the Heinrich-Heine University Duesseldorf, director of the C. and O. Vogt Institute for Brain Research, Heinrich-Heine University Duesseldorf and director of the Institute of Neuroscience and Medicine (INM-1), Research Centre Juelich.
Since 2007 Katrin Amunts is a member of the editorial board of Brain Structure and Function. Since 2012 she is member of the German Ethics Council. She is the programme speaker for the programme “Decoding the Human Brain” of the Helmholtz Association, Germany. Since 2013 Katrin Amunts is leading the Subproject 2 “Strategic Human Brain Data” and a member of the Board of Directors of the European FET-Flagship “The Human Brain Project”.
Voices of power, passion, and personality
Thursday, September 10, 2015, 11:30-12:30, Large Hall
Klaus R. Scherer
Many species of animals use vocal communication in mating rituals, warning conspecifics, conveying location of food sources, and social learning. Not surprisingly, the human species has perfected this system of communication by developing first spoken and then written language. I will argue that the expression of emotion has been an important motor for this evolutionary advancement. In many species we find multimodal “affect bursts” which communicate reactions to environmental events and behavioral intentions to conspecifics through synchronized vocal, facial and bodily expression. It becomes increasingly plausible that both speech and music evolved from such affect bursts. In this talk, I will highlight the major strengths of vocal communication, especially voice quality, as compared to facial expression. While the face is a relatively discrete signaling system for specific reactions and messages, in large part restricted to human communication, the voice is a phylogenetically old and continuous carrier of information about the vocalizer’s physique, enduring dispositions, strategic intention and current emotional state. The dynamic nature of voice delivery, including changes in voice quality, rhythm, intonation, and timing, is a major asset in communicating the unfolding of emotional reactions continuously in real time, allowing for instantaneous adaptation. In addition to theoretical considerations, including the suggestion of a path model for vocal communication, I will present recent empirical research from our laboratory, for both the speaking and the singing voice. Specifically, the signaling of speaker, power, passion and personality will be addressed. In addition, a variety of potential applications in different domains will be discussed.
Biography: Klaus R. Scherer is an emeritus professor at the University of Geneva and an honorary professor at the Ludwig-Maximilians University Munich. In 2005 he founded the Swiss Center for Affective Sciences at the University of Geneva, a Swiss National Center of Competence in Research, which he has directed until 2013. Scherer obtained a Ph.D. from Harvard University in 1970 with a thesis on voice and personality. After teaching at the University of Pennsylvania, Philadelphia, and the Universities of Kiel and Giessen, Germany, he was appointed, in 1984 to the chair in emotion psychology at the University of Geneva. Scherer’s research activities focus on different aspects of emotion and other affective states, in particular emotional expression in the voice and induction of emotion by music. Scherer reported this work in numerous publications in the form of monographs, contributed chapters, and papers in international journals. He co-edits the “Affective Science Series” for Oxford University Press and was the founding co-editor of the journal Emotion. He is a member of the Academia Europea and of the American Academy of Arts and Sciences. He has been awarded honorary doctorates by the University of Bologna and the University of Bonn. He held an ERC Advanced Grant and participated in several European Networks of Excellence.