Track 4 – 14:00-17:30
Duration: 3 hours, 30min coffee break
Location: Conference 4+5
Presenters: Brigitte Bigi1, Daniel Hirst1, Dafydd Gibbon2
1) Laboratoire Parole et Langage, AixMarseille Université, CNRS, AixenProvence, France
2) Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Germany
Target Audience: The potential public for this tutorial will be linguists and phoneticians who wish to acquire a working knowledge of automatic annotation software for their research.
This tutorial will report on methodology for the manual and/or automatic annotation and analysis of a recorded speech corpus. We illustrate the steps to take in the perspective of obtaining rich and broadcoverage speech annotation and initial analysis of such a corpus. The levels of annotation contain information ranging from Phonetics to Prosody. The tutorial will attempt to cover the following areas:
a. Finding and evaluating appropriate software
Participants will be encouraged to actively seek out appropriate software for their research. Criteria for the evaluation of tools will be discussed including the following:
- prefer free and open source software: Even if you can personally afford to pay for a licence for software you may wish to share your methodology with other students or researchers who cannot afford to buy a licence.
- prefer multiplatform software: Different scientific communities tend to use Mac OS, Windows or Unix platforms. Multiplatform software makes sharing between such communities much easier.
- prefer usable software: If the software requires the help of an engineer each time you need to use it, this will be a serious limitation on your usage.
- investigate whether the software has been found to be reliable and is likely to improve the efficiency of workflow, and either accelerate your work or enable you to deal with more extensive data, or both.
b. Corpus development
Current technology gives linguists the means of confronting theories and models with large quantities of language data. In this section we review the availability of data for several typologically different languages and discuss the practical aspects of building a linguistic corpus from scratch. We will also present good practices for creating a timealigned annotations at the utterance level.
c. Aligning acoustic data with textual annotation
An essential requirement for a speech database is an alignment between textual annotation at different levels of representation and the acoustic signal, from speech sounds to discourse. In this section we will review current solutions for obtaining such alignment automatically or semiautomatically. In particular, we will present a demo of SPPAS [Bigi and Hirst 2012]. SPPAS is a free audio annotation tool that allows to create, visualize and search annotations for audio data. This software is issued under the terms of the GNU Public License. It is able to produce automatically speech segmentation annotations from a recorded speech sound and its transcription. SPPAS is compatible with Praat, Elan, Transcriber, and others; it runs on Windows, Macintosh and Unix platforms. So, we will show how SPPAS can be integrated in an annotation protocol which is using some of such tools. Here is the list of functionalities:
- Automatic Annotations: modelling melody, utterance level segmentation, text normalization, grapheme to phoneme conversion, phonetic segmentation, group phonemes into syllables, detect selfrepetitions, and otherrepetitions.
- Components: Manual orthographic transcription, play sounds and display information, estimates descriptives statistics on annotated files, manipulate annotated files, extract data from annotated files, display wav and annotated files (currently under development).
d. Modelling speech melody
The specific task of modelling speech prosody will be given special attention and software for modelling the domain of speech melody [Hirst and Auran 2005; Hirst 2007] will be demonstrated. In particular the participants will be given the opportunity to install Praat plugins which make it possible to execute the functions of Momel (f0 modelling), Intsint (f0 symbolic coding), as well as the more recent ProZed analysis by synthesis paradigm, directly from within Praat. The ProZed paradigm makes it possible to test prosodic models directly by deriving a synthetic output automatically from a symbolic representation of rhythm and melody, which can then be directly compared to the original utterance.
e. TGA – Time Group Analyzer
TGA, Time Group Analyzer [Gibbon 2013], is an online tool for descriptive statistical analysis of the timeline of annotated units in speech annotations, and for the automatic parsing of these units, usually syllable sequences, into Time Groups (TG), Time Groups are, in the simplest case, interpausal groups, but may also be detected based on deceleration models (consistent slowing down) or acceleration models (consistent speeding up). The TGA tool captures not only the global timing patterns from the input speech annotation, but also analyzes local patterns based on adjustable duration difference thresholds (minimal duration difference between adjacent syllables). One of the novelties of the TGA is the use of duration difference slope (representing acceleration and deceleration) and intercept linear regression values.
Brigitte Bigi(2012). The SPPAS participation to the ForcedAlignment task of Evalita 2011. B. Magnini et al. (Eds.): EVALITA 2012, LNAI 7689, pp. 312321. Springer, Heidelberg.
Bigi, Brigitte and Daniel Hirst (2012). SPeech Phonetization Alignment and Syllabification (SPPAS): a tool for the automatic analysis of speech prosody. Speech Prosody, Tongji University Press, ISBN 9787560848693, pages 1922, Shanghai (China).
Brigitte Bigi (2013). A phonetization approach for the forcedalignment task. 3rd LessResourced Languages workshop, 6th Language & Technology Conference, Poznan (Poland)
Brigitte Bigi (2014). A Multilingual Text Normalization Approach. Human Language Technologies Challenges for Computer Science and Linguistics. LNAI 8387, Springer, Heidelberg. ISBN: 9783319141206. Pages 515526.
Gibbon, Dafydd, Inge Mertins and Roger Moore, eds. (2000). Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation. Dordrecht: Kluwer Academic Publishers.
Gibbon, Dafydd (2013). TGA: a web tool for Time Group Analysis, in D.J. Hirst & B. Bigi (Eds.) Proceedings of the Tools and Resources for the Analysis of Speech Prosody (TRASP) Workshop, Aix en Provence, 2013. pp. 6669.
Hirst, Daniel and Di Cristo, Albert. (eds) (1998). Intonation Systems. A survey of Twenty. Languages. (Cambridge, Cambridge University Press). [ISBN 0 521 39513 S (Hardback); 0 52139550 X (Paperback)].
Hirst, Daniel (2006). Review of John Coleman 2005. Journal of the International Phonetic Association. 198200.
Hirst, Daniel (2007). A Praat plugin for Momel and INTSINT with improved algorithms for modeling and coding intonation. In Proceedings of the XVIth International Conference of Phonetic Sciences, (paper 1443), pp 12331236.Saarbrücken, August 2007.
Hirst, Daniel (2015). ProZed: A Speech Prosody Editor for Linguists, Using AnalysisbySynthesis. In Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis (pp. 317). Springer Berlin Heidelberg.
Klessa, Katarzyna and Dafydd Gibbon (2014). Annotation Pro + TGA: automation of speech timing analysis. Proceedings of LREC 2014, Reykjavik. Paris: ELDA.
Yu, Jue, Dafydd Gibbon and Katarzyna Klessa (2014). Computational annotationmining of syllable durations in speech varieties. Proceedings of 7th Speech Prosody Conference, 2023 May 2014. Dublin.