Modelling automatic detection of prosodic boundaries for Brazilian Portuguese spontaneous speech
Keywords:Prosodic boundaries, Automatic detection, Spontaneous speech
Speech is segmented into intonational units marked by prosodic boundaries. This segmentation is claimed to have important consequences on syntax, information structure and cognition. This work aims both to investigate the phonetic-acoustic parameters that guide the production and perception of prosodic boundaries, and to develop models for automatic detection of prosodic boundaries in male monological spontaneous speech of Brazilian Portuguese. Two samples were segmented into intonational units by two groups of trained annotators. The boundaries perceived by the annotators were tagged as either terminal or non-terminal. A script was used to extract 111 phonetic-acoustic parameters along speech signal in a right and left windows around the boundary of each phonological word. The extracted parameters comprise measures of (1) Speech rate and rhythm; (2) Standardized segment duration; (3) Fundamental frequency; (4) Intensity; (5) Silent pause. The script considers as prosodic boundary positions at which at least 50% of the annotators indicated a boundary of the same type. A training of models composed by the parameters extracted by the script was developed; these models, were then improved heuristically. The models were developed from the two samples and from the whole data, both using non-balanced and balanced data. Linear Discriminant Analysis algorithm was adopted to produce the models. The models for terminal boundaries show a much higher performance than those for non-terminal ones. In this paper we: (i) show the methodological procedures; (ii) analyze the different models; (iii) discuss some strategies that could lead to an improvement of our results.
Albano E C, Moreira A A. Archisegment-based letter-to-phone conversion for concatenative speech synthesis in Portuguese. Proceedings of the ICSLP’96, October 3-6, v.3, 1996, 1708–1711.
Amir N, Silver-Varod V, Izre’el S. Characteristics of intonation unit boundaries in spontaneous spoken Hebrew – perception and acoustic correlates. Speech Prosody. Nara, 2004.
Avanzi M, Lacheret-Dujour A, Victorri B. ANALOR: A Tool for Semi-Automatic Annotation of French Prosodic Structure: ANALOR, Campinas, 2008, 119–122.
Barbosa P. Automatic Duration-Related Salience Detection in Brazilian Portuguese Read and Spontaneous Speech. Speech prosody international conference ISCA, Chicago, 2010.
Barbosa P. Caractérisation et génération automatique de la structuration rythmique du français, PhD thesis, Institut National Polytecnique de Grenoble, 1994.
Barbosa P. Incursões em torno do ritmo da fala. Campinas: Pontes, 2006.
Barbosa P. BreakDescriptor (2.0). Available with the author, 2019.
Barth-Weingarten D. Intonation Units Revised: Cesuras in talk-in-interaction. Philadelphia: John Benjamins Publishing Company, 2016.
Blaauw E. The contribution of prosodic boundary markers to the perceptual difference between read and spontaneous speech. Speech communication 14, Elsevier Science Publishers, 1994, 359–375.
Boersma P, Weenink D. Praat: doing phonetics by computer, 2015.
Bybee J. Language, Usage and Cognition. Cambridge: CUP, 2010.
Byrd D, Saltzman E. The elastic phrase: Modeling the dynamics of boundary adjacent lengthening. Journal of Phonetics 31, 2003, 149–180.
Chafe W. The Deployment of Consciousness in the production of a Narrative. In: Chae W. (ed.). The pear stories: Cognitive, cultural, and linguistic aspects of narrative production. Norwood: Ablex, 1980, 9–50.
Cooper W, Paccia Cooper J. Syntax and Speech. Cambridge: Harvard Universty Press, 1980.
Cresti E. Corpus di Italiano parlato. v. 1. Firenze: Accademia della Crusca, 2000.
Cresti E, Moneglia M. (Eds.). C-ORAL-ROM. Integrated Reference Corpus for Spoken Romance Languages. Amsterdam: John Benjamins, 2005.
Croft W. Intonation Units and Grammatical Structure. Linguistics 33 (5), 1995, 839–882.
Cruttenden A. Intonation. Cambridge: CUP, 1997.
Crystal D. Prosodic Systems and Intonation in English. Cambridge: CUP, 1969.
Du Bois J. Rhythm and Tunes: The notation Unit in the Structure of Dialogic Engagement. Conference Prosody and Interaction, University of Potsdam, 2008.
Du Bois J, Chafe W, Meyer Ch, Thompson S, Englebretson R, Martey N. Discourse Transcription. Santa Barbara Papers in Linguistics 4. Santa Barbara: Department of Linguistics, University of California, 1992.
Fleiss J. Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 1971, 378–382.
Halliday M. Speech and Situation. Londres: University College, 1965.
Heldner M. Spectral emphasis as an additional source of information in accent detection. Prosody 2001: ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding, 2001.
Izre'el S, Mello H, Panunzi A, Raso T. In search of a baisc unit of spoken language: Segmenting speech. In Izre’el S, Mello H, Panunzi A, Raso T (eds). In search of basic units of spoken language: A corpus-driven approach. Amsterdam: John Benjamins, forthcoming.
Krivokapić J. The planning, production and perception of prosodic structure, PhD thesis, University of Southern California, 2007.
Ladd R. Declination reset and the hierarchical organization of utterances. Journal of the Acoustical Society of America 84, 1988, 530–544.
Kelly J, Local J. On the Use of General Phonetic Techniques in Handling Conversational Material. In Roger D, Bull P. Conversation: An Interdisciplinary Perspective. Clevedon: Multilingual Matters, 1989.
Maschler Y. Metalanguage in Interaction: Hebrew Discourse Markers. Amsterdam: John Benjamins, 2009.
Mertens P, Simon A. Towards Automatic Detection of Prosodic Boundaries in Spoken French. In: Mertens P, Simon A. Proceedings of the Discourse-Prosody Interface Conference (IDP 2013), Leuven: University of Leuven, 2013, 81–87.
Mello H, Raso T, Mittmann M, Vale H, Côrtes P. Transcrição e segmentação prosódica do corpus C-ORAL-BRASIL: critérios de implementação e validação. In Raso T, Mello H. C-ORAL-BRASIL: corpus de referência do português brasileiro falado informal (I). Editora UFMG, 2012, 125–176.
Mo Y. Duration and intensity as perceptual cues for naïve listeners’ prominence and boundary perception. In Barbosa P, Madureira S, Reis C. Proceedings of Speech Prosody. Campinas: ISCA, 2008, 39–742.
Mo Y, Cole J, Lee E. Naïve listeners’ prominence and boundary perception. In Barbosa P, Madureira S, Reis C. Speech Prosody. Campinas: ISCA, 2008, 739–742.
Moneglia M, Fabbri M, Quazza S, Panizza A, Danieli M, Garrido J, Swerts M. Evaluation of Consensus on the Annotation of Terminal and Non-Terminal Prosodic Breaks in the C-ORAL-ROM corpus. In Cresti E, Moneglia M. (eds.). C-ORAL-ROM: Integrated Reference Corpora for Spoken Romance Languages. Amsterdam: John Benjamins, 2005, 257–276.
Ostendorf M, Price P, Shattuck-Hufnagel S. The Boston University Radio News Corpus, Boston University Technical Report, No. ECS-95-001, 1995.
Park J. Cognitive and interactional motivations for the intonation unit. Studies in Language 26(3), 2002, 637–680.
Pijper J, Sanderman A. On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues. Journal of the Acoustical Society of America 96, 1994, 2037–2047.
Price P, Ostendorf M, Shattuck-Hufnagel S, Fong C. The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America 90(6), 1991, 2956–2970.
Raso T, Mello H. C-ORAL-BRASIL: corpus de referência do português brasileiro falado informal (I). Editora UFMG, 2012.
Raso T, Mittmann M, Oliveira A. O papel da pausa na segmentação prosódica de corpora de fala. Revista de Estudos da Linguagem (23), 2015, 883–922.
Raso T, Mello H, Ferrari L. C-ORAL-BRASIL II: corpus de referência do português brasileiro falado formal. Forthcoming.
Reichel U, Mady K. Parameterization of F0 register and discontinuity to predict prosodic boundary strength in Hungarian spontaneous speech, Elektronische Sprachsignalverarbeitung ESSV 26, 2013, 223–230.
Selkirk E. Comments on Intonational Phrasing in English. In: Frota S, Vigário M, Freitas, M (eds.) Prosodies. Berlim: Mouton de Gruyter, 2005, 11–58.
Simon A, Christodoulides G. Perception of Prosodic Boundaries by Naïve Listeners in French. In Proc. of the 8th Speech Prosody Conference, Boston, USA, 2016
Szczepek Reed B. Prosody, Syntax and Action Formation: Intonation Phrases and Action Components. In Bergmann P et al. (eds.), Prosody and Embodiment in Interactional Grammar. Berlin: Mouton de Gruyter, 2012, 142–169.
Tabain M. Effects of prosodic boundary on /aC/ sequences: acoustic results. Journal of the Acoustical Society of America 113, 2003, 516–531.
Tabain M, Perrier P. Articulation and acoustics of /i/ in pre-boundary position in French. Journal of Phonetics 33, 2005, 77–100.
Teixeira B. Correlatos fonético-acústicos de fronteiras prosódicas na fala espontânea, Master Thesis, Federal University of Minas Gerais, 2018.
Traunmüller H, Eriksson A. Acoustic effects of variation in vocal effort bt men, women, and children. Journal of the Acoustical Society of America 107, 2000, 3438–3451.
Wei Q, Dunbrack Jr R L. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PloS one 8(7), e67863, 2013.
Wightman C, Shattuck-Hufnagel S, Ostendorf M, Price P. Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 91, 1992, 1707–1717.
How to Cite
Copyright (c) 2020 Tommaso Raso, Bárbara Teixeira, Plínio Barbosa
This work is licensed under a Creative Commons Attribution 4.0 International License.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.