Abstract
In this paper we present different resources for the study of spoken Brazilian Portuguese, developed within the C-ORAL-BRASIL project. The C-ORAL-BRASIL stemmed from the European C-ORAL-ROM project (Cresti & Moneglia, 2005), which has compiled spoken corpora of Italian, French, Spanish, and European Portuguese. The corpora of the C-ORAL family represent adequate tools for the analysis of spoken language, for they are provided not only with the transcripts of the recorded sessions (with prosodic breaks’ annotation), but also with their audio files and the text-to-speech alignment. So far, the C-ORAL-BRASIL project has published the C-ORAL-BRASIL I (Informal corpus: Raso & Mello, 2012), while the C-ORAL-BRASIL II (to be published by 2019) comprises a Formal corpus (Natural context), a Media corpus, and a Telephonic corpus. Besides these resources, a set of informationally tagged comparable minicorpora (representative samples of the aforementioned corpora) are already available or in preparation, enabling (cross-linguistic) studies focussed on information structure.
References
Austin J. L. 1962. How to do things with words. The William James. 1978.
Barbosa PA, Raso T. Spontaneous Speech Segmentation: Functional and Prosodic Aspects with Applications for Automatic Segmentation/A segmentação da fala espontânea: aspectos prosódicos, funcionais e aplicações para a tecnologia. Revista de Estudos da Linguagem. 2018 Oct 1;26(4):1361-96.
Bick E. The parsing system Palavras. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. 2000.
Boersma P, Weenink D. PRAAT: doing phonetics by computer (5.0. 21). From http://www. fon. hum. uva. nl/praat. 2016.
Bossaglia G, Raso T. The C-ORAL-BRASIL minicorpus of spoken Italian. Forthcoming.
Cavalcante FA, Ramos AC. The American English spontaneous speech minicorpus. Architecture and comparability. CHIMERA: Romance Corpora and Linguistic Studies. 2016;3(2):99-124.
Cresti E. Corpus di italiano parlato. 1. Introduzione. Presso L'Accad. della Crusca; 2000.
Cresti E, Moneglia M, editors. C-ORAL-ROM: integrated reference corpora for spoken Romance languages. John Benjamins Publishing; 2005 May 9.
Du Bois JW, Chafe WL, Meyer C, Thompson SA, Martey N. Santa Barbara Corpus of Spoken American English. CD-ROM. Philadelphia: Linguistic Data Consortium. 2000.
The C-ORAL-BRASIL project: varied resources for the study of spoken Brazilian Portuguese
Fleiss JL. Measuring nominal scale agreement among many raters. Psychological bulletin. 1971 Nov; 76(5):378.
Gobbo O. Marcadores discursivos como unidades informacionais prosodicamente marcadas. 2019 (MA dissertation, UFMG).
Hart JT, Collier R, Cohen A. A perceptual study of intonation: an experimental-phonetic approach to speech melody. Cambridge University Press; 2006 Nov 23.
Martin P. WinPitch. Pitch Instruments Inc. 2003.
Mello H, Methodological issues for spontaneous speech corpora compilation. The case of C-ORAL- BRASIL. Spoken Corpora and Linguistic Studies. 2014 Nov 14: 27-68.
Mello H, Raso T, Mittmann M, Vale H, Côrtes P. Transcrição e segmentação prosódica do corpus C-ORAL-BRASIL: critérios de implementação e validação. C-ORAL-BRASIL I: Corpus de referência do português brasileiro falado informal. Belo Horizonte: UFMG. 2012:125-76.
Moneglia M, Martin Ph. The C-ORAL-ROM resource. In C-ORAL-ROM: integrated reference corpora for spoken Romance languages. John Benjamins Publishing; 2005 May 9:1-70.
Moneglia M, Raso T. Notes on Language into Act Theory. Spoken corpora and linguistics studies. Amsterdam/New York, Benjamins. 2014:468-89.
Nicolas Martinez C, Lombán M. Mini-Corpus del español para DB-IPIC. CHIMERA. Romance Corpora and Linguistic Studies. In press.
Panunzi A, Gregori L. DB-IPIC. An XML database for the representation of information structure in spoken language. InPragmatics and Prosody 2011 (pp. 133-150). Firenze University Press.
Panunzi A, Mittmann MM. The IPIC resource and a cross-linguistic analysis of information structure in Italian and Brazilian Portuguese. Spoken Corpora and Linguistic Studies. 2014 Nov 14:189- 227.
Raso, T. O corpus C-ORAL-BRASIL. In Raso, T, Mello, H editors. C-ORAL-BRASIL: corpus de referência do português brasileiro falado informal. I. pp. 55-90. Editora UFMG; 2012.
Raso T, Mello H, editors. C-ORAL-BRASIL: corpus de referência do português brasileiro falado informal. I. Editora UFMG; 2012.
Raso T, Mello H. The C-ORAL-BRASIL I: reference corpus for Informal spoken Brazilian Portuguese. In International Conference on Computational Processing of the Portuguese Language 2012 Apr 17 (pp. 362-367). Springer, Berlin, Heidelberg.
Raso T, Vieira MA. A description of Dialogic Units/Discourse Markers in spontaneous speech corpora based on phonetic parameters. CHIMERA: Romance Corpora and Linguistic Studies. 2016;3(2):221-49.
Rocha B, Mello H, Raso T. Para a compilação do C-ORAL-ANGOLA: um corpus de fala espontânea informal do português angolano. Filologia E Linguística Portuguesa. 2018 Dec 30;20(Especial):139-57.
Santos SM, Raso T, Manual validation of transcription criteria of the C-Oral-Brazil II language resource: assessed criteria, methodology, and results. Forthcoming.
Schiel F, The validation of speech corpora. 2004.
Teixeira B, Barbosa P, Raso T. Automatic Detection of Prosodic Boundaries in Brazilian Portuguese Spontaneous Speech. In International Conference on Computational Processing of the Portuguese Language 2018 Sep 24 (pp. 429-437). Springer, Cham.
van den Heuvel H, Iskra D, Sanders E, de Vriend F. Validation of spoken language resources: an overview of basic aspects. Language Resources and Evaluation. 2008 Mar 1;42(1):41-73.
Vieira MA, Raso T, Oliveira E. Métodos automáticos de avaliação da qualidade acústica. Forthcoming.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2019 Giulia Bossaglia, Lúcia De Almeida Ferrari