Banner Portal


Automatic labeling
Pitch range

How to Cite

Mertens P. Polytonia: a system for the automatic transcription of tonal aspects in speech corpora. J. of Speech Sci. [Internet]. 2021 Feb. 5 [cited 2023 Nov. 28];4(2):17-5. Available from:


This paper first proposes a labeling scheme for tonal aspects of speech and then describes an automatic annotation system using this transcription. This fine-grained transcription provides labels indicating pitch level and pitch movement of individual syllables. Of the five pitch levels, three (low, mid, high) are defined on the basis of pitch changes in the local context and two (bottom, top) are defined relative to the boundaries of the speaker’s global pitch range. For pitch movements, both simple and compound, the transcription indicates direction (rise, fall, level) and size, using size categories (pitch intervals) adjusted relative to the speaker’s pitch range. The automatic tonal annotation system combines several processing steps: segmentation into syllable peaks, pause detection, pitch stylization, pitch range estimation, classification of the intra-syllabic pitch contour, and pitch level assignment. It uses a dedicated and rule-based procedure, which unlike commonly used supervised learning techniques does not require a labeled corpus for training the model. The paper also includes a preliminary evaluation of the annotation system, for a reference corpus of nearly 14 minutes of spontaneous speech in French and Dutch, in order to quantify the annotation errors. The results, expressed in terms of standard measures of precision, recall, accuracy and Fmeasure are encouraging. For pitch levels low, mid and high an F-measure between 0.946 and 0.815 is obtained and for pitch movements a value between 0.708 and 1. Provided additional modules for the detection of prominence and prosodic boundaries, the resulting annotation may serve as an input for a phonological annotation.


Ananthakrishnan, S., & Narayanan, S. (2008). Automatic prosodic event detection using acoustic, lexical, and syntactic evidence. IEEE Trans. on Audio Speech and Language Proc., 16(1), 216- 228.

Alessandro, C. d’, & Mertens, P. (1995). Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language, 9(3), 257-288.

Bartkova, K., Delais-Roussarie, E., & Santiago-Vargas, F. (2012). ProsoTran: a tool to annotate prosodically non-standard data. Speech Prosody 2012.

Beckman, M.E., Hirschberg, J., & Shattuck-Hufnagel, S. (2005). The original ToBI system and the evolution of the ToBI framework. In Jun, S-A. (Ed.), Prosodic Typology (pp. 9-54). Oxford: Oxford University Press.

Boersma, P., & Weenink, D. (2012). Praat: doing phonetics by computer [Computer program]. Version 5.3.10, retrieved 12 March 2012 from

Braunschweiler, N. (2005). The Prosodizer - Automatic Prosodic Annotations of Speech Synthesis Databases. Proceedings Speech Prosody (Dresden).

Campione, E., Hirst, D., & Véronis, J. (2000). Automatic Stylisation and Modelling of French and Italian Intonation. In Botinis, A. (Ed.) Intonation: Analysis, Modelling and Technology (pp. 185-208). Dordrecht: Kluwer Academic Publishing.

Campione, E., & Véronis, J. (2001). Etiquetage prosodique semi-automatique des corpus oraux. Actes TALN, Tours, 2-5 juillet 2001. Crystal, D. (1969). Prosodic systems and intonation in English. Cambridge: Cambridge University Press.

De Looze, C. & Hirst, D.J. (2010). Integrating changes of register into automatic intonation analysis. Proceedings of the Speech Prosody 2010 Conference. Chicago. 4 pages.

Dilley, L., Breen, M., Gibson, E., Bolivar, M., & Kraemer, J. (2006). A comparison of inter-coder reliability for two systems of prosodic transcriptions: RaP (Rhythm and Pitch) and ToBI (Tones and Break Indices). Proceedings of the International Conference on Spoken Language Processing, Pittsburgh, PA.

Escudero, D., Aguilar, L., del Mar Vanrell, M., & Prieto, P. (2012). Analysis of inter-transcriber consistency in the Cat_ToBI prosodic labeling system. Speech Communication, 54, 566–582.

Geoffrois, E. (1995). Extraction robuste de paramètres prosodiques pour la reconnaissance de la parole. Ph.D. Université Paris XI Orsay, 20 décembre 1995.

Grabe, E., Post, B., & Nolan, F. (2000). Modelling intonational Variation in English. The IViE System. In Puppel, S., & Demenko, G. (Eds.). Proceedings of Prosody 2000. Adam Mickiewitz University, Poznan, Poland. (2-5 October, 2000, Krakow, Poland.)

Grice, Martine (2006) Intonation. In Brown, Keith (ed.) Encyclopedia of Language and Linguistics, 2nd Edition. Elsevier: Oxford, vol 5, pp. 778-788. Hart, J. ‘t (1998). Intonation in Dutch. In Hirst, D., & Di Cristo, A. (Eds), Intonation systems: a survey of twenty languages (pp. 96-111). Cambridge: Cambridge University Press.

Hart, J. 't, Collier, R., & Cohen, A. (1990). A perceptual study of intonation. Cambridge: Cambridge University Press. 227 pp. Hermes, D. (2006). Stylization of pitch contours. In Sudhoff, S. et al. (Eds.), Methods in Empirical Prosody Research (pp. 29-61). Berlin: Walter de Gruyter.

Hess, W. (1983). Pitch determination of speech signals. Algorithms and devices. Berlin: Springer. Hirst, D.J. (2005). Form and function in the representation of speech prosody. Speech Communication, 46, 334–347.

Hirst, D. J. (2011). The Analysis by Synthesis of Speech Melody: From Data to Models. Journal of Speech Science, 1(1), 55-83. Hirst, D. J., Nicolas, P.,& Espesser, R. (1991). Coding the F0 of a continuous text in French: An experimental approach. Proc. International Congress of Phonetic Sciences, Aix en Provence, France (1991), 234–237.

Hirst, D. J., & Di Cristo, A. (1998). A survey of intonation systems. In Hirst, D., & Di Cristo, A. (Eds.) Intonation Systems. A Survey of Twenty Languages (pp. 1-44.) Cambridge: Cambridge University Press.

Honorof, D. N., & Whalen, D. H. (2005). Perception of pitch location within a speaker's F0 range. Journal of the Acoustical Society of America, 117(41), 2193-2200.

House, D. (1990). Tonal Perception in Speech. Lund: Lund University Press. House, D. (1995). The influence of silence on perceiving the preceding tonal contour. Proc. Int. Congr. Phonetic Sciences 13, vol. 1, 122-125. (Stockholm 1995)

House, D. (1996). Differential perception of tonal contours through the syllable. Proceedings of International Conference of Spoken Language Processing, 2048–2051. (Oct. 3-6, 1996. Philadelphia, PA, USA)

Jeon, J. H., & Liu, Y. (2012). Automatic prosodic event detection using a novel labeling and selection method in co-training. Speech Communication, 54, 445-458

Jun, S.-A. (Ed.) (2005). Prosodic Typology. Oxford: Oxford University Press. Kochanski, G., Grabe, E., Coleman, J., & Rosner, B. (2005). Loudness predicts prominence: fundamental frequency lends little. Journal of the Acoustic Society of America, 118, 1038-1054.

Ladd, D. R. (1996) Intonational Phonology. Cambridge: Cambridge University Press.

Ladd, D. R. (2008) Intonational Phonology. Cambridge: Cambridge University Press. Second edition. Martin, Ph. (2009). L’intonation du français. Paris: Armand Colin. 256 pp.

Mertens, P. (1987a). L’intonation du français. De la description linguistique à la reconnaissance automatique. Unpublished Ph.D. (University of Leuven)

Mertens, P. (1987b). Automatic segmentation of speech into syllables. In Laver, J., & Jack, M.A. (Eds.) Proceedings of the European Conference on Speech Technology. Vol. II, 9-12. Edinburgh: CEP Consultants.

Mertens, P. (1989). Automatic recognition of intonation in French and Dutch. Eurospeech, 89, 1, 46- 50.

Mertens, P. (2004a). The Prosogram : Semi-Automatic Transcription of Prosody based on a Tonal Perception Model. In Bel, B.,& Marlien, I. (Eds.) Proceedings of Speech Prosody 2004, Nara (Japan), 23-26 March 2004.

Mertens, P. (2004b). Un outil pour la transcription de la prosodie dans les corpus oraux. Traitement Automatique des langues, 45 (2), 109-130.

Mertens, P., Beaugendre, F., & Alessandro, Ch. d’ (1997). Comparing approaches to pitch contour stylization for speech synthesis. In Santen, J.P.H. van, Sproat, R. W., Olive, J. P., & Hirschberg, J. (Eds.) Progress in Speech Synthesis (pp 347-363). New York: Springer Verlag.

Rosenberg, A. (2010). AuToBI - A Tool for Automatic ToBI Annotation. Proceedings Interspeech 2010.

Rossi, M. (1971). Le seuil de glissando ou seuil de perception des variations tonales pour la parole. Phonetica, 23, 1-33.

Rossi, M. (1978). Interactions of intensity glides and frequency glissandos. Language and Speech, 21, 384-396.

Rossi, M., Di Cristo, A., Hirst, D., Martin, Ph., & Nishinuma, Y. (1981) L’intonation. De l’acoustique à la sémantique. Paris: Klincksieck. 364 pp.

Silverman, K., Beckman, M., Pitrelli, M., Ostendorf, M., Wightman, C. , & Price, P. (1992). TOBI: a standard for labeling English prosody. Int. Conf. on Spoken Language Systems, 867-870.

Smalley, W. A. (1964). Manual of Articulatory Phonetics. New York: Practical Anthropology, 512 pp.

Tamburini, F., & Caini, C. (2005). An automatic system for detecting prosodic prominence in American English. International Journal of Speech Technology 8(1), 33-44.

Taylor, P. (2000). Analysis and synthesis of intonation using the Tilt model. Journal of the Acoustical Society of America, 107(3), 1697-1714.

Wagner, A. (2009). Analysis and recognition of accentual patterns. Proceedings Interspeech 2009 (6- 10 Sept., Brighton, UK).

Wightman, C. W., & Ostendorf, M. (1994). Automatic labeling of prosodic patterns. IEEE Trans Speech and Audio Processing, 2, 469-481.

Xu, Y. (2005). Speech melody as articulatorily implemented communicative functions. Speech Communication, 46, 220-251.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2014 Piet Mertens


Download data is not yet available.