Abstract
This paper describes the application of the analysis by synthesis paradigm to the melody of speech. A
complete chain of processes is described from the acoustic analysis of fundamental frequency (f0), via the
phonetic modelling of f0 using the Momel algorithm, to the surface phonological representation of the
curves using the INTSINT alphabet. Each step of the chain is designed as a reversible process which can be
used to generate an acoustic output allowing an objective evaluation of the analysis. Finally, the current
implementation of ProZed, a prosody editor for linguists, is described. It is argued that an explicit set of
modelling tools like this will allow linguists to test different models of phonological structure which, it is
hoped, will result in the availability of more and better data on a wide variety of languages.
References
Ali S, Hirst D. Developing an automatic functional annotation system for british english intonation. In Proceedings of Interspeech X. Annual Conference of the International Speech Communication Association. Brighton, 2009.
Auran C, Bouzon C, Hirst D. The Aix-MARSEC Project: An Evolutive Database of Spoken British
English. Speech Prosody 2004, International Conference, March 23-26 2004, Nara., 2004.
Boersma P, Weenink D. Praat: doing phonetics by computer [computer program]. 2011.
Campione E. Etiquetage semi-automatique de la prosodie dans les corpus oraux - algorithmes et
méthodologies. Ph.D. thesis, Université de Provence, 2001.
Chentir A, Guerti M, Hirst D. Extraction of standard arabic micromelody. Journal of Computer
Science, 5(2):86--89, 2009.
Cho H, Rauzy S. Phonetic pitch movements of accentual phrases in korean read speech. In
Proceedings of the 4th International Conference on Speech Prosody. Campinas Brasil., 2008.
De Looze C. Analyse et interprétation de l'empan temporel des variations prosodiques en français
et en Anglais. Ph.D. thesis, Université de Provence, Aix-en-Provence, France, 2010.
Fujisaki H. Modeling the generation process of F0 contours as manifestation of linguistic and paralinguistic information. In Proceedings of the XIIth International Congress of Phonetic Sciences,
pages 1--10. 1991.
Gårding E. Intonation in swedish. In D Hirst, A Di Cristo (editors), Intonation Systems. A Survey of
Twenty Languages., chapter 6, pages 117--136. Cambridge: Cambridge University Press, 1998.
Goldsmith JA. Autosegmental and metrical phonology. Cambridge, Mass.: B. Blackwell, 1990.
Hart ('t ) J, Collier R, Cohen A. A perceptual study of intonation: an experimental-phonetic
approach to speech melody. Cambridge University Press, 1990.
Hirst D. La représentation linguistique des systèmes prosodiques : une approche cognitive. Thèse
de Doctorat d'Etat (Habilitation Thesis), Université de Provence, 1987.
Hirst D. Intonation in British English. In D Hirst, A Di Cristo (editors), Intonation Systems.
A Survey of Twenty Languages., chapter 3, pages 56--77. Cambridge: Cambridge University
Press, 1998.
Hirst D. The symbolic coding of segmental duration and tonal alignment: an extension to the
intsint system. Sixth European Conference on Speech Communication and Technology, 1999.
Hirst D. Form and function in the representation of speech prosody. Speech Communication,
(3-4):334--347, 2005.
Hirst D. A Praat plugin for Momel and INTSINT with improved algorithms for modelling and
coding intonation. In Proceedings of the XVIth International Conference of Phonetic Sciences,
pages 1233--1236. Saarbrucken, 2007.
Hirst D, Auran C. Analysis by synthesis of speech prosody: the prozed environment. In Proceedings of Interspeech 2005. (Lisbon), pages 3225--3228. 2005.
Hirst D, Bouzon C, Auran C. Analysis by synthesis of British English speech rhythm: from data to
models. In G Fant, F Hiroya, S Jiaxuan (editors), Frontiers in Phonetics and Speech Science. A
Festschrift for Professor Wu Zongji's 100th Birthday., pages 251--262. Beijing, Peoples Republic
of China: Commercial Press, 2009.
Hirst D, Cho H, Kim S, Yu H. Evaluating two versions of the momel pitch modeling algorithm on a
corpus of read speech in korean. In Proceedings of Interspeech, volume VIII, pages 1649--1652.
Antwerp, Belgium, 2007.
Hirst D, Di Cristo A. Intonation Systems: A Survey of Twenty Languages. Cambridge University
Press, 487 p., 1998a.
Hirst D, Di Cristo A. A survey of intonation systems. In D Hirst, A Di Cristo (editors), Intonation
Systems: A Survey of Twenty Languages, chapter 1, pages 1--44. Cambridge University Press,
b.
Hirst D, Di Cristo A, Espesser R. Levels of representation and levels of analysis for the description
of intonation systems. In M Horne (editor), Prosody: Theory and Experiment. Studies Presented
to Gösta Bruce., pages 51--87. Kluwer Academic Pub, 2000.
Hirst D, Espesser R. Automatic modelling of fundamental frequency using a quadratic spline
function. Travaux de l'Institut de Phonétique d'Aix, 15:75--85, 1993. URL http://www.
isca-speech.org/archive/eurospeech_1989/e89_1480.html.
Iivonen A. Intonation in Finnish. In D Hirst, A Di Cristo (editors), Intonation Systems. A Survey
of Twenty Languages, chapter 17, pages 331--347. Cambridge University Press, 1998.
Maghbouleh A. Tobi accent type recognition. In Proceedings of ICSLP., Paper 0632. 1998.
Mixdorff HJ. A novel approach to the fully automated extraction of fujisaki model parameters.
In Proceedings of ICASSP 1999. 1999.
Prom-on S, Xu Y, Thipakorn B. Modeling tone and intonation in mandarin and english as a process
of target approximation. Journal of the Acoustical Society of America, 125(1):405--424, 2009.
Rissanen J. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.
Rosenberg A. AuToBI -- a tool for automatic ToBI annotation. In Proceedings of the International
Conference on Spoken Language Processing. 2010.
Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J,
Hirschberg J. TOBI: A Standard for Labeling English Prosody. In Second International Conference on Spoken Language Processing, pages 867--870. Banff. Canada.: ISCA, 1992.
Taylor P. The rise/fall/connection model of intonation. Speech Communication, 15(1-2):169--
, 1994.
Trubetzkoy. Grundzüge der Phonologie. (French translation by J. Cantineau 1957) Principes de
phonologie. Paris: Klincksieck, 1949.
Vainio M, Hirst D, Suni A, De Looze C. Using functional annotation for high quality multilingual,
multidialectal and multistyle speech synthesis. In Proceedings SPECOM, 13th International
Conference on Speech and Computer. St Petersburg, Russia, 2009.
Véronis J, Hirst D, Ide N. NL and speech in the MULTEXT project. In Proceedings of AAAI
Workshop on Integration of Natural Language and Speech, pages 72--78. Seattle, USA, 1994.
Wightman C, Campbell N. Improved labeling of prosodic structure. In IEEE Trancactions on
Speech and Audio Processing. 1995.
Xu Y. Speech prosody: a methodological review. Journal of Speech Sciences, 1(1):85--115, 2011.
Xu Y, Sun X. Maximum speed of pitch change and how it may relate to speech. Journal of the
Acoustical Society of America, 111:1399--1413, 2002.
Zhi N. The music of Beijing Chinese speech. On the interactions of tones and intonations in read
and spontaneous Beijing speech. Ph.D. thesis, Scuola Normale da Pisa, in progress.
Zhi N, Hirst D, Bertinetto PM. Automatic analysis of the intonation of a tone language. applying
the momel algorithm to spontaneous standard chinese (beijing). In Proceedings of Interspeech
XI. Makuhari, Japan, 2010.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2011 Daniel Hirst