The analysis by synthesis of speech melody: from data to models

Daniel Hirst

doi:10.20396/joss.v1i1.15011

Vol. 1 No. 1 (2011), Reviews

Vol. 1 No. 1 (2011)

The analysis by synthesis of speech melody: from data to models

Reviews

https://doi.org/10.20396/joss.v1i1.15011

Published 2011-07-01

Daniel Hirst⁺⁻

Daniel Hirst

National Centre for Scientific Research

PDF

Keywords

Speech prosody
Melody
Intonation
Analysis by synthesis

How to Cite

1.

Hirst D. The analysis by synthesis of speech melody: from data to models. J. of Speech Sci. [Internet]. 2011 Jul. 1 [cited 2024 Jul. 22];1(1):55-83. Available from: https://econtents.bc.unicamp.br/inpec/index.php/joss/article/view/15011

Abstract

This paper describes the application of the analysis by synthesis paradigm to the melody of speech. A
complete chain of processes is described from the acoustic analysis of fundamental frequency (f0), via the
phonetic modelling of f0 using the Momel algorithm, to the surface phonological representation of the
curves using the INTSINT alphabet. Each step of the chain is designed as a reversible process which can be
used to generate an acoustic output allowing an objective evaluation of the analysis. Finally, the current
implementation of ProZed, a prosody editor for linguists, is described. It is argued that an explicit set of
modelling tools like this will allow linguists to test different models of phonological structure which, it is
hoped, will result in the availability of more and better data on a wide variety of languages.

https://doi.org/10.20396/joss.v1i1.15011

PDF

References

Ali S, Hirst D. Developing an automatic functional annotation system for british english intonation. In Proceedings of Interspeech X. Annual Conference of the International Speech Communication Association. Brighton, 2009.

Auran C, Bouzon C, Hirst D. The Aix-MARSEC Project: An Evolutive Database of Spoken British

English. Speech Prosody 2004, International Conference, March 23-26 2004, Nara., 2004.

Boersma P, Weenink D. Praat: doing phonetics by computer [computer program]. 2011.

Campione E. Etiquetage semi-automatique de la prosodie dans les corpus oraux - algorithmes et

méthodologies. Ph.D. thesis, Université de Provence, 2001.

Chentir A, Guerti M, Hirst D. Extraction of standard arabic micromelody. Journal of Computer

Science, 5(2):86--89, 2009.

Cho H, Rauzy S. Phonetic pitch movements of accentual phrases in korean read speech. In

Proceedings of the 4th International Conference on Speech Prosody. Campinas Brasil., 2008.

De Looze C. Analyse et interprétation de l'empan temporel des variations prosodiques en français

et en Anglais. Ph.D. thesis, Université de Provence, Aix-en-Provence, France, 2010.

Fujisaki H. Modeling the generation process of F0 contours as manifestation of linguistic and paralinguistic information. In Proceedings of the XIIth International Congress of Phonetic Sciences,

pages 1--10. 1991.

Gårding E. Intonation in swedish. In D Hirst, A Di Cristo (editors), Intonation Systems. A Survey of

Twenty Languages., chapter 6, pages 117--136. Cambridge: Cambridge University Press, 1998.

Goldsmith JA. Autosegmental and metrical phonology. Cambridge, Mass.: B. Blackwell, 1990.

Hart ('t ) J, Collier R, Cohen A. A perceptual study of intonation: an experimental-phonetic

approach to speech melody. Cambridge University Press, 1990.

Hirst D. La représentation linguistique des systèmes prosodiques : une approche cognitive. Thèse

de Doctorat d'Etat (Habilitation Thesis), Université de Provence, 1987.

Hirst D. Intonation in British English. In D Hirst, A Di Cristo (editors), Intonation Systems.

A Survey of Twenty Languages., chapter 3, pages 56--77. Cambridge: Cambridge University

Press, 1998.

Hirst D. The symbolic coding of segmental duration and tonal alignment: an extension to the

intsint system. Sixth European Conference on Speech Communication and Technology, 1999.

Hirst D. Form and function in the representation of speech prosody. Speech Communication,

(3-4):334--347, 2005.

Hirst D. A Praat plugin for Momel and INTSINT with improved algorithms for modelling and

coding intonation. In Proceedings of the XVIth International Conference of Phonetic Sciences,

pages 1233--1236. Saarbrucken, 2007.

Hirst D, Auran C. Analysis by synthesis of speech prosody: the prozed environment. In Proceedings of Interspeech 2005. (Lisbon), pages 3225--3228. 2005.

Hirst D, Bouzon C, Auran C. Analysis by synthesis of British English speech rhythm: from data to

models. In G Fant, F Hiroya, S Jiaxuan (editors), Frontiers in Phonetics and Speech Science. A

Festschrift for Professor Wu Zongji's 100th Birthday., pages 251--262. Beijing, Peoples Republic

of China: Commercial Press, 2009.

Hirst D, Cho H, Kim S, Yu H. Evaluating two versions of the momel pitch modeling algorithm on a

corpus of read speech in korean. In Proceedings of Interspeech, volume VIII, pages 1649--1652.

Antwerp, Belgium, 2007.

Hirst D, Di Cristo A. Intonation Systems: A Survey of Twenty Languages. Cambridge University

Press, 487 p., 1998a.

Hirst D, Di Cristo A. A survey of intonation systems. In D Hirst, A Di Cristo (editors), Intonation

Systems: A Survey of Twenty Languages, chapter 1, pages 1--44. Cambridge University Press,

b.

Hirst D, Di Cristo A, Espesser R. Levels of representation and levels of analysis for the description

of intonation systems. In M Horne (editor), Prosody: Theory and Experiment. Studies Presented

to Gösta Bruce., pages 51--87. Kluwer Academic Pub, 2000.

Hirst D, Espesser R. Automatic modelling of fundamental frequency using a quadratic spline

function. Travaux de l'Institut de Phonétique d'Aix, 15:75--85, 1993. URL http://www.

isca-speech.org/archive/eurospeech_1989/e89_1480.html.

Iivonen A. Intonation in Finnish. In D Hirst, A Di Cristo (editors), Intonation Systems. A Survey

of Twenty Languages, chapter 17, pages 331--347. Cambridge University Press, 1998.

Maghbouleh A. Tobi accent type recognition. In Proceedings of ICSLP., Paper 0632. 1998.

Mixdorff HJ. A novel approach to the fully automated extraction of fujisaki model parameters.

In Proceedings of ICASSP 1999. 1999.

Prom-on S, Xu Y, Thipakorn B. Modeling tone and intonation in mandarin and english as a process

of target approximation. Journal of the Acoustical Society of America, 125(1):405--424, 2009.

Rissanen J. Modeling by shortest data description. Automatica, vol. 14:465--471, 1978.

Rosenberg A. AuToBI -- a tool for automatic ToBI annotation. In Proceedings of the International

Conference on Spoken Language Processing. 2010.

Silverman K, Beckman M, Pitrelli J, Ostendorf M, Wightman C, Price P, Pierrehumbert J,

Hirschberg J. TOBI: A Standard for Labeling English Prosody. In Second International Conference on Spoken Language Processing, pages 867--870. Banff. Canada.: ISCA, 1992.

Taylor P. The rise/fall/connection model of intonation. Speech Communication, 15(1-2):169--

, 1994.

Trubetzkoy. Grundzüge der Phonologie. (French translation by J. Cantineau 1957) Principes de

phonologie. Paris: Klincksieck, 1949.

Vainio M, Hirst D, Suni A, De Looze C. Using functional annotation for high quality multilingual,

multidialectal and multistyle speech synthesis. In Proceedings SPECOM, 13th International

Conference on Speech and Computer. St Petersburg, Russia, 2009.

Véronis J, Hirst D, Ide N. NL and speech in the MULTEXT project. In Proceedings of AAAI

Workshop on Integration of Natural Language and Speech, pages 72--78. Seattle, USA, 1994.

Wightman C, Campbell N. Improved labeling of prosodic structure. In IEEE Trancactions on

Speech and Audio Processing. 1995.

Xu Y. Speech prosody: a methodological review. Journal of Speech Sciences, 1(1):85--115, 2011.

Xu Y, Sun X. Maximum speed of pitch change and how it may relate to speech. Journal of the

Acoustical Society of America, 111:1399--1413, 2002.

Zhi N. The music of Beijing Chinese speech. On the interactions of tones and intonations in read

and spontaneous Beijing speech. Ph.D. thesis, Scuola Normale da Pisa, in progress.

Zhi N, Hirst D, Bertinetto PM. Automatic analysis of the intonation of a tone language. applying

the momel algorithm to spontaneous standard chinese (beijing). In Proceedings of Interspeech

XI. Makuhari, Japan, 2010.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.

The analysis by synthesis of speech melody: from data to models

Keywords

How to Cite

Download Citation

Abstract

References

Downloads