Voice synthesis using power-balanced simulation of a quasi-1D model of the vocal apparatus
Authors
Thomas Risse, Thomas Hélie, Fabrice Silva
Abstract
The vocal apparatus is a biophysical dynamic system capable of self-oscillation, which involves fluid–structure interactions and human control. This study on the sound synthesis of voiced sounds presents a physical quasi-1D model of the vocal apparatus in the port-Hamiltonian framework and its validation through numerical experiments. The modelling ensures balanced power exchanges between fluid, tissues, and human control. Fluid is represented in the larynx and in the vocal tract using a unified 1D PDE handling transverse geometry variations. A regularisation procedure is introduced to mitigate the numerically stiff behaviour of the model observed at channel closure. Vocal folds and vocal tract walls are represented by lumped element models as well as the radiation load at the lips, which consists of a first-order high-pass filter. Spatial discretisation of the fluid model and temporal discretisation of the full system are made using structure-preserving methods to ensure energy consistency (passivity). The second part of this paper focuses on numerical experiments to progressively characterise the model and assess its validity. These experiments begin with frequency response analysis of a static vocal tract under quasi-linear conditions followed by simulations of vowel transitions (diphthongs) under forced excitation. Next, self-oscillation studies are conducted on an isolated larynx where contact parameters are adjusted. Lastly, full simulations of the self-oscillating vocal apparatus with co-articulation, representing a voice synthesizer capable of articulating vowels, are presented. The dynamics are also analysed in terms of energy transfer and passivity. Finally, these results are discussed to establish a basis for future model refinements and to identify directions for enhancing the accuracy and realism of vocal synthesis.
Citation
- Journal: Frontiers in Signal Processing
- Year: 2025
- Volume: 5
- Issue:
- Pages:
- Publisher: Frontiers Media SA
- DOI: 10.3389/frsip.2025.1525198
BibTeX
@article{Risse_2025,
title={{Voice synthesis using power-balanced simulation of a quasi-1D model of the vocal apparatus}},
volume={5},
ISSN={2673-8198},
DOI={10.3389/frsip.2025.1525198},
journal={Frontiers in Signal Processing},
publisher={Frontiers Media SA},
author={Risse, Thomas and Hélie, Thomas and Silva, Fabrice},
year={2025}
}
References
- Alipour, F., Berry, D. A. & Titze, I. R. A finite-element model of vocal-fold vibration. The Journal of the Acoustical Society of America 108, 3003–3012 (2000) – 10.1121/1.1324678
- Alipour, F. & Scherer, R. C. On pressure-frequency relations in the excised larynx. The Journal of the Acoustical Society of America 122, 2296–2305 (2007) – 10.1121/1.2772230
- Bilbao, Real-time gong synthesis. Proceedings of the 26th Conference of Digital audio effects (DAFx-23) (2023)
- Birkholz, Acoustic comparison of physical vocal tract models with hard and soft walls. IEEE international conference on acoustics, speech and signal processing (ICASSP 2022) (2022)
- Brugnoli, A., Cardoso-Ribeiro, F. L., Haine, G. & Kotyczka, P. Partitioned finite element method for structured discretization with mixed boundary conditions. IFAC-PapersOnLine 53, 7557–7562 (2020) – 10.1016/j.ifacol.2020.12.1351
- Cardoso-Ribeiro, F. L., Matignon, D. & Lefèvre, L. A structure-preserving Partitioned Finite Element Method for the 2D wave equation ⁎ ⁎This work is supported by the project ANR-16-CE92-0028, entitled Interconnected Infinite-Dimensional systems for Heterogeneous Media, INFIDHEM, financed by the French National Research Agency (ANR). Further information is available at https://websites.isae-supaero.fr/infidhem/the-project/. IFAC-PapersOnLine 51, 119–124 (2018) – 10.1016/j.ifacol.2018.06.033
- Castera, Numerical analysis of quadratized schemes. Application to the simulation of the nonlinear piano string. Tech. Rep. (2023)
- Doval, The spectrum of glottal flow models. Acta Acustica united Acustica (2006)
- Ducceschi, Simulation of the snare-membrane collision in modal form using the scalar auxiliary variable (SAV) method. Proceedings of the forum acusticum 2023 (2023)
- Encina, M., Yuz, J., Zanartu, M. & Galindo, G. Vocal fold modeling through the port-Hamiltonian systems approach. 2015 IEEE Conference on Control Applications (CCA) 1558–1563 (2015) doi:10.1109/cca.2015.7320832 – 10.1109/cca.2015.7320832
- Erath, B. D. et al. A review of lumped-element models of voiced speech. Speech Communication 55, 667–690 (2013) – 10.1016/j.specom.2013.02.002
- Flanagan, J. L. Speech Analysis Synthesis and Perception. (Springer Berlin Heidelberg, 1965). doi:10.1007/978-3-662-00849-2 – 10.1007/978-3-662-00849-2
- Fleischer, M., Pinkert, S., Mattheus, W., Mainka, A. & Mürbe, D. Formant frequencies and bandwidths of the vocal tract transfer function are affected by the mechanical impedance of the vocal tract wall. Biomech Model Mechanobiol 14, 719–733 (2014) – 10.1007/s10237-014-0632-2
- Guasch, The EUNISON project. The International Society for Computers and Their Applications (ISCA). Unified numerical simulation of the physics of voice: (2013)
- Gunter, H. E. A mechanical model of vocal-fold collision with high spatial and temporal resolution. The Journal of the Acoustical Society of America 113, 994–1000 (2003) – 10.1121/1.1534100
- Hélie, T. & Silva, F. Self-oscillations of a Vocal Apparatus: A Port-Hamiltonian Formulation. Lecture Notes in Computer Science 375–383 (2017) doi:10.1007/978-3-319-68445-1_44 – 10.1007/978-3-319-68445-1_44
- Ishizaka, K. & Flanagan, J. L. Synthesis of Voiced Sounds From a Two-Mass Model of the Vocal Cords. Bell System Technical Journal 51, 1233–1268 (1972) – 10.1002/j.1538-7305.1972.tb02651.x
- Kelly, Speech synthesis. Proc. Speech communication seminar (1962)
- Maeda, S. A digital simulation method of the vocal-tract system. Speech Communication 1, 199–229 (1982) – 10.1016/0167-6393(82)90017-6
- Maschke, B. M. & van der Schaft, A. J. Port-Controlled Hamiltonian Systems: Modelling Origins and Systemtheoretic Properties. IFAC Proceedings Volumes 25, 359–365 (1992) – 10.1016/s1474-6670(17)52308-3
- Mora, L. A., Le Gorrec, Y., Matignon, D., Ramirez, H. & Yuz, J. I. On port-Hamiltonian formulations of 3-dimensional compressible Newtonian fluids. Physics of Fluids 33, (2021) – 10.1063/5.0067784
- Mora, L. A., Ramirez, H., Yuz, J. I., Le Gorec, Y. & Zañartu, M. Energy-based fluid–structure model of the vocal folds. IMA Journal of Mathematical Control and Information 38, 466–492 (2020) – 10.1093/imamci/dnaa031
- Mora, L. A., Yuz, J. I., Ramirez, H. & Gorrec, Y. L. A port-Hamiltonian Fluid-Structure Interaction Model for the Vocal folds ⁎ ⁎This work was supported by CONICYT-PFCHA/2017-21170472, and AC3E CONICYT-Basal Project FB-0008. IFAC-PapersOnLine 51, 62–67 (2018) – 10.1016/j.ifacol.2018.06.016
- Müller, Time-continuous power-balanced simulation of nonlinear audio circuits: realtime processing framework and aliasing rejection (2021)
- Olver, P. J. Applications of Lie Groups to Differential Equations. Graduate Texts in Mathematics (Springer New York, 1986). doi:10.1007/978-1-4684-0274-2 – 10.1007/978-1-4684-0274-2
- Pelorson, X., Hirschberg, A., van Hassel, R. R., Wijnands, A. P. J. & Auregan, Y. Theoretical and experimental study of quasisteady-flow separation within the glottis during phonation. Application to a modified two-mass model. The Journal of the Acoustical Society of America 96, 3416–3431 (1994) – 10.1121/1.411449
- Risse, T., Hélie, T., Silva, F. & Falaize, A. Minimal port-Hamiltonian modeling of voice production: choices of fluid flow hypotheses, resulting structure and comparison. IFAC-PapersOnLine 58, 238–243 (2024) – 10.1016/j.ifacol.2024.08.287
- Russo, R., Bilbao, S. & Ducceschi, M. Scalar Auxiliary Variable Techniques for Nonlinear Transverse String Vibration. IFAC-PapersOnLine 58, 160–165 (2024) – 10.1016/j.ifacol.2024.08.274
- Ruty, N., Pelorson, X., Van Hirtum, A., Lopez-Arteaga, I. & Hirschberg, A. An in vitro setup to test the relevance and the accuracy of low-order vocal folds models. The Journal of the Acoustical Society of America 121, 479–490 (2007) – 10.1121/1.2384846
- Shen, J., Xu, J. & Yang, J. The scalar auxiliary variable (SAV) approach for gradient flows. Journal of Computational Physics 353, 407–416 (2018) – 10.1016/j.jcp.2017.10.021
- Stevens, K. N. Airflow and Turbulence Noise for Fricative and Stop Consonants: Static Considerations. The Journal of the Acoustical Society of America 50, 1180–1192 (1971) – 10.1121/1.1912751
- Story, B. H. & Titze, I. R. Voice simulation with a body-cover model of the vocal folds. The Journal of the Acoustical Society of America 97, 1249–1260 (1995) – 10.1121/1.412234
- Story, B. H., Titze, I. R. & Hoffman, E. A. Vocal tract area functions from magnetic resonance imaging. The Journal of the Acoustical Society of America 100, 537–554 (1996) – 10.1121/1.415960
- Titze, I. R. & Story, B. H. Rules for controlling low-dimensional vocal fold models with muscle activation. The Journal of the Acoustical Society of America 112, 1064–1076 (2002) – 10.1121/1.1496080
- Trenchant, V., Ramirez, H., Le Gorrec, Y. & Kotyczka, P. Finite differences on staggered grids preserving the port-Hamiltonian structure with application to an acoustic duct. Journal of Computational Physics 373, 673–697 (2018) – 10.1016/j.jcp.2018.06.051
- Valášek, Numerical simulation of fluid-structure-acoustic interaction in human phonation (2021)
- Wetzel, Lumped power-balanced modelling and simulation of the vocal apparatus: a fluid-structure interaction approach (2021)
- Xue, Q., Mittal, R., Zheng, X. & Bielamowicz, S. A computational study of the effect of vocal-fold asymmetry on phonation. The Journal of the Acoustical Society of America 128, 818–827 (2010) – 10.1121/1.3458839
- Yokota, K., Ishikawa, S., Takezaki, K., Koba, Y. & Kijimoto, S. Numerical analysis and physical consideration of vocal fold vibration by modal analysis. Journal of Sound and Vibration 514, 116442 (2021) – 10.1016/j.jsv.2021.116442
- Zhang, Z. Regulation of glottal closure and airflow in a three-dimensional phonation model: Implications for vocal intensity control. The Journal of the Acoustical Society of America 137, 898–910 (2015) – 10.1121/1.4906272