Voice synthesis using power-balanced simulation of a quasi-1D model of the vocal apparatus

Authors

Thomas Risse, Thomas Hélie, Fabrice Silva

Abstract

The vocal apparatus is a biophysical dynamic system capable of self-oscillation, which involves fluid–structure interactions and human control. This study on the sound synthesis of voiced sounds presents a physical quasi-1D model of the vocal apparatus in the port-Hamiltonian framework and its validation through numerical experiments. The modelling ensures balanced power exchanges between fluid, tissues, and human control. Fluid is represented in the larynx and in the vocal tract using a unified 1D PDE handling transverse geometry variations. A regularisation procedure is introduced to mitigate the numerically stiff behaviour of the model observed at channel closure. Vocal folds and vocal tract walls are represented by lumped element models as well as the radiation load at the lips, which consists of a first-order high-pass filter. Spatial discretisation of the fluid model and temporal discretisation of the full system are made using structure-preserving methods to ensure energy consistency (passivity). The second part of this paper focuses on numerical experiments to progressively characterise the model and assess its validity. These experiments begin with frequency response analysis of a static vocal tract under quasi-linear conditions followed by simulations of vowel transitions (diphthongs) under forced excitation. Next, self-oscillation studies are conducted on an isolated larynx where contact parameters are adjusted. Lastly, full simulations of the self-oscillating vocal apparatus with co-articulation, representing a voice synthesizer capable of articulating vowels, are presented. The dynamics are also analysed in terms of energy transfer and passivity. Finally, these results are discussed to establish a basis for future model refinements and to identify directions for enhancing the accuracy and realism of vocal synthesis.

Citation

Journal: Frontiers in Signal Processing
Year: 2025
Volume: 5
Issue:
Pages:
Publisher: Frontiers Media SA
DOI: 10.3389/frsip.2025.1525198

BibTeX

@article{Risse_2025,
  title={{Voice synthesis using power-balanced simulation of a quasi-1D model of the vocal apparatus}},
  volume={5},
  ISSN={2673-8198},
  DOI={10.3389/frsip.2025.1525198},
  journal={Frontiers in Signal Processing},
  publisher={Frontiers Media SA},
  author={Risse, Thomas and Hélie, Thomas and Silva, Fabrice},
  year={2025}
}

Download the bib file

References

Alipour, F., Berry, D. A. & Titze, I. R. A finite-element model of vocal-fold vibration. The Journal of the Acoustical Society of America 108, 3003–3012 (2000) – 10.1121/1.1324678
Alipour, F. & Scherer, R. C. On pressure-frequency relations in the excised larynx. The Journal of the Acoustical Society of America 122, 2296–2305 (2007) – 10.1121/1.2772230
Bilbao, Real-time gong synthesis. Proceedings of the 26th Conference of Digital audio effects (DAFx-23) (2023)
Birkholz, Acoustic comparison of physical vocal tract models with hard and soft walls. IEEE international conference on acoustics, speech and signal processing (ICASSP 2022) (2022)
Brugnoli, A., Cardoso-Ribeiro, F. L., Haine, G. & Kotyczka, P. Partitioned finite element method for structured discretization with mixed boundary conditions. IFAC-PapersOnLine 53, 7557–7562 (2020) – 10.1016/j.ifacol.2020.12.1351
Cardoso-Ribeiro, F. L., Matignon, D. & Lefèvre, L. A structure-preserving Partitioned Finite Element Method for the 2D wave equation ⁎ ⁎This work is supported by the project ANR-16-CE92-0028, entitled Interconnected Infinite-Dimensional systems for Heterogeneous Media, INFIDHEM, financed by the French National Research Agency (ANR). Further information is available at https://websites.isae-supaero.fr/infidhem/the-project/. IFAC-PapersOnLine 51, 119–124 (2018) – 10.1016/j.ifacol.2018.06.033
Castera, Numerical analysis of quadratized schemes. Application to the simulation of the nonlinear piano string. Tech. Rep. (2023)
Doval, The spectrum of glottal flow models. Acta Acustica united Acustica (2006)
Ducceschi, Simulation of the snare-membrane collision in modal form using the scalar auxiliary variable (SAV) method. Proceedings of the forum acusticum 2023 (2023)
Encina, M., Yuz, J., Zanartu, M. & Galindo, G. Vocal fold modeling through the port-Hamiltonian systems approach. 2015 IEEE Conference on Control Applications (CCA) 1558–1563 (2015) doi:10.1109/cca.2015.7320832 – 10.1109/cca.2015.7320832
Erath, B. D. et al. A review of lumped-element models of voiced speech. Speech Communication 55, 667–690 (2013) – 10.1016/j.specom.2013.02.002
Flanagan, J. L. Speech Analysis Synthesis and Perception. (Springer Berlin Heidelberg, 1965). doi:10.1007/978-3-662-00849-2 – 10.1007/978-3-662-00849-2
Fleischer, M., Pinkert, S., Mattheus, W., Mainka, A. & Mürbe, D. Formant frequencies and bandwidths of the vocal tract transfer function are affected by the mechanical impedance of the vocal tract wall. Biomech Model Mechanobiol 14, 719–733 (2014) – 10.1007/s10237-014-0632-2
Guasch, The EUNISON project. The International Society for Computers and Their Applications (ISCA). Unified numerical simulation of the physics of voice: (2013)
Gunter, H. E. A mechanical model of vocal-fold collision with high spatial and temporal resolution. The Journal of the Acoustical Society of America 113, 994–1000 (2003) – 10.1121/1.1534100
Hélie, T. & Silva, F. Self-oscillations of a Vocal Apparatus: A Port-Hamiltonian Formulation. Lecture Notes in Computer Science 375–383 (2017) doi:10.1007/978-3-319-68445-1_44 – 10.1007/978-3-319-68445-1_44
Ishizaka, K. & Flanagan, J. L. Synthesis of Voiced Sounds From a Two-Mass Model of the Vocal Cords. Bell System Technical Journal 51, 1233–1268 (1972) – 10.1002/j.1538-7305.1972.tb02651.x
Kelly, Speech synthesis. Proc. Speech communication seminar (1962)
Maeda, S. A digital simulation method of the vocal-tract system. Speech Communication 1, 199–229 (1982) – 10.1016/0167-6393(82)90017-6
Maschke, B. M. & van der Schaft, A. J. Port-Controlled Hamiltonian Systems: Modelling Origins and Systemtheoretic Properties. IFAC Proceedings Volumes 25, 359–365 (1992) – 10.1016/s1474-6670(17)52308-3
Mora, L. A., Le Gorrec, Y., Matignon, D., Ramirez, H. & Yuz, J. I. On port-Hamiltonian formulations of 3-dimensional compressible Newtonian fluids. Physics of Fluids 33, (2021) – 10.1063/5.0067784
Mora, L. A., Ramirez, H., Yuz, J. I., Le Gorec, Y. & Zañartu, M. Energy-based fluid–structure model of the vocal folds. IMA Journal of Mathematical Control and Information 38, 466–492 (2020) – 10.1093/imamci/dnaa031
Mora, L. A., Yuz, J. I., Ramirez, H. & Gorrec, Y. L. A port-Hamiltonian Fluid-Structure Interaction Model for the Vocal folds ⁎ ⁎This work was supported by CONICYT-PFCHA/2017-21170472, and AC3E CONICYT-Basal Project FB-0008. IFAC-PapersOnLine 51, 62–67 (2018) – 10.1016/j.ifacol.2018.06.016
Müller, Time-continuous power-balanced simulation of nonlinear audio circuits: realtime processing framework and aliasing rejection (2021)
Olver, P. J. Applications of Lie Groups to Differential Equations. Graduate Texts in Mathematics (Springer New York, 1986). doi:10.1007/978-1-4684-0274-2 – 10.1007/978-1-4684-0274-2
Pelorson, X., Hirschberg, A., van Hassel, R. R., Wijnands, A. P. J. & Auregan, Y. Theoretical and experimental study of quasisteady-flow separation within the glottis during phonation. Application to a modified two-mass model. The Journal of the Acoustical Society of America 96, 3416–3431 (1994) – 10.1121/1.411449
Risse, T., Hélie, T., Silva, F. & Falaize, A. Minimal port-Hamiltonian modeling of voice production: choices of fluid flow hypotheses, resulting structure and comparison. IFAC-PapersOnLine 58, 238–243 (2024) – 10.1016/j.ifacol.2024.08.287
Russo, R., Bilbao, S. & Ducceschi, M. Scalar Auxiliary Variable Techniques for Nonlinear Transverse String Vibration. IFAC-PapersOnLine 58, 160–165 (2024) – 10.1016/j.ifacol.2024.08.274
Ruty, N., Pelorson, X., Van Hirtum, A., Lopez-Arteaga, I. & Hirschberg, A. An in vitro setup to test the relevance and the accuracy of low-order vocal folds models. The Journal of the Acoustical Society of America 121, 479–490 (2007) – 10.1121/1.2384846
Shen, J., Xu, J. & Yang, J. The scalar auxiliary variable (SAV) approach for gradient flows. Journal of Computational Physics 353, 407–416 (2018) – 10.1016/j.jcp.2017.10.021
Stevens, K. N. Airflow and Turbulence Noise for Fricative and Stop Consonants: Static Considerations. The Journal of the Acoustical Society of America 50, 1180–1192 (1971) – 10.1121/1.1912751
Story, B. H. & Titze, I. R. Voice simulation with a body-cover model of the vocal folds. The Journal of the Acoustical Society of America 97, 1249–1260 (1995) – 10.1121/1.412234
Story, B. H., Titze, I. R. & Hoffman, E. A. Vocal tract area functions from magnetic resonance imaging. The Journal of the Acoustical Society of America 100, 537–554 (1996) – 10.1121/1.415960
Titze, I. R. & Story, B. H. Rules for controlling low-dimensional vocal fold models with muscle activation. The Journal of the Acoustical Society of America 112, 1064–1076 (2002) – 10.1121/1.1496080
Trenchant, V., Ramirez, H., Le Gorrec, Y. & Kotyczka, P. Finite differences on staggered grids preserving the port-Hamiltonian structure with application to an acoustic duct. Journal of Computational Physics 373, 673–697 (2018) – 10.1016/j.jcp.2018.06.051
Valášek, Numerical simulation of fluid-structure-acoustic interaction in human phonation (2021)
Wetzel, Lumped power-balanced modelling and simulation of the vocal apparatus: a fluid-structure interaction approach (2021)
Xue, Q., Mittal, R., Zheng, X. & Bielamowicz, S. A computational study of the effect of vocal-fold asymmetry on phonation. The Journal of the Acoustical Society of America 128, 818–827 (2010) – 10.1121/1.3458839
Yokota, K., Ishikawa, S., Takezaki, K., Koba, Y. & Kijimoto, S. Numerical analysis and physical consideration of vocal fold vibration by modal analysis. Journal of Sound and Vibration 514, 116442 (2021) – 10.1016/j.jsv.2021.116442
Zhang, Z. Regulation of glottal closure and airflow in a three-dimensional phonation model: Implications for vocal intensity control. The Journal of the Acoustical Society of America 137, 898–910 (2015) – 10.1121/1.4906272