Port–Hamiltonian Approach to Neural Network Training
Authors
Stefano Massaroli, Michael Poli, Federico Califano, Angela Faragasso, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
Abstract
Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations. Recent work proposes networks with layer outputs which are no longer quantized but are solutions of an ordinary differential equation (ODE); however, these networks are still optimized via discrete methods (e.g. gradient descent). In this paper, we explore a different direction: namely, we propose a novel framework for learning in which the parameters themselves are solutions of ODEs. By viewing the optimization process as the evolution of a port-Hamiltonian system, we can ensure convergence to a minimum of the objective function. Numerical experiments have been performed to show the validity and effectiveness of the proposed methods.
Citation
- Journal: 2019 IEEE 58th Conference on Decision and Control (CDC)
- Year: 2019
- Volume:
- Issue:
- Pages: 6799–6806
- Publisher: IEEE
- DOI: 10.1109/cdc40024.2019.9030017
BibTeX
@inproceedings{Massaroli_2019,
title={{Port–Hamiltonian Approach to Neural Network Training}},
DOI={10.1109/cdc40024.2019.9030017},
booktitle={{2019 IEEE 58th Conference on Decision and Control (CDC)}},
publisher={IEEE},
author={Massaroli, Stefano and Poli, Michael and Califano, Federico and Faragasso, Angela and Park, Jinkyoo and Yamashita, Atsushi and Asama, Hajime},
year={2019},
pages={6799--6806}
}
References
- eldan, The power of depth for feedforward neural networks. Conference on Learning Theory (2016)
- Neural Information Processing. Lecture Notes in Computer Science (Springer International Publishing, 2017). doi:10.1007/978-3-319-70139-4 – 10.1007/978-3-319-70139-4
- The Duffing Equation. (2011) doi:10.1002/9780470977859 – 10.1002/9780470977859
- Goebel, R., Sanfelice, R. G. & Teel, A. R. Hybrid dynamical systems. IEEE Control Syst. 29, 28–93 (2009) – 10.1109/mcs.2008.931718
- li, Visualizing the loss landscape of neural nets. Advances in neural information processing systems (2018)
- blum, Training a 3-node neural network is np-complete. Advances in neural information processing systems (1989)
- Maschke, B. M. & van der Schaft, A. J. Port-Controlled Hamiltonian Systems: Modelling Origins and Systemtheoretic Properties. IFAC Proceedings Volumes 25, 359–365 (1992) – 10.1016/s1474-6670(17)52308-3
- duindam, Modeling and control of complex physical systems: the port-Hamiltonian approach. (2009)
- van der Schaft, A. & Jeltsema, D. Port-Hamiltonian Systems Theory: An Introductory Overview. FnT in Systems and Control 1, 173–378 (2014) – 10.1561/2600000002
- Putting energy back in control. IEEE Control Syst. 21, 18–33 (2001) – 10.1109/37.915398
- Ortega, R., van der Schaft, A., Maschke, B. & Escobar, G. Interconnection and damping assignment passivity-based control of port-controlled Hamiltonian systems. Automatica 38, 585–596 (2002) – 10.1016/s0005-1098(01)00278-3
- Ortega, R., van der Schaft, A., Castanos, F. & Astolfi, A. Control by Interconnection and Standard Passivity-Based Control of Port-Hamiltonian Systems. IEEE Trans. Automat. Contr. 53, 2527–2542 (2008) – 10.1109/tac.2008.2006930
- chen, Neural ordinary differential equations. Advances in neural information processing systems (2018)
- ruthotto, Deep neural networks motivated by partial differential equations. (2018)
- krogh, A simple weight decay can improve generalization. Advances in neural information processing systems (1992)
- brock, Large scale gan training for high fidelity natural image synthesis. (2018)
- Golub, G. H., Hansen, P. C. & O’Leary, D. P. Tikhonov Regularization and Total Least Squares. SIAM J. Matrix Anal. & Appl. 21, 185–194 (1999) – 10.1137/s0895479897326432
- He, K., Gkioxari, G., Dollar, P. & Girshick, R. Mask R-CNN. 2017 IEEE International Conference on Computer Vision (ICCV) (2017) doi:10.1109/iccv.2017.322 – 10.1109/iccv.2017.322
- devlin, Bert:Pre-training of deep bidirectional transformers for language understanding. (2018)
- van der Schaft, A. & Schumacher, H. An Introduction to Hybrid Dynamical Systems. Lecture Notes in Control and Information Sciences (Springer London, 2000). doi:10.1007/bfb0109998 – 10.1007/bfb0109998
- He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016) doi:10.1109/cvpr.2016.90 – 10.1109/cvpr.2016.90
- tieleman, Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA Neural Networks for Machine Learning (2012)
- kingma, Adam: A method for stochastic optimization. International Conference on Learning Representations ICLR 2015 (2015)
- Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning Internal Representations by Error Propagation. http://dx.doi.org/10.21236/ADA164453 (1985) doi:10.21236/ada164453 – 10.21236/ada164453
- liu, On the variance of the adaptive learning rate and beyond. (2019)
- Hornik, K., Stinchcombe, M. & White, H. Multilayer feedforward networks are universal approximators. Neural Networks 2, 359–366 (1989) – 10.1016/0893-6080(89)90020-8
- greydanus, Hamiltonian neural networks. (2019)
- howse, Gradient and hamiltonian dynamics applied to learning in neural networks. Advances in neural information processing systems (1996)
- Chaudhari, P., Oberman, A., Osher, S., Soatto, S. & Carlier, G. Deep relaxation: partial differential equations for optimizing deep neural networks. Res Math Sci 5, (2018) – 10.1007/s40687-018-0148-y
- Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A Learning Algorithm for Boltzmann Machines*. Cognitive Science 9, 147–169 (1985) – 10.1207/s15516709cog0901_7
- Sienko, W., Citko, W. & Jakóbczak, D. Learning and System Modeling via Hamiltonian Neural Networks. Lecture Notes in Computer Science 266–271 (2004) doi:10.1007/978-3-540-24844-6_36 – 10.1007/978-3-540-24844-6_36
- moon, Theory of holors: A generalization of tensors. (2005)
- Hopfield, J. J. Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. U.S.A. 79, 2554–2558 (1982) – 10.1073/pnas.79.8.2554