Physics-Informed Multiagent Reinforcement Learning for Distributed Multirobot Problems

Authors

Eduardo Sebastián, Thai Duong, Nikolay Atanasov, Eduardo Montijano, Carlos Sagüés

Abstract

The networked nature of multirobot systems presents challenges in the context of multiagent reinforcement learning. Centralized control policies do not scale with increasing numbers of robots, whereas independent control policies do not exploit the information provided by other robots, exhibiting poor performance in cooperative-competitive tasks. In this work, we propose a physics-informed reinforcement learning approach able to learn distributed multirobot control policies that are both scalable and make use of all the available information to each robot. Our approach has three key characteristics. First, it imposes a port-Hamiltonian structure on the policy representation, respecting energy conservation properties of physical robot systems and the networked nature of robot team interactions. Second, it uses self-attention to ensure a sparse policy representation able to handle time-varying information at each robot from the interaction graph. Third, we present a soft actor–critic reinforcement learning algorithm parameterized by our self-attention port-Hamiltonian control policy, which accounts for the correlation among robots during training while overcoming the need of value function factorization. Extensive simulations in different multirobot scenarios demonstrate the success of the proposed approach, surpassing previous multirobot reinforcement learning solutions in scalability, while achieving similar or superior performance (with averaged cumulative reward up to \( \times {\text{2}[[:space:]]} \) greater than the state-of-the-art with robot teams \( \times {\text{6}[[:space:]]} \) larger than the number of robots at training time). We also validate our approach on multiple real robots in the Georgia Tech Robotarium under imperfect communication, demonstrating zero-shot sim-to-real transfer and scalability across number of robots.

Citation

Journal: IEEE Transactions on Robotics
Year: 2025
Volume: 41
Issue:
Pages: 4499–4517
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
DOI: 10.1109/tro.2025.3582836

BibTeX

@article{Sebasti_n_2025,
  title={{Physics-Informed Multiagent Reinforcement Learning for Distributed Multirobot Problems}},
  volume={41},
  ISSN={1941-0468},
  DOI={10.1109/tro.2025.3582836},
  journal={IEEE Transactions on Robotics},
  publisher={Institute of Electrical and Electronics Engineers (IEEE)},
  author={Sebastián, Eduardo and Duong, Thai and Atanasov, Nikolay and Montijano, Eduardo and Sagüés, Carlos},
  year={2025},
  pages={4499--4517}
}

Download the bib file

References

Pickem D, Glotfelter P, Wang L, Mote M, Ames A, Feron E, Egerstedt M (2017) The Robotarium: A remotely accessible swarm robotics research testbed. 2017 IEEE International Conference on Robotics and Automation (ICRA) 1699–170 – 10.1109/icra.2017.7989200
Peng, FACMAC: Factored multi-agent centralised policy gradients. Adv. Neural Inf. Process. Syst. (2021)
Atanasov N, Le Ny J, Daniilidis K, Pappas GJ (2015) Decentralized active information acquisition: Theory and application to multi-robot SLAM. 2015 IEEE International Conference on Robotics and Automation (ICRA) 4775–478 – 10.1109/icra.2015.7139863
Tian Y, Chang Y, Herrera Arias F, Nieto-Granda C, How JP, Carlone L (2022) Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems. IEEE Trans Robot 38(4):2022–2038. https://doi.org/10.1109/tro.2021.313775 – 10.1109/tro.2021.3137751
Kan X, Thayer TC, Carpin S, Karydis K (2021) Task Planning on Stochastic Aisle Graphs for Precision Agriculture. IEEE Robot Autom Lett 6(2):3287–3294. https://doi.org/10.1109/lra.2021.306233 – 10.1109/lra.2021.3062337
Pierson A, Schwager M (2015) Bio-inspired non-cooperative multi-robot herding. 2015 IEEE International Conference on Robotics and Automation (ICRA) 1843–184 – 10.1109/icra.2015.7139438
Sebastian E, Montijano E (2021) Multi-robot Implicit Control of Herds. 2021 IEEE International Conference on Robotics and Automation (ICRA) 1601–160 – 10.1109/icra48506.2021.9561231
Sebastian E, Montijano E, Sagues C (2022) Adaptive Multirobot Implicit Control of Heterogeneous Herds. IEEE Trans Robot 38(6):3622–3635. https://doi.org/10.1109/tro.2022.318353 – 10.1109/tro.2022.3183537
Heintzman L, Hashimoto A, Abaid N, Williams RK (2021) Anticipatory Planning and Dynamic Lost Person Models for Human-Robot Search and Rescue. 2021 IEEE International Conference on Robotics and Automation (ICRA) 8252–825 – 10.1109/icra48506.2021.9562070
Matarić MJ (1997) Reinforcement Learning in the Multi-Robot Domain. Robot Colonies 73–8 – 10.1007/978-1-4757-6451-2_4
Matignon L, Laurent GJ, Le Fort-Piat N (2007) Hysteretic Q-learning : an algorithm for Decentralized Reinforcement Learning in Cooperative Multi-Agent Teams. 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems 64–6 – 10.1109/iros.2007.4399095
Matignon L, Jeanpierre L, Mouaddib A-I (2021) Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes. AAAI 26(1):2017–2023. https://doi.org/10.1609/aaai.v26i1.838 – 10.1609/aaai.v26i1.8380
Munikoti S, Agarwal D, Das L, Halappanavar M, Natarajan B (2024) Challenges and Opportunities in Deep Reinforcement Learning With Graph Neural Networks: A Comprehensive Review of Algorithms and Applications. IEEE Trans Neural Netw Learning Syst 35(11):15051–15071. https://doi.org/10.1109/tnnls.2023.328352 – 10.1109/tnnls.2023.3283523
Serra-Gómez Á, Zhu H, Brito B, Böhmer W, Alonso-Mora J (2023) Learning scalable and efficient communication policies for multi-robot collision avoidance. Auton Robot 47(8):1275–1297. https://doi.org/10.1007/s10514-023-10127- – 10.1007/s10514-023-10127-3
Lo, Cheap talk discovery and utilization in multi-agent reinforcement learning. Proc. Int. Conf. Learn. Representations (2023)
Qu G, Wierman A, Li N (2022) Scalable Reinforcement Learning for Multiagent Networked Systems. Operations Research 70(6):3601–3628. https://doi.org/10.1287/opre.2021.222 – 10.1287/opre.2021.2226
Beckers T, Jiahao TZ, Pappas GJ (2023) Learning Switching Port-Hamiltonian Systems with Uncertainty Quantification. IFAC-PapersOnLine 56(2):525–532. https://doi.org/10.1016/j.ifacol.2023.10.162 – 10.1016/j.ifacol.2023.10.1621
Neary, Compositional learning of dynamical system models using port-Hamiltonian neural networks. Proc. Learn. Dyn. Control Conf. (2023)
Sebastián E, Duong T, Atanasov N, Montijano E, Sagüés C (2023) LEMURS: Learning Distributed Multi-Robot Interactions. 2023 IEEE International Conference on Robotics and Automation (ICRA) 7713–771 – 10.1109/icra48891.2023.10161328
Nghiem TX, Drgoňa J, Jones C, Nagy Z, Schwan R, Dey B, Chakrabarty A, Di Cairano S, Paulson JA, Carron A, Zeilinger MN, Shaw Cortez W, Vrabie DL (2023) Physics-Informed Machine Learning for Modeling and Control of Dynamical Systems. 2023 American Control Conference (ACC) 3735–375 – 10.23919/acc55779.2023.10155901
Sanyal S, Roy K (2023) RAMP-Net: A Robust Adaptive MPC for Quadrotors via Physics-informed Neural Network. 2023 IEEE International Conference on Robotics and Automation (ICRA) 1019–102 – 10.1109/icra48891.2023.10161410
Rodwell C, Tallapragada P (2023) Physics-informed reinforcement learning for motion control of a fish-like swimming robot. Sci Rep 13(1). https://doi.org/10.1038/s41598-023-36399- – 10.1038/s41598-023-36399-4
Cuomo S, Di Cola VS, Giampaolo F, Rozza G, Raissi M, Piccialli F (2022) Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next. J Sci Comput 92(3). https://doi.org/10.1007/s10915-022-01939- – 10.1007/s10915-022-01939-z
Xu Y, Kohtz S, Boakye J, Gardoni P, Wang P (2023) Physics-informed machine learning for reliability and systems safety applications: State of the art and challenges. Reliability Engineering & System Safety 230:108900. https://doi.org/10.1016/j.ress.2022.10890 – 10.1016/j.ress.2022.108900
Bloembergen D, Tuyls K, Hennes D, Kaisers M (2015) Evolutionary Dynamics of Multi-Agent Learning: A Survey. jair 53:659–697. https://doi.org/10.1613/jair.481 – 10.1613/jair.4818
Long P, Fanl T, Liao X, Liu W, Zhang H, Pan J (2018) Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning. 2018 IEEE International Conference on Robotics and Automation (ICRA) 6252–625 – 10.1109/icra.2018.8461113
Semnani SH, Liu H, Everett M, de Ruiter A, How JP (2020) Multi-Agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning. IEEE Robot Autom Lett 5(2):3221–3226. https://doi.org/10.1109/lra.2020.297469 – 10.1109/lra.2020.2974695
Ng, Algorithms for inverse reinforcement learning. Proc. Int. Conf. Mach. Learn. (2000)
Dasari, RoboNet: Large-scale multi-robot learning. Proc. Conf. Robot Learn. (2020)
Bogert K, Doshi P (2018) Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions. Artificial Intelligence 263:46–73. https://doi.org/10.1016/j.artint.2018.07.00 – 10.1016/j.artint.2018.07.002
Han R, Chen S, Hao Q (2020) Cooperative Multi-Robot Navigation in Dynamic Environment with Deep Reinforcement Learning. 2020 IEEE International Conference on Robotics and Automation (ICRA) 448–45 – 10.1109/icra40945.2020.9197209
Gharbi I, Kuckling J, Ramos DG, Birattari M (2023) Show me What you want: Inverse Reinforcement Learning to Automatically Design Robot Swarms by Demonstration. 2023 IEEE International Conference on Robotics and Automation (ICRA) 5063–507 – 10.1109/icra48891.2023.10160947
Zhu H, Claramunt FM, Brito B, Alonso-Mora J (2021) Learning Interaction-Aware Trajectory Predictions for Decentralized Multi-Robot Motion Planning in Dynamic Environments. IEEE Robot Autom Lett 6(2):2256–2263. https://doi.org/10.1109/lra.2021.306107 – 10.1109/lra.2021.3061073
Zhou S, Phielipp MJ, Sefair JA, Walker SI, Amor HB (2019) Clone Swarms: Learning to Predict and Control Multi-Robot Systems by Imitation. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 4092–409 – 10.1109/iros40897.2019.8967824
Qu, Scalable reinforcement learning of localized policies for multi-agent networked systems. Proc. Learn. Dyn. Control (2020)
Zambaldi, Deep reinforcement learning with relational inductive biases. Proc. Int. Conf. Learn. Representations (2018)
Iqbal, Actor-attention-critic for multi-agent reinforcement learning. Proc. Int. Conf. Mach. Learn. (2019)
Li G, Jiang B, Zhu H, Che Z, Liu Y (2020) Generative Attention Networks for Multi-Agent Behavioral Modeling. AAAI 34(05):7195–7202. https://doi.org/10.1609/aaai.v34i05.620 – 10.1609/aaai.v34i05.6209
Parnika, Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning. Proc. Int. Conf. Auton. Agents Multiagent Syst. (2021)
Marino A, Pacchierotti C, Giordano PR (2024) Input State Stability of Gated Graph Neural Networks. IEEE Trans Control Netw Syst 11(4):2052–2063. https://doi.org/10.1109/tcns.2024.337271 – 10.1109/tcns.2024.3372710
Li Q, Lin W, Liu Z, Prorok A (2021) Message-Aware Graph Attention Networks for Large-Scale Multi-Robot Path Planning. IEEE Robot Autom Lett 6(3):5533–5540. https://doi.org/10.1109/lra.2021.307786 – 10.1109/lra.2021.3077863
Khan, Graph policy gradients for large scale robot control. Proc. Conf. Robot Learn. (2020)
Tolstaya, Learning decentralized controllers for robot Swarms with graph neural networks. Proc. Conf. Robot Learn. (2020)
Tolstaya E, Paulos J, Kumar V, Ribeiro A (2021) Multi-Robot Coverage and Exploration using Spatial Graph Neural Networks. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 8944–895 – 10.1109/iros51168.2021.9636675
Yang F, Matni N (2021) Communication Topology Co-Design in Graph Recurrent Neural Network based Distributed Control. 2021 60th IEEE Conference on Decision and Control (CDC) 3619–362 – 10.1109/cdc45484.2021.9683779
Gama F, Li Q, Tolstaya E, Prorok A, Ribeiro A (2022) Synthesizing Decentralized Controllers With Graph Neural Networks and Imitation Learning. IEEE Trans Signal Process 70:1932–1946. https://doi.org/10.1109/tsp.2022.316640 – 10.1109/tsp.2022.3166401
Kuyer L, Whiteson S, Bakker B, Vlassis N Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs. Lecture Notes in Computer Science 656–67 – 10.1007/978-3-540-87479-9_61
Buşoniu L, Babuška R, De Schutter B (2010) Multi-agent Reinforcement Learning: An Overview. Studies in Computational Intelligence 183–22 – 10.1007/978-3-642-14435-6_7
Vinyals, Starcraft II: A new challenge for reinforcement learning. (2017)
Ellis, SMACv2: An improved benchmark for cooperative multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. (2024)
Gronauer S, Diepold K (2021) Multi-agent deep reinforcement learning: a survey. Artif Intell Rev 55(2):895–943. https://doi.org/10.1007/s10462-021-09996- – 10.1007/s10462-021-09996-w
Oroojlooy A, Hajinezhad D (2022) A review of cooperative multi-agent deep reinforcement learning. Appl Intell 53(11):13677–13722. https://doi.org/10.1007/s10489-022-04105- – 10.1007/s10489-022-04105-y
Matignon L, Laurent GJ, Le Fort-Piat N (2012) Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review 27(1):1–31. https://doi.org/10.1017/s026988891200005 – 10.1017/s0269888912000057
Papoudakis, Dealing with non-stationarity in multi-agent deep reinforcement learning. (2019)
Bhmer, Deep coordination graphs. Proc. Int. Conf. Mach. Learn. (2020)
Haarnoja, Soft actor-critic algorithms and applications. (2018)
Zhang S Continuous control for robot based on deep reinforcement learnin – 10.32657/10356/90191
Liu Y Proximal Policy Optimization in StarCraf – 10.12794/metadc1505267
Lowe, Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. (2017)
Yu, The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. (2022)
Bettini, BenchMARL: Benchmarking multi-agent reinforcement learning. J. Mach. Learn. Res. (2024)
Kuba, Trust region policy optimisation in multi-agent reinforcement learning. Proc. Int. Conf. Learn. Representations (2021)
Bloom J, Paliwal P, Mukherjee A, Pinciroli C (2023) Decentralized Multi-Agent Reinforcement Learning with Global State Prediction. 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 8854–886 – 10.1109/iros55552.2023.10341563
Yang, Mean field multi-agent reinforcement learning. Proc. Int. Conf. Mach. Learn. (2018)
Wang B, Xie J, Atanasov N (2022) DARL1N: Distributed multi-Agent Reinforcement Learning with One-hop Neighbors. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 9003–901 – 10.1109/iros47612.2022.9981441
Witt, Is independent learning all you need in the starcraft multi-agent challenge. (2020)
Motokawa Y, Sugawara T (2023) Interpretability for Conditional Coordinated Behavior in Multi-Agent Reinforcement Learning. 2023 International Joint Conference on Neural Networks (IJCNN) 1– – 10.1109/ijcnn54540.2023.10191825
Kortvelesy, QGNN: Value function factorisation with graph neural networks. (2022)
Hu Y, Fu J, Wen G (2025) Graph Soft Actor–Critic Reinforcement Learning for Large-Scale Distributed Multirobot Coordination. IEEE Trans Neural Netw Learning Syst 36(1):665–676. https://doi.org/10.1109/tnnls.2023.332953 – 10.1109/tnnls.2023.3329530
Huang Z, Yang Z, Krupani R, Şenbaşlar B, Batra S, Sukhatme GS (2024) Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning. 2024 IEEE International Conference on Robotics and Automation (ICRA) 300–30 – 10.1109/icra57147.2024.10611499
Zhao P, Liu Y (2022) Physics Informed Deep Reinforcement Learning for Aircraft Conflict Resolution. IEEE Trans Intell Transport Syst 23(7):8288–8301. https://doi.org/10.1109/tits.2021.307757 – 10.1109/tits.2021.3077572
Sartoretti G, Wu Y, Paivine W, Kumar TKS, Koenig S, Choset H (2019) Distributed Reinforcement Learning for Multi-robot Decentralized Collective Construction. Springer Proceedings in Advanced Robotics 35–4 – 10.1007/978-3-030-05816-6_3
van der Schaft A, Jeltsema D (2014) Port-Hamiltonian Systems Theory: An Introductory Overview. Foundations and Trends® in Systems and Control 1(2–3):173–378. https://doi.org/10.1561/260000000 – 10.1561/2600000002
Furieri, Distributed neural network control with dependability guarantees: A compositional port-Hamiltonian approach. Proc. Learn. Dyn. Control Conf. (2022)
Galimberti CL, Furieri L, Xu L, Ferrari-Trecate G (2023) Hamiltonian Deep Neural Networks Guaranteeing Nonvanishing Gradients by Design. IEEE Trans Automat Contr 68(5):3155–3162. https://doi.org/10.1109/tac.2023.323943 – 10.1109/tac.2023.3239430
Shi G, Honig W, Yue Y, Chung S-J (2020) Neural-Swarm: Decentralized Close-Proximity Multirotor Control Using Learned Interactions. 2020 IEEE International Conference on Robotics and Automation (ICRA) 3241–324 – 10.1109/icra40945.2020.9196800
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention Is All You Nee – 10.48550/arxiv.1706.03762
Schaft AJ (2004) Port-Hamiltonian Systems: Network Modeling and Control of Nonlinear Physical Systems. Advanced Dynamics and Control of Structures and Machines 127–16 – 10.1007/978-3-7091-2774-2_9
Blankenstein G, Ortega R, Van Der Schaft AJ (2002) The matching conditions of controlled Lagrangians and IDA-passivity based control. International Journal of Control 75(9):645–665. https://doi.org/10.1080/0020717021013593 – 10.1080/00207170210135939
Haarnoja, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc. Int. Conf. Mach. Learn. (2018)
Bettini M, Kortvelesy R, Blumenkamp J, Prorok A (2024) VMAS: A Vectorized Multi-agent Simulator for Collective Robot Learning. Springer Proceedings in Advanced Robotics 42–5 – 10.1007/978-3-031-51497-5_4
Haarnoja T, Ha S, Zhou A, Tan J, Tucker G, Levine S (2019) Learning to Walk Via Deep Reinforcement Learning. Robotics: Science and Systems X – 10.15607/rss.2019.xv.011
Mordatch I, Abbeel P (2018) Emergence of Grounded Compositional Language in Multi-Agent Populations. AAAI 32(1). https://doi.org/10.1609/aaai.v32i1.1149 – 10.1609/aaai.v32i1.11492
Long, Evolutionary population curriculum for scaling multi-agent reinforcement learning. Proc. Int. Conf. Learn. Representations (2019)
Baydin, Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res. (2018)
Todorov E, Erez T, Tassa Y (2012) MuJoCo: A physics engine for model-based control. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems 5026–503 – 10.1109/iros.2012.6386109
Schulman, Trust region policy optimization. Proc. Int. Conf. Mach. Learn. (2015)
Ramachandran, Searching for activation functions. (2017)
Kingma, Adam: A method for stochastic optimization. Proc. Int. Conf. Learn. Representations (2015)