Artificial intelligence, particularly reinforcement learning (RL), offers significant potential for adaptability, optimisation, and autonomous operation. RL methods learn from environmental interactions, but they face challenges in asymmetric cases involving heterogeneous agents. The diversity of agents increases complexity, making it difficult to learn effective strategies and leading to scalability issues. Additionally, RL's performance is heavily dependent on training data, which can be difficult, expensive, and dangerous to collect in real-world scenarios. Simulations are typically used for training, but differences between simulated and real environments can cause performance drops after deployment. Therefore, agents need to adapt efficiently to new settings, which can be challenging in complex and unpredictable situations.
Operations research methods, such as mathematical optimisation and heuristics, do not require agent training and need less data. These models can be constructed to be robust to uncertainties and variations in the environment, and they are easily designed to handle a varying number of heterogeneous agents. However, optimisation-based methods are typically not suited for highly dynamic and uncertain environments. While optimisation provides interpretability, data-driven black-box methods are easier to implement as they do not require understanding internal mechanics.
Combining RL and optimisation methods can leverage RL's adaptability in dynamic environments with the robustness and interpretability of optimisation methods, creating more effective and scalable solutions for complex missions.