You are here

Federation Playback and Restart

Description:

OUSD (R&E) CRITICAL TECHNOLOGY AREA(S): Space Technology; Integrated Network Systems-of-Systems; Trusted AI and Autonomy OBJECTIVE: Develop a minimally invasive simulation playback and restart capability for simulation federations. DESCRIPTION: Extremely useful capabilities within some simulations are playback and restart. In both cases, a simulation saves critical state data during execution. For playback, these states are used to re-create a simulation execution exactly as previously executed without recalculating all intermediate model results, while not needing to store all intermediate data. For restart, the saved simulation states are used to restart a simulation execution at a save-point, either saving re-run time in failed execution or allowing execution variations from that save-point. Simulation developers must specifically design this capability into the simulation code-base, and the capability introduces a significant bookkeeping overhead on both models and the simulation (although some simulation engines facilitate this, e.g., optimistic simulation engines). Due to these complexities, this capability exists almost exclusively within integrated simulations. Federated simulations, simulations-of-simulations in which simulations are independently developed and connected/executed together by a simulation framework, almost never have this capability as most federate simulations do not save required state data nor pass it to the federation framework. All of the federate simulations and the framework would require a common means of implementing the playback/restart capability. Development of federate simulations is done independently, and as a result federates are essentially “black boxes” to the developers of the federation framework. Therefore, any solution should be minimally invasive, meaning the requirements federate developers need to meet must be the minimum necessary. Changes to their code should be minimized, simple to implement, and clear-cut regardless of the nature of the federate simulation. Performance of the federates or federation as a whole should not be noticeably compromised. The solution should work with distributed architectures. Ideally the solution should support parallelization. Technical Objectives include: 1) Identify minimum requirements for Playback/Restart Capabilities in federation. 2) Define minimally invasive changes for federates. 3) Define changes to Framework. 4) Demonstrate collection of simulation state data from federates. 5) Demonstrate re-initialization of federates. 6) Demonstrate playback in federation. 7) Demonstrate restart in federation. 8) Benchmark federate and federation performance. PHASE I: Phase I should focus on proving a solution concept, including: 1) Identify minimum requirements for Playback/Restart Capabilities in federation via analysis. 2) Define minimally invasive changes for federates via analysis. 3) Define changes to Framework via analysis. 4) Show collection of simulation state data from federates via demonstration in a contractor test simulation environment representing a federation. 5) Show re-initialization of federates via demonstration in a contractor test simulation environment representing a federation. 6) Show playback in federation via demonstration in a contractor test simulation environment representing a federation. 7) Show restart in federation via a demonstration in a contractor test simulation environment representing a federation. PHASE II: Phase II should focus on demonstrating a prototype capability in a relevant simulation federation and developing specific software required to integrate with operational federations. 1) Show collection of simulation state data from federates via demonstration in a simulation federation. 2) Show re-initialization of federates via demonstration in a simulation federation. 3) Show playback in federation via demonstration in a simulation federation. 4) Show restart in federation via demonstration in a simulation federation. 5) Benchmark federate and federation performance while collecting simulation state data against normal operation via test in a simulation federation. PHASE III DUAL USE APPLICATIONS: Phase III should focus on implementing the capability in a missile defense system and other DoD simulation federations. REFERENCES: 1) G. Zheng, Xiang Ni and L. V. Kalé, "A scalable double in-memory checkpoint and restart scheme towards exascale," IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012), 2012, pp. 1-6, doi: 10.1109/DSNW.2012.6264677. 2) K. Dichev, D. De Sensi, D. S. Nikolopoulos, K. W. Cameron and I. Spence, "Power Log’n’Roll: Power-Efficient Localized Rollback for MPI Applications Using Message Logging Protocols," in IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 6, pp. 1276-1288, 1 June 2022, doi: 10.1109/TPDS.2021.3107745. KEYWORDS: Model; Simulation; M&S Frameworks; M&S Federations; Simulation Restart; Simulation Playback; State Saves; Checkpoint
US Flag An Official Website of the United States Government