Architecting Operational Resilience in Cloud-Native and Cyber-Physical Systems: Integrating Chaos Engineering, Redundancy Design, Bayesian Modeling, and Reactive Execution Frameworks

Authors

  • Dr. Mateo Laurent Dubois Department of Computer Science and Systems Engineering, University of Montreal, Canada

Keywords:

Operational resilience, chaos engineering, cloud-native systems

Abstract

Operational resilience has emerged as a defining requirement for cloud-native infrastructures, microservices ecosystems, distributed artificial intelligence platforms, and cyber-physical systems. As critical infrastructure processes migrate toward highly virtualized, containerized, and decentralized architectures, traditional reliability engineering approaches are insufficient to capture dynamic interdependencies, adversarial threats, and systemic cascading failures. This research develops an integrative theoretical framework for operational resilience grounded strictly in established literature on chaos engineering, resilience engineering in cloud offerings, Bayesian network modeling of infrastructure resilience, redundancy allocation strategies, process resilience analysis, reactive execution models, and adversarial resilience in communication networks. The study synthesizes principles from chaos experimentation in digital twins and cyber-physical systems, microservices fault tolerance design, redundancy theory in mechanical and communication systems, and resilience-based supplier and infrastructure modeling. Methodologically, the research constructs a layered conceptual model linking proactive disruption testing, redundancy structuring, probabilistic dependency mapping, and reactive operational orchestration. Findings indicate that operational resilience in cloud-native and cyber-physical environments emerges from the interplay between controlled fault injection, probabilistic interdependency awareness, strategic redundancy allocation, and adaptive execution models. The discussion elaborates theoretical implications for distributed AI resilience, microservices orchestration, and critical infrastructure protection, highlighting limitations related to model complexity, adversarial unpredictability, and cost constraints. The study concludes that integrating chaos engineering with Bayesian resilience modeling and redundancy optimization offers a comprehensive paradigm for resilient digital and cyber-physical infrastructures in high-volume operational environments.

References

Aziz FM, Li L, Shamma JS, Stüber GL (2020) Resilience of LTE eNode B against smart jammer in infinite-horizon asymmetric repeated zero-sum game. Physical Communication 39:100989.

Chinamanagonda S (2023) Focus on resilience engineering in cloud offerings. Academia Nexus Journal 2(1).

Dedousis P, Stergiopoulos G, Arampatzis G, Gritzalis D (2023) Enhancing operational resilience of critical infrastructure processes through chaos engineering. IEEE Access 11:106172–106189.

Fogli M, Giannelli C, Poltronieri F, Stefanelli C, Tortonesi M (2023a) Chaos engineering for resilience assessment of digital twins. IEEE Transactions on Industrial Informatics 20(2):1134–1143.

Fogli M, Giannelli C, Poltronieri F, Stefanelli C, Tortonesi M (2023b) Chaos engineering for resilience evaluation of virtual twins. IEEE Transactions on Industrial Informatics 20(2):1134–1143.

Gholinezhad H, Zeinal Hamadani A (2017) A new model for the redundancy allocation problem with component mixing and mixed redundancy strategy. Reliability Engineering & System Safety 164:66–73.

Gibson DV, Mendleson BE (1984) Redundancy. Journal of Business Communication 21:43–61.

Gosselin C, Schreiber L-T (2018) Redundancy in parallel mechanisms: a review. Applied Mechanics Reviews 70.

K. S. Hebbar, "Evolving High-Volume Systems: Reactive Execution Models for Resilient Operations," Computer Fraud and Security, vol. 2024, no.04, pp. 49-58, Apr. 2024 https://computerfraudsecurity.com/index.php/journal/article/view/906/638

Hosseini S, Barker K (2016a) A Bayesian network model for resilience-based supplier selection. International Journal of Production Economics 180:68–87.

Hosseini S, Barker K (2016b) Modeling infrastructure resilience using Bayesian networks: a case study of inland waterway ports. Computers & Industrial Engineering 93:252–266.

Jain P, Pasman HJ, Waldram S, Pistikopoulos EN, Mannan MS (2018) Process resilience analysis framework (PRAF): a systems approach for improved risk and safety management. Journal of Loss Prevention in the Process Industries 53:61–73.

Konstantinou C, Stergiopoulos G, Parvania M, Esteves-Verissimo P (2021a) Chaos engineering for superior resilience of cyber-physical systems. Resilience Week (RWS), IEEE.

Konstantinou C, Stergiopoulos G, Parvania M, Esteves-Verissimo P (2021b) Chaos engineering for more suitable resilience of cyber-physical structures. Resilience Week (RWS), IEEE.

Rehak D, Senovsky P, Slivkova S (2018) Resilience of critical infrastructure elements and its main factors. Systems 6.

Downloads

Published

2026-02-25

How to Cite

Dr. Mateo Laurent Dubois. (2026). Architecting Operational Resilience in Cloud-Native and Cyber-Physical Systems: Integrating Chaos Engineering, Redundancy Design, Bayesian Modeling, and Reactive Execution Frameworks. Ethiopian International Journal of Multidisciplinary Research, 13(2), 1343–1348. Retrieved from https://www.eijmr.org/index.php/eijmr/article/view/5309