A Case Study On The Application of Software Health Management Techniques

Abstract

Ever increasing complexity of software used in large-scale, safety critical cyber-physical systems makes it increasingly difficult to expose and thence correct all potential bugs. There is a need to augment the existing fault tolerance methodologies with new approaches that address latent software bugs exposed at runtime. This paper describes an approach that borrows and adapts traditional textquoteleftSystems Health Managementtextquoteright techniques to improve software dependability through simple formal specification of runtime monitoring, diagnosis and mitigation strategies. The two-level approach of Health Management at Component and System level is demonstrated on a simulated case study of an Air Data Inertial Reference Unit (ADIRU). That subsystem was categorized as the primary failure source for the in-flight upset caused in the Malaysian Air flight 124 over Perth, Australia in August 2005.

Type