Tampere University of Technology

TUTCRIS Research Portal

Service-based Fault Tolerance for Cyber-Physical Systems: A Systems Engineering Approach

Research output: Book/ReportDoctoral thesisCollection of Articles

Standard

Service-based Fault Tolerance for Cyber-Physical Systems : A Systems Engineering Approach. / Alho, Pekka .

Tampere University of Technology, 2015. 104 p. (Tampere University of Technology. Publication; Vol. 1353).

Research output: Book/ReportDoctoral thesisCollection of Articles

Harvard

Alho, P 2015, Service-based Fault Tolerance for Cyber-Physical Systems: A Systems Engineering Approach. Tampere University of Technology. Publication, vol. 1353, Tampere University of Technology.

APA

Alho, P. (2015). Service-based Fault Tolerance for Cyber-Physical Systems: A Systems Engineering Approach. (Tampere University of Technology. Publication; Vol. 1353). Tampere University of Technology.

Vancouver

Alho P. Service-based Fault Tolerance for Cyber-Physical Systems: A Systems Engineering Approach. Tampere University of Technology, 2015. 104 p. (Tampere University of Technology. Publication).

Author

Alho, Pekka . / Service-based Fault Tolerance for Cyber-Physical Systems : A Systems Engineering Approach. Tampere University of Technology, 2015. 104 p. (Tampere University of Technology. Publication).

Bibtex - Download

@book{a9a07df70a8347959248e315eeae1e60,
title = "Service-based Fault Tolerance for Cyber-Physical Systems: A Systems Engineering Approach",
abstract = "Cyber-physical systems (CPSs) comprise networked computing units that monitor and control physical processes in feedback loops. CPSs have potential to change the ways people and computers interact with the physical world by enabling new ways to control and optimize systems through improved connectivity and computing capabilities. Compared to classical control theory, these systems involve greater unpredictability which may affect the stability and dynamics of the physical subsystems. Further uncertainty is introduced by the dynamic and open computing environments with rapidly changing connections and system configurations. However, due to interactions with the physical world, the dependable operation and tolerance of failures in both cyber and physical components are essential requirements for these systems.The problem of achieving dependable operations for open and networked control systems is approached using a systems engineering process to gain an understanding of the problem domain, since fault tolerance cannot be solved only as a software problem due to the nature of CPSs, which includes close coordination among hardware, software and physical objects. The research methodology consists of developing a concept design, implementing prototypes, and empirically testing the prototypes. Even though modularity has been acknowledged as a key element of fault tolerance, the fault tolerance of highly modular service-oriented architectures (SOAs) has been sparsely researched, especially in distributed real-time systems. This thesis proposes and implements an approach based on using loosely coupled real-time SOA to implement fault tolerance for a teleoperation system.Based on empirical experiments, modularity on a service level can be used to support fault tolerance (i.e., the isolation and recovery of faults). Fault recovery can be achieved for certain categories of faults (i.e., non-deterministic and aging-related) based on loose coupling and diverse operation modes. The proposed architecture also supports the straightforward integration of fault tolerance patterns, such as FAIL-SAFE, HEARTBEAT, ESCALATION and SERVICE MANAGER, which are used in the prototype systems to support dependability requirements. For service failures, systems rely on fail-safe behaviours, diverse modes of operation and fault escalation to backup services. Instead of using time-bounded reconfiguration, services operate in best-effort capabilities, providing resilience for the system. This enables, for example, on-the-fly service changes, smooth recoveries from service failures and adaptations to new computing environments, which are essential requirements for CPSs.The results are combined into a systems engineering approach to dependability, which includes an analysis of the role of safety-critical requirements for control system software architecture design, architectural design, a dependability-case development approach for CPSs and domain-specific fault taxonomies, which support dependability case development and system reliability analyses. Other contributions of this work include three new patterns for fault tolerance in CPSs: DATA-CENTRIC ARCHITECTURE, LET IT CRASH and SERVICE MANAGER. These are presented together with a pattern language that shows how they relate to other patterns available for the domain.",
keywords = "Dependability, Patterns, Distributed systems, Reliability, Real-time systems, Service-oriented architecture, Software architecture, Systems engineering",
author = "Pekka Alho",
note = "Awarding institution:Tampere University of Technology",
year = "2015",
month = "12",
day = "11",
language = "English",
isbn = "978-952-15-3642-7",
series = "Tampere University of Technology. Publication",
publisher = "Tampere University of Technology",

}

RIS (suitable for import to EndNote) - Download

TY - BOOK

T1 - Service-based Fault Tolerance for Cyber-Physical Systems

T2 - A Systems Engineering Approach

AU - Alho, Pekka

N1 - Awarding institution:Tampere University of Technology

PY - 2015/12/11

Y1 - 2015/12/11

N2 - Cyber-physical systems (CPSs) comprise networked computing units that monitor and control physical processes in feedback loops. CPSs have potential to change the ways people and computers interact with the physical world by enabling new ways to control and optimize systems through improved connectivity and computing capabilities. Compared to classical control theory, these systems involve greater unpredictability which may affect the stability and dynamics of the physical subsystems. Further uncertainty is introduced by the dynamic and open computing environments with rapidly changing connections and system configurations. However, due to interactions with the physical world, the dependable operation and tolerance of failures in both cyber and physical components are essential requirements for these systems.The problem of achieving dependable operations for open and networked control systems is approached using a systems engineering process to gain an understanding of the problem domain, since fault tolerance cannot be solved only as a software problem due to the nature of CPSs, which includes close coordination among hardware, software and physical objects. The research methodology consists of developing a concept design, implementing prototypes, and empirically testing the prototypes. Even though modularity has been acknowledged as a key element of fault tolerance, the fault tolerance of highly modular service-oriented architectures (SOAs) has been sparsely researched, especially in distributed real-time systems. This thesis proposes and implements an approach based on using loosely coupled real-time SOA to implement fault tolerance for a teleoperation system.Based on empirical experiments, modularity on a service level can be used to support fault tolerance (i.e., the isolation and recovery of faults). Fault recovery can be achieved for certain categories of faults (i.e., non-deterministic and aging-related) based on loose coupling and diverse operation modes. The proposed architecture also supports the straightforward integration of fault tolerance patterns, such as FAIL-SAFE, HEARTBEAT, ESCALATION and SERVICE MANAGER, which are used in the prototype systems to support dependability requirements. For service failures, systems rely on fail-safe behaviours, diverse modes of operation and fault escalation to backup services. Instead of using time-bounded reconfiguration, services operate in best-effort capabilities, providing resilience for the system. This enables, for example, on-the-fly service changes, smooth recoveries from service failures and adaptations to new computing environments, which are essential requirements for CPSs.The results are combined into a systems engineering approach to dependability, which includes an analysis of the role of safety-critical requirements for control system software architecture design, architectural design, a dependability-case development approach for CPSs and domain-specific fault taxonomies, which support dependability case development and system reliability analyses. Other contributions of this work include three new patterns for fault tolerance in CPSs: DATA-CENTRIC ARCHITECTURE, LET IT CRASH and SERVICE MANAGER. These are presented together with a pattern language that shows how they relate to other patterns available for the domain.

AB - Cyber-physical systems (CPSs) comprise networked computing units that monitor and control physical processes in feedback loops. CPSs have potential to change the ways people and computers interact with the physical world by enabling new ways to control and optimize systems through improved connectivity and computing capabilities. Compared to classical control theory, these systems involve greater unpredictability which may affect the stability and dynamics of the physical subsystems. Further uncertainty is introduced by the dynamic and open computing environments with rapidly changing connections and system configurations. However, due to interactions with the physical world, the dependable operation and tolerance of failures in both cyber and physical components are essential requirements for these systems.The problem of achieving dependable operations for open and networked control systems is approached using a systems engineering process to gain an understanding of the problem domain, since fault tolerance cannot be solved only as a software problem due to the nature of CPSs, which includes close coordination among hardware, software and physical objects. The research methodology consists of developing a concept design, implementing prototypes, and empirically testing the prototypes. Even though modularity has been acknowledged as a key element of fault tolerance, the fault tolerance of highly modular service-oriented architectures (SOAs) has been sparsely researched, especially in distributed real-time systems. This thesis proposes and implements an approach based on using loosely coupled real-time SOA to implement fault tolerance for a teleoperation system.Based on empirical experiments, modularity on a service level can be used to support fault tolerance (i.e., the isolation and recovery of faults). Fault recovery can be achieved for certain categories of faults (i.e., non-deterministic and aging-related) based on loose coupling and diverse operation modes. The proposed architecture also supports the straightforward integration of fault tolerance patterns, such as FAIL-SAFE, HEARTBEAT, ESCALATION and SERVICE MANAGER, which are used in the prototype systems to support dependability requirements. For service failures, systems rely on fail-safe behaviours, diverse modes of operation and fault escalation to backup services. Instead of using time-bounded reconfiguration, services operate in best-effort capabilities, providing resilience for the system. This enables, for example, on-the-fly service changes, smooth recoveries from service failures and adaptations to new computing environments, which are essential requirements for CPSs.The results are combined into a systems engineering approach to dependability, which includes an analysis of the role of safety-critical requirements for control system software architecture design, architectural design, a dependability-case development approach for CPSs and domain-specific fault taxonomies, which support dependability case development and system reliability analyses. Other contributions of this work include three new patterns for fault tolerance in CPSs: DATA-CENTRIC ARCHITECTURE, LET IT CRASH and SERVICE MANAGER. These are presented together with a pattern language that shows how they relate to other patterns available for the domain.

KW - Dependability

KW - Patterns

KW - Distributed systems

KW - Reliability

KW - Real-time systems

KW - Service-oriented architecture

KW - Software architecture

KW - Systems engineering

M3 - Doctoral thesis

SN - 978-952-15-3642-7

T3 - Tampere University of Technology. Publication

BT - Service-based Fault Tolerance for Cyber-Physical Systems

PB - Tampere University of Technology

ER -