What is Information System Resilience?


The resilience of an Information System is its capacity to function in adverse conditions or under stress, while preserving critical operational capabilities. In general, a system is resilient if it continues to fulfil its objective despite hardship (i.e. if it provides required capabilities despite excessive stress that can cause disruptions).

Being resilient is vital because, no matter how well-engineered a system is, reality will conspire to disrupt it sooner or later. Residual software or hardware faults will ultimately cause the system to fail to execute a necessary function or to fail to achieve one or more of its quality standards (e.g., availability, capacity, interoperability, performance, reliability, robustness, safety, security, and usability). An accident will occur if a precaution is missing or fails. An attacker can breach a system by exploiting an undiscovered or unpatched security flaw. Service will be disrupted if there is a problem with the external environment (for example, if the power goes out or if the temperature becomes too high).

A system must have controls that identify undesirable events and situations, respond correctly to these disruptions, and quickly recover thereafter in order to be resilient. Controls that avoid adversity are outside the purview of resilience since they presume that unfavorable events and situations will occur. Because of these unavoidable interruptions, availability and dependability are inadequate, and a system must also be robust. Despite disruptions caused by unfavorable events and situations, it must withstand adversity and maintain service continuity, perhaps in a degraded mode of operation. It also needs to recover quickly from any damage caused by the disturbances.

The execution of both analytic and holistic procedures is required to implement resilience in a system. The usage of architecting, as well as the accompanying heuristics, is essential. The intended level of resilience, as well as the characteristics of a threat or disruption, are inputs. The properties of the system, especially the architectural qualities and the nature of the elements, are known as outputs (e.g., hardware, software, or humans).

Artefacts are determined by the system's domain. Specification and architectural descriptions will emerge for technological systems. Enterprise strategies will emerge as a result of enterprise systems. Analytical and comprehensive procedures, as well as architectural skills, are necessary. Analytical approaches are used to determine the level of robustness that is necessary. Using holistic approaches, you can figure out how much adaptation, tolerance, and integrity you'll need. One mistake is to rely solely on a particular strategy to achieve your goals.

Roles of Information System Resilience

Following are the roles of a resilient information system −

Resistance

The capacity of the system to passively avoid or limit injury during an unpleasant event or situation is referred to as resistance. Modular architecture that prevents failure propagation between modules, a lack of single points of failure, and the shielding of electrical equipment, computers, and networks from electromagnetic pulses are all passive resistance techniques.

Detection

The capacity of a system to actively detect (through detection techniques) is known as detection. Resilient information systems can detect, for example, the degradation or loss of crucial competencies and unfavorable occurrences and situations that might jeopardize essential capabilities or assets.

Response

The capacity of a system to actively react to the emergence of an ongoing unpleasant event or respond to the existence of an undesirable situation is referred to as response (whereby the reaction is implemented by reaction techniques).

When a system detects adversity, it can halt or avoid the adversity, remove the adversity, and so prevent or reduce additional harm. Exception handling, degraded modes of operation, and redundancy with voting are examples of reaction strategies.

Recovery

The ability of a system to actively recover from injury after an adverse event has occurred is known as recovery (whereby recovery is implemented by recovery techniques). Recovery can be complete if all damaged or destroyed assets have been fixed or replaced, restoring the system to full operating functionality.

Partial recovery (e.g., full service is restored utilizing redundant resources without replacement/repair) or minimum recovery is also possible (e.g., degraded mode operations providing only limited services). Recovery might also entail the system changing or adapting (for example, by redesigning itself) in order to prevent future repetitions of the bad events or situations.

Updated on: 04-May-2022

315 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements