Recovery testing is system-based testing that ensures the system is capable of recovering from a failure. The process involves testing the system recovery process by voluntarily causing system failure. Besides testing the system fault-tolerance capabilities, the testing engineer also checks whether the system can resume itself within a pre-defined time. Recovery testing is an essential part of system testing that is a top requisite for mission-critical system-related such as defense systems, medical devices, banking, etc.
Software or system is expected to recover from failures when −
A tester must follow these six crucial steps while performing a recovery test −
Step 1 − Recovery analysis – It is important that the system under test can allocate extra resources like multiple CPUs, and servers. This helps in understanding how each recovery-based change affects the working structure of the system. Besides reporting possible failures, the testers must also analyze the impact and the severity of the impact of such failures.
Step 2 − Preparing Test Plan – Once recovery analysis results are recorded, the testing team prepares the test plan.
Step 3 − Preparing Test Environment – This step involves evaluating the results from the recovery analysis process and designing the test environment.
Step 4 − Back-up Maintenance – To recover any possible data loss during the test, the testing team performs a backup of all information related to the system, software, and database. If essential data is there, it should also be back-up and stored in multiple locations for safety.
Step 5 − Allocating recovery personnel – Since this process is divided into steps, special recovery personnel is allocated for each step.
Step 6 − Documentation – Each step performed before and during the test is analyzed carefully when encountered a failure.
Make sure to follow these steps while performing a recovery test −
You must create a testbed similar to the real conditions of deployment. You must ensure all conditions like interface, firmware, protocol, software, and hardware are close as possible to the real condition.
Run exhaustive testing regardless of how much it costs. A complete identical configuration and check are necessary.
Try to run the test on the actual hardware you intend to restore the program after the test.
For the backup system, try to get hardware with a similar size to the drive from where you have taken the backup.
Try to discard obsolete technology and use the latest hardware/firmware to avoid compatibility issues. If such issues arrive, you can create a virtual machine for the hardware with the same disk sizes and configuration.
Creating an online backup system is a great way to avoid exposure to media problems. Most online backup systems are reliable. However, you should also check their restore ability, retrieval functionality, encryption level, and overall security.
It’s quite common to experience system failure, but it is also crucial to ensure the system can recover with ease causing minimum to no damage to the user’s essential data −
Here are some examples of how a recovery testing contributes to a system performance −
Let’s say you are running multiple sessions in your browser and the network goes off or the system turns off due to power failure or any other condition. In recovery testing, testers test your browser under such a scenario and ensure the browser recovers from the failure and all your previous sessions are restores completely after a system restart.
If you are streaming a movie over a network and suddenly the software crashed or you clicked the close button mistakenly. Will you be able to resume the movie from the place you left off? In recovery testing, testers simulate a software failure by unplugging the system power and plug it again to ensure the app recovers and resumes receiving data from the point of failure.
Assume you are sitting inside a café and downloading a movie from the café’s Wi-Fi. Suddenly you move to a non-Wi-Fi zone and the downloading stops in the middle. Now, if your move back to the Wi-Fi zone will the download resumes from the same point or it you have to download the file again? Testers simulate the whole process to check the recovery rate of the software.
A restoration strategy is planned by the recovery team to retrieve important code and data bringing all operations go back to their normal state. The strategies might differ based on the criticality of the system.
A mockup restoration strategy −
Try to do at least a single backup or multiple if the stakes are high
In case of multiple backups, try to store them at different places
Choose between an online or offline backup or both if necessary
Check the policy to ensure if you can conduct an automatic backup or do it manually
It would help to have an independent restoration team or development team by your side
Note − The cost of restoration strategy will increase if you go for multiple backups or hire an independent team.
Once all files and folders are restored, make sure to do the following activities −
Count the files from the restored folders and make sure it matches with the backup folder.
Rename the corrupted document folder.
Check whether the restored files are working. Open them with applications that you use normally. Also, check if you can browse, update, and modify the data.
Open all types of files such as documents, music, pictures, and videos including small and large ones.
System or application failure is inevitable regardless of how much you spend on development. Recovery testing helps eliminates critical bugs and make your system ready to recover from future failures. It is a continuous and time-consuming process. Meaning, the system will be tested for failure repeatedly, until it is free from all critical failures.