- 3 minutes to read
- Print
- DarkLight
- PDF
How to perform a Disaster Recovery test
- 3 minutes to read
- Print
- DarkLight
- PDF
In this article, you’ll find a step-by-step guide on how to conduct a Disaster Recovery (DR) test by executing a manual failover.
To successfully conduct this test, the Primary member must be abruptly shut down.
Requirements
- Two senhasegura instances must be available.
- Instances must be in the same cluster and operating correctly. For more information on cluster settings, refer to the article How to create a cluster.
- Take a snapshot of the instances.
This test is intended for two instances as cluster members. Before starting, check the following tags at the bottom of each member:
Member A (Primary)
Application: Production and Enabled.
Replication: Primary.
Member B (Secondary - contingency)
Application: Contingency and Disabled.
Replication: Non-primary.
Step 1: Take snapshots
Before conducting the test, it’s crucial to take a snapshot of the instances as a precaution since abrupt shutdowns can cause damage.
Always take snapshots in reverse order of the cluster. In this case, first take a snapshot of Member B and then of Member A.
To take a snapshot, follow these steps:
- Access the instance.
- Run the following command to shut it down:
sudo orbit shutdown
- When the instance is completely shut down, take the snapshot in the hypervisor.
- Then, restart the instance and verify if the operation has been restored.
Step 2: Validate the cluster
- Access Orbit Config Manager > Replication > Elasticsearch.
- In the Data search cluster and the Cluster members tables, check if the cluster size corresponds to 2.
Step 3: Configure the Recovery
- On Member B, access Orbit Config Manager > Settings > Recovery.
- Enter the allowed origin IPs to perform system recovery.
Make sure not to use Wildcards (*).
This list will make the Assume as Primary button visible to users.
When using subnet masks, adopt the CIDR notation, for example, 192.168.1.0/24
.
Step 4: Execute the Disaster Recovery Test
- Force an abrupt shutdown on Member A.
Ensure it’s an abrupt shutdown; otherwise, the cluster will detect the deactivation, and Member B will not display the Recovery page.
- Once Member A is inactive due to unexpected behavior, Member B will enter a split brain, blocking any database changes until manual instructions.
- Then, the Recovery page will be displayed on the web application.
- Click Assume as Primary.
- Confirm by clicking Yes. This will set Member B as the new Primary member. This process may take a few minutes.
Ensure the button appears; otherwise, refer to How to enable Recovery to ensure IPs are configured correctly.
- Once the Orbit Web interface is available on Member B, check if the tag indicates that this instance is now the Primary member.
- To access other senhasegura modules, you need to enable the application. Go to Orbit > Settings > Application, and toggle the Enable application button to the active position.
- Click Save.
If the green color is displayed, then the application is activated.
- Log out and log in again to access other modules.
After these steps, all senhasegura functionalities will be available and operational on the DR Member B.
Step 5: Recover the Primary Member
- Activate Member A and wait for synchronization with the other cluster database. This may take a few minutes.
Member A will identify the issue, and Member B, currently Primary, will automatically synchronize new information between members.
- After synchronization, the login page will be displayed on the main web application interface.
- Log in to Member A's web application and click Assume as Primary to restore it as the Primary member.
- On Member B, go to Orbit Config Manager > Settings > Application, and toggle the Enable application button to the inactive position.
- Click Save.
Make sure the green color is not displayed.
Step 5.1 (alternative): Recover the Primary Member via SSH
- Initiate an SSH session on Member A using port 59022 with the user
mt4adm
. - Run the command
sudo orbit application status
to check the following information:
sudo orbit application status
Application: Active
Replication: Active
Instance: Cluster
Primary: memberB
Main: No
- Then, execute the command
sudo orbit application primary
to set Member A as Primary:
sudo orbit application primary
Application: Active
Replication: Active
Instance: Cluster
Primary: memberA
Main: Yes