How to perform a Disaster Recovery test
  • 3 minutes to read
  • Dark
    Light
  • PDF

How to perform a Disaster Recovery test

  • Dark
    Light
  • PDF

Article Summary

In this article, you’ll find a step-by-step guide on how to conduct a Disaster Recovery (DR) test by executing a manual failover.

Important

To successfully conduct this test, the Primary member must be abruptly shut down.

Requirements

  • Two senhasegura instances must be available.
  • Instances must be in the same cluster and operating correctly. For more information on cluster settings, refer to the article How to create a cluster.
  • Take a snapshot of the instances.

This test is intended for two instances as cluster members. Before starting, check the following tags at the bottom of each member:

Member A (Primary)

  • Application: Production and Enabled.

  • Replication: Primary.

Member B (Secondary - contingency)

  • Application: Contingency and Disabled.

  • Replication: Non-primary.

Step 1: Take snapshots

Before conducting the test, it’s crucial to take a snapshot of the instances as a precaution since abrupt shutdowns can cause damage.

Important

Always take snapshots in reverse order of the cluster. In this case, first take a snapshot of Member B and then of Member A.

To take a snapshot, follow these steps:

  1. Access the instance.
  2. Run the following command to shut it down:
sudo orbit shutdown

  1. When the instance is completely shut down, take the snapshot in the hypervisor.
  2. Then, restart the instance and verify if the operation has been restored.

Step 2: Validate the cluster

  1. Access Orbit Config Manager > Replication > Elasticsearch.
  2. In the Data search cluster and the Cluster members tables, check if the cluster size corresponds to 2.

Step 3: Configure the Recovery

  1. On Member B, access Orbit Config Manager > Settings > Recovery.
  2. Enter the allowed origin IPs to perform system recovery.
Caution

Make sure not to use Wildcards (*).

Info

This list will make the Assume as Primary button visible to users.

Caution

When using subnet masks, adopt the CIDR notation, for example, 192.168.1.0/24.

Step 4: Execute the Disaster Recovery Test

  1. Force an abrupt shutdown on Member A.
Caution

Ensure it’s an abrupt shutdown; otherwise, the cluster will detect the deactivation, and Member B will not display the Recovery page.

  1. Once Member A is inactive due to unexpected behavior, Member B will enter a split brain, blocking any database changes until manual instructions.
  2. Then, the Recovery page will be displayed on the web application.
  3. Click Assume as Primary.
  4. Confirm by clicking Yes. This will set Member B as the new Primary member. This process may take a few minutes.
Info

Ensure the button appears; otherwise, refer to How to enable Recovery to ensure IPs are configured correctly.

  1. Once the Orbit Web interface is available on Member B, check if the tag indicates that this instance is now the Primary member.
  2. To access other senhasegura modules, you need to enable the application. Go to Orbit > Settings > Application, and toggle the Enable application button to the active position.
  3. Click Save.

If the green color is displayed, then the application is activated.

  1. Log out and log in again to access other modules.

After these steps, all senhasegura functionalities will be available and operational on the DR Member B.

Step 5: Recover the Primary Member

  1. Activate Member A and wait for synchronization with the other cluster database. This may take a few minutes.
Info

Member A will identify the issue, and Member B, currently Primary, will automatically synchronize new information between members.

  1. After synchronization, the login page will be displayed on the main web application interface.
  2. Log in to Member A's web application and click Assume as Primary to restore it as the Primary member.
  3. On Member B, go to Orbit Config Manager > Settings > Application, and toggle the Enable application button to the inactive position.
  4. Click Save.
Info

Make sure the green color is not displayed.

Step 5.1 (alternative): Recover the Primary Member via SSH

  1. Initiate an SSH session on Member A using port 59022 with the user mt4adm.
  2. Run the command sudo orbit application status to check the following information:
 sudo orbit application status

Application: Active
Replication: Active
Instance:    Cluster
Primary:     memberB
Main:        No

  1. Then, execute the command sudo orbit application primary to set Member A as Primary:
sudo orbit application primary

Application: Active
Replication: Active
Instance:    Cluster
Primary:     memberA
Main:        Yes


Was this article helpful?