Replica set behaviour when members can't communicate properly

Hi !

I have a set up which includes three machines, m1, m2 and m3. These machines talks to each other using hostnames host-m1, host-m2 and host-m3. The IP to host mapping is retained in the /etc/hosts of each machine. For various reasons, I cannot employ a DNS server here.

So, the /etc/hosts of each machine looks something like this,

================== 10.46.51.1 =================
10.46.51.3 host-m3
10.46.51.2 host-m2
10.46.51.1 host-m1
================== 10.46.51.2 =================
10.46.51.3 host-m3
10.46.51.2 host-m2
10.46.51.1 host-m1
================== 10.46.51.3 =================
10.46.51.3 host-m3
10.46.51.2 host-m2
10.46.51.1 host-m1

Now, for various reasons, the hostnames sometime have to be changed. For example, this,

10.46.51.3 host-m3
10.46.51.2 host-m2
10.46.51.1 host-m1

might need to be changed to,
10.46.51.3 host-m3
10.46.51.2 host-m1
10.46.51.1 host-m2
on all the machines.

While this change is in progress, the mappings on all machines might not be the same. Meaning, on m1 “host-m3” maps to m3, while on m2 “host-m3” might map to m1

While this change is happening, I will sometime do a replica set init. When I do this, I observe the following,

  1. Init command goes through without any errors.
  2. The instance on which the replica set init was performed says it is in the “SECONDARY” state, while the other two members are in “STARTUP”.
  3. The member in “SECONDARY” state continues to receive heartbeats from the other two members.
  4. Connecting to members in “STARTUP” state and fetching status of the replica set give, “NotYetInitialized”
  5. Reads and Writes fails.
  6. Setup continued to stay in this state for well over 30 mins. It does not correct itself when the mappings get in sync.
  7. The setup corrects itself and reaches Primary-Secondary status if one of the instance in “STARTUP” state is restarted.

Can someone help me understand what is mongo’s recommended approach while init-ing a replica set in a scenario where the host to IP mappings on the all the members might not be in sync.

Additional details:

  1. The Actual hostname to IP mappings on the machines where,
    ================== 10.46.51.5 =================
    10.46.51.5 cvm-5
    10.46.51.2 cvm-1
    10.46.51.1 cvm-2
    ================== 10.46.51.1 =================
    10.46.51.5 cvm-2
    10.46.51.2 cvm-5
    10.46.51.1 cvm-1
    ================== 10.46.51.2 =================
    10.46.51.5 cvm-1
    10.46.51.2 cvm-2
    10.46.51.1 cvm-5

  2. Mongodb Version 2.4.6 was running on centos 7

Hi @Mohammad_Ghazanfar

If I understand the question correctly, you are changing the name to IP address mapping while the MongoDB process is running, and the replica set behaves strangely. Is this correct?

If yes, then unfortunately there’s not much the server can do to fix itself since the situation is not under its control. A replica set node tries to connect to other nodes in the replica set, but if it asks the OS for the address for a certain node, but the address given to it is wrong, there is nothing it can do about it.

I would recommend to shutdown MongoDB while these IP remapping are being done, and restart them once all the correct IP mappings are in place. Having a set of very confused replica set is generally not a good thing, especially if you’re doing writes while this is going on.

Mongodb Version 2.4.6 was running on centos 7

Please note that version 2.4.6 is seriously outdated now. The 2.4 series was released in March 2013 (7 years ago) and was out of support in March 2016 (4 years ago). Please consider using a supported version (see Support Policy for a list of supported versions).

Best regards,
Kevin

2 Likes

Thanks @kevinadi for your response.

This is correct: “changing the name to IP address mapping while the MongoDB process is running, and the replica set behaves strangely”

Mongodb Version 2.4.6 was running on centos 7

Oh man ! Sorry about that, I seem to have mistyped. I meant 4.2.6

I will go ahead with your recommendation to restart mongo when the remapping happens.
Thanks again for your response.

1 Like