MongoDB across multiple sites

Hi,

We’re running through a large upgrade process and thinking about our replicaset and how many members we need for optimal uptime.

We have two main data centres and our office network as a 3rd location. We should really only be serving data out of our Primary or Secondary data centre.

Would you choose a 5 or 7 member replicaset with 2 or 3 data bearing nodes in each data centre and an arbiter in our office to vote in an election? Any argument for making the arbiter a data member? The only thing I can think of would be of if we lost both data centres for a significant period of time.

In either case, if we lose the primary or secondary site, we still have the majority available assuming the other 3 or 4 members can communicate.

Having 3 members allows a little tolerance for an initial sync should we require one. We do not currently shard our data and therefore our data size is significantly larger than the recommended 2TB limit (something we’re looking to change, but unlikely until next year). But 3 members comes with a storage cost.

Thoughts?

Clive

1 Like

Hi @clivestrong,

It’s hard to give a truly enlighten advice without the full picture but I will try to, at least, add a few considerations in the mix.

Running with an arbiter is never optimal but in your case, as you only have 2 data centers and you don’t want to have data in the 3rd one you have, it definitely makes sense.

My first question would be: which write concern are you using?

Let’s suppose you are running a 5 nodes replica set like this (but it would be the same issue with 7 nodes):

  • DC 1: P+S
  • DC 2: S+S
  • DC3: A

If you are using w=“majority”, you won’t be able to write anymore if you have a network partition that isolated one of the data centers (1 or 2). You would still have a P because 2 nodes + the Arbiter would still be able to elect a node but in this case majority = 3 and a write would be validated by the cluster only if 3 nodes were able to acknowledge the write operation which is impossible with the A in the mix.

Replace the A by an S with p=0 and this issue disappear because your S with p=0 can still vote and therefore still acknowledge write operations but cannot be elected primary.

That being said, w=1 (default) and w=2 operations would still be fine in this configuration but w=“majority”=3 would fail.

Adding 2 nodes (one in each DC1 and 2) would not solve the issue as it would make majority = 4. But at least w=3 operations would then work fine.

Write operations are more likely to be acknowledged by nodes close to the primary due to the latency between the nodes so also note that in the case of a 5 nodes RS, w=2 write operations might be way faster than w=3 - depending how far away your 2 DC are from each other.

Let’s imagine that the DC3 with the A is halfway in the middle of the 2 other DC.

  • latency between DC1 and DC2: 100ms
  • latency between DC1 and DC3: 50ms
  • latency between DC2 and DC3: 50ms

Again, in this scenario, changing the A into an S with p=0 and v=1 would reduce the time for a w=3 write operation by 2 as the DC3 in the middle would acknowledge the write operation twice as fast than the other DC.

In general, I would not recommend w=“majority” write operations in a RS with an A because then the cluster is not truly Highly Available (HA). In my example, it would fail in a network partition scenario. If w=1 or 2 write operations are also used, they could be rollbacked if the Arbiter switches side and now decides to elect a P in DC2 without sync between DC1 & DC2.

I hope this helps a little :smiley:!

Cheers,
Maxime.

1 Like

Hi Maxime,

I was comfortable with the election side of things, but the notes around the S in the 3rd DC and latency for writes is very useful information for me. Food for thought!

Thank you for the insight.

Clive

1 Like