Replication lag for more than 10 hours

This is just a question I wanted to ask.

replica set: 1 primary,1 secondary and 1 arbiter server

Every now and then secondary server is out of sync and we tried doing complete sync again.

We have increased the Oplog size to a very good amount but still it is taking/eating the Oplog size very nicely and again putting the secondary to lag for more than 10 hours to primary.

In log, I can see below message so wanted to know why I am seeing this issue and how to make the secondary sync to primary.

I checked with application team and they are using majoritywriteconcern as true in the connection string.

Is the below issue happening due to majoritywriteconcern?

Wed Jan 23 07:22:31.654 I CONTROL [LogicalSessionCacheRefresh] Failed to refresh session cache: WriteConcernFailed: waiting for replication timed out
Wed Jan 23 07:22:31.658 I COMMAND [LogicalSessionCacheReap] command config.$cmd command: delete { delete: “orders”, ordered: false, writeConcern: { w: “majority”, wtimeout: 15000 }, $db: “config” } numYields:0 reslen:352 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } protocol:op_msg 15002ms

Hi Kevin,

I also have PSA architecture setup and in logs I am also seeing same error message and it takes almost 10 hours for my secondary server to sync and once it syncs back, again after few hours it will start lagging from primary. I have resized the oplog also almost to 2 TB , still the issue is not resolved. Please can you let us know how to fix this issue?
Which way can we handle this issue as we have baremetals and load average is not more than 5.

Your insert operation is asking for WriteConcern: “majority”, and it has a timeout of 15 seconds, meaning if the system can’t write to at least 2 (two) nodes in the replica set in a time window of 15 seconds, the operation fails.

With the secondary ~10 hours behind the primary - or in some cases, as you mentioned, out of sync - any operation that requires w: “majority” with wtimeout: 15000 will fail.

That is one of the reasons I’m not a fan of PSA configs. You only have 2 data-bearing nodes to establish the majority, if anything happens with any of the data-bearing members, then your application is in trouble to work requests with w: “majority”.

(a) What is your MongoDB version?

(b) Can you share the output for rs.config() and rs.status()?

(b) Anything in the mongod.log or operating system log that could give you a hint about why the secondary can’t keep the replication going?

All the best,

– Rodrigo

2 Likes

I am just curious to know how did we understand the problem is bcos of writeconcern:majority?
The log information is this with respect to replication status right and how do we know that insert is triggered by app team with writeconcern:majority?

APart from increasing oplog size , what is the other way to solve replication lag as I ,am seeing 19 hours of delay.

MON_P1:SECONDARY> db.getReplicationInfo()
{
        "logSizeMB" : 4000000,
        "usedMB" : 2789244.36,
        "timeDiff" : 75363,
        "timeDiffHours" : 20.93,
        "tFirst" : "Wed Jan 27 2021 05:24:46 GMT+0000 (GMT)",
        "tLast" : "Thu Jan 28 2021 02:20:49 GMT+0000 (GMT)",
        "now" : "Thu Jan 28 2021 21:34:14 GMT+0000 (GMT)"
}
MON_P1:SECONDARY> rs.printSlaveReplicationInfo()
source: yyy.com:23011
        syncedTo: Thu Jan 28 2021 02:20:52 GMT+0000 (GMT)
        69216 secs (19.23 hrs) behind the primary
MON_P1:SECONDARY>

We are at 3.6.18 version.

MON_P1:SECONDARY> rs.config()
{
        "_id" : "MON_P1",
        "version" : 8,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 0,
                        "host" : "xxx.com:23011",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {
 },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 1,
                        "host" : "yyy.com:23011",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 0.9,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 2,
                        "host" : "zzz.com:23011",
                        "arbiterOnly" : true,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 0,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "catchUpTimeoutMillis" : -1,
                "catchUpTakeoverDelayMillis" : 30000,
                "getLastErrorModes" : {

                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("7y1f63996l8693a4196l7aq1")
        }
}
MON_P1:SECONDARY> rs.status()
{
        "set" : "MON_P1",
        "date" : ISODate("2021-01-28T21:28:03.245Z"),
        "myState" : 2,
        "term" : NumberLong(30),
        "syncingTo" : "xxx.com:23011",
        "syncSourceHost" : "xxx.com:23011",
        "syncSourceId" : 0,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1611800325, 36),
                        "t" : NumberLong(29)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1611800325, 36),
                        "t" : NumberLong(29)
                }
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "xxx.com:23011",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 5710,
                        "optime" : {
                                "ts" : Timestamp(1611869283, 4),
                                "t" : NumberLong(30)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1611869282, 4),
                                "t" : NumberLong(30)
                        },
                        "optimeDate" : ISODate("2021-01-28T21:28:03Z"),
                        "optimeDurableDate" : ISODate("2021-01-28T21:28:02Z"),
                        "lastHeartbeat" : ISODate("2021-01-28T21:28:03.235Z"),
                        "lastHeartbeatRecv" : ISODate("2021-01-28T21:28:02.348Z"),
                        "pingMs" : NumberLong(39),
                        "lastHeartbeatMessage" : "",
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1611863472, 1),
                        "electionDate" : ISODate("2021-01-28T19:51:12Z"),
                        "configVersion" : 8
                },
                {
                        "_id" : 1,
                        "name" : "yyy.com:23011",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 5762,
                        "optime" : {
                                "ts" : Timestamp(1611800325, 36),
                                "t" : NumberLong(29)
                        },
                        "optimeDate" : ISODate("2021-01-28T02:18:45Z"),
                        "syncingTo" : "xxx.com:23011",
                        "syncSourceHost" : "xxx.com:23011",
                        "syncSourceId" : 0,
                        "infoMessage" : "",
                        "configVersion" : 8,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                },
                {
                        "_id" : 2,
                        "name" : "zzz.com:23011",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 5710,
                        "lastHeartbeat" : ISODate("2021-01-28T21:28:02.711Z"),
                        "lastHeartbeatRecv" : ISODate("2021-01-28T21:28:02.909Z"),
                        "pingMs" : NumberLong(29),
                        "lastHeartbeatMessage" : "",
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "configVersion" : 8
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1611800325, 36),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1611869283, 4),
                "signature" : {
                        "hash" : BinData(0,"mKHbJ/3jBavaMoNjluLlbQFQOqc="),
                        "keyId" : NumberLong("6883968148895170561")
                }
        }
}

I am just curious to know how did we understand the problem is bcos of writeconcern:majority?

Let’s take a step back… WriteConcern majority isn’t causing any lagging replication. The cluster isn’t satisfying w: “majority” because you’re in a PSA deployment and your only data-bearing member (secondary) is falling behind the primary by ~19 hours. In this case, operations with w: “majority” are suffering the consequences (by failing) because of the replication lag.

The log information is this with respect to replication status right and how do we know that insert is triggered by app team with writeconcern:majority?

According to the rs.config() output, your replica set has the default write concern 1. Therefore, any CRUD operations logged in the mongod.log file with w: “majority” must have been triggered by the application, or by someone using mongoshell/compass and requesting w: “majority” while running a CRUD operation.

APart from increasing oplog size , what is the other way to solve replication lag as I ,am seeing 19 hours of delay.

Increasing the oplog size isn’t doing any good here. You have to find the root cause for your replication lag. There is something happening within that secondary that isn’t letting it to keep up with your primary.

(a) what is the utilization for essential resources (CPU. mem, disk, and network) in the secondary?

(b) is the network interface configured properly?

(c) is the secondary in the same data center as the primary? (thinking about network latency)

I hope it help.

All the best,

– Rodrigo

Hello,

Thanks for your mail.

Just to overcome the PSA architecture, we removed the arbiter server and added one more new node to replica set and the new secondary is placed in same Datacenter where primary is hosted.

Now the setup is:

Node 1 : primary (DC1)
Node 2 : secondary (DC1)
Node 3 : secondary (DC2 - always as replication lag issue)

Till yesterday node 2 was in sync with primary, now i see both node 2 and node 3 are out of sync with primary.
Though we have increased oplog to 4TB and it is also occupied 3.9 TB.

No idea how to resolve this issue as network team says there is no latency.

Node 1and 2 utilization:

          total        used        free      shared  buff/cache   available

Mem: 1583647880 1024135316 12289240 4027720 547223324 553727932
Swap: 4194300 260352 3933948

07:06:49 up 198 days, 8:20, 3 users, load average: 7.56, 7.38, 7.48

Node 3 utilization:

          total        used        free      shared  buff/cache   available

Mem: 1583647876 782497192 4066068 3692760 797084616 795701856
Swap: 4194300 621944 3572356

07:07:12 up 198 days, 9:41, 2 users, load average: 1.19, 1.61, 1.55

I have also tried initial sync 2 times but the moment it changes it status from startup2 to secondary at that point itself i see that it is 1.2 hours delay and goes on continuing the lag and never comes down at all.

Please can you give me more insights to understand why this issues goes like this.

Hi Mamatha,

Is the issue resolved ?

We are seeing a replication lag issue with MongoDB 3.6.17 Community Version in CentOS 8 environment and we wanted to check with you on the current status of the issue that you reported here.

Thanks,
Kiran

Hi @Mamatha_M,

Can you please check the following output to check if SEC is running into any hang issue.

  • db.serverStatus().connections
  • db.currentOp().inprog.length
  • lsof | wc -l

Also please share the OS information.
Thanks,
Venkataraman

Hi ,

Is this issue resolved ?

Yes we solved it by compiling the glibc ourselves with the hot fix proposed in this thread. 25847 – pthread_cond_signal failed to wake up pthread_cond_wait due to a bug in undoing stealing

Also please refer https://jira.mongodb.org/browse/SERVER-54805 and mongo added a new config https://jira.mongodb.org/browse/SERVER-56054

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.