Replication Oplog is ahead of time

Aayushi_Mangal · June 20, 2020, 5:38am

Hi,

I would like to ask, why we are seeing different durable and optime entries.
Tt is ahead of time. For where this entries are been taken. How can I find that it is coming.

I also ried with overiting from config file using timeStampFormat: iso8601-local, but it is same

                    **"optimeDate" : ISODate("2020-10-03T12:32:22Z"),**

** “optimeDurableDate” : ISODate(“2020-10-03T12:32:22Z”),**
“lastHeartbeat” : ISODate(“2020-06-20T00:56:36.126Z”),
“lastHeartbeatRecv” : ISODate(“2020-06-20T00:56:36.302Z”),

@Doug_Duncan if you have any insights to this, it is very helpful.

Doug_Duncan · June 22, 2020, 3:35am

Hi @Aayushi_Mangal I can’t say that I’ve ever seen this before. Have you checked the system time on all of your replica set servers? It would seem that one of them has the wrong system date. If it does, I would recommend making sure your system has a NTP client running to keep times in sync.

Aayushi_Mangal · June 22, 2020, 5:21am

Hi @Doug_Duncan,

Thank you for your reply.

Yes, i have checked all the servers date with “date” command, and it is all same in IST, can you please recommend any other command to check for in particular.

Aayushi_Mangal · June 22, 2020, 9:00am

Also, we are using IST instead of UTC, do we need any other changes also at server side, or mongoside

Doug_Duncan · June 23, 2020, 2:29am

Hi @Aayushi_Mangal unfortunately I’m not sure how the optimeDate could be 3.5 months ahead if the system dates are correct. Are the log files all showing the correct timestamps?

I wonder if the date was wrong at some point in time and then got corrected. Is the optimeDate value increasing or is it staying at 2020-10-03T12:32:22Z?

@kevinadi can you think of anything that would cause an issue like this? I’m sure you’ve seen a lot more interesting MongoDB issues than I have.

kevinadi · June 23, 2020, 12:18pm

Hi Aayushi,

I believe @Doug_Duncan was correct in that at some point the server time was accidentally moved forward, then was fixed.

Just to double check, could you post the output of rs.status() and rs.printReplicationInfo() from the mongo shell? Please also post your MongoDB version.

Best regards,
Kevin

Aayushi_Mangal · June 24, 2020, 5:49am

Hello @kevinadi,

Yes that seems issue related to NTP or cluster time. Can you help me to confirm from where it takes that value from the server. Like any particular existing file from where it refers or map that value.

Does it anyway belongs to keyfilebeing used in cluster, since if keyfile is different even at mongo, the cluster will not communicate at all, but here cluster communicated, but found that date.

Here is the log, where I found first that is the issue, where it is taking 3rd oct 2020 date:

020-06-20T05:55:38.364+0530 D NETWORK  [ReplicaSetMonitor-TaskExecutor] Updating host hostname based on ismaster reply: { hosts: [ "hostname", "hostname", "hostname" ], setName: "test", setVersion: 5, ismaster: false, secondary: true, primary: "hostname", me: "hostname", lastWrite: { opTime: { ts: Timestamp(1601728342, 18363), t: 5 }, lastWriteDate: new Date(1601728342000), majorityOpTime: { ts: Timestamp(1601728342, 18363), t: 5 }, majorityWriteDate: new Date(1601728342000) }, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 100000, localTime: new Date(1592612738364), logicalSessionTimeoutMinutes: 30, minWireVersion: 7, maxWireVersion: 7, readOnly: false, compression: [ "snappy" ], ok: 1.0, operationTime: Timestamp(1601728342, 18363), $gleStats: { lastOpTime: Timestamp(0, 0), electionId: ObjectId('000000000000000000000000') }, lastCommittedOpTime: Timestamp(1601728342, 18363), $configServerState: { opTime: { ts: Timestamp(1592609481, 2), t: 31 } }, $clusterTime: { clusterTime: Timestamp(1601728342, 18371), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } } }

Mongodb version: 4.0

Other details I will not be able as it is in sync now, and I am unable to simulate the case.

kevinadi · June 24, 2020, 6:20am

Hi Aayushi,

Can you help me to confirm from where it takes that value from the server.

The optimeDate & optimeDurableDate are from the oplog. It is the timestamp of the latest recorded operation in the oplog, and used to sort events in the oplog in chronological order even though your server time is inaccurate. In reality, it is an implementation of Lamport clock to provide oplog event ordering.

Thus, it is not a problem if optimeDate is further ahead from the real date.

The log line you posted is not the actual event where the server time drifted 4 months ahead as can be seen in this part of the log:

opTime: { ts: Timestamp(1601728342, 18363) ...

The first part of the Timestamp(..., ...) number is a date:

> new Date(1601728342000)
ISODate("2020-10-03T12:32:22Z")

which is exactly the date in your earlier post.

The second number signifies that there are already 18363 write operations already happened at that point (see Timestamp specification), so the clock drifted way earlier.

Does it anyway belongs to keyfilebeing used in cluster

No it has nothing to do with keyfiles.

You can simulate this behaviour by starting a 3-node replica set, change the system clock of the primary forward, do some writes on the primary, then set the system clock back to the correct time. You will see that you arrive at exactly this situation.

Unfortunately this situation is not reversible, e.g. you cannot set optimeDate back to the current date as that would render the oplog unable to determine the order of operations anymore. Having said that, this situation will fix itself in 4 months time, assuming no more clock jumps happen in the meantime

Best regards,
Kevin

Aayushi_Mangal · June 24, 2020, 3:08pm

Hi @kevinadi,

Thank you for your detailed responce, that is helpful.
I was able to simulate the case, but yes it was not causing any problem. Lat time when I faced this issue, i was not able to insert data from mongos to shard, and my logs are full of with this error messges:

D NETWORK [ShardRegistry] Timer received error: CallbackCanceled: Callback was canceled

But now, when i changed system time and tried to simulate the case, i found that optime is changed, but i was able to insert into the data.

system · July 1, 2020, 5:58am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.