How to debug problem with mongodb replica

Lec_Kozol · September 22, 2020, 11:03am

Hello,

I am new to this forum.
I am more in an admin role of graylog app on 3 nodes, that is using mongodb replica.
mongodb replica was running fine for more than a year. Then we did some smaller upgrade of graylog sw, probably somethng else was done with mongo-not sure, but now I have a problem and I suspect it is mongodb related.

All users and app config is stored in mongodb. If I run graylog app on NODE1 it starts,
but it shows no users, no config, all is missing.

If I run graylog on the other node, NODE3, it still show fine users and config.

I suspect somethig is wrong on the mongodb lebvel, but I cann not reall debug it properly.

some details: here is how nodes connect to replica:

mongodb_uri = mongodb://192.158.20.100/graylog,192.158.20.101/graylog,192.158.20.102/graylog?replicaSet=reproduk

now if I log to NODE1(192.158.20.100) and run mongo, it shows me:

produk:PRIMARY> show dbs
graylog      0.028GB
graylog,192  0.002GB
local        0.313GB

I never before saw this strange db “graylog,192”
192 is actually the first part of IP

If I run the same command on the only node still runing ok, node3, I get ERROR:

reproduk:SECONDARY> show dbs
2020-09-21T14:27:51.126+0200 E QUERY    [thread1] Error: listDatabases failed:{ "ok" : 0, "errmsg" : "not master and slaveOk=false", "code" : 13435 } :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
Mongo.prototype.getDBs@src/mongo/shell/mongo.js:62:1
shellHelper.show@src/mongo/shell/utils.js:769:19
shellHelper@src/mongo/shell/utils.js:659:15
@(shellhelp2):1:1

But if I run commands like rs.conf() or rs.status() I get practically the same working result on both node1 and node 3:

reproduk:SECONDARY> rs.status()
{
        "set" : "reproduk",
        "date" : ISODate("2020-09-21T12:37:04.748Z"),
        "myState" : 2,
        "term" : NumberLong(65),
        "syncingTo" : "192.158.20.100:27017",
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.158.20.100:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 2625358,
                        "optime" : {
                                "ts" : Timestamp(1600691823, 20),
                                "t" : NumberLong(65)
                        },
                        "optimeDate" : ISODate("2020-09-21T12:37:03Z"),
                        "lastHeartbeat" : ISODate("2020-09-21T12:37:03.658Z"),
                        "lastHeartbeatRecv" : ISODate("2020-09-21T12:37:03.128Z"),
                        "pingMs" : NumberLong(0),
      ...

Any pointers how could I continue my debugging ?

Maybe deleting this collection graylog,192 ?

MaBeuLux88 · September 22, 2020, 1:16pm

Hi @Lec_Kozol and welcome in the MongoDB Community !

I see a few problems in here.

First:

mongodb_uri = mongodb://192.158.20.100/graylog,192.158.20.101/graylog,192.158.20.102/graylog?replicaSet=reproduk

This URI is not a valid URI and indeed, this is probably that which is causing this weird graylog,192 database in your replica set. You can read more about URIs in the MongoDB documentation.

From what I see here, your URI should look like this:

mongodb_uri = mongodb://192.158.20.100:27017,192.158.20.101:27017,192.158.20.102:27017/graylog?replicaSet=reproduk&retryWrites=true&w=majority

The 2 last options are not mandatory but they are good practises. I recommend you have a look to our documentation about retryable writes and write concerns.

Second:

The error you got when trying to show dbs on a secondary node is normal. By default Mongo Shell blocks reads from a secondary node as these reads are considered “eventually consistent” with the primary (because of the asynchronous replication process).

In order to read data on a secondary node, you need to tell the Mongo Shell it’s OK to read eventually consistent data using the command: rs.slaveOk() but if your Primary and Secondary are replicating correctly to each other, you should not see any difference here - modulo the replication lag.

Third:

Because you didn’t use the correct URI, I don’t know what the MongoDB driver actually understood when it connected to MongoDB. I’m not sure the information replicaSet=reproduk actually went through so I don’t know how the replica set behaved at this point.

Can you please share the entire rs.status() command? Are the 3 nodes configured correctly? Did you confirme that the replica process is working properly and our main database and collections are present in the 3 nodes correctly?

I hope this helps !

Cheers,
Maxime.

Lec_Kozol · September 22, 2020, 7:26pm

But just in case I will post here also rs.status() output.
I know that node1 is in the state of RECOVERING. I will need to fix it sometime. Once I treid to fix it by deleting its mongodata dir and starting mongod again, but it didnt work.

reproduk:PRIMARY> use graylog
switched to db graylog
reproduk:PRIMARY> rs.status()
{
        "set" : "reproduk",
        "date" : ISODate("2020-09-22T14:12:24.499Z"),
        "myState" : 1,
        "term" : NumberLong(65),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.158.20.100:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 2776362,
                        "optime" : {
                                "ts" : Timestamp(1600783944, 2),
                                "t" : NumberLong(65)
                        },
                        "optimeDate" : ISODate("2020-09-22T14:12:24Z"),
                        "electionTime" : Timestamp(1598007616, 3),
                        "electionDate" : ISODate("2020-08-21T11:00:16Z"),
                        "configVersion" : 70477,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.158.20.101:27017",
                        "health" : 1,
                        "state" : 3,
                        "stateStr" : "RECOVERING",
                        "uptime" : 2338682,
                        "optime" : {
                                "ts" : Timestamp(1573169381, 10),
                                "t" : NumberLong(36)
                        },
                        "optimeDate" : ISODate("2019-11-07T23:29:41Z"),
                       "lastHeartbeat" : ISODate("2020-09-22T14:12:23.922Z"),
                        "lastHeartbeatRecv" : ISODate("2020-09-22T14:12:21.911Z"),
                        "pingMs" : NumberLong(0),
                        "configVersion" : 70477
                },
                {
                        "_id" : 2,
                        "name" : "192.158.20.102:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 2776358,
                        "optime" : {
                                "ts" : Timestamp(1600783944, 1),
                                "t" : NumberLong(65)
                        },
                        "optimeDate" : ISODate("2020-09-22T14:12:24Z"),
                        "lastHeartbeat" : ISODate("2020-09-22T14:12:24.091Z"),
                        "lastHeartbeatRecv" : ISODate("2020-09-22T14:12:23.009Z"),
                        "pingMs" : NumberLong(0),
                        "syncingTo" : "192.158.20.100:27017",
                        "configVersion" : 70477
                }
        ],
        "ok" : 1
}

Lec_Kozol · September 22, 2020, 7:26pm

Hello Maxime,
Wow, that was quick and efficient. On the non working node1, I just typed the connection mongo uri that you suggested and bang… right after restart it joined the cluster, also the cluster side sees it,al config is there, it looks perfect!

I forgot to mention, we are running a bit older version of mongodb, 3.2.
And funny enought, the old, “wrong” connection string was doing just fine for few years :-).

Thank you!
Grateful greetings from Slovenija.

MaBeuLux88 · September 22, 2020, 10:44pm

MongoDB 3.2 is not supported anymore. Same for 3.4 .

I strongly suggest to update to 4.4 gradually, version by version and by following the production notes for upgrades for each version.

Maybe doing a dump and restoring in 4.4 would be easier at this point.

Cheers,
Maxime.

system · September 27, 2020, 10:44pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.