Could not find a primary

In my replica set, when I tried to remove a node and add it with the host name instead of ip address, something went wrong. Now there are four nodes (one of them being duplicated) and out of that 2 are not reachable. Because of this a primary is not getting elected. The remaining 2 nodes are showing as secondary. How to force a node to become primary?

MongoDB Enterprise m103-repl:SECONDARY> rs.status()
{
“set” : “m103-repl”,
“date” : ISODate(“2019-10-04T01:47:24.916Z”),
“myState” : 2,
“term” : NumberLong(2),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“heartbeatIntervalMillis” : NumberLong(2000),
“optimes” : {
“lastCommittedOpTime” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“appliedOpTime” : {
“ts” : Timestamp(1569794137, 1),
“t” : NumberLong(2)
},
“durableOpTime” : {
“ts” : Timestamp(1569794137, 1),
“t” : NumberLong(2)
}
},
“members” : [
{
“_id” : 0,
“name” : “192.168.103.100:27001”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 271,
“optime” : {
“ts” : Timestamp(1569794137, 1),
“t” : NumberLong(2)
},
“optimeDate” : ISODate(“2019-09-29T21:55:37Z”),
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “could not find member to sync from”,
“configVersion” : 6,
“self” : true,
“lastHeartbeatMessage” : “”
},
{
“_id” : 2,
“name” : “192.168.103.100:27003”,
“health” : 0,
“state” : 8,
“stateStr” : “(not reachable/healthy)”,
“uptime” : 0,
“optime” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“optimeDurable” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“optimeDate” : ISODate(“1970-01-01T00:00:00Z”),
“optimeDurableDate” : ISODate(“1970-01-01T00:00:00Z”),
“lastHeartbeat” : ISODate(“2019-10-04T01:47:24.811Z”),
“lastHeartbeatRecv” : ISODate(“1970-01-01T00:00:00Z”),
“pingMs” : NumberLong(0),
“lastHeartbeatMessage” : “Our replica set configuration is invalid or does not include us”,
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“configVersion” : -1
},
{
“_id” : 3,
“name” : “m103:27002”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 77,
“optime” : {
“ts” : Timestamp(1569794137, 1),
“t” : NumberLong(2)
},
“optimeDurable” : {
“ts” : Timestamp(1569794137, 1),
“t” : NumberLong(2)
},
“optimeDate” : ISODate(“2019-09-29T21:55:37Z”),
“optimeDurableDate” : ISODate(“2019-09-29T21:55:37Z”),
“lastHeartbeat” : ISODate(“2019-10-04T01:47:24.636Z”),
“lastHeartbeatRecv” : ISODate(“2019-10-04T01:47:24.638Z”),
“pingMs” : NumberLong(0),
“lastHeartbeatMessage” : “”,
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“configVersion” : 6
},
{
“_id” : 4,
“name” : “m103:27003”,
“health” : 0,
“state” : 8,
“stateStr” : “(not reachable/healthy)”,
“uptime” : 0,
“optime” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“optimeDurable” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“optimeDate” : ISODate(“1970-01-01T00:00:00Z”),
“optimeDurableDate” : ISODate(“1970-01-01T00:00:00Z”),
“lastHeartbeat” : ISODate(“2019-10-04T01:47:24.553Z”),
“lastHeartbeatRecv” : ISODate(“1970-01-01T00:00:00Z”),
“pingMs” : NumberLong(0),
“lastHeartbeatMessage” : “Our replica set configuration is invalid or does not include us”,
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“configVersion” : -1
}
],
“ok” : 1,
“operationTime” : Timestamp(1569794137, 1),
“$clusterTime” : {
“clusterTime” : Timestamp(1569794137, 1),
“signature” : {
“hash” : BinData(0,“f5fBdihZTiVqn1GeMSL5EGov+pM=”),
“keyId” : NumberLong(“6741894410314711041”)
}
}
}

Hi @jnvdasa I’m going to have a look at the stats in a minute. In the meantime, run the following code in the Ubuntu shell and paste the results here:
ps -ef | grep mongod

And also please use the Preformatted Text button when you paste code so that it preserves the formatting:
image

vagrant 2067 1 5 01:42 ? 00:01:10 mongod --config mongod-repl-1.conf
vagrant 2241 1 5 01:46 ? 00:00:54 mongod --config mongod-repl-3.conf
vagrant 2507 1950 0 02:03 pts/0 00:00:00 grep --color=auto mongod

This shows that two of your config replica sets are running, 1 and 3.

Let’s see the output for rs.conf() and also paste the code you used to login to the replica set, i.e. the full mongo command.

    vagrant@m103:~$ ps -ef | grep mongod
    vagrant   2067     1  5 01:42 ?        00:01:41 mongod --config mongod-repl-1.conf
    vagrant   2241     1  4 01:46 ?        00:01:23 mongod --config mongod-repl-3.conf
    vagrant   2520     1  5 02:09 ?        00:00:16 mongod --config mongod-repl-2.conf
    vagrant   2621  1950  0 02:14 pts/0    00:00:00 grep --color=auto mongod`

Note: I have also started the process 2520.

I am starting mongo using the following command:

mongo --host “192.168.103.100:27001” --authenticationDatabase “admin” -u “m103-admin” -p “m103-pass”

The result of rs.conf() is as follows:

MongoDB Enterprise m103-repl:SECONDARY> rs.conf()
{
        "_id" : "m103-repl",
        "version" : 6,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 0,
                        "host" : "192.168.103.100:27001",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 2,
                        "host" : "192.168.103.100:27003",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 3,
                        "host" : "m103:27002",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 4,
                        "host" : "m103:27003",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "catchUpTimeoutMillis" : -1,
                "catchUpTakeoverDelayMillis" : 30000,
                "getLastErrorModes" : {

                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("5d90053d2c2e442c641d0159")
        }
}

Ok. I just wanted to make sure that you didn’t have any arbiter or hidden nodes.

First thing we need to do is remove one of the unhealthy nodes. Which one do you want to remove? I’m guessing _id: 4?

Yes. I want to remove 4. I tried it. But since I do not have any primary node, I could not.

You tried rs.remove("m103:27003")?

Yes. But since both the healthy nodes are secondary, from both the nodes, I am not able to run rs.remove. The following message appears:

MongoDB Enterprise m103-repl:SECONDARY> rs.remove("m103:27003")
{
        "ok" : 0,
        "errmsg" : "replSetReconfig should only be run on PRIMARY, but my state is SECONDARY; use the \"force\" argument to override",
        "code" : 10107,
        "codeName" : "NotMaster",
        "operationTime" : Timestamp(1569794137, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1569794137, 1),
                "signature" : {
                        "hash" : BinData(0,"f5fBdihZTiVqn1GeMSL5EGov+pM="),
                        "keyId" : NumberLong("6741894410314711041")
                }
        }
}

It says to use force. How to do that?

For your specific case, use this:

cfg = rs.conf()
cfg.members[4].votes = 0
cfg.members[4].priority = 0
cfg.members[2].priority = 0.5
cfg.members[3].priority = 0.5
cfg.members[0].priority = 1
rs.reconfig(cfg)

Run the above, then keep checking rs.Status() until it _id:0 becomes the primary. It shouldn’t be more than 2 minutes.

NB: There’s more steps to complete after this so let me know once the above is done.

I am getting a type error:

2019-10-04T02:35:35.210+0000 E QUERY    [thread1] TypeError: cfg.members[4] is undefined :
@(shell):1:1

You need to run cfg = rs.conf() before this line? Let’s try again.

I have done that.

MongoDB Enterprise m103-repl:SECONDARY> cfg = rs.conf()
{
        "_id" : "m103-repl",
        "version" : 6,
        "protocolVersion" : NumberLong(1),
        "members" : [
                {
                        "_id" : 0,
                        "host" : "192.168.103.100:27001",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 2,
                        "host" : "192.168.103.100:27003",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 3,
                        "host" : "m103:27002",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                },
                {
                        "_id" : 4,
                        "host" : "m103:27003",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "catchUpTimeoutMillis" : -1,
                "catchUpTakeoverDelayMillis" : 30000,
                "getLastErrorModes" : {

                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("5d90053d2c2e442c641d0159")
        }
}

Still I get the type error.

What do you get when you run cfg.members.length?

Try this sequence:

cfg = rs.conf()
cfg.members[3].votes = 0
cfg.members[3].priority = 0
cfg.members[1].priority = 0.5
cfg.members[2].priority = 0.5
cfg.members[0].priority = 1
rs.reconfig(cfg)

I get the value 4 when I run cfg.members.length.
The sequence worked.
But reconfig is giving error since I am trying to run it on secondary.

MongoDB Enterprise m103-repl:SECONDARY> rs.reconfig(cfg)
{
        "ok" : 0,
        "errmsg" : "replSetReconfig should only be run on PRIMARY, but my state is SECONDARY; use the \"force\" argument to override",
        "code" : 10107,
        "codeName" : "NotMaster",
        "operationTime" : Timestamp(1569794137, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1569794137, 1),
                "signature" : {
                        "hash" : BinData(0,"f5fBdihZTiVqn1GeMSL5EGov+pM="),
                        "keyId" : NumberLong("6741894410314711041")
                }
        }
}

Try this line:
rs.reconfig(cfg, {force : true})

1 Like

Yes. It worked. Now I have a primary node. But the third node is showing as OTHER.

Ok. Three last steps left:

  1. Come out of the current shell and log in to the replica set properly:
    mongo --host "m103-repl/192.168.103.100:27001,m103:27002" -u "m103-admin" -p "m103-pass" --authenticationDatabase "admin"
    NB: Your previous command connects you to the specified node, not the replica set. Also, in the above command, I’ve entered at least two hosts… it’s good practice and there’s a reason behind that.
  2. Remove the replica member you don’t want, i.e. m103-repl:27003. You know what to do.
  3. Reset the priority of the other members:

cfg = rs.conf()
cfg.members[1].priority = 1
cfg.members[2].priority = 1
rs.reconfig(cfg)

NOTE: Before you run step 3, make sure that cfg.members[1] and cfg.members[2] are the Secondaries. You can check by doing cfg.members[1].host and do the same for 2.

Wait a minute or two and run rs.status()

1 Like

Thanks. When I try to create a new node with a config file I am getting the following error: ERROR: child process failed, exited with error number 1

What is the output of ps -ef | grep mongo in Ubuntu shell and rs.status() in mongo shell?

The first one is something you should keep handy. It tells you what mongo(d/s) processes are running.