Lab 2.3. Writes with Failovers: It seems there's no primary node

Hi there:

I connected to a node that is supposed to be a secondary one (not sure if being 2 or 3 means it’s an actual secondary node, but…) and shut it down:

mongod --config /shared/mongod-repl-2.conf
mongo --host "192.168.103.100:27002" -u "m103-admin" -p "m103-pass" --authenticationDatabase "admin"
use admin
db.shutdownServer()
quit()

Then I tried to connect to the one I guess it’s the primary, i.e. 1:

mongod --config /shared/mongod-repl-1.conf
mongo --host "192.168.103.100:27001" -u "m103-admin" -p "m103-pass" --authenticationDatabase "admin"

But when the prompt appears…

MongoDB Enterprise m103-repl:SECONDARY>

So when I try to initiate the replica set:

rs.initiate()

I get this:

{
        "info2" : "no configuration specified. Using a default configuration for the set",
        "me" : "192.168.103.100:27001",
        "info" : "try querying local.system.replset to see current configuration",
        "ok" : 0,
        "errmsg" : "already initialized",
        "code" : 23,
        "codeName" : "AlreadyInitialized",
        "operationTime" : Timestamp(1573675455, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1573675455, 1),
                "signature" : {
                        "hash" : BinData(0,"JaBwUlZclM5OLHn/BepeE5zbwaI="),
                        "keyId" : NumberLong("6758783797975580674")
                }
        }
}

I don’t know what it means, but it doesn’t look good.

If I check the status:

rs.status()

I get:

"name" : "192.168.103.100:27001",
"stateStr" : "SECONDARY",

Why?

"name" : "192.168.103.100:27002", 
"stateStr" : "(not reachable/healthy)",

I was expecting this, OK.

"name" : "m103:27003",
"stateStr" : "(not reachable/healthy)",

Why??

I guess it’s related to the previous lab, but I passed it (or at least I got the code).

The port numbers 1,2,3 does not dictate which one is primary or secondary. Any relevant node can be elected Primary at any point. So before shutting down any nodes, you need to check the type of node it is by either running rs.status() or rs.isMaster().

Let’s see the full rs.status()

1 Like

OK, I connected to # 3:

mongod --config /shared/mongod-repl-3.conf
mongo --host "192.168.103.100:27003" -u "m103-admin" -p "m103-pass" --authenticationDatabase "admin"

It seems it’s a secondary node:

MongoDB Enterprise m103-repl:SECONDARY>

And when I check rs.status():

{
        "set" : "m103-repl",
        "date" : ISODate("2019-11-21T09:32:06.701Z"),
        "myState" : 2,
        "term" : NumberLong(1),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(0, 0),
                        "t" : NumberLong(-1)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1573675455, 1),
                        "t" : NumberLong(1)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1573675455, 1),
                        "t" : NumberLong(1)
                }
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.103.100:27001",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2019-11-21T09:32:06.562Z"),
                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                        "pingMs" : NumberLong(0),
                        "lastHeartbeatMessage" : "Connection refused",
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "configVersion" : -1
                },
                {
                        "_id" : 1,
                        "name" : "192.168.103.100:27002",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2019-11-21T09:32:06.563Z"),
                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                        "pingMs" : NumberLong(0),
                        "lastHeartbeatMessage" : "Connection refused",
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "configVersion" : -1
                },
                {
                        "_id" : 2,
                        "name" : "m103:27003",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 84,
                        "optime" : {
                                "ts" : Timestamp(1573675455, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2019-11-13T20:04:15Z"),
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "configVersion" : 5,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1573675455, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1573675455, 1),
                "signature" : {
                        "hash" : BinData(0,"JaBwUlZclM5OLHn/BepeE5zbwaI="),
                        "keyId" : NumberLong("6758783797975580674")
                }
        }
}

Other two are “dead”. And why the different name for the third node?

"name" : "192.168.103.100:27001",
"stateStr" : "(not reachable/healthy)",

"name" : "192.168.103.100:27002",
"stateStr" : "(not reachable/healthy)",

"name" : "m103:27003",
"stateStr" : "SECONDARY",

Hi @JavierBlanco,

I would recommend you to go through the video lecture on Replication again.

No, these numbers have nothing to do with the Primary of your replica set. The primary in your replica set might change from time to time if election happens.

When you will start just a single node (and not others) and connect to it then it will be marked as Secondary because there are no other nodes running in the replica set hence there won’t be any primary.

As you can see in the output :arrow_down:, it says the replica set is already initialised and hence it returns an error.

You have two questions here. The first one is “why the nodes are showing it’s (not reachable/healthy) ?”.

It might be possible that the other two nodes are not up and running on your system. In your case, please make sure you have started all these three nodes.

mongod --config /shared/mongod-repl-1.conf
mongod --config /shared/mongod-repl-2.conf
mongod --config /shared/mongod-repl-3.conf

The second question that you have is why the different name for the third node i.e. (“name” : “m103:27003”).

Please read about the concept of Hostname-to-IP Address Mapping. In this case the hostname m103 will be mapped to 192.168.103.100 and hence it will be having the same IP as other nodes.


As mentioned above, please start all three nodes in your replica set and then connect to it using this command.

 mongo --host "m103-repl/192.168.103.100:27003" -u "m103-admin" -p "m103-pass" --authenticationDatabase "admin"

Notice the replica set name in the connection string. This will always connect you to the primary if there is a primary in your replica set at the time of connecting.

Hope it helps!

If the issue still persists then please feel free to get back to us.

Happy Learning :slight_smile:

Thanks,
Shubham Ranjan
Curriculum Support Engineer

1 Like

@JavierBlanco, additionally, you used m103:27003 when you added this node to the replica set that’s why it’s like this. Whatever format you used during during rs.add() is the same format that will be saved and displayed in rs.status().

1 Like

OK, I started all nodes and then got conected to the replica set using:

mongo --host "m103-repl/192.168.103.100:27003" -u "m103-admin" -p "m103-pass" --authenticationDatabase "admin"

Checked status and told me that #1 was primary and the other two were secondary, but after that I’m getting a repeating error message:

2019-11-21T15:44:44.169+0000 W NETWORK [ReplicaSetMonitor-TaskExecutor-0] Failed to connect to 127.0.1.1:27003, in(checking socket for error after poll), reason: Connection refused

OK, I have checked my notes from the previous lab… Is it OK if I change again its name to the original one?

Hi @JavierBlanco,

It seems that the hostname m103 is not getting mapped to 192.168.103.100.

Can you please refer this post and update the mapping in your /etc/hosts file ?


Alternatively, what you can do is remove the m103:27003 from your replica set and add it back again using the actual IP address.

Here is how you can do it.

  1. Connect to the primary in your replica set
  2. rs.remove(“m103:27003”)

  3. rs.add(“192.168.103.100:27003”)

Hope it helps!

Thanks,
Shubham Ranjan
Curriculum Support Engineer

I reverted the changes in #3 after conecting to #1 (didn’t use the m103-repl/):

cfg = rs.conf()
cfg.members[2].host = "192.168.103.100:27003"
rs.reconfig(cfg) 

Then connected to #3 and shut it down, reconnected to #1 (using m103-repl/ this time), checked that #3 was not reachable/healthy and… the repetitive error message again:

2019-11-21T16:42:02.669+0000 W NETWORK [ReplicaSetMonitor-TaskExecutor-0] Failed to connect to 192.168.103.100:27003, in(checking socket for error after poll), reason: Connection refused

It seems the problem is m103-repl/; connecting without that causes no trouble.

I tried the writing operation and the shell prompted back:

WriteResult({
        "nInserted" : 1,
        "writeConcernError" : {
                "code" : 64,
                "codeName" : "WriteConcernFailed",
                "errInfo" : {
                        "wtimeout" : true
                },
                "errmsg" : "waiting for replication timed out"
        }
	})

Is that the expectable error message?

Run and show us the outputs of:

  • cat /etc/hosts from the VM shell
  • ps -ef | grep "[m]ongod" from the VM shell
  • rs.status() from the mongo shell

vagrant@m103:~$ cat /etc/hosts

127.0.1.1       m103.mongodb.university m103
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1       localhost localhost.localdomain localhost6 localhost6.localdomain6
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
192.168.103.100    m103 m103

vagrant@m103:~$ ps -ef | grep “[m]ongod”

No output at all.

vagrant@m103:~$ ps -ef | grep mongod

vagrant 1977 1839 0 13:34 pts/0 00:00:00 grep --color=auto mongod

MongoDB Enterprise m103-repl:PRIMARY> rs.status()

{
        "set" : "m103-repl",
        "date" : ISODate("2019-11-22T13:38:06.417Z"),
        "myState" : 1,
        "term" : NumberLong(3),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1574429880, 1),
                        "t" : NumberLong(3)
                },
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1574429880, 1),
                        "t" : NumberLong(3)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1574429880, 1),
                        "t" : NumberLong(3)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1574429880, 1),
                        "t" : NumberLong(3)
                }
        },
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.103.100:27001",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 39,
                        "optime" : {
                                "ts" : Timestamp(1574429880, 1),
                                "t" : NumberLong(3)
                        },
                        "optimeDate" : ISODate("2019-11-22T13:38:00Z"),
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1574429859, 1),
                        "electionDate" : ISODate("2019-11-22T13:37:39Z"),
                        "configVersion" : 8,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                },
                {
                        "_id" : 1,
                        "name" : "192.168.103.100:27002",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 28,
                        "optime" : {
                                "ts" : Timestamp(1574429880, 1),
                                "t" : NumberLong(3)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1574429880, 1),
                                "t" : NumberLong(3)
                        },
                        "optimeDate" : ISODate("2019-11-22T13:38:00Z"),
                        "optimeDurableDate" : ISODate("2019-11-22T13:38:00Z"),
                        "lastHeartbeat" : ISODate("2019-11-22T13:38:05.119Z"),
                        "lastHeartbeatRecv" : ISODate("2019-11-22T13:38:05.726Z"),
                        "pingMs" : NumberLong(0),
                        "lastHeartbeatMessage" : "",
                        "syncingTo" : "192.168.103.100:27001",
                        "syncSourceHost" : "192.168.103.100:27001",
                        "syncSourceId" : 0,
                        "infoMessage" : "",
                        "configVersion" : 8
                },
                {
                        "_id" : 2,
                        "name" : "192.168.103.100:27003",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 17,
                        "optime" : {
                                "ts" : Timestamp(1574429880, 1),
                                "t" : NumberLong(3)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(1574429880, 1),
                                "t" : NumberLong(3)
                        },
                        "optimeDate" : ISODate("2019-11-22T13:38:00Z"),
                        "optimeDurableDate" : ISODate("2019-11-22T13:38:00Z"),
                        "lastHeartbeat" : ISODate("2019-11-22T13:38:05.142Z"),
                        "lastHeartbeatRecv" : ISODate("2019-11-22T13:38:04.753Z"),
                        "pingMs" : NumberLong(0),
                        "lastHeartbeatMessage" : "",
                        "syncingTo" : "192.168.103.100:27002",
                        "syncSourceHost" : "192.168.103.100:27002",
                        "syncSourceId" : 1,
                        "infoMessage" : "",
                        "configVersion" : 8
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1574429880, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1574429880, 1),
                "signature" : {
                        "hash" : BinData(0,"IU5GGTajgGgg4lVhwpj5pNSaXps="),
                        "keyId" : NumberLong("6758783797975580674")
                }
        }
}

Remove the “d” and let’s see the output:
ps -ef | grep [m]ongo

No output again. I’m running the command directly after setting up the VM, don’t know if it’s the appropriate time.

Hi @JavierBlanco,

As I can see in the screenshot shared by you in the post #12, the output of rs.status() command shows that all of the nodes in the replica set are now healthy and reachable.

Please make sure you are running this command after launching the mongod instances.

ps -ef | grep mongod

Are you facing any other issue here ?

Thanks,
Shubham Ranjan
Curriculum Support Engineer

OK, now it works:

ps -ef | grep [m]ongod

vagrant   3043     1  1 12:45 ?        00:00:03 mongod --config /shared/mongod-repl-1.conf
vagrant   3122     1  1 12:45 ?        00:00:03 mongod --config /shared/mongod-repl-2.conf
vagrant   3213     1  1 12:45 ?        00:00:03 mongod --config /shared/mongod-repl-3.conf

ps -ef | grep [m]ongo

vagrant   3043     1  3 12:45 ?        00:00:01 mongod --config /shared/mongod-repl-1.conf
vagrant   3122     1  3 12:45 ?        00:00:01 mongod --config /shared/mongod-repl-2.conf
vagrant   3213     1  7 12:45 ?        00:00:01 mongod --config /shared/mongod-repl-3.conf

ps -ef | grep mongod

vagrant   3043     1  2 12:45 ?        00:00:02 mongod --config /shared/mongod-repl-1.conf
vagrant   3122     1  2 12:45 ?        00:00:02 mongod --config /shared/mongod-repl-2.conf
vagrant   3213     1  2 12:45 ?        00:00:02 mongod --config /shared/mongod-repl-3.conf
vagrant   3306  1952  0 12:47 pts/0    00:00:00 grep --color=auto mongod

Everything seems ok, you have your Primary and Secondaries. I was just curious why the ps command wasn’t returning any results initially but we know why now.

So this message is the expectable one?

Yes. The health and state of your Primary and Secondaries looks good. Even @Shubham_Ranjan agrees.

:beers:

Thanks to everyone, week done.