Chapter 3 : Setting Up a Sharded Cluster

I followed the instructions listed and here is what I found. I have node 1, 2 and 3 up.

I connected to the 3 nodes and verified that node1 is the primary.

I connect to node2, shut it down and restart the node with the new configuration file. All good here.
I connect to node3, shut it down and restart the node with the new configuration file. All good here.

Now I connect to node 1 (port 27011). I see that this node is no longer the primary. It is listed as a secondary. So I am unable to do the rs.stepDown(). But in the video at 5.04, when connecting to the primary node, it shows as PRIMARY but that is not the case for me.

What am I doing wrong?

Girish

The primary will always show “PRIMARY”. So I guess you did not connect to the primary? :slight_smile: One of the other nodes should have taken over the role, if the cluster worked correctly.

It shows primary when I upgrade node 2 but after I upgrade node 3 status of primary node changes to secondary…

I guess I am stuck…

No, you’re not stuck, you just need to do troubleshooting :slight_smile: Are you sure that one of the other hosts hasn’t become primary? And what’s the cluster status? Etc…

On a related matter, can you help me Tess in clearing out ports 27001 and 27003. I created and sharded an rs with ports 27002 004 005 and 006 because I had to discard the 001 and 003. But lab validation requires those ports. When I add them to an rs they pomptly become unreachable. As I understand it, I need to delete the local db somewhere. Would that be by mongo ing into those ports? Maybe I then restart with the config file afresh?

Hi Brian_18814,

Please check my reply on this thread:

Kanika

1 Like

Well @Brian_18814, the big important question is whether those MongoD daemons already have a configuration set up, or whether you wiped them already.

It is very important, especially at the late stages of M103, to keep track of which MongoD does what. You have to fully understand what you’re building. I mean, we’re juggling no less than nine MongoD processes here :wink:

So yes, you have two options:

  1. Either you run the MongoD’s that you already have configured for 27001 and 27003, then use the shell to clear out their current replica set configuration. That would be an excellent training exercise! The most educational too!
  2. You can of course also just nuke & pave the whole thing :smiley: Just delete the config and datadirs, then rebuild them. I mean, it’s not like we’re working with production data here :wink:
1 Like

HI

I am having trouble getting the mongod processes to start for node 2 and 3.

When i fist run them they say try running with --fork. When I run them without fork nothing really happens.

Thanks

Its hard to understand the problem. Please share the configuration files for node2 and node3.
Meanwhile, you can also debug the issue by checking the log files you have mentioned in the configuration file.

Kanika

Hi

I dont know what I have done but now I cant get the first process to work

Error: child process failed, exited with error number 100.

I have run chmod 777 on /var/mongodb/1 which is the dbPath:
I have altered the systemLog path to : /var/mongodb/db/1/mongod1.log to see if that helps

it asks me to start without --fork

Hi RWin,

You need to change the owner of the directory using:

sudo chown vagrant:vagrant -R /var/mongodb/1

And you can always check the logfile for detailed error explanation and as error message stated start without --fork option to see what is causing the error.

Kanika

Hi I found the issue it was the localhost being in the bindIP of the config file. After I removed that from the configs they all work.

I am just unable to add the other nodes at the moment

Error message: Quorom check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: 192.168.103.100:27011

…27013 failed with Connection refused

Make sure mongod is running on port 27013 when you try to add the node.
Also check for hostname for mongod node.

Kanika

Hi Kanika

I found using the IP addresses rather than host names fixed it.

So the lab requires 27001, 27002 and 27003 to be used.

So I removed 2 of the nodes, reconfigured the cong files and added them back in but I cannot remove the final node, I have made it step down as PRIMARY and 27001 is nor Primary.

I am getting errmsg:

when running rs.remove(“192,168.103.100:27013”)

Our Replica set ID of 5c472…etc did not match that of 192.168.103.100:27002, which is 5c4721b(diff number)

1 Like

This is the status for node 2

“_id” : 4,
“name” : “192.168.103.100:27002”,
“health” : 0,
“state” : 8,
“stateStr” : “(not reachable/healthy)”,
“uptime” : 0,
“optime” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“optimeDurable” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“optimeDate” : ISODate(“1970-01-01T00:00:00Z”),
“optimeDurableDate” : ISODate(“1970-01-01T00:00:00Z”),
“lastHeartbeat” : ISODate(“2019-01-22T14:46:25.972Z”),
“lastHeartbeatRecv” : ISODate(“1970-01-01T00:00:00Z”),
“pingMs” : NumberLong(0),
“lastHeartbeatMessage” : “replica set IDs do not match, ours: 5c4720b83be68d8503ffb25a; remote node’s: 5c4721b3a9204171dc305707”,
“syncingTo” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“configVersion” : -1
}
],
“ok” : 1,
“operationTime” : Timestamp(1548168377, 1),
“$clusterTime” : {
“clusterTime” : Timestamp(1548168377, 1),
“signature” : {
“hash” : BinData(0,“N4j/1SGyRtNr8ZfCCDbnFHvccl4=”),
“keyId” : NumberLong(“6649319353776865281”)
}

Why is 27013 is your replica set or is it 27003?

Your replica set configuration doesn’t match as the error suggests. Make sure replSet name is same in all the nodes. Check your configuration files for every node.

Kanika

1 Like

they were the same. I ended up stopping the service and deleting the folder it was in.