Servers won't join replica set

When I go through the Configure a Sharded Cluster lab, setting up mongos and the config server RS seemed to go fine. However, my mongods will no longer join their replica set. When I start them up the prompt is:

m103-repl:OTHER>

and if I rs.status() I get a state of 10 (‘Removed’) and the message “Our replica set config is invalid or we are not a member of it”.

And so if I switch back to mongos and try to add a shard I get:

“Could not find host matching read preference { mode: “primary” } for set m103-repl”

So I stopped them all again and overwrote the .conf files with the examples given here. They start up without error, but they don’t join their RS.

Everyone seems to be running:

vagrant 22465 1 1 02:08 ? 00:01:09 mongod -f csrs_1.conf
vagrant 22499 1 1 02:09 ? 00:01:10 mongod -f csrs_2.conf
vagrant 22533 1 1 02:09 ? 00:01:14 mongod -f csrs_3.conf
vagrant 23437 1 0 02:51 ? 00:00:11 mongos -f mongos.conf
vagrant 23467 6721 0 02:52 pts/3 00:00:00 mongo --port 26000
vagrant 24493 1 0 03:46 ? 00:00:07 mongod -f node1.conf
vagrant 24566 1 0 03:46 ? 00:00:07 mongod -f node2.conf
vagrant 24639 1 0 03:47 ? 00:00:07 mongod -f node3.conf

What’s going on there, and what else should I try?

I have a similar problem and can’t finish the lab. If you know the solution, please post it. I’ll do likewise. Thank you.

You may need to re-add them using rs.add(). In the last chapter, we did a lot of configuring and some might had been removed.

I did try rs.add(), but got an error whose wording I do not remember.

I believe the data directory changed, as I went so far as to copy the node*.conf files from the lesson verbatim, and the data paths were different. I will try again tonight.

Rerun the command and send the output.

Here is the output.

m103-example:OTHER> rs.add("192.168.103.100:27011")
{
	"ok" : 0,
	"errmsg" : "replSetReconfig should only be run on PRIMARY, but my state is REMOVED; use the \"force\" argument to override",
	"code" : 10107,
	"codeName" : "NotMaster"
}

I’m going to try rs.reconfig()

Here is rs.conf(). Note the replica set name, “m103-example”. I have no idea when or how that changed.

m103-example:OTHER> rs.conf()
{
“_id” : “m103-example”,
“version” : 3,
“protocolVersion” : NumberLong(1),
“members” : [
{
“_id” : 0,
“host” : “192.168.103.100:27011”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

		},
		"slaveDelay" : NumberLong(0),
		"votes" : 1
	},
	{
		"_id" : 1,
		"host" : "m103:27012",
		"arbiterOnly" : false,
		"buildIndexes" : true,
		"hidden" : false,
		"priority" : 1,
		"tags" : {
			
		},
		"slaveDelay" : NumberLong(0),
		"votes" : 1
	},
	{
		"_id" : 2,
		"host" : "m103:27013",
		"arbiterOnly" : false,
		"buildIndexes" : true,
		"hidden" : false,
		"priority" : 1,
		"tags" : {
			
		},
		"slaveDelay" : NumberLong(0),
		"votes" : 1
	}
],
"settings" : {
	...
}

}

I edited the rs.conf() output to revert the RS name to m103-repl and ran the modified copy through rs.reconfig(…, {force:true}), but it won’t take that because the RS names don’t match.

{
“ok” : 0,
“errmsg” : “New and old configurations differ in replica set name; old was m103-example, and new is m103-repl”,
“code” : 103,
“codeName” : “NewReplicaSetConfigurationIncompatible”
}

That appears to be the culprit, because my .conf files are trying to join RS m103-repl.

processManagement:
fork: true
replication:
replSetName: m103-repl

Now I want to delete the RS entirely but I don’t know how to do that. There does not seem to be a method for it.

If you do not have any documents in your database you may simply remove all the files from the data directories specified in your configuration files.

I decided to drop local instead:

That worked pretty well. I just remarked (#) out the replication entry in each .conf, restarted each mongod, and connected to one and ran rs.initialize() with no args, and then rs.add() for the other two. Then stop them again, un-rem the replication entry, and restart them.

Then I switched over to my mongos window and called addShard(), and that worked fine.

However, I think the mongods are running on the wrong port (wrong meaning, not the port the validation script is expecting). Because the lab validation script says:

Client experienced a timeout when connecting to ‘m103-repl’ - check that mongod/mongos
processes are running on the correct ports, and that the ‘m103-admin’ user
authenticates against the admin database.

The lecture’s node*.conf example files show ports 27011 through 27013:

net:
  bindIp: 192.168.103.100,localhost
  port: 27011

But the lab’s text suggests that the mongods should be using 27001-27003:

Once m103-repl has sharding enabled, you can add it as the primary shard with:

sh.addShard("m103-repl/192.168.103.100:27001")

And I think the lab’s ports are what the validation script is connecting to – and failing, because I followed the lecture text.

I can switch the mongods to the 001-003 ports and restart, but I’m not sure what to do about the shard I already have set up.

Here’s how I solved my problem: I deleted all the .conf files for the replica set and restarted from scratch. It worked. I also found that when I renamed a replica set in the .conf file, funny things happen.

@ThomasKennedy, what is the name of the lab and the name of the script?

I am having the same issue, but I seemed to make things worse when trying to use your fix. All the server are showing as “REMOVED”, with a invalid replica set config.

MongoDB Enterprise m103-example:OTHER> rs.config()
{
“_id” : “m103-example”,
“version” : 7,
“protocolVersion” : NumberLong(1),
“members” : [
{
“_id” : 0,
“host” : “192.168.103.100:27011”,
“arbiterOnly” : false,
“buildIndexes” : true,
“hidden” : false,
“priority” : 1,
“tags” : {

                    },
                    "slaveDelay" : NumberLong(0),
                    "votes" : 1
            },
            {
                    "_id" : 1,
                    "host" : "m103.mongodb.university:27012",
                    "arbiterOnly" : false,
                    "buildIndexes" : true,
                    "hidden" : false,
                    "priority" : 1,
                    "tags" : {

                    },
                    "slaveDelay" : NumberLong(0),
                    "votes" : 1
            },
            {
                    "_id" : 2,
                    "host" : "m103.mongodb.university:27013",
                    "arbiterOnly" : false,
                    "buildIndexes" : true,
                    "hidden" : false,
                    "priority" : 1,
                    "tags" : {

                    },
                    "slaveDelay" : NumberLong(0),
                    "votes" : 1
            },
            {
                    "_id" : 3,
                    "host" : "m103.mongodb.university:27014",
                    "arbiterOnly" : false,
                    "buildIndexes" : true,
                    "hidden" : true,
                    "priority" : 0,
                    "tags" : {

                    },
                    "slaveDelay" : NumberLong(0),
                    "votes" : 0
            }
    ],
    "settings" : {
            "chainingAllowed" : true,
            "heartbeatIntervalMillis" : 2000,
            "heartbeatTimeoutSecs" : 10,
            "electionTimeoutMillis" : 10000,
            "catchUpTimeoutMillis" : -1,
            "catchUpTakeoverDelayMillis" : 30000,
            "getLastErrorModes" : {

            },
            "getLastErrorDefaults" : {
                    "w" : 1,
                    "wtimeout" : 0
            },
            "replicaSetId" : ObjectId("5d6acd66f6cfcf90a95107fb")
    }

}
MongoDB Enterprise m103-example:OTHER> rs.status() { “state” : 10, “stateStr” : “REMOVED”, “uptime” : 285, “optime” : { “ts” : Timestamp(1567351227, 1), “t” : NumberLong(4) }, “optimeDate” : ISODate(“2019-09-01T15:20:27Z”), “lastHeartbeatMessage” : “”, “syncingTo” : “”, “syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“ok” : 0,
“errmsg” : “Our replica set config is invalid or we are not a member of it”,
“code” : 93,
“codeName” : “InvalidReplicaSetConfig”,
“operationTime” : Timestamp(1567351227, 1),
“$clusterTime” : {
“clusterTime” : Timestamp(1567351227, 1),
“signature” : {
“hash” : BinData(0,“KIGQxYZQfc2Oa/CE/g8rJ9opFd4=”),
“keyId” : NumberLong(“6731418439618920449”)
}
}
}
MongoDB Enterprise m103-example:OTHER>

Hi @Joseph_Jamison_45055,

Please delete all the files in your “–dbpath” directories and all the config files.
After you have cleared all the old files, please re-create all your config files (exactly same as mentioned in the instructions for lab) and follow all the steps mentioned in the lab notes from scratch.

Please let me know if the error is not resolved.

Thanks,
Muskan
Curriculum Support Engineer