Cleanup node state to add to replica set

Hi Kanika,

I’m Octavian from USA. I am mostly interested in database administration working for many years on DB2 LUW and Informix. Very interesting and informative course.
I am looking for best practice (script or steps) on how to cleaning existing nodes in inconsistent state already included to a replica set, to be able resync or remove and add the node back to and existent replica set. I have inadvertently run rs.initiate on a Seconday, so I ended up with 2 Primaries. I can use brute force to remove storage and re-initialize everything, but I was wondering is there is better way or command.?
I have found the following, but still does not work due to authorization ramifications on the old Seconday currently a Primary.
The scary part is rs.initiate was execute immediately without any warning to change the state from Secondary to Primary, but reversing the state seems impossible…

Shut down the mongodb server

Start the mongodb server in the standalone mode, e.g., without –replSet <replicaSetName>.

Login to Mongodb; You can use admin database
Make sure that user has readWrite permission to local database.
In case, it is not, use following command to grant the role to the user.
The following command gives readWrite permission on local database to admin user.

1

db.grantRolesToUser("admin", [{role: "readWrite", db: "local"}]);

Switch to local database
Execute the command to make the cllection such as system.replset empty

1

db.system.replset.remove({});

Make sure that system.replset in local database is empty by executing the following command:

1

db.system.replset.find();

:black_medium_small_square:Start the MongoDB database with replSet option all over again; The following command can be used:

1

sudo mongod --dbpath /path/to/mongo/db --replSet rs0

:black_medium_small_square:Login to Mongodb using admin user and execute following command to initialize the replication set:

1

rs.initiate();

:black_medium_small_square:Switch over to local database and execute following command to make sure that there exists an entry for the replication.

1

db.system.replset.find();
TIA

The general steps seem fine: you would shutdown the secondary that got messed up, restart it without RS-configuration and remove the local configuration of any RS-related materials. You then restart it with RS enabled and tell the primary to add the secondary into the replica set.

I don’t understand why you would run rs.initiate() again, it’s listed as the second-to-last command in your post. There is already a primary running in the RS.

Hi @Tess_Sluijter, talking about mess, I am getting

about to fork child process, waiting until server is ready for connections. forked process: 9116 ERROR: child process failed, exited with error number 100 To see additional information in this output, start without the "--fork" option.

when I,

vagrant@m103:~$ mongod -f /shared/mongod-repl-1.conf

Without the

processManagement: fork: true

Then, my vagrant doesn’t do anything, therefore can’t start off my mongod.

Any hint?

P.S. my mongod works just fine, of course, without the config file options…

When MongoD tells you this:

Don’t actually do that :expressionless: I think it’s a silly thing they put in there and they should always just refer to the logfile. But that’s my opinion.

So, if you run it forked etc. you just need to check the MongoD logfile that you configured. It should clearly display what is going wrong.

And just to clarify some terminology:

Then, my vagrant doesn’t do anything, therefore can’t start off my mongod.

Your Vagrant won’t do anything either way, because Vagrant runs on your computer itself. Vagrant is used to make it easier to setup the Virtualbox virtualization software to run the M103 VM (virtual machine). Mongo and all the things you’re doing run inside M103 (on the Linux OS).

So anyway! Off to get that error log!

Hi Octavian_90298,

Thanks for reaching out. If I don’t understand your problem correctly, then just feel free to correct me whenever.

Your Problem: That you had a replica set running with 3 nodes, 1 Primary and 2 Secondaries. But somehow, you ran rs.initiate() on Secondary and it became Primary. So, now you have 2 Primaries [which is impossible], As soon as you run rs.initiate() on any Secondary node, you will get an error message like below:

MongoDB Enterprise m103-repl:SECONDARY> rs.initiate()
{
 "operationTime" : Timestamp(1543487090, 1),
 "ok" : 0,
 "errmsg" : "already initialized",
 "code" : 23,
 "codeName" : "AlreadyInitialized",
 "$clusterTime" : {
	"clusterTime" : Timestamp(1543487090, 1),
	"signature" : {
		"hash" : BinData(0,"6mwshgWLsmkTLHZAZoi/NUVXZo4="),
		"keyId" : NumberLong("6581706101702524931")
	}
  }
}

And even if you messed up everything,as @Tess_Sluijter mentioned, the most safe way is:

  • Shutdown your node
  • Remove from your replica set using rs.remove()
  • Restart with your previous replSet configuration
  • Add the node back to replica set.

Let me know if you still have doubts.

Kanika

If I may add, @maulberto won’t see why mongod does not start in the logs if the reason mongod does not start is related to the existance or permission of the logfile or parent directories. @Tess_Sluijter, I thought too it was silly to write start without --fork, so I tried to find a reason why they did it. I found the situation above. However, it would be nicer if mongod prints that the logs are not writable, then we would know.

1 Like

Ha! Thats an excellent point Steeve! You’re absolutely right…

Hi Kanika,

Thanks much for your quick reply. I have reviewed my previous steps and you are right the second node 2 was not successfully added as secondary, so most probably was in standalone state when I have run rs.initiate inadvertently. Anyway, node 2 was started using a configuration using a given replica set as described in the Lab, but the new member was not successfully added using rs.add(“m103.mongodb.university:27012”) because the command should have been rs.add(“192.168.103.100:27012”). I have tried numerous options to change the state from node 2 from primary, but did not work apparently due to permission related errors. The only option left was to cleanup was to remove storage, namely node1, node 2, node3 directories and restart from scratch.
Anyway, having in mind it is expected to have users make mistakes during setup or maintenance, the fact rs.initiate has been allowed on node started with replica set in configuration file after a Primary node was already up and running can create problems. Fun and interesting time playing with replication :slight_smile:

1 Like

Hi @Tess_Sluijter , I went and checked that log and it said something about DBPathInUse: Another mongodb instance running on the /var/mongodb/db/1 directory, so I thought about killing all instances, I did some research, it turns out pgrep mongod shows (all?) mongod instances, so I did kill ###, it did kill it, then fired oof again my mongod -f /shared/mongod-repl-1.conf and I think it started well forked, as another vagrant@m103:~$ came in, and no error came out.

1 Like

Hi Octavian_90298,

I can’t agree more. But the thing you are seeing is not under our control especially if we talk about courses. You are talking about Replica Set in Mongo as a whole. And I already showed you if that node is already in a Replica Set, then one won’t be able to run rs.initiate().

Let me know if I can help!

Kanika

1 Like