Mongod.lock when replica increases + replica set IDs do not match

jellyx · December 11, 2020, 11:42am

Hi everyone,

I’m using a Docker image “mongo” which is created by Docker officials.

Everything works great except when I increase number of replicas. In logs, can see the following:

{"t":{"$date":"2020-12-10T19:02:25.896+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27032,"dbPath":"/data/db","architecture":"64-bit","host":"f74b3b84f2e8"}}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"W",  "c":"CONTROL",  "id":20720,   "ctx":"initandlisten","msg":"Available memory is less than system memory","attr":{"availableMemSizeMB":512,"systemMemSizeMB":3933}}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.2","gitVersion":"15e73dc5738d2278b688f8929aee605fe4279b0e","openSSLVersion":"OpenSSL 1.1.1  11 Sep 2018","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"18.04"}}}
{"t":{"$date":"2020-12-10T19:02:25.898+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*","port":27032},"replication":{"enableMajorityReadConcern":false,"replSet":"jm-replica"},"storage":{"dbPath":"/data/db","journal":{"enabled":true}}}}}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"E",  "c":"STORAGE",  "id":20557,   "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"DBPathInUse: Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory"}}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"REPL",     "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":10000}}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"COMMAND",  "id":4784901, "ctx":"initandlisten","msg":"Shutting down the MirrorMaestro"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"SHARDING", "id":4784902, "ctx":"initandlisten","msg":"Shutting down the WaitForMajorityService"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"NETWORK",  "id":20562,   "ctx":"initandlisten","msg":"Shutdown: going to close listening sockets"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"NETWORK",  "id":4784905, "ctx":"initandlisten","msg":"Shutting down the global connection pool"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"STORAGE",  "id":4784906, "ctx":"initandlisten","msg":"Shutting down the FlowControlTicketholder"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"-",        "id":20520,   "ctx":"initandlisten","msg":"Stopping further Flow Control ticket acquisitions."}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"REPL",     "id":4784907, "ctx":"initandlisten","msg":"Shutting down the replica set node executor"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"NETWORK",  "id":4784918, "ctx":"initandlisten","msg":"Shutting down the ReplicaSetMonitor"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"SHARDING", "id":4784921, "ctx":"initandlisten","msg":"Shutting down the MigrationUtilExecutor"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"CONTROL",  "id":4784925, "ctx":"initandlisten","msg":"Shutting down free monitoring"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"FTDC",     "id":4784926, "ctx":"initandlisten","msg":"Shutting down full-time data capture"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"STORAGE",  "id":4784927, "ctx":"initandlisten","msg":"Shutting down the HealthLog"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"STORAGE",  "id":4784929, "ctx":"initandlisten","msg":"Acquiring the global lock for shutdown"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"-",        "id":4784931, "ctx":"initandlisten","msg":"Dropping the scope cache for shutdown"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"CONTROL",  "id":20565,   "ctx":"initandlisten","msg":"Now exiting"}
{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"I",  "c":"CONTROL",  "id":23138,   "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":100}}

Probably the most interesting part is this:

{"t":{"$date":"2020-12-10T19:02:25.899+00:00"},"s":"E",  "c":"STORAGE",  "id":20557,   "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"DBPathInUse: Unable to lock the lock file: /data/db/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /data/db directory"}}

I found this issue happens when we use volume and mount it becasue Dockerfile creates “mongodb” user and volume is created by root.

More on that here: mongodb - Mongo docker with volume get error - Stack Overflow

However, I also face another error:

"members" : [
	{
		"_id" : 0,
		"name" : "mongo-1:27030",
		"health" : 0,
		"state" : 8,
		"stateStr" : "(not reachable/healthy)",
		"uptime" : 0,
		"optime" : {
			"ts" : Timestamp(0, 0),
			"t" : NumberLong(-1)
		},
		"optimeDurable" : {
			"ts" : Timestamp(0, 0),
			"t" : NumberLong(-1)
		},
		"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
		"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
		"lastHeartbeat" : ISODate("2020-12-11T11:34:57.208Z"),
		"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
		"pingMs" : NumberLong(0),
		"lastHeartbeatMessage" : "Error connecting to mongo-1:27030 :: caused by :: Could not find address for mongo-1:27030: SocketException: Host not found (authoritative)",
		"syncSourceHost" : "",
		"syncSourceId" : -1,
		"infoMessage" : "",
		"configVersion" : -1,
		"configTerm" : -1
	},
	{
		"_id" : 1,
		"name" : "mongo-2:27031",
		"health" : 1,
		"state" : 2,
		"stateStr" : "SECONDARY",
		"uptime" : 107,
		"optime" : {
			"ts" : Timestamp(1607686093, 1),
			"t" : NumberLong(41)
		},
		"optimeDate" : ISODate("2020-12-11T11:28:13Z"),
		"syncSourceHost" : "",
		"syncSourceId" : -1,
		"infoMessage" : "",
		"configVersion" : 95732,
		"configTerm" : -1,
		"self" : true,
		"lastHeartbeatMessage" : ""
	},
	{
		"_id" : 2,
		"name" : "mongo-3:27032",
		"health" : 0,
		"state" : 8,
		"stateStr" : "(not reachable/healthy)",
		"uptime" : 0,
		"optime" : {
			"ts" : Timestamp(0, 0),
			"t" : NumberLong(-1)
		},
		"optimeDurable" : {
			"ts" : Timestamp(0, 0),
			"t" : NumberLong(-1)
		},
		"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
		"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
		"lastHeartbeat" : ISODate("2020-12-11T11:34:57.096Z"),
		"lastHeartbeatRecv" : ISODate("2020-12-11T11:34:56.923Z"),
		"pingMs" : NumberLong(0),
		"lastHeartbeatMessage" : "replica set IDs do not match, ours: 5fd22a776b0ed74077c65ca8; remote node's: 5fd29e358b77ac2c30973a4c",
		"syncSourceHost" : "",
		"syncSourceId" : -1,
		"infoMessage" : "",
		"configVersion" : -1,
		"configTerm" : -1
	}
],

It says replica set IDs do not match.

Have you ever had these errors? If I try to execute rs.reconfig() nothing will happen. Simply unsolvable.

I am worried if these two errors are connected. If yes, then it’s fine because I’ll somehow find a way to solve the first (and then second won’t appear again).

However, I am worried if these two are not connected because I am not able to solve it. Whatever I try - nothing.

Thanks.

EDIT: Interestng is that I redeployed stack again and my script gets documents although my latest rs.status looks like this:

{
	"set" : "jm-replica",
	"date" : ISODate("2020-12-11T11:45:57.587Z"),
	"myState" : 2,
	"term" : NumberLong(42),
	"syncSourceHost" : "",
	"syncSourceId" : -1,
	"heartbeatIntervalMillis" : NumberLong(2000),
	"majorityVoteCount" : 2,
	"writeMajorityCount" : 2,
	"votingMembersCount" : 3,
	"writableVotingMembersCount" : 3,
	"optimes" : {
		"lastCommittedOpTime" : {
			"ts" : Timestamp(0, 0),
			"t" : NumberLong(-1)
		},
		"lastCommittedWallTime" : ISODate("1970-01-01T00:00:00Z"),
		"appliedOpTime" : {
			"ts" : Timestamp(1607686302, 2),
			"t" : NumberLong(41)
		},
		"durableOpTime" : {
			"ts" : Timestamp(1607686302, 2),
			"t" : NumberLong(41)
		},
		"lastAppliedWallTime" : ISODate("2020-12-11T11:31:42.115Z"),
		"lastDurableWallTime" : ISODate("2020-12-11T11:31:42.115Z")
	},
	"members" : [
		{
			"_id" : 0,
			"name" : "mongo-1:27030",
			"health" : 0,
			"state" : 8,
			"stateStr" : "(not reachable/healthy)",
			"uptime" : 0,
			"optime" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDurable" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
			"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
			"lastHeartbeat" : ISODate("2020-12-11T11:45:57.535Z"),
			"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "replica set IDs do not match, ours: 5fd29e358b77ac2c30973a4c; remote node's: 5fd22a776b0ed74077c65ca8",
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"configVersion" : -1,
			"configTerm" : -1
		},
		{
			"_id" : 1,
			"name" : "mongo-2:27031",
			"health" : 0,
			"state" : 8,
			"stateStr" : "(not reachable/healthy)",
			"uptime" : 0,
			"optime" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDurable" : {
				"ts" : Timestamp(0, 0),
				"t" : NumberLong(-1)
			},
			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
			"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
			"lastHeartbeat" : ISODate("2020-12-11T11:45:57.524Z"),
			"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "replica set IDs do not match, ours: 5fd29e358b77ac2c30973a4c; remote node's: 5fd22a776b0ed74077c65ca8",
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"configVersion" : -1,
			"configTerm" : -1
		},
		{
			"_id" : 2,
			"name" : "mongo-3:27032",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 96,
			"optime" : {
				"ts" : Timestamp(1607686302, 2),
				"t" : NumberLong(41)
			},
			"optimeDate" : ISODate("2020-12-11T11:31:42Z"),
			"syncSourceHost" : "",
			"syncSourceId" : -1,
			"infoMessage" : "",
			"configVersion" : 1,
			"configTerm" : 41,
			"self" : true,
			"lastHeartbeatMessage" : ""
		}
	],
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1607687155, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1607686302, 2)
}

Thanks

jellyx · December 11, 2020, 3:09pm

Seems like it is not possible to increase # of replicas in Swarm. If I wish to have more replicas I need to add more services (additional secondary nodes). Am I right? Thanks.