com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown)

Bhaskar_Avisha · April 1, 2021, 4:53pm

we have a 6 Shards Cluster with 3 routers and 1 config replica set, we started getting below error when connecting from Routers, couldnt find any specific issue related to the cluster, i can see cluster available up and running, and issue seems intermittent

com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): ‘interrupted at shutdown’ on server XXX:XXX. The full response is {"ok": 0.0, “errmsg”: “interrupted at shutdown”, “code”: 11600, “codeName”: “InterruptedAtShutdown”, “operationTime”: {"$timestamp": {"t": 88608, “i”: 2}}, “$clusterTime”: {"clusterTime": {"$timestamp": {"t": 88608, “i”: 2}}, “signature”: {"hash": {"$binary": {"base64": “XXXXXXXXXXXX”, “subType”: “00”}}, “keyId”: XXXXXXXXXXXXXX}}} at com.mongodb.internal.connection.ProtocolHelper.createSpecialException(ProtocolHelper.java:242) ~[mongodb-driver-core-4.1.1.jar:na] at com.mongodb.internal.connection.ProtocolHelper.getCommandFailureException(ProtocolHelper.java:171) ~[mongodb-driver-core-4.1.1.jar:na] at com.mongodb.internal.connection.InternalStreamConnection.receiveCommandMessageResponse(InternalStreamConnection.java:359) ~[mongodb-driver-core-4.1.1.jar:na] at com.mongodb.internal.connection.InternalStreamConnection.receive(InternalStreamConnection.java:316) ~[mongodb-driver-core-4.1.1.jar:na] at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.lookupServerDescription(DefaultServerMonitor.java:215) ~[mongodb-driver-core-4.1.1.jar:na] at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:144) ~[mongodb-driver-core-4.1.1.jar:na] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]

Kushagra_Kesav · April 1, 2021, 6:57pm

Hi @Bhaskar_Avisha

First of all, welcome to MongoDB Community Forum…

I believe error 11600 means that the client is trying to do an operation on a server that is shutting down.

So, if you provide the driver with a connection URI that specifies a replica set, the driver should reconnect to the new primary as soon as it’s available. See Connection String URI Format, specifically the replica set option section.

I hope it works for you.
Let us know if it or still persists.

In case of any doubts please feel free to reach out.

Regards,
Kushagra

Bhaskar_Avisha · April 2, 2021, 1:34am

Thanks @Kushagra_Kesav

I am getting this error , when application is connecting through mongos , this is not a direct connection to replica set.

kevinadi · April 9, 2021, 7:53am

Hi @Bhaskar_Avisha,

As far as I can tell, the com.mongodb.MongoNodeIsRecoveringException is a possible error that happens when you’re connected to a replica set. From the linked page:

An exception indicating that the server is a member of a replica set but is in recovery mode, and therefore refused to execute the operation. This can happen when a server is starting up and trying to join the replica set.

However I don’t think there’s enough information here to determine what’s going on. Could you post more details:

What is your MongoDB version
What is your Java driver version
What is the connection string you used in your Java app (as mentioned by @Kushagra_Kesav)
Typically, when does this error happen? Can you determine a pattern to it? For example, do you see this error when there’s a maintenance, network interruptions, high load, etc. ?
How about if you try connecting to the mongos using some other method, e.g. the mongo shell, a driver from another language (Python, Node, etc.) do you still see the same error, or does it only happen with the Java driver?

Best regards,
Kevin

Pavel_Grigorenko · January 11, 2022, 10:38am

Hi @Bhaskar_Avisha, have you got your issue resolved?
We’re facing a similar problem, our client application keeps loosing the connection to the Atlas cluster and since we are using tailable cursors, our app has to restart upon each interruption which is highly annoying.
The client connects to the cluster via ‘mongodb+srv://’ URL

 Jan 10 23:20:10 ip-***.eu-west-1.compute.internal  [cluster-ClusterId{value='61dc9f7a153a5d03282359a1', description='null'}-demo-shard-00-01.l76nh.mongodb.net:27017] WARN  c.g.r.c.config.MongoDbConfiguration: MongoDB serverHeartbeatFailed: ServerHeartbeatFailedEvent{connectionId=connectionId{localValue:12, serverValue:269}, elapsedTimeNanos=950823445, awaited=true, throwable=com.mongodb.MongoNodeIsRecoveringException: Command failed with error 11600 (InterruptedAtShutdown): 'interrupted at shutdown' on server ***.l76nh.mongodb.net:27017. The full response is {"operationTime": {"$timestamp": {"t": 1641853208, "i": 11}}, "ok": 0.0, "errmsg": "interrupted at shutdown", "code": 11600, "codeName": "InterruptedAtShutdown", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1641853208, "i": 15}}, "signature": {"hash": {"$binary": {"base64": "5cTlPKELM4icjdNscfA1g85B75s=", "subType": "00"}}, "keyId": 7006031879656701953}}}} com.mongodb.event.ServerHeartbeatFailedEvent@583a7488
Jan 10 23:20:10 ip-***.eu-west-1.compute.internal  org.springframework.data.mongodb.UncategorizedMongoDbException: Query failed with error code 11600 and error message 'interrupted at shutdown' on server demo-shard-00-01.l76nh.mongodb.net:27017; nested exception is com.mongodb.MongoQueryException: Query failed with error code 11600 and error message 'interrupted at shutdown' on server ***.l76nh.mongodb.net:27017
 	at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:140)
 	at org.springframework.data.mongodb.core.ReactiveMongoTemplate.potentiallyConvertRuntimeException(ReactiveMongoTemplate.java:2814)
 	at org.springframework.data.mongodb.core.ReactiveMongoTemplate.lambda$translateException$90(ReactiveMongoTemplate.java:2797)
 	at reactor.core.publisher.Flux.lambda$onErrorMap$28(Flux.java:6910)
 	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)

Vladimir_Beliakov · January 20, 2022, 8:41am

Hi, everyone!
We also encountered this problem while upgrading our MongoDB shard cluster from 4.0.12 to 4.2.17.
During the upgrade we were observing this kind of error messages in our logs

Code: 11600;
CodeName: InterruptedAtShutdown;
Command: { "getMore" : NumberLong("9001353061637322596"), "collection" : "some.collection" };
ErrorMessage: interrupted at shutdown;
Result: { "ok" : 0.0, "errmsg" : "interrupted at shutdown", "code" : 11600, "codeName" : "InterruptedAtShutdown" };
ConnectionId: { ServerId : { ClusterId : 1, EndPoint : "Unspecified/some.router.name:27017" }, LocalValue : 290 };
ErrorLabels: System.Collections.Generic.List`1[System.String]

MongoDB.Driver.MongoNodeIsRecoveringException: Server returned node is recovering error (code = 11600, codeName = "InterruptedAtShutdown").
   at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
   at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.Execute(IConnection connection, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.WireProtocol.CommandWireProtocol`1.Execute(IConnection connection, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Servers.Server.ServerChannel.ExecuteProtocol[TResult](IWireProtocol`1 protocol, ICoreSession session, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Servers.Server.ServerChannel.Command[TResult](ICoreSession session, ReadPreference readPreference, DatabaseNamespace databaseNamespace, BsonDocument command, IEnumerable`1 commandPayloads, IElementNameValidator commandValidator, BsonDocument additionalOptions, Action`1 postWriteAction, CommandResponseHandling responseHandling, IBsonSerializer`1 resultSerializer, MessageEncoderSettings messageEncoderSettings, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.AsyncCursor`1.ExecuteGetMoreCommand(IChannelHandle channel, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.AsyncCursor`1.GetNextBatch(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.AsyncCursor`1.MoveNext(CancellationToken cancellationToken)
   at MongoDB.Driver.IAsyncCursorExtensions.ToList[TDocument](IAsyncCursor`1 source, CancellationToken cancellationToken)
   at MongoDB.Driver.IAsyncCursorSourceExtensions.ToList[TDocument](IAsyncCursorSource`1 source, CancellationToken cancellationToken)

Code: 11600;
CodeName: InterruptedAtShutdown;
Command: { "find" : "some.collection", "filter" : {somefilter} };
ErrorMessage: Encountered non-retryable error during query :: caused by :: interrupted at shutdown;
Result: { "ok" : 0.0, "errmsg" : "Encountered non-retryable error during query :: caused by :: interrupted at shutdown", "code" : 11600, "codeName" : "InterruptedAtShutdown", "operationTime" : Timestamp(1642587846, 583), "$clusterTime" : { "clusterTime" : Timestamp(1642587850, 1385), "signature" : { "some signature"  } } };
ConnectionId: { ServerId : { ClusterId : 2, EndPoint : "Unspecified/some.router.name:27017" }, LocalValue : 4228 };
ErrorLabels: System.Collections.Generic.List`1[System.String]

MongoDB.Driver.MongoNodeIsRecoveringException: Server returned node is recovering error (code = 11600, codeName = "InterruptedAtShutdown").
   at MongoDB.Driver.Core.Operations.RetryableReadOperationExecutor.ExecuteAsync[TResult](IRetryableReadOperation`1 operation, RetryableReadContext context, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.ReadCommandOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.FindCommandOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.FindOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.FindOperation`1.ExecuteAsync(IReadBinding binding, CancellationToken cancellationToken)
   at MongoDB.Driver.OperationExecutor.ExecuteReadOperationAsync[TResult](IReadBinding binding, IReadOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.ExecuteReadOperationAsync[TResult](IClientSessionHandle session, IReadOperation`1 operation, ReadPreference readPreference, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)

(I edited out the collection names, filters, signature, and router names.)

The upgrade process was as follows:

Stop one of the secondary replica set members by issuing the command systemctl stop mongod or systemctl stop mongos.
Upgrade the MongoDB packages to 4.2.17.
Start the mongod or mongos processes.
After upgrading all the secondaries, change the primary and perform steps 1-3.

As we observed, the aforementioned errors were showing up right after stopping the process. And also we saw the errors while connecting with ReadPreference = Primary and ReadPreference = SecondaryPreferred.

To answer @kevinadi questions:

4.0.12 before the upgrade.
2.11.4 (we’re using the C# MongoDB driver).
Here’s our C# driver settings:

			var settings = new MongoClientSettings
			{
				Servers = "{router names}",
				ConnectionMode = ConnectionMode.Automatic,
				MaxConnectionIdleTime = TimeSpan.FromMinutes(10),
				MaxConnectionLifeTime = TimeSpan.FromMinutes(30),
				MaxConnectionPoolSize = 100,
				MinConnectionPoolSize = 1,
				ReadPreference = ReadPreference.Primary, // and could be SecondaryPreferred
				SocketTimeout = TimeSpan.Zero,
				WaitQueueTimeout = TimeSpan.FromMinutes(2),
				WriteConcern = WriteConcern.W1,
				ConnectTimeout = TimeSpan.FromSeconds(15),
				ReadConcern = ReadConcern.Default,
				ServerSelectionTimeout = TimeSpan.FromSeconds(15),
			};

It was happening while we were shutting down the servers during the upgrade process as mentioned before.
We didn’t come across any errors like those while using the mongo shell.

What we want to know is how to upgrade a MongoDB shard cluster more flawlessly. Any advice is really appreciated.

Tom_Duerr · November 15, 2022, 9:54am

We’re running into this same issue while doing a rolling upgrade from 4.0 → 4.2 .
All clients are running java.

Any updates or solutions available?
Thanks,
Tom

kevinadi · November 16, 2022, 12:35am

Hi @Tom_Duerr welcome to the community!

Are you seeing the same message InterruptedAtShutdown during the upgrade process from the client side, and only during the upgrade and not during any other time?

If yes, the message InterruptedAtShutdown just means that the driver/client is in the middle of an operation, and it’s being stopped by the server since the server is shutting down. Most newer drivers implements retryable writes and retryable reads to make this situation smoother, but the error can still happen when the operation in question are not retryable (see the linked page for more details about this).

I don’t believe this is an issue per se since 1) the server is shutting down, and 2) the operation got killed because the server needs to shut down. However if you see this error when the server is not shutting down, then this may be unexpected and may need further investigation.

Best regards
Kevin