Spark plugin - too many open cursors

Dan_S · October 28, 2020, 6:36am

I am getting this error:

Caused by: com.mongodb.MongoCommandException: Command failed with error 2: 'Cannot open a new cursor since too many cursors are already opened' on server server_dns:27017. The full response is {"ok": 0.0, "errmsg": "Cannot open a new cursor since too many cursors are already opened", "code": 2}

I think that the plugin has too many connections to the database. I have tried appending

&MaxPoolSize=1

to the mongodb URL and there is nothing here about how to limit the number of open cursors. I cannot increase the number of allowed open cursors in the database configuration itself.

How do I limit/control the number of open cursors used by the Apache Spark plugin for MongoDB?

Pavel_Duchovny · October 30, 2020, 5:47am

Hi @Dan_S,

Welcome to MongoDB community!

The maxPoolSize limit amount of connections but not cursors which are controlled by the application when specifying a timeout for them (also they should be timedout on server after 10min)

You should be able to kill cursors:

https://docs.mongodb.com/manual/reference/command/killCursors/

Search if on spark side you are not opening cursors with noTimeOut flag or just many simultaneously running queries.

Best
Pavel

Dan_S · October 30, 2020, 7:48am

Hi Pavel,

Thanks for your response.

The problem is that in Spark I am using the plugin to MongoDB for Spark which provides a much higher level interface to the database than the pymongo module. I assume (just by observing the behaviour of the plugin) that it is aggressively opening multiple cursors to the database, but I do not believe that this behaviour can be controlled by the plugin’s user.

Pavel_Duchovny · October 30, 2020, 5:55pm

Hi @Dan_S,

I can see if our spark.team colud help us more. Can you share your MongoDB version, topology and spark connector version?

Best
Pavel

Dan_S · February 2, 2021, 2:54am

Does Ross Lawley (author of this plugin) pay attention to these forums?

I include below the response from Ross Lawley to my ticket for this issue on MongoDB’s Jira:

Hi Dan S,

Many thanks for the ticket. When working with multiple distributed systems, it often can be difficult to diagnose the root cause of too many cursors. I have a feel that error you are seeing is the symptom of an issue rather than the root cause. If that is the case then adding a configuration wouldn't be the right thing to do.

Could you provide more detail on how you are hitting this issue? What version of MongoDB are you running? What OS? What version of Spark and what version of the Spark connector? Ideally, a minimal reproducible example would help as I could replicate the issue.

I believe that this forum is the most appropriate place for this discussion. The database used is Amazon’s DocumentDB, which is supposed to support the same client-server protocol as the database from MongoDB. The number of open cursors allowed for any database is a fixed parameter that corresponds to the size of the EC2 instances on which the managed database is to run. see here Note that the smallest instance allows up to 30 cursors.

I have a batch job that runs a Spark application which includes reads from a MongoDB database using the Spark plugin. If there are 4 or more instances of this application running at the same time, while the database is on the smallest tier, they will produce this error. From this I reason the following:

A single read operation appears to open >7 and <10 cursors on average to the MongoDB database, certainly not 1.
There appears to be no mechanism to choose how many cursors the plugin opens during read operations. I am also unaware of where this number is published. This makes it impossible to reason about how many applications can simultaneously read before a failure occurs.

I may be misunderstanding, but there is an arbitrary number of cursors opened by this plugin to the database, with no mechanism to control this number.

Pavel_Duchovny · February 2, 2021, 6:53am

Hi @Dan_S,

Amazon document DB is just an emulator of MongoDB API, all the cursor management and other parts are developed and managed by Amazon.

I don’t believe we can help you with this database and you should address Amazon for Answers.

I strongly suggest to consider the REAL MongoDB as your backend database (Atlas)

Thanks
Pavel