Performance issue when quering from big collection

Lukasz_Kosinski · June 2, 2021, 12:14pm

Hello,

I’m working with mongocxx driver version 3.4 and I experienced a performance issue. When I try to query from the big collection (using pipeline::match() method) app hangs for a second and closes. It happens only on big collections. I use exactly the same method to fetch documents from other collections and there is no such issue.

The collection that causes issues has 49.9 K documents.
The biggest collection from the rest has 15 K documents and it works fine.

Is it known? Maybe compiling the driver to a higher version will help?

Code:

FYI bsonQuery is empty. I don’t look for any specific field values.

MaBeuLux88 · June 2, 2021, 3:45pm

Hi @Lukasz_Kosinski,

Your application shouldn’t just crash like this. I suspect that you are not handling errors & exceptions correctly here and because you are hitting a timeout, your application stops.

Regarding the performances themselves, what’s the query exactly? Which index is baking this query? Can you share the explain output with the execution stats if you still have an issue despite using an index for this query?

Cheers,
Maxime.

Lukasz_Kosinski · June 4, 2021, 12:03pm

Hi @MaBeuLux88,

Thanks for your answer.

Regarding handling exceptions and timeout:
That can happen, but you can see that there is a try {} catch in the code snippet I linked to. It handles different exceptions, but not this.
I tried adding socket timeout to the connection URI, but it didn’t change anything.
QString(“mongodb://%1:%2/?socketTimeoutMS=1200000”).arg(host).arg(port);

Regarding the query itself:
Maybe the word “performance” is not the right one here. I thought that it was a performance issue because it works for smaller collections. Actually, pipeline.match doesn’t return anything and just cause crash, so it’s hard to say if it’s slow or not.
But saying, about the query, it’s empty in this case and I use bsoncxx::builder::core{false}.extract_document().

Regarding indexes. I wasn’t aware of that feature. Maybe that’s the case. What I do, is archiving some stuff I get from Rest API. I wasn’t really thinking about what I store (that’s the flexibility we have with MongoDB).
Three main collections are:

small collection with ~50 fields in every document
medium collection with ~13 fields in every document
big collection with ~13 fields in every document
All of these collections have only on default index (for _id).
Despite _id, every document in collection has one “uuid” field that is unique.

What would you propose me to do, then? I guess I should create some indexes.

MaBeuLux88 · June 4, 2021, 1:42pm

All the fields that you use in a query (find or match in the first stage of a pipeline) should be indexed in a perfect world.

MongoDB offers many different type of indexes. If you aren’t sure, take this training:

The doc is also a good source:

I’m not a C# developer so I’m struggling to read the code to be honest. Maybe someone else will be able to help.

Also one thing that could be an issue: if you don’t consume the aggregation (read the document from it) the pipeline won’t execute. It’s lazy by default.

Cheers,
Maxime.