C# .NET Core 3.1 - Driver (2.11.1) - Slow ToList() data manifestation

Hello,
I have a performance issue that I can’t ignore.
I have a collection with 1000 document. This collection will not grow so the number is stable.
Each document weight 1kb. A single document has one array of subdocument and some other properties.

What I do is:

var result = db.collection("A")
            .Aggregate()
            .Match(Builders<T>.Filter.Eq("<ArrayName>.<PropertyName>", "name"))
            .ToList();

The result of the query operation in MongoDbCompass is 1ms for 200 document and its super ok.
The result of the ToList() of the documents is 10 seconds.

This happen even with property not inside an array.

What can I do ?
Thanks

Edit
I have indexes in place for the properties I need to search for.
Also this result are from localhost:27017, so all in local.

I tested with the FindSync() and FindAsync() and I get the same results.
I tried .Skip(n) and Limit(n + 20) and the performance for 20 documents are 1.5 second. A lot.

1 Like

Can you run the same query in the Mongo shell, mongo and see if you get similar results? You should also try and run the explain() for the query and post it here.

the explain docs are here.

Hello Joe,
thanks for the reply.

I run ‘db.collection.find(…)’ and got the results. The performance was strictly bettere than the ToList()

I also run the ‘db.collection.find().explain()’ and this is the result:

{
queryPlanner: {
plannerVersion: 1,
namespace: 'FirstCollection.Pokemon',
indexFilterSet: false,
parsedQuery: { 'moves.name': { '$eq': 'thunder' } },
winningPlan: {
  stage: 'FETCH',
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { 'moves.name': 1 },
    indexName: 'MovesName',
    isMultiKey: true,
    multiKeyPaths: { 'moves.name': [ 'moves' ] },
    isUnique: false,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: { 'moves.name': [ '["thunder", "thunder"]' ] }
  }
},
    rejectedPlans: []
},
executionStats: {
executionSuccess: true,
nReturned: 232,
executionTimeMillis: 0,
totalKeysExamined: 232,
totalDocsExamined: 232,
executionStages: {
  stage: 'FETCH',
  nReturned: 232,
  executionTimeMillisEstimate: 0,
  works: 233,
  advanced: 232,
  needTime: 0,
  needYield: 0,
  saveState: 0,
  restoreState: 0,
  isEOF: 1,
  docsExamined: 232,
  alreadyHasObj: 0,
  inputStage: {
    stage: 'IXSCAN',
    nReturned: 232,
    executionTimeMillisEstimate: 0,
    works: 233,
    advanced: 232,
    needTime: 0,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    keyPattern: { 'moves.name': 1 },
    indexName: 'MovesName',
    isMultiKey: true,
    multiKeyPaths: { 'moves.name': [ 'moves' ] },
    isUnique: false,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: { 'moves.name': [ '["thunder", "thunder"]' ] },
    keysExamined: 232,
    seeks: 1,
    dupsTested: 232,
    dupsDropped: 0
  }
  }
 },
serverInfo: {
host: 'DESKTOP-CS3UEH8',
port: 27017,
version: '4.4.0',
gitVersion: '563487e100c4215e2dce98d0af2a6a5a2d67c5cf'
 },
 ok: 1
}

Can you post the output of .explain("executionStats"). This will give a more detailed report. I can see you are using an index stage: 'IXSCAN' which is good. What we are trying to do here is eliminate the possibility that the database is the bottleneck. the executionStats parameter will give us a lot more detail about where the time is spent.

In the previous post if you scroll down the section code you can see the “executionStats”.
Sorry I didn’t specify that I used the .explain("executionStats")

What I see slowing down it is the actual manifestation of the data.
I know I am trying to retrieve a List<BsonDocument> where each document has all the properties, array etc.
I can of course set a $projection to limit what to retrieve (and I will do eventually) but because I want to learn how mongodb works I am trying to understand why this is happening.

I see it now. So the actual query time is effectively 0. So your delay is not in the query or the database engine. Could it be n/w or client delays?

I don’t think so. I am running all locally (mongodb is hosted in localhost:27017). All C# in a basic Console Application project with no client side.
I am doing like this just to understand the MongoDb C# Driver.

So I don’t really know.

Put a time stamp either side of the query on the client. It may just take a while to start your program. How long does it take from the program side to run the query.

So, as I wrote I am doing this in a ConsoleApplication NetcoreApp 3.1.
I executed 2 different ways of querying, one with FindSync() the other with Aggregate().Match().

The first one (FindSync):

The second one (Aggregate):

I am going to ask my C# colleague @yo_adrienne to take a look. She is based in Nevada so it will be a few hours.

1 Like

Hi @Andrea_Zanini!

Can you try your queries but with ToListAsync()?

And if possible, can you also log out the execution stats?

Hello, sorry for the late answer.

I tried the same thing with the await query.ToListAsync(); but even with that the time it takes for manifest the data is the same.

For the execution stats do you mean what @Joe_Drumgoole asked me? Or I can, with C# code, somehow use the execution stats for the ToListAsync() operation?

Hi All,

I am experiencing the same issue. It’s not a matter of a big number of documents retrieved by the query but when doing the tolist(), it takes around 20 seconds to materialize them. Do you have any workaround or estimation for the fix? We also tried with the newest mongoDb driver version, but the issue is still there either using tolist or toListAsync. Please help

Hello Sergio,
I didn’t found the cause of the problem and how to solve it unfortunately.

What I did to go on with the project was start from the scratch and rebuild my document class with all the BsonAttribute to define via code what type are the property of the document (even if, when declared, have already the type)
Example:

public class Example {
    [BsonId]
    [BsonRepresentation(BsonType.ObjectId)]
    public Guid Id {get; set; }
    
    [BsonRepresentation(BsonType.String)]
    public string Name {get; set; }
}

I changed a lot, more that I wanted to my document, but now the manifestation time it’s what I expected to be.

I didn’t close the issue because I think there is something spooky around it.

Hi Andrea,

Thanks for your response. I tried that but still having the same issue toList is extremely slow:

        var query = collection.Aggregate(new AggregateOptions { AllowDiskUse = true })
            .Match(matching_criteria)
            .Group(grouping_criteria)
            .Sort(sorting_criteria)
            .Project(projection)
            .As<DOCUMENT>();

        var result = query.ToList();

Any other ideas on this?

Hi,

I’m experiencing a similar performance overhead with the client for 1667 results.
Time in Compass : Actual Query Execution Time (ms):120
C# driver: around 6000 ms

Any update on this?

I am going to get some C# experts to check this one out.

1 Like

In my case, the time seems to be in the serialization
It’s returning 1667 instances of Entity, each one with 500 instances of Bar and 150 instances of Foo

   public class Entity
{
	public Guid Id { get; set; }
	public List<Bar> BarValues { get; set; }
	public List<Foo> FooValues { get; set; }
}

public class Bar
{
	public Guid Id { get; set; }
	public Guid B { get; set; }
	public Guid C { get; set; }
}

public class Foo
{
	public Guid Id { get; set; }
	public Guid B { get; set; }
	public object C { get; set; }
}

@Andrea_Zanini, @Sergio_Fiorillo, @Joao_Passos:

There are still many factors that could be affecting these times, so I’d like to ask a few more questions regarding your respective scenarios:

  1. What is your ping time?
  2. What’s the bandwidth between your client and your MongoDB cluster?

The object models collectively posted here seem straightforward, so the deserialization shouldn’t be an issue.

Additionally, regarding these observations:

I’d like to quickly explain why this may be:

When making a call to db.GetCollection(), this is usually ~8ms, as @Andrea_Zanini’s screenshots show:

When creating the Aggregate queries, it’s important to note that this doesn’t result in the first batch being retrieved; it’s only in-memory and doesn’t go out on the wire yet. Calling query.ToList(); takes the longest now because it has to retrieve the first batch (~600ms based on the FindSync log in Andrea’s first screenshot), PLUS all subsequent batches. This is why the total times are 8.5 seconds or higher.

Compass reports query execution time is only 120ms because that only considers the query time on the server. It doesn’t include the time on the wire nor the deserialization from BSON to C# objects.

So, some options:

If you find that ping time is your bottleneck, your wait time is probably due to the amount of roundtrips that are required to retrieve the batches. Based on the earlier example of 9.1 seconds grand total execution time and ~600ms per batch, that’s about 15 round trips to the server (9.1s / 600ms).

Possible solution to ping issue is to locate your app servers closer to the cluster that you’re querying.

If you find that bandwidth is your bottleneck, your wait time is probably due to the time it takes to transmit each 15mb batch across the wire, again, around 15 times.

Possible solution to bandwidth issue is to reduce the amount of data being retrieved, via a projection (like @Sergio_Fiorillo appears to be doing), and/or enabling wire protocol compression in your connection string or via code.

Finally, if all else fails and it still appears to be a deserialization issue, we’d love to see a reproduction of it happening, ideally with some test data. I know it might be additional work, but even a small repo with this information helps us immensely in being able to debug the exact issue you may be having!

Thanks for your patience! Looking forward to hearing from you and seeing if these possible solutions help.

1 Like

Hi,

Thank you for the prompt support.
The object models are more complex than the simplified version I posted
I have MongoDB running on my local machine so bandwidth is not an issue.

Here’s a repo with a working example: https://github.com/jpdlpassos/mongodb

5000 raw values took 675 ms
5000 typed values took 14461 ms

Hopefully, you can help me spot where i can improve my code.

Thanks again!

2 Likes