I am trying to use an aggregation to identify duplicated data.
The code below fails with a MongoCommandException which says:
‘Command aggregate failed: Exceeded memory limit for $group, but didn’t allow external sort. Pass allowDiskUse:true to opt in’
I am using AggregateOptions {AllowDiskUse = true}
, but it seems like that setting is not passed to the MongoDB server.
var collection = database.GetCollection<BsonDocument>(collectionName);
string strPipeline = @"
[
{
$group :
{
_id : {raw_curve_id : ""$raw_curve_id"", published_date : ""$published_date"", delivery_date : ""$delivery_date"", value : ""$value""},
ids: { $push: ""$_id""},
saved_dates: { $push: ""$saved_date""},
count: {$sum: 1}
}
},
{$match: { count: {$gt: 1} } },
]";
var pipelineDoc = BsonSerializer.Deserialize<BsonDocument[]>(strPipeline);
var cursor = await collection.AggregateAsync<BsonDocument>(pipelineDoc, new AggregateOptions {AllowDiskUse = true});
var firstDuplicate = await cursor.FirstOrDefaultAsync();