I have added unique key for name field after that I have drop the indexes and I am trying to create indexes I am getting duplicate error because we have duplicate documents in the collection. how to get the duplicate documents please help me to resolve this issue
Welcome to the community, @Vinay_reddy_Mamedi!
Let’s assume we have this data in collection ‘test1’:
db.test1.insertMany([
{ _id: 1, val: 'A', },
{ _id: 2, val: 'B', },
{ _id: 3, val: 'C', },
{ _id: 4, val: 'A', },
])
Then, to find duplicates we can use this aggregation:
db.test1.aggregate([
{
$group: {
// collect ids of the documents, that have same value
// for a given key ('val' prop in this case)
_id: '$val',
ids: {
$push: '$_id'
},
// count N of duplications per key
totalIds: {
$sum: 1,
}
}
},
{
$match: {
// match only documents with duplicated value in a key
totalIds: {
$gt: 1,
},
},
},
{
$project: {
_id: false,
documentsThatHaveDuplicatedValue: '$ids',
}
},
]);
This will output ids:
{ "documentsThatHaveDuplicatedValue" : [ 1, 4 ] }
It is also possible join full documents with duplicated values, if just ids is not enough for you.
You can do this by adding $lookup stage in the end of the pipeline:
{
$lookup: {
// note, you need to use same collection name here
from: 'test1',
localField: 'documentsThatHaveDuplicatedValue',
foreignField: '_id',
as: 'documentsThatHaveDuplicatedValue'
}
}
Output, after adding the $lookup stage:
{
"documentsThatHaveDuplicatedValue": [
{
"_id" : 1,
"val" : "A"
},
{
"_id" : 4,
"val" : "A"
}
]
}
@Vinay_reddy_Mamedi and @slava,
This is recent post on StackOverflow.com with a similar question and an answer. From the post’s answer:
Assuming a collection documents with
name
(usingname
instead ofurl
) field consisting duplicate values. I have two aggregations which return some output which can be used to do further processing. I hope you will find this useful.
…