Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Find all duplicate documents in a MongoDB collection by a key field?
To find all duplicate documents in a MongoDB collection by a key field, use the aggregation framework with $group and $match stages to group by the field and filter groups with count greater than 1.
Syntax
db.collection.aggregate([
{ $group: {
_id: { fieldName: "$fieldName" },
documents: { $addToSet: "$_id" },
count: { $sum: 1 }
}},
{ $match: { count: { $gte: 2 } }},
{ $sort: { count: -1 }}
]);
Sample Data
db.findDuplicateByKeyDemo.insertMany([
{"StudentId": 1, "StudentName": "John"},
{"StudentId": 2, "StudentName": "Carol"},
{"StudentId": 3, "StudentName": "Carol"},
{"StudentId": 4, "StudentName": "John"},
{"StudentId": 5, "StudentName": "Sam"},
{"StudentId": 6, "StudentName": "Carol"}
]);
Display all documents to see the data ?
db.findDuplicateByKeyDemo.find().pretty();
{
"_id": ObjectId("..."),
"StudentId": 1,
"StudentName": "John"
}
{
"_id": ObjectId("..."),
"StudentId": 2,
"StudentName": "Carol"
}
{
"_id": ObjectId("..."),
"StudentId": 3,
"StudentName": "Carol"
}
{
"_id": ObjectId("..."),
"StudentId": 4,
"StudentName": "John"
}
{
"_id": ObjectId("..."),
"StudentId": 5,
"StudentName": "Sam"
}
{
"_id": ObjectId("..."),
"StudentId": 6,
"StudentName": "Carol"
}
Find Duplicate Documents
db.findDuplicateByKeyDemo.aggregate([
{ $group: {
_id: { StudentName: "$StudentName" },
UIDS: { $addToSet: "$_id" },
COUNTER: { $sum: 1 }
}},
{ $match: {
COUNTER: { $gte: 2 }
}},
{ $sort: { COUNTER: -1 }},
{ $limit: 10 }
]).pretty();
The output displays duplicate records with Carol appearing 3 times and John 2 times ?
{
"_id": {
"StudentName": "Carol"
},
"UIDS": [
ObjectId("5c7f5b248d10a061296a3c3c"),
ObjectId("5c7f5b438d10a061296a3c3f"),
ObjectId("5c7f5b1f8d10a061296a3c3b")
],
"COUNTER": 3
}
{
"_id": {
"StudentName": "John"
},
"UIDS": [
ObjectId("5c7f5b2d8d10a061296a3c3d"),
ObjectId("5c7f5b168d10a061296a3c3a")
],
"COUNTER": 2
}
How It Works
-
$groupgroups documents by the key field (StudentName) -
$addToSetcollects all document IDs for each group -
$sum: 1counts documents in each group -
$matchfilters groups with count ? 2 -
$sortorders results by duplicate count (descending)
Conclusion
Use MongoDB aggregation with $group and $match stages to efficiently identify duplicate documents by any key field. This approach provides both the duplicate values and their document IDs for further processing.
Advertisements
