Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
How to remove duplicate record in MongoDB 3.x?
To remove duplicate records in MongoDB, use the aggregation pipeline with $group and $addToSet operators to identify duplicates, then remove them using deleteMany().
Syntax
db.collection.aggregate([
{
$group: {
_id: { field: "$field" },
duplicateIds: { $addToSet: "$_id" },
count: { $sum: 1 }
}
},
{ $match: { count: { $gt: 1 } } }
]);
Sample Data
db.demo438.insertMany([
{ "FirstName": "Chris" },
{ "FirstName": "David" },
{ "FirstName": "Chris" },
{ "FirstName": "Bob" },
{ "FirstName": "David" }
]);
{
"acknowledged": true,
"insertedIds": [
ObjectId("5e775c37bbc41e36cc3caea1"),
ObjectId("5e775c3dbbc41e36cc3caea2"),
ObjectId("5e775c40bbc41e36cc3caea3"),
ObjectId("5e775c44bbc41e36cc3caea4"),
ObjectId("5e775c47bbc41e36cc3caea5")
]
}
Display all documents from the collection ?
db.demo438.find();
{ "_id": ObjectId("5e775c37bbc41e36cc3caea1"), "FirstName": "Chris" }
{ "_id": ObjectId("5e775c3dbbc41e36cc3caea2"), "FirstName": "David" }
{ "_id": ObjectId("5e775c40bbc41e36cc3caea3"), "FirstName": "Chris" }
{ "_id": ObjectId("5e775c44bbc41e36cc3caea4"), "FirstName": "Bob" }
{ "_id": ObjectId("5e775c47bbc41e36cc3caea5"), "FirstName": "David" }
Method 1: Identify Duplicate Groups
First, group documents by the field to identify duplicates ?
db.demo438.aggregate([
{
$group: {
_id: { FirstName: "$FirstName" },
duplicateIds: { $addToSet: "$_id" },
count: { $sum: 1 }
}
}
]);
{ "_id": { "FirstName": "David" }, "duplicateIds": [ ObjectId("5e775c47bbc41e36cc3caea5"), ObjectId("5e775c3dbbc41e36cc3caea2") ], "count": 2 }
{ "_id": { "FirstName": "Bob" }, "duplicateIds": [ ObjectId("5e775c44bbc41e36cc3caea4") ], "count": 1 }
{ "_id": { "FirstName": "Chris" }, "duplicateIds": [ ObjectId("5e775c40bbc41e36cc3caea3"), ObjectId("5e775c37bbc41e36cc3caea1") ], "count": 2 }
Method 2: Remove Duplicates (Keep First Occurrence)
Remove duplicate documents while keeping the first occurrence of each unique value ?
db.demo438.aggregate([
{
$group: {
_id: { FirstName: "$FirstName" },
duplicateIds: { $addToSet: "$_id" },
count: { $sum: 1 }
}
},
{ $match: { count: { $gt: 1 } } }
]).forEach(function(doc) {
doc.duplicateIds.shift();
db.demo438.deleteMany({ _id: { $in: doc.duplicateIds } });
});
Verify Result
db.demo438.find();
{ "_id": ObjectId("5e775c37bbc41e36cc3caea1"), "FirstName": "Chris" }
{ "_id": ObjectId("5e775c3dbbc41e36cc3caea2"), "FirstName": "David" }
{ "_id": ObjectId("5e775c44bbc41e36cc3caea4"), "FirstName": "Bob" }
Key Points
-
$addToSetcollects all_idvalues for each duplicate group -
shift()removes the first element, keeping the original record -
$matchfilters groups with count greater than 1 (duplicates only)
Conclusion
Use aggregation to group duplicates by field values, then remove extras with deleteMany(). This approach preserves the first occurrence of each unique value while removing all duplicates.
Advertisements
