Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Find same contacts in a list of contacts in Python
Finding duplicate contacts in a list is a common problem where we need to group contacts that belong to the same person. Two contacts are considered the same if they share any common field: username, email, or phone number.
This problem can be solved using graph theory with Depth First Search (DFS). We create an adjacency matrix where contacts are nodes, and edges connect contacts that share any field.
Problem Statement
Given a list of contacts with three fields each (username, email, phone), we need to ?
A contact can store username, email and phone fields in any order
Two contacts are the same if they have either same username, email, or phone number
Return groups of contact indices that belong to the same person
Solution Approach
We'll use a graph-based approach with these steps ?
Create adjacency matrix: Build a graph where contacts are connected if they share any field
Apply DFS: Use depth-first search to find connected components
Group results: Each connected component represents contacts of the same person
Implementation
class Contact:
def __init__(self, slot1, slot2, slot3):
self.slot1 = slot1
self.slot2 = slot2
self.slot3 = slot3
def generate_graph(contacts, n, matrix):
# Initialize matrix with zeros
for i in range(n):
for j in range(n):
matrix[i][j] = 0
# Check each pair of contacts
for i in range(n):
for j in range(i + 1, n):
# Check if any field matches
if (contacts[i].slot1 == contacts[j].slot1 or
contacts[i].slot1 == contacts[j].slot2 or
contacts[i].slot1 == contacts[j].slot3 or
contacts[i].slot2 == contacts[j].slot1 or
contacts[i].slot2 == contacts[j].slot2 or
contacts[i].slot2 == contacts[j].slot3 or
contacts[i].slot3 == contacts[j].slot1 or
contacts[i].slot3 == contacts[j].slot2 or
contacts[i].slot3 == contacts[j].slot3):
matrix[i][j] = 1
matrix[j][i] = 1
break
def visit_using_dfs(i, matrix, visited, group, n):
visited[i] = True
group.append(i)
for j in range(n):
if matrix[i][j] and not visited[j]:
visit_using_dfs(j, matrix, visited, group, n)
def find_similar_contacts(contacts):
n = len(contacts)
matrix = [[0] * n for i in range(n)]
visited = [False] * n
result = []
# Generate adjacency matrix
generate_graph(contacts, n, matrix)
# Find connected components using DFS
for i in range(n):
if not visited[i]:
group = []
visit_using_dfs(i, matrix, visited, group, n)
result.append(group)
return result
# Example usage
contacts = [
Contact("Amal", "amal@gmail.com", "+915264"),
Contact("Bimal", "bimal321@yahoo.com", "+1234567"),
Contact("Amal123", "+915264", "amal_new@gmail.com"),
Contact("AmalAnother", "+962547", "amal_new@gmail.com")
]
groups = find_similar_contacts(contacts)
for group in groups:
print(group)
[0, 2, 3] [1]
How It Works
The algorithm works in three phases ?
Graph Construction: Create an adjacency matrix where
matrix[i][j] = 1if contacts i and j share any fieldDFS Traversal: For each unvisited contact, perform DFS to find all connected contacts
Grouping: Each DFS traversal gives us one group of related contacts
Example Explanation
In our example ?
Contact 0 ("Amal") and Contact 2 ("Amal123") share phone "+915264"
Contact 2 and Contact 3 share email "amal_new@gmail.com"
Contact 1 ("Bimal") shares no fields with others
This creates two groups: [0, 2, 3] and [1].
Time Complexity
The time complexity is O(n²) where n is the number of contacts, due to the nested loops in graph construction and DFS traversal.
Conclusion
This graph-based approach efficiently groups contacts that belong to the same person by finding connected components. The DFS algorithm ensures all related contacts are grouped together, even when connections are indirect.
