Python - Records Union

Data manipulation and analysis are essential tasks in programming. Python provides powerful tools for handling and transforming data, including the union of records combining multiple datasets into a single comprehensive dataset. This article explores three approaches to achieve record union in Python with practical examples.

What is Records Union?

Records Union refers to combining multiple datasets or records into a single comprehensive dataset. It involves merging datasets based on common attributes or creating a unified dataset containing all unique records from the original datasets.

Record union is valuable for:

  • Consolidating information from different sources

  • Integrating datasets with overlapping records

  • Data preprocessing and analysis tasks

Method 1: Using Built-in Set Data Structure

Python's set data structure efficiently handles collections of unique elements, making it perfect for record union operations.

Algorithm

  • Step 1 Convert both datasets into sets

  • Step 2 Perform union operation using union() method

  • Step 3 Convert the result back to a list

Example

# Using set union for combining datasets
data1 = [19, 99, 15]
data2 = [4, 5, 6, 7, 8, 19]

# Convert to sets and perform union
union_set = set(data1).union(data2)
result = list(union_set)

print("Dataset 1:", data1)
print("Dataset 2:", data2)
print("Union Result:", sorted(result))
Dataset 1: [19, 99, 15]
Dataset 2: [4, 5, 6, 7, 8, 19]
Union Result: [4, 5, 6, 7, 8, 15, 19, 99]

Method 2: Using Pandas Library

Pandas provides powerful data structures and analysis tools for efficient record union operations using DataFrames.

Algorithm

  • Step 1 Import pandas library

  • Step 2 Create DataFrames from both datasets

  • Step 3 Use concat() to concatenate DataFrames vertically

  • Step 4 Reset the index of resulting DataFrame

Example

import pandas as pd

# Sample datasets with records
data1 = [['John', 25], ['Alice', 30], ['Bob', 28]]
data2 = [['Charlie', 35], ['David', 27], ['Eve', 32]]

# Create DataFrames
df1 = pd.DataFrame(data1, columns=['Name', 'Age'])
df2 = pd.DataFrame(data2, columns=['Name', 'Age'])

# Concatenate DataFrames
result = pd.concat([df1, df2]).reset_index(drop=True)

print("Combined Dataset:")
print(result)
Combined Dataset:
      Name  Age
0     John   25
1    Alice   30
2      Bob   28
3  Charlie   35
4    David   27
5      Eve   32

Method 3: Using List Comprehension with Union

For simple record union without duplicates, you can combine list comprehension with set operations ?

# Using list comprehension for record union
records1 = [('A', 1), ('B', 2), ('C', 3)]
records2 = [('D', 4), ('B', 2), ('E', 5)]

# Combine and remove duplicates
all_records = records1 + records2
unique_records = list(set(all_records))

print("Original records1:", records1)
print("Original records2:", records2)
print("Union result:", sorted(unique_records))
Original records1: [('A', 1), ('B', 2), ('C', 3)]
Original records2: [('D', 4), ('B', 2), ('E', 5)]
Union result: [('A', 1), ('B', 2), ('C', 3), ('D', 4), ('E', 5)]

Comparison

Method Best For Duplicates Handled Performance
Set Union Simple data types Automatically removed Fast
Pandas concat() Structured data Manual removal needed Good for large datasets
List + Set Mixed data types Automatically removed Moderate

Conclusion

Python offers multiple approaches for record union operations. Use set.union() for simple data types, pandas concat() for structured datasets, and list comprehension for mixed scenarios. Choose the method based on your data structure and duplicate handling requirements.

Updated on: 2026-03-27T13:45:16+05:30

263 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements