- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Joining Data in R with data.table
In this article, we will discuss joining data in R using data.table package. By the term “joining data” we mean to say that performing different types of joins operations like INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, AND FULL OUTER JOIN between two or more tables. The main purpose of doing join operations between tables is to access data from multiple tables on the basis of some attribute (or column) condition.
R provides us data.table package with the help of which we can handle tabular data (having rows and columns) very efficiently. This package was launched as an alternative to R’s default data.frame to manipulate data. The data.table is very efficient in terms of speed and code size as compared to data.frame. The syntax of data.table is also very intuitive and terse which makes it an ideal choice for developers. This package provides us functionalities using which we can efficiently perform join operations between tables.
In SQL, a join clause is used to join rows from two or more relations or tables based on the column condition between them.
Before proceeding further, we need to define two tables on which we can perform different join operations.
Creating tables with data.table
The data.table package provides us data.table() function using which we can easily construct these tables in R.
Syntax
This function has the following syntax −
data.table(column_name1 = values, column_name2 = values, column_name3 = values, …)
It Returns a table having data stored as rows and columns.
Example
Let’s create the Employees table −
# Importing data.table package library("data.table") # Creating table1 employeesTable <- data.table(Employee_ID = 1:5, Employee_Name = c("Bhuwanesh", "Jai", "Piyush", "Hitesh", "Anil"), Department_ID = c(1, 11, 4, 11, 8), Salary = c(30000, 27000, 19000, 21000, 25000)) # Print table1 print(employeesTable)
Output
Employee_ID Employee_Name Department_ID Salary 1: 1 Bhuwanesh 1 30000 2: 2 Jai 11 27000 3: 3 Piyush 4 19000 4: 4 Hitesh 11 21000 5: 5 Anil 8 25000
Creating Departments Table −
# Importing data.table package library("data.table") # Creating table2 departmentsTable <- data.table(Department_ID = c(4, 11, 5, 17, 3), Department_Name = c("IT", "Finance", "Sales", "HR", "Marketing")) # Print table2 print(departmentsTable)
Output
Department_ID Department_Name 1: 4 IT 2: 11 Finance 3: 5 Sales 4: 17 HR 5: 3 Marketing
Joining Data with data.table
In R, we have the following types of joins −
Inner join
Left outer join
Right outer join
Full outer join
Note that to perform a join between two tables using data.table you need to set a key on which we can set the ON clause.
The data.table package provides us setKey() function which has the following syntax −
setKey(table1, ON = column_name) setKey(table2, ON = column_name)
Inner Join with data.table
This type of Join returns records that have a matching value in the respective tables.
For example, consider a pair of tables table1 and table2 having some data. Then, we can represent the inner join of both the tables in terms of the Venn diagram as −
Syntax
The inner join is the basic join under data.table and follows the below-given syntax −
innerJoin <- table1[table2, nomatch=0]
Example
Now let’s perform the inner join operation between tables Employees and Departments using data.table.
# Import data.table library library("data.table") # Create table1 employeesTable <- data.table(Employee_ID = 1:5, Employee_Name = c("Bhuwanesh", "Jai", "Piyush", "Hitesh", "Anil"), Department_ID = c(1, 11, 4, 11, 8), Salary = c(30000, 27000, 19000, 21000, 25000)) # Create table2 departmentsTable <- data.table(Department_ID = c(4, 11, 5, 17, 3), Department_Name = c("IT", "Finance", "Sales", "HR", "Marketing")) # Set the ON clause on Department_ID as the key of the tables setkey(employeesTable, ON = Department_ID) setkey(departmentsTable, ON = Department_ID) # Perform the inner join by eliminating not matched rows innerJoin <- employeesTable[departmentsTable, nomatch=0] # Display the result of inner join print(innerJoin)
Output
Employee_ID Employee_Name Department_ID Salary Department_Name 1: 3 Piyush 4 19000 IT 2: 2 Jai 11 27000 Finance 3: 4 Hitesh 11 21000 Finance
Left Outer Join with data.table
This type of Join returns the matching records as well as the remaining records from the left table.
For example, consider a pair of tables table1 and table2 having some data. Then, we can represent left outer join of both the tables in terms of venn diagram as −
The left outer join is the basic join under data.table and follows the below-given syntax −
Syntax
table2[table1]
Example
Now let’s perform the left outer join operation between tables Employees and Departments using data.table −
# Import data.table library library("data.table") # Create table1 employeesTable <- data.table(Employee_ID = 1:5, Employee_Name = c("Bhuwanesh", "Jai", "Piyush", "Hitesh", "Anil"), Department_ID = c(1, 11, 4, 11, 8), Salary = c(30000, 27000, 19000, 21000, 25000)) # Create table2 departmentsTable <- data.table(Department_ID = c(4, 11, 5, 17, 3), Department_Name = c("IT", "Finance", "Sales", "HR", "Marketing")) # Set the ON clause on Department_ID as the key of the tables setkey(employeesTable,Department_ID) setkey(departmentsTable,Department_ID) # Perform the left outer join by including matched rows # and remaining rows from the left leftOuterJoin <- departmentsTable[employeesTable] # Display the result of left outer join print(leftOuterJoin)
Output
Department_ID Department_Name Employee_ID Employee_Name Salary 1: 1 <NA> 1 Bhuwanesh 30000 2: 4 IT 3 Piyush 19000 3: 8 <NA> 5 Anil 25000 4: 11 Finance 2 Jai 27000 5: 11 Finance 4 Hitesh 21000
Right Outer Join with data.table
The right outer join returns the matching records as well as the remaining records from the right table. For example, consider a pair of tables table 1 and table 2 having some data. Then, we can represent right outer join of both the tables in terms of Venn diagram as −
Syntax
The right outer join is the basic join under data.table and follows the below-given syntax −
table1[table2]
Example
Now let’s perform the right outer join operation between tables Employees and Departments using data.table −
# Import data.table library library("data.table") # Create table1 employeesTable <- data.table(Employee_ID = 1:5, Employee_Name = c("Bhuwanesh", "Jai", "Piyush", "Hitesh", "Anil"), Department_ID = c(1, 11, 4, 11, 8), Salary = c(30000, 27000, 19000, 21000, 25000)) # Create table2 departmentsTable <- data.table(Department_ID = c(4, 11, 5, 17, 3), Department_Name = c("IT", "Finance", "Sales", "HR", "Marketing")) # Set the ON clause on Department_ID as the key of the tables setkey(employeesTable,Department_ID) setkey(departmentsTable,Department_ID) # Perform the right join by including matched rows # and remaining rows from the right rightOuterJoin <- employeesTable[departmentsTable] # Print the result of right join result print(rightOuterJoin)
Output
Employee_ID Employee_Name Department_ID Salary Department_Name 1: NA <NA> 3 NA Marketing 2: 3 Piyush 4 19000 IT 3: NA <NA> 5 NA Sales 4: 2 Jai 11 27000 Finance 5: 4 Hitesh 11 21000 Finance 6: NA <NA> 17 NA HR
Full Outer Join
A full outer join returns the matching records as well as the remaining records from the left table and the right table.
For example, consider a pair of tables table1 and table2 having some data. Then, we can represent the full outer join of both the tables in terms of the Venn diagram as −
Syntax
The full outer join is the basic join under data.table and follows the below-given syntax −
merge(table1, table2, all=TRUE)
Example
Now let’s perform the full outer join operation between tables Employees and Departments using data.table −
# Perform the full outer join fullOuterJoin <- merge(employeesTable, departmentsTable, all=TRUE) # Print the full outer join result print(fullOuterJoin)
Output
Department_ID Employee_ID Employee_Name Salary Department_Name 1: 1 1 Bhuwanesh 30000 <NA> 2: 3 NA <NA> NA Marketing 3: 4 3 Piyush 19000 IT 4: 5 NA <NA> NA Sales 5: 8 5 Anil 25000 <NA> 6: 11 2 Jai 27000 Finance 7: 11 4 Hitesh 21000 Finance 8: 17 NA <NA> NA HR
Conclusion
In this tutorial, we have discussed about how we can Join data in R with data.table. We discussed in detail different types of joins: Inner join, Left outer join, Right outer join, and Full outer join. I believe that this tutorial will strengthen your knowledge in the field of data science.
- Related Articles
- Data Manipulation in R with data.table
- How to create pivot table with sum for data stored in data.table object in R?
- How to convert a data frame to data.table in R?
- How to multiply corresponding row values in a data.table object with single row data.table object in R?
- How to deal with missing column for row names when converting data frame to data.table object in R?
- How to fill a data.table row with missing values in R?
- How to remove rows in a data.table object with NA's in R?
- Dealing with Missing Data in R
- How to replace NAs with 0 in an R list that contains data.table objects?
- How to multiply vector values in sequence with columns of a data.table object in R?
- How to create frequency table of data.table in R?
- How to repeat a whole data.table object in R?
- How to unsplit a split data.table object in R?
- How to check for duplicates in data.table object in R?
- How to standardize selected columns in data.table object in R?
