- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Parallel Programming in R
Parallel programming is a software development practice that involves dividing a computation or task into smaller parts that can be executed concurrently or in parallel. Parallel programming can help improve the performance and efficiency of your R code by utilizing multiple processors or cores in a computer or cluster. The main concept of parallel programming is, if one operation can be performed in S seconds using a single processor, then it should be able to get executed in S / N seconds when N processors are involved.
Need for Parallel Programming in R
Most of the time the code in R works fast on a single core only. But sometimes operations can −
Consume too much CPU time.
Occupy too much memory space.
Consumes too much time to read from or write into a disk.
Takes a lot of time for transferring.
Hidden Parallelism
R provides us with robust support of libraries. Sometimes, we do parallel programming even without knowing it. This is because nowadays R provides such libraries that offer built-in parallelism and we can use them in the background. Such a kind of hidden parallelism improves our programming efficiency. But it is nice to have the knowledge of what is happening actually (even behind the scenes).
Let us consider an example of hidden parallelism
Parallel blass
The basic linear algebra subroutines (BLAS) library is custom-coded in R for a particular type of CPU in order to take benefit of the architecture of the chipset. It is always beneficial to have an optimized BLAS as it improves the performance of execution.
Embarrassing Parallelism
Embarrassing parallelism is a common methodology in statistics and data science. It is capable to tackle many problems in data science and statistics. In this type of parallelism, the problem is divided into multiple independent sections and all are executed simultaneously as they don’t have any link with each other.
Syntax
Embarrassing parallelism is achievable in R using the lapply() function. This function has the following syntax −
lapply(list, function)
Example
It accepts a list and a function. It returns a list whose length is equal to the input listLet us consider a program illustrating the working of this function −
# Creating a list myList <- list(data1 = 1:5, data2 = 10:15) # Use lapply() function and # calculate the mean lapply(myList, mean)
Output
$data1 [1] 3 $data2 [1] 12.5
As you can see in the output, mean values for list elements have been displayed.
The lapply() function works similarly to the loop-it cycle where we iterate over each of the elements of the list and apply the function to it.
Now let us get more insights into what is happening actually −
We iterate each element one by one and that is why the other elements just sit idle in the memory while we apply the function to a single element of the list. We can be parallelized this thing in R. The main idea is to divide list objects and put them into multiple processors and then we can apply the function to all the subsets of the list simultaneously.
So, we can achieve parallelism using the following steps −
Break the list into multiple processors.
Clone the supplied function into multiple processors.
Apply the function to multiple cores simultaneously.
Combine the result from multiple cores into a single list.
Display the result.
Parallel Programming package in R
The parallel package in R comes with the installation of R. This package comes as a combination of two packages: snow and multicore in R.
The parallel package is specifically used to deliver tasks to each of the cores in a parallel way. Specifically, it is carried out by mclapply() function. The mclapply() function is analogous to lapply but the former is capable of distributing the task to multiple processors. The mclapply() function also collects the results from the function calls, combine them, and returns the result as a list having the length same as the original list. Note that R allows us detectCores() function using which we can get the number of cores present in the system.
Let us consider the following program illustrating the working of mclapply() function −
Note − Please note that the value of “mc.cores” greater than one works only in a non-window operating system. So, the below code is executed in an operating system other than windows.
Example
# Import library library(parallel) library(MASS) # Creating a list myList <- list(data1 = 1:10000000, data2 = 1:100000000) cat("The estimated time using lapply() function:
") # Calculate the time taken using lapply system.time( results <- lapply(myList, mean) ) # Get the number of cores numberOfCores <- detectCores() cat("The estimated time using clapply() function:
") # Calculate the time taken using lapply() using mclapply() system.time( results <- mclapply(myList, mean, mc.cores = numberOfCores) )
Output
The estimated time using lapply() function: user system elapsed 0.40 0.00 0.43 The estimated time using clapply() function: user system elapsed 0.12 0.00 0.17
You can see in the output the difference in times while using apply() and mcapply() function.
Parallel Programming using foreach and doParallel packages
Now we will see how we can implement parallel programming using foreach library in R. But before going into it let us see how a basic for loop works in R −
Example
# Iterate using the for loop from 1 to 5 # And print the square of each number for (data in 1:5) { print(data * data) }
Output
[1] 1 [1] 4 [1] 9 [1] 16 [1] 25
As you can see in the output, the square of each number from 1 to 5 displayed on the console.
Foreach Package
Now let us talk about foreach package and method. The foreach package provides us foreach() method using which we can easily achieve parallel programming.
Syntax
If you haven’t installed foreach library yet in your system, then use the following command in CRAN’s terminal −
install.packages("foreach")
The foreach method is similar to a basic for loop method but the former uses %do% operator which means running a specific type of expression. Both differ in term of the return data structure as well.
Example
Consider the following program that illustrates the working of the foreach method −
# Import foreach library library(foreach) # Iterate using the foreach loop from 1 to 5 # And print the square of each number foreach (data=1:5) %do% { data * data }
Output
[[1]] [1] 1 [[2]] [1] 4 [[3]] [1] 9 [[4]] [1] 16 [[5]] [1] 25
As you can see in the output, the square of each number from 1 to 5 displayed on the console.
doParallel Package
The doParallel package provides us %dopar% operator which we can be used with foreach. By using this operator along with foreach we will be able to use different processing cores for each iteration. You may download the “doParallel” package using the following command in CRAN −
install.packages("doParallel")
Example
Now let us consider the following program demonstrates the working of foreach method along with %dopar% operator -
# Import foreach library library(foreach) library(doParallel) library(MASS) # Get the total number of cores numOfCores <- detectCores() # Register all the cores registerDoParallel(numberOfCores) # Iterate using the for loop from 1 to 5 # And print the square of each number # Using parallelism foreach (data=1:5) %dopar% { print(data * data) }
Output
[[1]] [1] 1 [[2]] [1] 4 [[3]] [1] 9 [[4]] [1] 16 [[5]] [1] 25
The square of each number from 1 to 5 is displayed on the console.
Conclusion
In this tutorial, we discussed parallel programming in R. We talked about libraries like foreach and doParallel using which parallel programming is achievable in R. We saw the working of functions like mcapply() also. Parallel programming is one of the most important concepts for any programming language and I believe that this tutorial has surely helped to gain good knowledge in the field of data science.
- Related Articles
- Defensive R Programming
- Object-Oriented Programming in R
- What are the relationships between programming languages and parallel architectures?
- How to match and replace column names stored in R data frames in R-Programming?
- R programming to subtract all values in a vector from all values in another vector.
- R Programming how to display both axes’ labels of a ggplot2 graph in italics?
- R Programming to find the column name for row maximum in a data.table object.
- R programming to find the sum of corresponding elements in all matrix’s stored in a list.
- Programming in Java.
- Resistors in Parallel
- Capacitors in Parallel
- In the figure, $AB \parallel CD \parallel EF$ and $GH \parallel KL$. Find $\angle HKL$."\n
- Stack in Java Programming
- Literals in Java programming
- Overloading in java programming
