Parallel Processing in Python


Introduction

The effective completion of computationally difficult jobs is essential for developers and data scientists in today's fast−paced digital environment. Fortunately, Python offers strong capabilities for parallel processing because of its adaptability and wide ecosystem. We can get large performance improvements by splitting up difficult issues into smaller, more manageable activities that can be carried out concurrently.

Python's parallel processing features allow us to work more quickly and effectively on activities like web scraping, scientific simulations, and data analysis by utilizing the available computer resources. We'll set off on a voyage via Python parallel processing in this post. We will examine many approaches, including multi−processing, asynchronous programming, and multi−threading, and learn how to use them efficiently to get around performance barriers in our systems. Join us as we realize the full power of Python's parallel processing and reach new heights of performance and productivity.

Understanding Parallel Processing

Splitting a job into smaller subtasks and running them concurrently on several processors or cores is known as parallel processing. Parallel processing may considerably reduce the total execution time of a program by effectively leveraging the available computing resources. Asynchronous programming, multi−processing, and multi−threading are only a few of the parallel processing methods that Python provides.

Multi−Threading in Python

Using the multi−threading approach, numerous threads inside a single process run simultaneously while sharing the same memory. Multi−threading may be easily implemented using Python's threading module. Multi−threading in Python, however, might not necessarily result in a speedup for CPU−bound operations because of the Global Interpreter Lock (GIL), which only permits one thread to execute Python bytecode at a time. Multi−threading, however, can be useful for I/O−bound jobs since it allows threads to run other operations while they wait for I/O operations to finish.

Let's look at an illustration where we use multi−threading to download many web pages:

Example

import threading import requests 
 
def download_page(url): 
    response = requests.get(url)    
print(f"Downloaded {url}") 
 
urls = [ 
    "https://example.com", 
    "https://google.com", 
    "https://openai.com" 
] 
 
threads = [] 
 for url in 
 urls: 
    thread = threading.Thread(target=download_page,
args=(url,))     thread.start()    threads.append(thread) 
 
for thread in threads: 
    thread.join() 

Output

Downloaded https://example.com 
Downloaded https://google.com 
Downloaded https://openai.com 

Multiple downloads may take place simultaneously thanks to the code snippet above, which downloads each URL in its own thread. The join() function makes sure that the main thread waits until each thread is finished before moving on.

Multi−Processing in Python

Multi−processing, as opposed to multi−threading, offers genuine parallelism by using several processes, each with their own memory space. A high−level interface for implementing multi−processing is provided by the multiprocessing module of Python. Multiprocessing is appropriate for CPU−bound activities since each process runs in a distinct Python interpreter, avoiding the GIL multi−threading restriction.

Multiprocessing is used in the code below.The burden is divided among the available processes by the map() method once the pool class generates a pool of worker processes. The results list is a collection of the results.

Take into account the example below, where we use multi−processing to square each integer in a list:

Example

import multiprocessing 
 
def square(number):    
return number ** 2 
 
numbers = [1, 2, 3, 4, 5] 
 
with multiprocessing.Pool() as pool: 
    results = pool.map(square, numbers) 
 
print(results) 

Output

[1, 4, 9, 16, 25] 

Asynchronous Programming in Python

By leveraging non−blocking operations, asynchronous programming enables the efficient execution of I/O−bound processes. Coroutines, event loops, and futures may all be used to create asynchronous code in Python thanks to the asyncio package. Asynchronous programming has become increasingly important with the rise in popularity of online applications and APIs.

The fetch_page() coroutine in the code sample below utilizes aiohttp to asynchronously fetch web pages. The main() method generates a list of jobs, which are then executed simultaneously by using asyncio.gather(). To wait for tasks to finish and receive the results, use the await keyword.

Let's look at an example of asynchronously obtaining many web pages using asyncio and aiohttp:

Example

import asyncio 
import aiohttp 
 
async def fetch_page(url):     async with aiohttp.ClientSession() as session:         async with session.get(url) as response: 
            return await response.text() 
 
async def main(): 
    urls = [ 
        "https://example.com", 
        "https://google.com", 
        "https://openai.com" 
    ] 
 
    tasks = [fetch_page(url) for url in urls]     pages = await asyncio.gather(*tasks)     
print(pages) 
 
asyncio.run(main()) 

Output

['<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta 
charset="utf-8" />\n    <meta http-equiv="Content-type"content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initialscale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n  margin: 0;\n        padding: 0;\n        font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n  padding: 50px;\n        background-color: #fff;\n        border-radius: 1em;\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (maxwidth: 700px) {\n        body {\n            background-color: #fff;\n        }\n        div {\n  width: auto;\n            margin: 0 auto;\n            border-radius: 0;\n            padding: 1em;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href="https://www.iana.org/domains/example">More information...</a></p>\n</div>\n</body>\n</html>', '<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/logos/doodles/2021/mom-
and-dad-6116550989716480.2-law.gif" itemprop="image"><link href="/logos/doodles/2021/mom-and-dad-6116550989716480.2-law.gif" rel="icon" type="image/gif"><title>Google</title><script nonce="sJwM0Ptp5a/whzxPtTD8Yw==">(function(){window.google={kEI:'cmKgYY37A7 K09QPhzKuACw',kEXPI:'1354557,1354612,1354620,1354954,1355090,1355493,13556
83,3700267,4029815,4031109,4032677,4036527,4038022,4043492,4045841,4048347,4
048490,4052469,4055589,4056520,4057177,4057696,4060329,4060798,4061854,4062 531,4064696,406 '

Choosing the Right Approach

Python's parallel processing techniques vary depending on the particulars of the task at hand. Here are some guidelines to help you make an educated decision:

For I/O−bound activities, where the majority of the execution time is spent waiting for input/output operations, multi−threading is appropriate. It works well for tasks like downloading files, using APIs, and operating on files. Due to Python's Global Interpreter Lock (GIL), multi−threading may not considerably speed up CPU−bound activities.

Multi−processing, on the other hand, is ideal for CPU−bound tasks involving intensive computations. It enables true parallelism by utilizing multiple processes, each with its own memory space, and bypasses the limitations of the GIL. However, it incurs additional overhead in terms of memory consumption and inter−process communication.

For I/O−bound activities involving network operations, asynchronous programming, performed using libraries like asyncio, is useful. It makes use of non−blocking I/O operations so that jobs may go forward without having to wait for each operation to finish. This method effectively manages several concurrent connections, making it appropriate for network server development, web API interactions, and web scraping. Asynchronous programming minimizes I/O operation wait times, ensuring responsiveness and scalability.

Conclusion

Python's capability for parallel processing offers chances to increase the effectiveness of tasks that need complex computing. Whether you choose to employ multi−threading, multi−processing, or asynchronous programming, Python provides the necessary tools and modules to effectively leverage concurrency. You may maximize the benefits of parallel processing and shorten execution times by comprehending the nature of your activity and choosing the proper technique. So go ahead, explore, and utilize Python's parallelism to its fullest potential to create applications that are quicker and more productive.

Updated on: 25-Jul-2023

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements