Python multiprocessing is a module that allows developers to run multiple processes simultaneously. It makes it easier to execute tasks in parallel. It leverages the full power of multi-core processors. By default, Python’s Global Interpreter Lock (GIL) restricts execution to a single thread at a time, even on multi-core systems. However, multiprocessing overcomes this limitation by creating separate processes, each with its own Python interpreter and memory space.
Key Concepts
- Processes vs. Threads: While threading is suitable for I/O-bound tasks, multiprocessing is better for CPU-bound tasks. Each process runs independently and can execute code in parallel. This makes it ideal for tasks like data analysis. It is also ideal for image processing or mathematical computations.
- Process Creation: Using the
multiprocessing.Process
class, you can create a new process by defining a target function to run in parallel. This process will execute independently of the main program.
import multiprocessing
def worker():
print("Worker function is running")
if __name__ == "__main__":
process = multiprocessing.Process(target=worker)
process.start()
process.join()
- Communication Between Processes: Since processes have separate memory spaces, sharing data between them requires mechanisms like Queues or Pipes. These tools allow processes to exchange data safely.
- Process Pooling: The
Pool
class simplifies task distribution by managing a pool of worker processes. This is particularly useful when you need to run a large number of tasks concurrently.
from multiprocessing import Pool
def square(x):
return x * x
if __name__ == "__main__":
with Pool(4) as p:
print(p.map(square, [1, 2, 3, 4]))
- Benefits of Multiprocessing:
- Improved performance: It allows parallel execution on multiple CPU cores, making it faster for CPU-intensive tasks.
- Avoids GIL limitations: Unlike threading, multiprocessing does not suffer from the GIL, which ensures better utilization of system resources.
- Fault isolation: Since processes are independent, an error in one process doesn’t affect others.
Use Cases
Multiprocessing is commonly used in scenarios like data processing, scientific computations, and web scraping. It is also used for other tasks that require high CPU usage. It is also helpful when running separate tasks that do not need to share data in real time. It is essential when avoiding Python’s GIL bottleneck.