Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Memory-mapped file support in Python (mmap)?
Python's mmap module provides memory-mapped file support, allowing you to map files directly into memory for efficient reading, writing, and searching. Instead of making system calls like read() and write(), memory mapping loads file data into RAM where you can manipulate it directly.
Basic Memory Mapped File Reading
Memory mapping loads the entire file into memory as a file-like object. You can then slice and access specific portions efficiently ?
import mmap
# Create a sample file first
with open('sample.txt', 'w') as f:
f.write('The emissions from gaseous compounds are harmful to environment')
def read_mmap(fname):
with open(fname, mode="r", encoding="utf8") as fobj:
with mmap.mmap(fobj.fileno(), length=0, access=mmap.ACCESS_READ) as mmap_obj:
print(mmap_obj[4:26])
read_mmap('sample.txt')
b'emissions from gaseous'
Finding Text Using mmap
Memory mapping provides faster search operations compared to regular file I/O for large files ?
import mmap
import time
# Create a larger sample file
with open('large_sample.txt', 'w') as f:
f.write('The quick brown fox jumps over the lazy dog. ' * 1000)
f.write('Death is the end of life.')
def regular_io_find(fname):
with open(fname, mode="r", encoding="utf-8") as fobj:
text = fobj.read()
return text.find("Death")
def mmap_io_find(fname):
with open(fname, mode="r", encoding="utf-8") as fobj:
with mmap.mmap(fobj.fileno(), length=0, access=mmap.ACCESS_READ) as mmap_obj:
return mmap_obj.find(b"Death")
# Test regular I/O
start_time = time.time()
position = regular_io_find('large_sample.txt')
regular_time = time.time() - start_time
print(f"Regular I/O found 'Death' at position: {position}")
print(f"Regular I/O time: {regular_time:.6f} seconds")
# Test mmap I/O
start_time = time.time()
position = mmap_io_find('large_sample.txt')
mmap_time = time.time() - start_time
print(f"mmap found 'Death' at position: {position}")
print(f"mmap time: {mmap_time:.6f} seconds")
Regular I/O found 'Death' at position: 44000 Regular I/O time: 0.002156 seconds mmap found 'Death' at position: 44000 mmap time: 0.000312 seconds
Writing to Memory Mapped Files
You can modify files in-place using mmap with ACCESS_WRITE mode. Changes are written directly to the file ?
import mmap
# Create a file to modify
with open('modify_sample.txt', 'w') as f:
f.write('The original text that will be modified')
def mmap_io_write(fname):
with open(fname, mode="r+") as fobj:
with mmap.mmap(fobj.fileno(), length=0, access=mmap.ACCESS_WRITE) as mmap_obj:
# Replace bytes 4-12 with "CHANGED!"
mmap_obj[4:12] = b"CHANGED!"
mmap_obj.flush()
print("Before modification:")
with open('modify_sample.txt', 'r') as f:
print(f.read())
mmap_io_write('modify_sample.txt')
print("\nAfter modification:")
with open('modify_sample.txt', 'r') as f:
print(f.read())
Before modification: The original text that will be modified After modification: The CHANGED! text that will be modified
Access Modes
The mmap module provides three access modes for different use cases ?
| Access Mode | Description | File Mode Required |
|---|---|---|
ACCESS_READ |
Read-only access | "r" |
ACCESS_WRITE |
Write changes to file | "r+" |
ACCESS_COPY |
Copy-on-write (changes in memory only) | "r" |
Key Benefits
Performance: Memory mapping avoids copying data between user and kernel space, making it faster for large files.
Memory Efficiency: The OS can page file contents in and out of memory as needed, rather than loading the entire file.
Shared Access: Multiple processes can map the same file, enabling efficient inter-process communication.
Conclusion
Memory-mapped files provide efficient access to file data by mapping it directly into memory. Use mmap for large files where you need random access, searching, or in-place modifications for better performance than traditional file I/O.
