Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to extract file extension using Python?
In a few scenarios, we need to extract the extension of a file to perform specific operations based on its type, such as validating image formats or filtering document files. Python provides different ways to achieve this using the os and pathlib modules. In this article, we'll explore how to get a file's extension with different approaches.
Using os.path.splitext()
The os.path.splitext() method of the os module in Python is used to split the file name into the name and extension. This method returns a tuple containing the filename without extension and the extension (including the dot).
Example
In this example, we are using the os.path.splitext() method to get the extension of the given file ?
import os
filename = "report.pdf"
name, extension = os.path.splitext(filename)
print("Filename:", name)
print("Extension:", extension)
The output of the above code is ?
Filename: report Extension: .pdf
Using pathlib.Path.suffix
The pathlib module provides a more object-oriented way to work with filesystem paths. The Path.suffix attribute returns the file's extension, including the dot.
Example
Here is an example using pathlib.Path.suffix to extract the extension of a file ?
from pathlib import Path
file_path = Path("image.jpeg")
extension = file_path.suffix
filename = file_path.stem
print("Filename:", filename)
print("Extension:", extension)
Following is the output of the above program ?
Filename: image Extension: .jpeg
Using the String split() Method
We can also use the split() method of Python string to extract the extension manually. This approach is simple but may not handle all edge cases, such as multiple dots in filenames.
Example
Below is an example where we extract the extension using the split() method ?
filename = "archive.tar.gz"
parts = filename.split(".")
extension = parts[-1]
print("Filename:", ".".join(parts[:-1]))
print("Extension:", extension)
Here is the output of the above program ?
Filename: archive.tar Extension: gz
This method only gives the part after the last dot and does not include the dot in the extension.
Extracting Extension Without the Dot
If you want to extract just the extension suffix without the dot (such as py, txt, docx), you can use string slicing to remove the first character.
Example
The following program demonstrates how to extract extensions without the dot using both methods ?
import os
from pathlib import Path
# Using os.path.splitext()
filename = "document.txt"
name, extension = os.path.splitext(filename)
print("Using os.path.splitext():", extension[1:])
# Using pathlib.Path.suffix
file_path = Path("document.txt")
print("Using pathlib.Path.suffix:", file_path.suffix[1:])
The output of the above code is ?
Using os.path.splitext(): txt Using pathlib.Path.suffix: txt
Comparison
| Method | Includes Dot? | Best For | Handles Complex Extensions? |
|---|---|---|---|
os.path.splitext() |
Yes | Traditional file handling | Yes |
pathlib.Path.suffix |
Yes | Modern object-oriented approach | Yes |
split() |
No | Simple cases | Limited |
Conclusion
Use pathlib.Path.suffix for modern Python code as it's more readable and object-oriented. Use os.path.splitext() for compatibility with older code. The split() method works for simple cases but may not handle complex extensions properly.
