How to scan through a directory recursively in Python?


A directory is simply defined as a collection of subdirectories and single files; or either one of them. A directory hierarchy is constructed by organizing all the files and subdirectories within a main directory, also known as “root” directory. These subdirectories are separated using a “/” operator in a directory hierarchy.

Since the directory is organized in the form of a hierarchy/tree, scanning it can be likened to traversing a tree. And to do that, there are various ways.

  • Using os.walk() method

  • Using glob.glob() method

  • Using os.listdir() method

The directories are handled by the operating system; therefore, whenever one needs a status update on any directory it needs to be done using the os module.

Using os.walk() Method

The os.walk() function generates file names in a directory tree by walking it top-down or bottom-up. It returns a three-tuple for each directory in the tree rooted at directory top: (path, names, and filenames)

The path is a string that represents the path to the directory. The names variable contains a list of the names of the subdirectories in path that do not begin with '.' or '..' The filenames variable contains a list of the names of non-directory files in path.

Example

Here, let us use the os.walk() method to display all the files and subdirectories present in the current root directory.

import os
path = "."
for root, d_names, f_names in os.walk(path):
   print(root, d_names, f_names)

Output

Let us compile and run the program above, to produce the following result −

. [] ['main.py']

Example

We can also make a full path for each file. For that, we must use the os.path.join() method. This method will create a path for a file. These paths of each file can be appended together using the append() method as shown below.

import os
path = "./TEST"
fname = []
for root,d_names,f_names in os.walk(path):
   for f in f_names:
      fname.append(os.path.join(root, f))
print("fname = %s" %fname)

Output

The output for the program above is given as follows −

fname = []

Example

Using the os.walk() method, we can also choose to display what element of the return value tuple we want to print. Let us look at an example program below.

import os
for dirpath, dirs, files in os.walk("."):
   print(dirpath) # prints paths of all subdirectories present

for dirpath, dirs, files in os.walk("."):
   print(dirs) # prints the names of existing subdirectories

for dirpath, dirs, files in os.walk("."):
   print(files) # prints existing files in the current directory

Output

The output for the program above is as follows −

.
[]
['main.py']

Using glob.glob() Method

The glob module is used to get the pathnames matching a specific pattern in a specific directory. The glob.glob() method is used to search for all the pathnames containing the given path specification as an argument.

If the path specification is passed as an “*” (Asterisk), the method matches zero or more characters in the pathname; hence, it returns all the files present in the directory.

Example

Let us try to print the names of all the files and subdirectories present in the root directory using the glob() method. The example is shown below.

from pathlib import Path

root_directory = Path('.')
size = 0
for f in root_directory.glob("*"):
   print(f)

Output

main.py

Conclusion

We have discussed how to scan through a directory recursively in Python using the os.walk() function.

Updated on: 24-Feb-2023

9K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements