Find last Directory or file from a given path


Overview

We often use shell scripts or work with Linux commands when handling paths. Extracting the last part of a given file name is a fairly common task.

For example, if we're trying to access /tmp/dir/target, then we want to be able to access target as a file name.

Yes, this looks easy enough. But there might be some edge cases that could cause us to fail.

We’ll take a close look at this problem and explore some common solutions.

Discussion of Common Solutions

We know that Linux file systems don't allow slashes (/) to be parts of filenames or directories.

So, if we consider the input path string as a list of comma−separated values, then we can simply use the last element to solve the issue.

We can use various commands to accomplish our tasks, including grep, which helps us to filter out certain lines from text files; awk, which allows us to manipulate text files; and so on.

$ sed 's#.*/##' <<< "/tmp/dir/target"
target
$ awk -F'/' '{print $NF}' <<< "/tmp/dir/target"
target
$ grep -o '[^/]*$' <<< "/tmp/dir/target"
target

We can use Bash’ s parameter expansion to solve the problem.

$ INPUT="/tmp/dir/target"
$ echo ${INPUT##*/}
target

There might be lots of other similar CLI tools out there but are they really stable enough for production use?

If you use /tmp/dir/target/, then none of the approaches above will work because they assume that the last character is not a slash.

$ sed 's#.*/##' <<< "/tmp/dir/target/"
( empty output )
$ awk -F'/' '{print $NF}' <<< "/tmp/dir/target/"
( empty output )
$ grep -o '[^/]*$' <<< "/tmp/dir/target/"
( empty output )
$ INPUT="/tmp/dir/target/"
$ echo ${INPUT##*/}
( empty output )

We might want to fix the solutions above so they handle both slash and backslash cases. For example, we could modify the awk solution to be something like −

$ awk -F'/' '{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target"
target
$ awk -F'/' '{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target/"
target

A fixed awk one-line command could be used for most situations but there are still edge case scenarios where it might not work.

Let’s now take a closer examination of them.

Looking Into the Corner Cases

We've seen that a Linux file system can be represented by a set of paths. Now, we're going to look at some other possible patterns for these paths.

First, in Linux / is the topmost directory. It contains all other directories and files. Therefore, / is a valid path string for any file or directory.

Furthermore, most Linux filesystem types allow spaces to be used as part of filenamestrings or directorynames. Therefore, it‘s also a valid path if a file or adirectory is called by “”.

Let's now look at all the possible patterns for a Linux path and see if we get the correct output.

Input

Expected Output

“/tmp/dir/target“

“target“

“/tmp/dir/target/“

“target“

“/“

“/“

“/tmp/dir/ “

” “

“/tmp/dir/ /“

” “

We could still extend the awk command to cover all the cases, or write a bash script for the task.

We'll use an awks one-liner here as an example −

$ awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target"
target
$ awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/target/"
target
$ awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/"
/
$ echo "^$( awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/ " )\$"
^ $
$ echo "^$( awk -F'/' '$0==FS{ print $0; next }{ a = length($NF) ? $NF : $(NF-1); print a }' <<< "/tmp/dir/ /" )\$"
^ $

We use ^ and $ to indicate where the expected results are printed.

We can see that the awk one-liners works for all the cases but comparing them to the first version ( awk -F’/' '{printf "%s",$NF}'), they are quite complex now.

Actually, the coreutils packages provide a convenient command for solving our problem.

Using the basename Command

The basename command strips off the directory names from a given path string.

Furthermore, it’s fairly stable and covers all the edge case scenarios. Let’s now do some testing with different input values.

$ basename "/tmp/dir/target"
target
$ basename "/tmp/dir/target/"
target
$ basename "/"
/
$ echo "^$(basename '/tmp/dir/ ')\$"
^ $
$ echo "^$(basename '/tmp/dir/ /')\$"
^ $

The basename command solves the problem by renaming the file.

You may want to mention that the basename command (which strips off the final component) has a sibling named dirnme (which removes the first component).

$ dirname "/tmp/dir/target"
/tmp/dir

If we want to handle paths, we can first think about whether basename and/or directory name can solve our problems. Usually, solutions using these two commands are stable, and they're easier to read.

Awks are powerful tools, but they don't always cover every case. If you're using them in scripts, be careful not to overlook any edge cases.

Conclusion

We've explored the issue of extracting the last component from a path string.

The simple problem has multiple solutions. We've found a single awk one−liner that covers all those cases.

We've also discussed a simpler way to solve the problem: using the basename function

Updated on: 23-Dec-2022

517 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements