What does opening a file actually do on Linux?

When we are talking about opening a file, then we have different cases such as in what language and what API are we actually calling when opening a file. While in most of the cases it is quite simple, the higher level languages will eventually call either the C API or directly invoke the Linux open() function which is also written in C.

If we try to talk about different languages then this is a very broad question and cannot be covered in a one single article, and that is because of the sheer complexity that gets added to it when new languages and different API’s are added.

Now once establishing that we are talking about C and Linux, let’s first take a look at the C code that gets called when we open a file in Linux.

Consider the code shown below −

int sys_open(const char *filename, int flags, int mode) {
   char *tmp = getname(filename);
   int fd = get_unused_fd();
   struct file *f = filp_open(tmp, flags, mode);
   fd_install(fd, f);
   putname(tmp);
   return fd;
}

The above code can also be found inside the fs/open.c file on your linux machine.

Now, as we can see there are many functions that gets called from this function, like the first of them is the function named getname() in which we pass the filename as an argument and the code of the getname() function looks something like this −

#define __getname() kmem_cache_alloc(names_cachep, SLAB_KERNEL)
#define putname(name) kmem_cache_free(names_cachep, (void *)(name))

char *getname(const char *filename) {
char *tmp = __getname(); /* allocate some memory */
strncpy_from_user(tmp, filename, PATH_MAX + 1);
return tmp;
}

The above code can be found inside the fs/namei.c file and its main use is to copy the file name from the user space and pass it to the kernel space. Then after the getname() function we have the get_unused_fd() function which returns us an unused file descriptor, which is nothing but an integer index into a growable list of currently opened files. The code of the get_unused_fd() function looks something like this −

int get_unused_fd(void) {
   struct files_struct *files = current->files;
   int fd = find_next_zero_bit(files->open_fds,
                  files->max_fdset, files->next_fd);
   FD_SET(fd, files->open_fds); /* in use now */
   files->next_fd = fd + 1;
   return fd;
}

Now we have the filp_open() function that has the following implementation −

struct file *filp_open(const char *filename, int flags, int mode) {
   struct nameidata nd;
   open_namei(filename, flags, mode, &nd);
   return dentry_open(nd.dentry, nd.mnt, flags);
}

The above function plays two key roles, first, it uses the filesystem to look up the inode which corresponds to the filename of path that was passed in. Next, if creates a struct file with all the essential information about the inode and then returns the file.

Now, the next function in the call stack is the fd_install() function which can be found in the include/linux/file.h file. It is used to store the information returned by the function filp_open(). The code for the fd_install() function is shown below −

void fd_install(unsigned int fd, struct file *file) {
   struct files_struct *files = current->files;
   files->fd[fd] = file;
}

Then we have the store() function that stores the struct that was returned from the filp_open() function and then installs that struct into the process’s list of open files.

Next step is to free the allocated block of kernel-controlled memory. Finally, it return the file descriptor, which can then be passed to other C functions like close() , write() etc.

Mukul Latiyan

Updated on: 2021-07-31T12:22:04+05:30

680 Views

Kickstart Your Career

Get certified by completing the course

Get Started