Drop Empty Columns in Pandas


Pandas data frame is a very powerful data manipulation tool. It is a tabular data structure consisting of rows and columns. The size of this 2-D matrix can be variable depending upon the complexity of the dataset. We can use different type of sources to create a data frame ranging from databases to files.

The columns in a pandas data frame represents a series of information and it can be an integer, float, or string. We can perform numerous operations on these columns including deletion, indexing, filtering etc. In this article, we will perform one such basic operation of dropping/removing of empty columns from a pandas data frame.

Firstly, let’s understand what empty columns are in a data frame.

Creating the Data Frame with Empty Columns

We create a data frame for analysing the data by taking advantage of the computing techniques. Each column consists of a piece of data and it holds some significance. In case of complex datasets, the generated data frame might contain some empty columns which degrades the relevance of the data frame. In order to produce an optimized data frame, we tend to eliminate this kind of unnecessary data from it.

If a column consists of “NaN” (Not a number) values, then it is considered as “empty”. A column consisting of “empty spaces” and “zero” values are not “empty” in nature because an “empty space” and a “zero value” both signifies something about the dataset.

When we create a data frame and do not pass any data to the column, an empty column is created. We can drop both regular and empty columns with the help of “dataframe.drop()” method but for specific dropping of empty columns we use “dataframe.dropna()” method. Let’s create a data frame with “NaN” values and then begin with the dropping operation.

Example

We imported the “pandas” and “numpy” libraries and then passed a dictionary dataset consisting of information related to different hostels.

We created the data frame with the help of “DataFrame()” function and passed a list of values for labelling the rows.

In the dataset we assigned NaN values to the “Hostel location” column with the help of numpy library and finally printed the data frame.

import pandas as pd
import numpy as np

dataset = {"Hostel ID":["DSC224", "DSC124", "DSC568", "DSC345"], "Hostel Rating":[8, 6, 10, 5], "Hostel price":[35000, 32000, 50000, 24000], "Hostel location": [np.nan, np.nan, np.nan, np.nan]}

dataframe = pd.DataFrame(dataset, index= ["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"])
print(dataframe)

Output

          Hostel ID   Hostel Rating      Hostel price      Hostel location
Hostel 1    DSC224              8         35000              NaN
Hostel 2    DSC124              6         32000              NaN
Hostel 3    DSC568             10         50000              NaN
Hostel 4    DSC345              5         24000              NaN

Using dropna() Method to Drop Empty Columns

Let’s apply dropna() method to the pervious data frame.

Example

After creating the data frame, we used the “dropna()” function to drop all the columns with NaN values.

Since we are operating on the columns, we specified the axis value as “1” and then the dropping logic was programmed by assigning the “how” value as “all”. It means that a column will be dropped only if all of its values are “NaN”.

At last, we created and printed a new data frame with non “NaN” values.

import pandas as pd
import numpy as np
dataset = {"Hostel ID":["DSC224", "DSC124", "DSC568", "DSC345"], "Hostel Rating":[8, 6, 10, 5], "Hostel price":[35000, 32000, 50000, 24000], "Hostel location": [np.nan, np.nan, np.nan, np.nan]}
dataframe = pd.DataFrame(dataset, index= ["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"])
print(dataframe)
Emp_drop = dataframe.dropna(how= "all", axis=1)
print("After dropping the empty columns using dropna() we get: -")
print(Emp_drop)

Output

         Hostel ID     Hostel Rating     Hostel price     Hostel location
Hostel 1    DSC224              8         35000              NaN
Hostel 2    DSC124              6         32000              NaN
Hostel 3    DSC568             10         50000              NaN
Hostel 4    DSC345              5         24000              NaN
After dropping the empty columns using dropna() we get: -
           Hostel ID     Hostel Rating   Hostel price
Hostel 1    DSC224              8         35000
Hostel 2    DSC124              6         32000
Hostel 3    DSC568             10         50000
Hostel 4    DSC345              5         24000

Note − If we want to make changes to the current data frame instead of creating a new one, we use the “inplace” clause.

dataframe.dropna(how= "all", axis=1, inplace=True)
print(dataframe)

Using notnull() Method to Drop Empty Columns

After creating the data frame, we used the notnull() method along with the loc() function to filter and select those columns with “NaN” values. We specified the axis of evaluation and printed the data frame with non “NaN” values.

Example

import pandas as pd
import numpy as np
dataset = {"Hostel ID":["DSC224", "DSC124", "DSC568", "DSC345"], "Hostel Rating":[8, 6, 10, 5], "Hostel price":[35000, 32000, 50000, 24000], "Hostel location": [np.nan, np.nan, np.nan, np.nan]}
dataframe = pd.DataFrame(dataset, index= ["Hostel 1", "Hostel 2", "Hostel 3", "Hostel 4"])
print(dataframe)
dataframe = dataframe.loc[:, dataframe.notnull().any(axis=0)]
print("Using notnull() method to remove empty columns: -")
print(dataframe)

Output

         Hostel ID  Hostel Rating  Hostel price  Hostel location
Hostel 1    DSC224              8         35000              NaN
Hostel 2    DSC124              6         32000              NaN
Hostel 3    DSC568             10         50000              NaN
Hostel 4    DSC345              5         24000              NaN
Using notnull() method to remove empty columns: -
         Hostel ID  Hostel Rating  Hostel price
Hostel 1    DSC224              8         35000
Hostel 2    DSC124              6         32000
Hostel 3    DSC568             10         50000
Hostel 4    DSC345              5         24000

Conclusion

In this article, we strolled through the different methods of dropping empty columns i.e., columns consisting of “NaN” values. We discussed about the “dropna()” method and “notnull()” method and how they are implemented to remove empty columns from the data frame. We also understood the importance of getting rid of this unnecessary data and how it increases the relevance of the data frame.

Updated on: 05-May-2023

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements