Drop columns in DataFrame by label Names or by Index Positions


A pandas data frame is a 2D data structure consisting of a series of entities. It is very useful in the analysis of mathematical data. The data is arranged in a tabular manner with each row behaving as an instance of the data.

A Pandas data frame is special because it is empowered with numerous functions making it a very powerful programming asset. Each column in a data frame represents a series of information which is labelled. In this article, we will operate on these columns and discuss the various methods to drop columns in a pandas data frame.

Dropping of a single or multiple columns can be achieved by either specifying the column name or with the help of their index value. We will understand both of these method but firstly we have to prepare a dataset and generate a data frame.

Creating The Data Frame

While creating a data frame we can assign column names and row names to our table. This procedure is important as it specify the “label names” and “index values”.

Here, we imported the pandas library as “pd” and then passed the dataset using a dictionary of lists. Each key represents a column data and the value associated with it is passed in the form of a list. We created the data frame using pandas “DataFrame()” function. We assigned the row labels to the data frame with the help of “index” parameter. Now let’s drop the columns using column names.

Example

import pandas as pd
dataset = {"Employee ID":["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], "Age":[25, 28, 27, 26, 25], "Salary":[200000, 250000, 180000, 300000, 280000], "Role":["Junior Developer", "Analyst", "Programmer", "Senior Developer", "HR"]}
dataframe = pd.DataFrame(dataset, index=["Nimesh", "Arjun", "Mohan", "Ritesh", "Raghav"])
print(dataframe)

Output

         Employee ID  Age  Salary              Role
Nimesh       CIR45   25  200000     Junior Developer
Arjun        CIR12   28  250000           Analyst
Mohan        CIR18   27  180000        Programmer
Ritesh       CIR50   26  300000     Senior Developer
Raghav       CIR28   25  280000                HR

Using Column Names and Drop() Method

After generating the data frame, we used the “dataframe.drop” method to remove the “Salary” and “Role” columns from the data frame. We passed these column names in a list.

We specified the “axis” value as 1 because we are operating on the column axis. At last, we stored this new data frame in a variable “colDrop” and printed it.

Example

import pandas as pd
dataset = {"Employee ID":["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], "Age":[25, 28, 27, 26, 25], "Salary":[200000, 250000, 180000, 300000, 280000], "Role":["Junior Developer", "Analyst", "Programmer", "Senior Developer", "HR"]}
dataframe = pd.DataFrame(dataset, index=["Nimesh", "Arjun", "Mohan", "Ritesh", "Raghav"])
print(dataframe)
colDrop = dataframe.drop(["Role", "Salary"], axis=1)
print("After dropping the Role and salary column:")
print(colDrop)

Output

       Employee ID  Age  Salary              Role
Nimesh       CIR45   25  200000  Junior Developer
Arjun        CIR12   28  250000           Analyst
Mohan        CIR18   27  180000        Programmer
Ritesh       CIR50   26  300000  Senior Developer
Raghav       CIR28   25  280000                HR
After dropping the Role and salary column:
       Employee ID  Age
Nimesh       CIR45   25
Arjun        CIR12   28
Mohan        CIR18   27
Ritesh       CIR50   26
Raghav       CIR28   25

Using Index Values and Drop() Method

We can use the index positions to lock the columns that we want to remove.

Example

Here, we simply used the “dataframe.columns” method along with “dataframe.drop()” to specify the index positions of the columns to be dropped. We passed the “[[2,3]]” argument to drop the “Salary” and “role” columns.

Now that we have discussed both the basic methods for dropping columns, let’s discuss some extended concepts.

colDrop = dataframe.drop(dataframe.columns[[2, 3]], axis=1)
print("After dropping salary and role: -")
print(colDrop)

Output

After dropping salary and role: -
         Employee ID  Age
Nimesh       CIR45    25
Arjun        CIR12    28
Mohan        CIR18    27
Ritesh       CIR50    26
Raghav       CIR28    25

Dropping a Range of Columns from the Data Frame

In the above discussed examples, we only dropped specific columns (Salary& Role) but as we all know pandas offers numerous facilities to the programmer and therefore we can use it to create a range of columns to be dropped. Let’s implement this logic.

Using iloc() Function

After generating the data frame, we used the “iloc() function” to select a range of columns and remove it from the data frame. The “iloc()” function takes an index range for both rows and columns. The range for rows was set to “[0:0]” and for columns it was “[1:4]”. Finally we use “dataframe.drop()” method to drop these columns.

Example

import pandas as pd
dataset = {"Employee ID":["CIR45", "CIR12", "CIR18", "CIR50", "CIR28"], "Age":[25, 28, 27, 26, 25], "Salary":[200000, 250000, 180000, 300000, 280000], "Role":["Junior Developer", "Analyst", "Programmer", "Senior Developer", "HR"]}
dataframe = pd.DataFrame(dataset, index=["Nimesh", "Arjun", "Mohan", "Ritesh", "Raghav"])
print(dataframe)
colDrop = dataframe.drop(dataframe.iloc[0:0, 1:4],axis=1)
print("Dropping a range of columns from 'Age' to 'Role' using iloc() function")
print(colDrop)

Output

        Employee ID  Age  Salary              Role
Nimesh       CIR45   25  200000     Junior Developer
Arjun        CIR12   28  250000           Analyst
Mohan        CIR18   27  180000         Programmer
Ritesh       CIR50   26  300000    Senior Developer
Raghav       CIR28   25  280000                HR
Dropping a range of columns from 'Age' to 'Role' using iloc() function
         Employee ID
Nimesh       CIR45
Arjun        CIR12
Mohan        CIR18
Ritesh       CIR50
Raghav       CIR28

Using loc() Function

If we want to use labels instead of indices for creating a range, we use “loc() function”.

Example

We created a range with the help of “loc()” function. Unlike iloc(), it includes the last column. The “loc()” function selects the columns by taking the column names as the argument. At last, we printed the new data frame with the remaining columns.

colDrop = dataframe.drop(dataframe.loc[:, "Age": "Role"].columns, axis=1)
print("Dropping a range of columns from Age to Role using loc() fucntion")
print(colDrop)

Output

       Employee ID  Age  Salary              Role
Nimesh       CIR45   25  200000  Junior Developer
Arjun        CIR12   28  250000           Analyst
Mohan        CIR18   27  180000        Programmer
Ritesh       CIR50   26  300000  Senior Developer
Raghav       CIR28   25  280000                HR
Dropping a range of columns from Age to Role using loc() fucntion
       Employee ID
Nimesh       CIR45
Arjun        CIR12
Mohan        CIR18
Ritesh       CIR50
Raghav       CIR28

Conclusion

This article focuses on the simple operation of dropping columns from a pandas data frame. We discussed the two techniques i.e., “dropping by label names” and “dropping by index values”. We also used “loc()” and “iloc()” functions and acknowledged their application on a pandas data frame.

Updated on: 05-May-2023

106 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements