Check if a Column Starts/Ends with a given String in Pandas DataFrame


Pandas is a popular Python library used for data manipulation and analysis. It provides powerful tools for working with structured data, such as tables or spreadsheets. Pandas can handle a variety of data formats, including CSV, Excel, SQL databases, and JSON, among others.

One of the critical features of Pandas is its two primary data structures: Series and DataFrame.

A Series is a one−dimensional array−like object that can hold any data type, such as integers, floating−point numbers, strings, or even Python objects. Series are labelled, meaning that they have an index that is used to access and manipulate the data.

A DataFrame is a two−dimensional table−like data structure, with rows and columns, that is similar to a spreadsheet or SQL table. It can contain multiple data types and can be thought of as a collection of Series. DataFrames are very powerful and flexible, as they can be manipulated in many ways, such as filtering, merging, grouping, and transforming the data.

Pandas provide many tools for working with DataFrames, including methods for indexing, selecting, and filtering the data, as well as statistical and mathematical operations. Pandas also include tools for handling missing data, reshaping the data, and handling time series data.

To create a DataFrame, you can pass a dictionary or a list of lists to the DataFrame constructor. Each key in the dictionary represents a column in the DataFrame, and the values represent the data in that column. Alternatively, you can create a DataFrame from a CSV, Excel, SQL database, or other data format.

Overall, Pandas is a powerful and flexible library for working with structured data in Python, and its DataFrame data structure is one of its most important features. With its extensive functionality and ease of use, Pandas is an essential tool for any data scientist or analyst working with data in Python.

Now that we know in detail about pandas, and pandas dataFrame, let's talk a little about the Pandas endswith() method.

endswith()

The endswith() method in Pandas can also be applied to a DataFrame to check whether each element in a specified column ends with a given string or characters. The method returns a boolean DataFrame object with the same shape as the original DataFrame.

The syntax for using the endswith() method with a DataFrame in Pandas is as follows:

DataFrame[column_name].str.endswith(suffix, na=None)

Where DataFrame is the name of the DataFrame you want to apply the method to, column_name is the name of the column you want to check for the endswith condition, suffix is the string or character sequence you want to check if each element in the column ends with, and na is an optional parameter that specifies how missing or null values should be handled.

To illustrate this, let's consider an example where we have a dataset of employees containing information such as 'Employee_ID', 'Name', 'Department', and 'Salary'.

Example

# Importing the pandas library and renaming it as pd
import pandas as pd

# Creating a DataFrame for employees
employee_df = pd.DataFrame({
    'Employee_ID': ['E101', 'E102', 'E103', 'E104', 'E105'],
    'Name': ['John', 'Emily', 'Mark', 'Sarah', 'Jessica'],
    'Department': ['Sales', 'HR', 'IT', 'Marketing', 'Finance'],
    'Salary': [50000, 60000, 75000, 80000, 90000]
})

# Printing the original DataFrame
print("Printing Original Employee DataFrame:")
print(employee_df)

Explanation

This code uses the Pandas library to create a DataFrame from a dictionary. Here's what each part of the code does:

  • import pandas as pd: This line imports the Pandas library and renames it as "pd" so we can refer to it more easily in our code.

  • data = {'name': ['John', 'Emily', 'Mark', 'Jessica'], 'age': [25, 32, 18, 47], 'country': ['USA', 'Canada', 'UK', 'USA'], 'gender': ['M', 'F', 'M', 'F']}: This line creates a dictionary called "data" with keys for "name", "age", "country", and "gender". Each key has a list of values that correspond to the data for that column.

  • df = pd.DataFrame(data): This line creates a DataFrame called "df" from the dictionary "data". Pandas automatically use the keys of the dictionary as the column headers for the DataFrame, and the values in each list as the values in the corresponding column.

  • print(df): This line prints the DataFrame to the console. The output will look like this:

To run the above code we need to run the command shown below.

Command

python3 main.py

Output

Printing Original Employee DataFrame:
  Employee_ID     Name Department  Salary
0        E101     John      Sales   50000
1        E102    Emily         HR   60000
2        E103     Mark         IT   75000
3        E104    Sarah  Marketing   80000
4        E105  Jessica    Finance   90000

Now let's make use of the endswith() method in the above code.

In the first example, we will be checking whether the dataFrame column Department, contains 'IT' or not.

Consider the code shown below.

Example

# Importing the pandas library and renaming it as pd
import pandas as pd

# Creating a DataFrame for employees
employee_df = pd.DataFrame({
    'Employee_ID': ['E101', 'E102', 'E103', 'E104', 'E105'],
    'Name': ['John', 'Emily', 'Mark', 'Sarah', 'Jessica'],
    'Department': ['Sales', 'HR', 'IT', 'Marketing', 'Finance'],
    'Salary': [50000, 60000, 75000, 80000, 90000]
})

# Printing the original DataFrame
print("Printing Original Employee DataFrame:")
# print(employee_df)

# Applying a lambda function to each value in the "Department" column
# The lambda function uses the `endswith()` string method to check if the string ends with "IT"
# The `map()` function applies the lambda function to each value in the column and returns a list of boolean values
# The list is used to create a new column in the `employee_df` DataFrame
employee_df['TutorialsPoint_Emp'] = list(
	map(lambda x: x.endswith('IT'), employee_df['Department']))

# Printing the new DataFrame with the added column
print(employee_df)

In this case, the endswith() function is used to perform a conditional check and create a new column based on the result of that check. It's a useful tool for data manipulation and filtering, especially when working with text data.

The output of this code will be a modified DataFrame with an additional column called "TutorialsPoint_Emp". This column contains boolean values that indicate whether the employee works in the IT department (True) or not (False). The output will look like this:

To run the above code we need to run the command shown below.

Command

python3 main.py

Output

Printing Original Employee DataFrame:
  Employee_ID     Name Department  Salary  TutorialsPoint_Emp
0        E101     John      Sales   50000              False
1        E102    Emily         HR   60000               False
2        E103     Mark         IT   75000                True
3        E104    Sarah  Marketing   80000          False
4        E105  Jessica    Finance   90000          False

Now let's try to make use of endswith() method on another column.

Consider the code shown below.

Example

# Importing the pandas library and renaming it as pd
import pandas as pd

# Creating a DataFrame for employees
employee_df = pd.DataFrame({
    'Employee_ID': ['E101', 'E102', 'E103', 'E104', 'E105'],
    'Name': ['John', 'Emily', 'Mark', 'Sarah', 'Jessica'],
    'Department': ['Sales', 'HR', 'IT', 'Marketing', 'Finance'],
    'Salary': [50000, 60000, 75000, 80000, 90000]
})

# Printing the original DataFrame
print("Printing Original Employee DataFrame:")
# print(employee_df)

# joining new column in dataframe
# endwith function used to check
employee_df['TutorialsPoint_Emp'] = list(
	map(lambda x: x.endswith('Sarah'), employee_df['Name']))
	
# printing new data frame
print(employee_df)

Explanation

In the above code we are making use of the map() function and a lambda function to check whether the name of each employee ends with the string "Sarah". This is accomplished using the endswith() method, which returns a boolean value indicating whether a given string ends with a specified suffix.

The resulting boolean values are then converted to a list using the list() function and stored in a new column of the DataFrame called "TutorialsPoint_Emp".

Finally, the modified DataFrame is printed using the print() function. The output will display the original employee information along with a new column indicating whether each employee has a name that ends with "Sarah".

To run the above code we need to run the command shown below.

Command

python3 main.py

Output

Printing Original Employee DataFrame:
  Employee_ID     Name Department  Salary  TutorialsPoint_Emp
0        E101     John      Sales   50000               False
1        E102    Emily         HR   60000               False
2        E103     Mark         IT   75000               False
3        E104    Sarah  Marketing   80000                True
4        E105  Jessica    Finance   90000               False

Conclusion

In conclusion, the endswith() method in Pandas DataFrame allows us to check whether the elements of a given column ends with a specified suffix. This method can be used to manipulate data based on certain string patterns present in a column of a DataFrame. By using the endswith() method, we can filter and transform data according to specific criteria, making it a useful tool for data analysis and data cleaning tasks.

Updated on: 02-Aug-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements