How does pandas series argsort handles nan values?

PandasServer Side ProgrammingProgramming

In pandas series the argmax() method is used to sort the values of a series and it will return a new series object with the indices that would sort the original series values. If the Series object contains any null values or missing values, then the argsort() method gives -1 value as its index.

To sort the values of the series object, the argsort method takes the quicksort algorithm as a default one and we can apply any other sorting algorithms like ‘mergesort’, ‘heapsort’, ‘stable’ by using the kind parameter.

The argsort method returns a series with values replaced by sorted order of indices. And it won’t change the index labels of the original series object.

Example 1

import pandas as pd

# creating series
series = pd.Series([None,5,2,None,7])
print(series)

# apply argsort()
print("Output argsort:", series.argsort())

Explanation

In the following example, we have created a series using a python list with some None values. Then we applied the argsort() method over that series data.

Output

0 NaN
1 5.0
2 2.0
3 NaN
4 7.0
dtype: float64

Output argsort:
0 -1
1  1
2  0
3 -1
4  2
dtype: int64

In the above output block, we can see the resultant series object of the argsort method, in that we can observe -1 value, which is representing the None values index of the original series object.

Example 2

import pandas as pd

# creating dates
date = pd.date_range("2021-07-01", periods=5, freq="M")

# creating pandas Series with date range index
s = pd.Series([9, None, 2, 5, 6], index=date)
print(s)

# apply argsort
print("Output of argsort:",s.argsort())

Explanation

Let’s take another example of a pandas series object to apply the argsort method. Initially, we created a pandas series object with a list of integer values and a None value then applied the argsort method on that data. The index of the series is date range values.

Output

2021-07-31 9.0
2021-08-31 NaN
2021-09-30 2.0
2021-10-31 5.0
2021-11-30 6.0
Freq: M, dtype: float64

Output of argsort:
2021-07-31  1
2021-08-31 -1
2021-09-30  2
2021-10-31  3
2021-11-30  0
Freq: M, dtype: int64

The element at index label 2021-07-31 of the output series object is coming from the 1st index position of the original series and it is the smallest number. None values are skipped from the argsort method, due to this it represents the -1 as indices of that particular element.

In the same way, the element at index label 2021-11-30 of the output series object is coming from the 0th index position of the original series and it is the largest number from the original series values. In this way, the pandas series argmax() method handles the None values.

raja
Updated on 09-Mar-2022 07:07:56

Advertisements