How to Find The Largest Or Smallest Items in Python?

PythonServer Side ProgrammingProgramming

This article is aimed at developers who want to find the largest or smallest items with Python. I will show a few methods touse and will conclude the best method for you.

Method – 1: Slice approach on a List

If you are simply trying to find the single smallest or largest item i.e N = 1, it is faster to use min() and max().

Let us begin by generating some random integers.

import random
# Create a random list of integers
random_list = random.sample(range(1,10),9)
random_list

Output

[2, 4, 5, 1, 7, 9, 6, 8, 3]

FINDING THE SMALLEST & LARGEST ITEM (N=1)

# Find the smallest number (N=1)
min
(
random_list
)

Output

1
# Find the largest number (N=1)
max
(
random_list
)

Output

9

FINDING THE 3 SMALLEST & LARGEST ITEMS (N=3)

Similarly, if N is about the same size as the collection itself, it is usually faster to sort it first and take a slice of N.

# lets get the nsmallest using a slice approach(N=3)
sorted(random_list)[:3]

Output

[1, 2, 3]
# lets get the nlargest using a slice approach(N=3)
sorted(random_list)[-3:]

Output

[7, 8, 9]

Method – 2: heapq Method on a List

The heapq module has two functions—nlargest() and nsmallest() that can be used to find the nsmallest or nlargest items.

import heapq
import random
random_list = random.sample(range(1,10),9)

# nsmallest items (N=3)
heapq.nsmallest(3,random_list)

Output

[1, 2, 3]
# nlargest items (N=3)
heapq.nlargest(3,random_list)

Output

[9, 8, 7]

If you have a bit more complicated data, heapq functions have a key parameter that can be used.

import heapq
grandslams = [
{'name':'Roger Federer','titles':20},
{'name:'Rafel Nadal','titles':19},
{'name':'Novak Djokovic','titles':17},
{'name':'Andy Murray','titles':3},]

# Players with less titles (N=3)
less = heapq.nsmallest(3,grandslams,
key = lambdas:s['titles'])
less

Output

[{'name': 'Andy Murray', 'titles': 3},
{'name': 'Novak Djokovic', 'titles': 17},
{'name': 'Rafel Nadal', 'titles': 19}]
# Players with highest titles (N=3)
more = heapq.nlargest(3,grandslams,key = lambdas:s['titles'])
more

Output

[{'name': 'Roger Federer', 'titles': 20}, {'name': 'Rafel Nadal', 'titles': 19}, {'name': 'Novak Djokovic', 'titles': 17}]

Finding N Largest and Smallest from a DataFrame.

Well, the world is made up of CSV files, Yes they do!.

So it is very safe to assume that at some point in your python development you would encounter CSV’s and apparentlyDataFrame.

I will show you couple of methods to find the N largest/ smallest from a DataFrame.

In the first method we will sort the values using sort_values() method and pick up the values using head method.

import pandas as pd
import io
# Define your data
data = """
player,titles
Djokovic,17
Nadal,19
Federer,20
Murray,3
"""
throwaway_storage = io.StringIO(data)
df = pd.read_csv(throwaway_storage,index_col = "player")


# nsmallest (N = 3)
df.sort_values("titles").head(3)

Output

player title
_______________
Murray    3
Djokovic 17
Nadal    19


# nlargest (N = 3)
df.sort_values("titles",ascending = False).head(3)

Output

player title
_______________
Federer  20
Nadal    19
Djokovic 17

Instead of sorting the rows and using the .head() method, we can call the .nsmallest() and .nlargest() methods.

df.nsmallest(3,columns="titles")

Output

player title
_______________
Murray    3
Djokovic 17
Nadal    19


df.nlargest(3,columns = "titles")

Output

player title
_______________
Federer  20
Nadal    19
Djokovic 17

Conclusion

If you are trying to find a relatively small number of items, then the nlargest() and nsmallest() functions are most appropriate.

If you are simply trying to find the single smallest or largest item (N=1), it is faster to use min() and max().

Similarly, if N is about the same size as the collection itself, it is usually faster to sort it first and take a slice.

In conclusion, the actual implementation of nlargest() and nsmallest() is adaptive in how python operates and will carry outsome of these optimizations on your behalf.

raja
Published on 05-Nov-2020 11:53:15
Advertisements