Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to correctly sort a string with a number inside in Python?
Sorting strings that contain numbers, such as ("xy1", "xy2", "xy10"), can be complex in Python. For example, if we sort the list ["xy1", "xy2", "xy10"] using the built-in sort() method, it results in ["xy1", "xy10", "xy2"]. But we expect "xy2" to come before "xy10".
This happens because Python's default sorting uses lexicographical order and compares characters from left to right based on their Unicode values. Since the character '1' in "xy10" comes before "2" in "xy2", the "xy10" is treated as smaller, even though the number 10 is greater than 2.
This is where we need natural sorting, which understands that "xy2" should come before "xy10" by treating embedded numbers as integers instead of strings. For implementing this, we can use regular expressions.
Using re.split() Method
The first approach uses the Python re.split() method to split strings by occurrences of a specified regular expression pattern.
We use the regex pattern '(\d+)' along with re.split() to split strings into parts, separating digits and non-digits. We then convert the digit parts to integers so that sorting behaves numerically.
Syntax
re.split(pattern, string)
Example
Let's sort the list ["xy1", "xy10", "xy2"] using the re module ?
import re
def natural_key(s):
return [int(x) if x.isdigit() else x.lower() for x in re.split(r'(\d+)', s)]
items = ["xy1", "xy10", "xy2"]
items.sort(key=natural_key)
print(items)
['xy1', 'xy2', 'xy10']
Using sorted() Function
The Python sorted() function returns a new sorted list from an iterable object without modifying the original list. We can use the same natural sorting key function with sorted().
Syntax
sorted(iterable, key=None, reverse=False)
Example
Here we use the sorted() function to sort without changing the original list ?
import re
def natural_key(s):
return [int(part) if part.isdigit() else part.lower() for part in re.split(r'(\d+)', s)]
items = ["A1", "A10", "A2"]
result = sorted(items, key=natural_key)
print("Original:", items)
print("Sorted:", result)
Original: ['A1', 'A10', 'A2'] Sorted: ['A1', 'A2', 'A10']
How It Works
The natural sorting algorithm works by:
-
Splitting:
re.split(r'(\d+)', s)breaks the string into alternating text and number parts - Converting: Number parts are converted to integers for proper numerical comparison
- Comparing: Python compares the resulting lists element by element
Example
Let's see how the splitting works ?
import re
def show_split(s):
parts = re.split(r'(\d+)', s)
processed = [int(x) if x.isdigit() else x.lower() for x in parts]
print(f"'{s}' ? {parts} ? {processed}")
strings = ["xy1", "xy2", "xy10"]
for s in strings:
show_split(s)
'xy1' ? ['xy', '1', ''] ? ['xy', 1, ''] 'xy2' ? ['xy', '2', ''] ? ['xy', 2, ''] 'xy10' ? ['xy', '10', ''] ? ['xy', 10, '']
Comparison
| Method | Modifies Original | Best For |
|---|---|---|
list.sort() |
Yes | In-place sorting |
sorted() |
No | Keeping original intact |
Conclusion
Use re.split(r'(\d+)', string) with a custom key function to achieve natural sorting of strings containing numbers. The sort() method modifies the original list, while sorted() returns a new sorted list.
