How to split on successions of newline characters using Python regular expression?

Python's built-in splitlines() method and the split() method with \n as a delimiter are sufficient to split strings based on newline characters. This article will explore different approaches to splitting strings on sequences of newline characters using Python's regular expressions.

Splitting on One or More Newlines

The Python re.split() function uses a regular expression to split a string. We'll use the pattern \n+, which means one or more newlines. The re.split() will find where these newlines are, split the string there, and return a list of the resulting pieces.

Example

The following code splits the text string into a list using one or more newline characters (\n+) as delimiters ?

import re

text = "This is the first line.\nThis is the second line.\n\n\nThis is the third line."
result = re.split(r'\n+', text)
print(result)

The output of the above code is ?

['This is the first line.', 'This is the second line.', 'This is the third line.']

Splitting with a Maximum Number of Splits

The re.split() function allows us to specify a maxsplit argument, which limits the number of splits performed. This can be useful when you only want to split the string a certain number of times from the beginning and leave the remaining portion.

Example

Let's extract the first two text blocks from a larger string, leaving the rest as a single combined block. Setting maxsplit=2 will achieve this ?

import re

text = "This is the first line.\nThis is the second line.\nThis is the third line.\nThis is the fourth line."
result = re.split(r'\n+', text, maxsplit=2)
print(result)

The output of the above code is ?

['This is the first line.', 'This is the second line.', 'This is the third line.\nThis is the fourth line.']

Splitting on Any Newline Character (Cross-Platform)

Different operating systems use different newline character representations. Unix-like systems (including macOS) typically use \n, Windows uses \r\n, and older Macs used \r. To handle all these possibilities, we can use the character class [\r\n]+ in our regular expression.

Example

The following example demonstrates how to split on any newline character across different operating systems ?

import re

text = "This is the first line.\r\nThis is the second line.\nThis is the third line.\rThis is the fourth line."
result = re.split(r'[\r\n]+', text)
print(result)

The output of the above code is ?

['This is the first line.', 'This is the second line.', 'This is the third line.', 'This is the fourth line.']

Using re.MULTILINE Flag

For more complex text processing, you can combine re.split() with the re.MULTILINE flag to handle multi-line patterns more effectively ?

import re

text = "Line 1\n\nLine 2\n\n\nLine 3"
result = re.split(r'\n{2,}', text, flags=re.MULTILINE)
print(result)

The output shows splitting only on two or more consecutive newlines ?

['Line 1', 'Line 2', 'Line 3']

Comparison of Methods

Pattern Matches Best For
r'\n+' One or more Unix newlines Unix/Linux systems
r'[\r\n]+' Any newline combination Cross-platform compatibility
r'\n{2,}' Two or more consecutive newlines Splitting paragraphs

Conclusion

Use re.split(r'\n+') for basic newline splitting, r'[\r\n]+' for cross-platform compatibility, and the maxsplit parameter to limit the number of splits. Regular expressions provide flexible control over how text is split on newline sequences.

Updated on: 2026-03-24T19:16:53+05:30

498 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements