Python - K length consecutive characters


Consecutive characters are those characters that appear one after the other. k length consecutive characters mean the same character appearing k times consecutively. In this article, we will adopt several methods to achieve it. We will start with brute force by using loop statements. Next, we will perform the same using regular expressions, sliding window techniques, etc. A sliding window is a better and optimized way to find the k-length consecutive characters. Numpy Library also offers us methods to adopt similar techniques.

Using the Brute Force Method

Brute force is a simple algorithm that we can think about without caring much about optimization. For our example, we can use the following approach:

  • Initialize an empty list.

  • Iterate over the String. Since we need to find the k consecutive characters iterating n-k+1 is enough.

  • Next, take out the substring containing the next k characters for each iteration.

  • Make a set out of it and find the length. If the length is 1, all the characters in the sequence are the same. Append the result to the list.

Example

In the following example, we have defined a function named find_consecutive_characters. It takes the String and the value k as the argument. Next, we have defined an empty list named result. We iterated n-k+1 times. Under each iteration, we have used the string indexing method to get a k-length substring of the String. We used the set function to access the unique characters of the substring. We used the len method to find the length of the substring and checked if this equals one. If so, we append the substring to the list.

def find_consecutive_characters(string, k):
    n = len(string)
    result = []
    
    for i in range(n - k + 1):
        substring = string[i:i+k]
        if len(set(substring)) == 1:
            result.append(substring)
    
    return result

test_string="aaabcedfffghikkk"
k=3
print(f"Consecutive characters with length {k} are: {find_consecutive_characters(test_string, k)}")

Output

Consecutive characters with length 3 are: ['aaa', 'fff', 'kkk']

Using re Library

The re library in Python is a powerful tool that supports regular expressions. Regular expressions, often abbreviated as regex or regexp, are patterns used to match and manipulate text strings based on specific rules. The library allows us to search for patterns in a String, extract specific parts of a String, replace the String, etc. Although we can also build similar logic, the re-library provides the optimal and best approach to the problem.

Example

In the following code, after importing the re library, we have created the function named find_consecutive_characters, which takes the name of the String and length k. Next, we defined our pattern as a String and stored it in the pattern variable. We used the “findall” method of the re library to find all the substrings with the pattern. It returns a list containing the tuples. Each tuple contains two components first is the full match, and next is the captured element. We used the list comprehension to append the first element of the tuple elements of the list We returned the result from the function. We have used a string with a value of k for testing purposes. We called the function and printed the result.

import re

def find_consecutive_characters(string, k):
    pattern = r"((.)\2{%d})" % (k - 1)
    result = re.findall(pattern, string)
    result = [match[0] for match in result]
    
    return result

test_string="abdffghttpplihdf"
k=2
print(f"Consecutive characters with length {k} are: {find_consecutive_characters(test_string, k)}")

Output

Consecutive characters with length 2 are: ['ff', 'tt', 'pp']

Using Sliding Window

The sliding window is a popular programming technique that we can use to search for patterns in a sequence of arrays of list-like objects. You should slide a window of fixed length through the data by sliding through it. The method is particularly important when dealing with subarrays, substring, etc. For our problem, we want to find the subsequence with common characters and length k. Hence sliding windows could be a great choice.

Example

In the following code, we created the find_consecutive_characters method, a non-void function that returns the list of k-length consecutive characters. Under this function, we have first defined an empty list named result and, next, used the first k elements a and the set method to convert it into the set. If the set's length is one, we appended the substring to the initialized list. We then implemented a similar algorithm for the rest of the substring. We returned the list.

def find_consecutive_characters(string, k):
    n = len(string)
    result = []
    window = string[:k]
    if len(set(window)) == 1:
        result.append(window)
    for i in range(k, n):
        window = window[1:] + string[i]
        if len(set(window)) == 1:
            result.append(window)   
    return result
test_string="xxxxangduuuu"
k=4
print(f"Consecutive characters with length {k} are: {find_consecutive_characters(test_string, k)}")

Output

Consecutive characters with length 4 are: ['xxxx', 'uuuu']

Using The Numpy Library

The Numpy is a popular library of Python for Numerical computations. The library allows the coders to perform operations in the form of Numpy arrays. The arrays are implemented efficiently, making them a popular choice for programmers. The NumPy library has a function to deal with the sliding window efficiently. Hence we can utilize the in-built method to generate k-length consecutive characters.

Example

In the following code, we have imported the library Numpy. We created the function find_consecutive_characters, which takes the String and the length k as the parameters. Under the function, we used the “frombuffer” method to convert the String into a Numpy array. We used the sliding_window_view method to implement sliding windows in the characters. Next, we used the list comprehension technique, which appends the elements only if the count of unique characters is of the window in one.

import numpy as np
def find_consecutive_characters(string, k):
    arr = np.frombuffer(string.encode(), dtype=np.uint8)
    windows = np.lib.stride_tricks.sliding_window_view(arr, k)
    result = [window.tobytes().decode() for window in windows if np.unique(window).size == 1]
    return result

test_string="xxxxangduuuu"
k=4
print(f"Consecutive characters with length {k} are: {find_consecutive_characters(test_string, k)}")

Output

Consecutive characters with length 4 are: ['xxxx', 'uuuu']

Conclusion

In this article, we understood how to find k-length consecutive characters of a String. We can define our logic for this. Otherwise, Python has several libraries and packages that help us do so. We first saw the brute force approach. The brute force approach is simple to understand but could be more efficient—libraries like “re” allow us to implement easier algorithms. We can also use the sliding window approach, a popular programming technique.

Updated on: 18-Jul-2023

651 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements