Run Length Encoding in Python

Run-length encoding compresses a string by grouping consecutive identical characters and representing them as character + count. For example, "aaabbc" becomes "a3b2c1".

However, the example above shows a different approach - counting total occurrences of each character rather than consecutive runs. Let's explore both approaches.

Character Frequency Encoding

This approach counts total occurrences of each character ?

import collections

def character_frequency_encoding(string):
    # Initialize ordered dictionary to maintain character order
    count_dict = collections.OrderedDict.fromkeys(string, 0)
    
    # Count occurrences of each character
    for char in string:
        count_dict[char] += 1
    
    # Build encoded string
    encoded_string = ""
    for key, value in count_dict.items():
        encoded_string += key + str(value)
    
    return encoded_string

# Test examples
string1 = "tutorialspoint"
result1 = character_frequency_encoding(string1)
print(f"'{string1}' ? '{result1}'")

string2 = "aaaaaabbbbbccccccczzzzzz"
result2 = character_frequency_encoding(string2)
print(f"'{string2}' ? '{result2}'")
'tutorialspoint' ? 't3u1o2r1i2a1l1s1p1n1'
'aaaaaabbbbbccccccczzzzzz' ? 'a6b5c7z6'

True Run-Length Encoding

Traditional run-length encoding groups consecutive identical characters ?

def run_length_encoding(string):
    if not string:
        return ""
    
    encoded = ""
    current_char = string[0]
    count = 1
    
    for i in range(1, len(string)):
        if string[i] == current_char:
            count += 1
        else:
            encoded += current_char + str(count)
            current_char = string[i]
            count = 1
    
    # Add the last group
    encoded += current_char + str(count)
    return encoded

# Test examples
test1 = "aaabbc"
result1 = run_length_encoding(test1)
print(f"'{test1}' ? '{result1}'")

test2 = "aabbbaabbcc"
result2 = run_length_encoding(test2)
print(f"'{test2}' ? '{result2}'")
'aaabbc' ? 'a3b2c1'
'aabbbaabbcc' ? 'a2b3a2b2c2'

Comparison

Method Input: "aabbcc" Output Use Case
Character Frequency "aabbcc" "a2b2c2" Character counting
Run-Length Encoding "aabbcc" "a2b2c2" Data compression

Conclusion

Character frequency encoding counts total occurrences using OrderedDict. True run-length encoding compresses consecutive identical characters and is more effective for data compression.

Updated on: 2026-03-25T08:05:48+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements