Program to find minimum length of lossy Run-Length Encoding in Python


Suppose we have a lowercase string s and another value k. Now consider an operation where we perform a run-length encoding on a string by putting repeated successive characters as a count and character. So if the string is like "aaabbc" would be encoded as "3a2bc". Here we do not put "1c" for "c" since it only appears once successively. So we can first remove any k consecutive characters in s, then find the minimum length possible of the resulting run-length encoding.

So, if the input is like s = "xxxxxyyxxxxxzzxxx", k = 2, then the output will be 6, as the two obvious choices are to remove the "yy"s or the "zz"s. If we remove the "yy"s, then we will get "10x2z3x" which has length of 7. If we remove the "zz"s, then will get "5x2y8x" which has length of 6, this is the smallest.

To solve this, we will follow these steps −

  • Define a function calc_cost() . This will take l

  • if l is same as 0, then

    • return 0

  • if l is same as 1, then

    • return 1

  • otherwise,

    • return size of str(l) + 1

  • Define a function prefix() . This will take s

    • pre := a list initially with pair [0, 0]

    • last := null

    • for each c in s, do

      • if c is same as last, then

        • insert a pair (0th element of last item of pre, 1 + 1st element of last item of pre) into pre

      • otherwise,

        • insert (0th element of last item of pre) + calc_cost(1st element of last item of pre, 1) into pre

      • last := c

    • return pre

  • From the main method do the following:

  • pre := prefix(s)

  • suf := reverse of prefix(s in reverse order)

  • ans := infinity

  • for i in range 0 to size of s - k + 1, do

    • j := i + k

    • pair (left, midl) := pre[i]

    • pair (right, midr) := suf[j]

    • cost := left + right

    • c1 := s[i - 1] if i > 0 otherwise null

    • c2 := s[j] if j < size of s otherwise null

    • if c1 is same as c2, then

      • cost := cost + calc_cost(midl + midr)

    • otherwise,

      • cost := cost + calc_cost(midl) + calc_cost(midr)

    • ans := minimum of ans and cost

  • return ans

Let us see the following implementation to get better understanding −

Example

 Live Demo

def calc_cost(l):
   if l == 0:
      return 0
   if l == 1:
      return 1
   else:
      return len(str(l)) + 1
class Solution:
   def solve(self, s, k):
      def prefix(s):
         pre = [[0, 0]]
         last = None
         for c in s:
            if c == last:
               pre.append([pre[-1][0], pre[-1][1] + 1])
            else:
               pre.append([pre[-1][0] + calc_cost(pre[-1][1]),1])
            last = c
         return pre
      pre = prefix(s)
      suf = prefix(s[::-1])[::-1]
      ans = float("inf")
      for i in range(len(s) - k + 1):
         j = i + k
         left, midl = pre[i]
         right, midr = suf[j]
         cost = left + right
         c1 = s[i - 1] if i > 0 else None
         c2 = s[j] if j < len(s) else None
         if c1 == c2:
            cost += calc_cost(midl + midr)
         else:
            cost += calc_cost(midl) + calc_cost(midr)
         ans = min(ans, cost)
         return ans
ob = Solution()
s = "xxxxxyyxxxxxzzxxx"
print(ob.solve(s, 2))

Input

s = "xxxxxyyxxxxxzzxxx"

Output

6

Updated on: 10-Oct-2020

161 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements