Program to perform a letter frequency attack on a monoalphabetic substitution cipher


The challenge is to display the top five probable plain texts which could be decrypted from the supplied monoalphabetic cypher utilizing the letter frequency attack from a string Str with size K representing the given monoalphabetic cypher.

Let us see what exactly is frequency attack.

The very foundation for frequency analysis is the certainty that specific letters and letter combinations appear with varied frequencies all through any given section of written language. Additionally, matter-of-factly every sample of that language shares a common pattern in the distribution of letters. To make it more clear,

The English alphabet has 26 letters, however not all of them are used equally frequently in written English. The frequency of use of certain letters varies. For instance, if you examine letters in a book or in a newspaper, you will notice how often the letters E, T, A, and O appear in English words quite frequently. However, English texts rarely use letters J, X, Q, or Z. This fact can be used to decrypt Vigenère-encrypted messages. The term "frequency analysis" refers to this method.

Each letter found in the plaintext is substituted with a different letter in a basic substitution cypher, and any given character in its plaintext is perpetually changed to an identical letter in the text of the cypher. A ciphertext message with several repetitions of the letter Y, for instance, would imply to the cryptanalyst that Y stands in for the letter a if every instance of the letter a are converted to the letter X.

Sample Example 1

Let us take string T,

the string formed by concatenating the english letters in decreasing frequency in the English alphabet.

String T=ETAOINSHRDLCUMWFGYPBVKJXQZ”
Given string Str = "SGHR HR SGD BNCD";
Output:
THIS IS THE CODE
FTUE UE FTQ OAPQ
LZAK AK LZW UGVW
PDEO EO PDA YKZA
IWXH XH IWT  RDST

Problem Statement

Implement a program to perform a letter frequency attack on a monoalphabetic substitution cipher.

Solution Approach

In Order to perform a letter frequency attack on a monoalphabetic substitution cipher, we take the following methodology.

The approach to solve this problem and to perform a letter frequency attack on a monoalphabetic substitution cipher is by applying frequency analysis.

One widely-known technique or a practice of breaking ciphertext is nothing but a frequency analysis. It is founded on research into how often and regular different letters or groupings of letters appear in ciphertexts. A variety of letters or alphabets are used at varying rates across all languages.

For example, take the word "APPLE". The frequency of the letter "A" is 1 since it is occured only one time, similarly the frequency of the letter "L" is 1 and the frequency of the letter "E" is also 1. But the frequency of the letter "P" is 2 since it is repeated two times.

That's how we find the frequencies of the letters.

Consider how often each from the 26 letters appears in a typical English text. The most frequently occurring letter is E, followed by T, then A, and so on, if we rank these from highest frequency to lowest frequency −

"ETAOINSHRDLCUMWFGYPBVKJXQZ" is the complete alphabetical list of letters in frequency order.

Algorithm

The algorithm to perform a letter frequency attack on a monoalphabetic substitution cipher is given below

  • Step 1 − Start

  • Step 2 − Define the function to decrypt a monoalphabetic substitution cipher by using the method of frequency attack or analysis

  • Step 3 − Stores the final 5 feasible deciphered plaintext

  • Step 4 − store frequency of each letter in the ciphertext

  • Step 5 − Traverse the string Str

  • Step 6 − Iterate over a range of [0, 5]

  • Step 7 − Iterate over a range of [0, 26]

  • Step 8 − Defining a temporary string "cur" to create one plaintext at a time or at the current time

  • Step 9 − Now create the ith plaintext by making use of the calculated shift

  • Step 10 − Shift the Tth letter of the cipher by x

  • Step 11 − Add up the kth calculated letter to the temporary string cur

  • Step 12 − Print the output as the generated 5 possible plaintexts.

  • Step 13 − Stop

Example: C Program

Here is the C program implementation of the above written algorithm in order to perform letter frequency attack on a monoalphabetic substitution cipher.

#include <stdio.h>
#include <string.h>
// Define a function to decrypt given monoalphabetic substitution cipher by implementing the method of frequency analysis or an attack
void printTheString(char Str[], int K){

   // this stores the final 5 feasible plaintext //which are deciphered
   char ptext[5][K+1];
   
   // the frequency of every letter in the
   // cipher text is stored
   int fre[26] = { 0 }; 
   
   // The letter frequency of the cipher text is stored in the order of descendence
   int freSorted[26]; 
   
   // this stores the used alphabet 
   int Used[26] = { 0 }; 
   
   // Traversing the given string named Str
   for (int i = 0; i < K; i++) {
      if (Str[i] != ' ') {
         fre[Str[i] - 'A']++;
      }
   } 
   
   // Copying the array of frequency
   for (int i = 0; i < 26; i++) {
      freSorted[i] = fre[i];
   } 
   
   //by concatenating the english letters in //decreasing frequency in the english alphabet , the string T is //obtained
   char T[] = "ETAOINSHRDLCUMWFGYPBVKJXQZ"; 
   
   // Sorting the array in the order of descendence
   for (int i = 0; i < 26; i++) {
      for (int j = i + 1; j < 26; j++) {
         if (freSorted[j] > freSorted[i]) {
            int temp = freSorted[i];
            freSorted[i] = freSorted[j];
            freSorted[j] = temp;
         }
      }
   } 
   
   // Iterating in the range between [0, 5]
   for (int i = 0; i < 5; i++) {
      int ch = -1; 
      
      // Iterating in the range between [0, 26]
      for (int m = 0; m < 26; m++) { 
         if (freSorted[i] == fre[m] && Used[m] == 0) {
            Used[m] = 1;
            ch = m;
            break;
         }
      }
      if (ch == -1)
         break; 
         
      //  here numerical equivalent of letter is stored ith index of array letter_frequency
      int x = T[i] - 'A'; 
      
      //  now probable shift is calculated in the monoalphabetic cipher
      x = x - ch; 
      
      // defining a temporary string cur to create one plaintext at a time or at the current time
      char cur[K+1]; 
      
      //  ith plaintext is generated by making use of the shift calculated 
      for (int T = 0; T < K; T++) { 
      
         // whitespaces is inserted without any //change
         if (Str[T] == ' ') {
            cur[T] = ' ';
            continue;
         } 
         
         // Shifting the Tth cipher letter by x we get
         int y = Str[T] - 'A';
         y =y+x; 
         if (y < 0)
            y =y+ 26;
         if (y > 25)
            y -=26; 
            
         // Adding the kth calculated letter to the temporary string cur 
         cur[T] = 'A' + y;        
      }
      cur[K] = '\0';
      
      // The ith feasible plaintext is printed
      printf("%s\n", cur);
   }
}
int main(){
   char Str[] = "SGHR HR SGD BNCD";
   int K = strlen(Str);
   printTheString(Str, K);
   return 0;
}

Output

THIS IS THE CODE
FTUE UE FTQ OAPQ
LZAK AK LZW UGVW
PDEO EO PDA YKZA
IWXH XH IWT RDST

Conclusion

Likewise, we can obtain a solution to perform a letter frequency attack on a monoalphabetic substitution cipher.

The challenge of obtaining the program to perform a letter frequency attack on a monoalphabetic substitution cipher is resolved in this article.

Here C programming code as well as the algorithm to perform a letter frequency attack on a monoalphabetic substitution cipher are provided.

Updated on: 28-Jul-2023

620 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements