
- C++ Basics
- C++ Home
- C++ Overview
- C++ Environment Setup
- C++ Basic Syntax
- C++ Comments
- C++ Data Types
- C++ Variable Types
- C++ Variable Scope
- C++ Constants/Literals
- C++ Modifier Types
- C++ Storage Classes
- C++ Operators
- C++ Loop Types
- C++ Decision Making
- C++ Functions
- C++ Numbers
- C++ Arrays
- C++ Strings
- C++ Pointers
- C++ References
- C++ Date & Time
- C++ Basic Input/Output
- C++ Data Structures
- C++ Object Oriented
- C++ Classes & Objects
- C++ Inheritance
- C++ Overloading
- C++ Polymorphism
- C++ Abstraction
- C++ Encapsulation
- C++ Interfaces
Repeated DNA Sequences in C++
Suppose we have a DNA sequence. As we know, all DNA is composed of a series of nucleotides abbreviated such as A, C, G, and T, for example: "ACGAATTCCG". When we are studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
We have to write one method to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
So if the input is like “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”, then the output will be ["AAAAACCCCC", "CCCCCAAAAA"].
To solve this, we will follow these steps −
Define an array ret, n := size of s, create two sets called visited and visited2
define a map called bitVal.
Store corresponding values for ACGT like 0123 into butVal.
mask := 0
for i in range 0 to n – 1
mask := mask * 4
mask := mast OR bitVal[s[i]]
mask := mask AND FFFFF
if i < 9, then just continue to the next iteration
insert substring form index i – 9 to 9, into ret
insert mark into visited2.
insert mask into visited
return ret
Example(C++)
Let us see the following implementation to get a better understanding −
#include <bits/stdc++.h> using namespace std; void print_vector(vector<auto> v){ cout << "["; for(int i = 0; i<v.size(); i++){ cout << v[i] << ", "; } cout << "]"<<endl; } typedef long long int lli; class Solution { public: vector<string>findRepeatedDnaSequences(string s) { vector <string> ret; int n = s.size(); set <int> visited; set <int> visited2; map <char, int> bitVal; bitVal['A'] = 0; bitVal['C'] = 1; bitVal['G'] = 2; bitVal['T'] = 3; lli mask = 0; for(int i = 0; i < n; i++){ mask <<= 2; mask |= bitVal[s[i]]; mask &= 0xfffff; if(i < 9) continue; if(visited.count(mask) && !visited2.count(mask)){ ret.push_back(s.substr(i - 9, 10)); visited2.insert(mask); } visited.insert(mask); } return ret; } }; main(){ Solution ob; print_vector(ob.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT")); }
Input
"AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output
[AAAAACCCCC, CCCCCAAAAA, ]
- Related Articles
- What is Z-DNA and What Sequences Can Form it?
- Escape sequences in C
- Validate Stack Sequences in C++
- Plant DNA C Values
- Repeated Substring Pattern in C++
- Numbers With Repeated Digits in C++
- C# Program to merge sequences
- Print all sequences of given length in C++
- Minimum Swaps To Make Sequences Increasing in C++
- C++ Program to Find the Longest Subsequence Common to All Sequences in a Set of Sequences
- Repeated Unit Divisibility using C++
- Chromosomes are made up of(a) DNA(b) protein(c) DNA and protein(d) RNA.
- Swap For Longest Repeated Character Substring in C++
- Queries for characters in a repeated string in C++
- Escape sequences in Java
