DNA Pattern Recognition - Problem

Biologists are studying basic patterns in DNA sequences stored in a database. Given a table Samples containing DNA sequences, you need to identify which samples contain specific genetic patterns.

Pattern Requirements:

  • Start Codon: Sequences that start with ATG (a common start codon)
  • Stop Codons: Sequences that end with either TAA, TAG, or TGA (stop codons)
  • ATAT Motif: Sequences containing the motif ATAT (a simple repeated pattern)
  • Triple G: Sequences that have at least 3 consecutive G (like GGG or GGGG)

Return a result table showing each sample with boolean flags (1/0) indicating which patterns are present, ordered by sample_id in ascending order.

Table Schema

Samples
Column Name Type Description
sample_id PK int Unique identifier for each DNA sample
dna_sequence varchar DNA sequence represented as string of A, T, G, C characters
species varchar Species from which the DNA sample was collected
Primary Key: sample_id
Note: Each row contains a DNA sequence and its corresponding species information

Input & Output

Example 1 — Multiple Pattern Detection
Input Table:
sample_id dna_sequence species
1 ATGCTAGCTAGCTAA Human
2 GGGTCAATCATC Human
3 ATATATCGTAGCTA Human
4 ATGGGGTCATCATAA Mouse
Output:
sample_id dna_sequence species has_start has_stop has_atat has_ggg
1 ATGCTAGCTAGCTAA Human 1 1 0 0
2 GGGTCAATCATC Human 0 0 0 1
3 ATATATCGTAGCTA Human 0 0 1 0
4 ATGGGGTCATCATAA Mouse 1 1 0 1
💡 Note:

Sample 1: Starts with ATG (has_start=1), ends with TAA (has_stop=1), no ATAT motif, no triple G.

Sample 2: Starts with GGG (has_ggg=1), but no other patterns match.

Sample 3: Contains ATAT at the beginning (has_atat=1), but no other patterns.

Sample 4: Has all patterns except ATAT: starts with ATG, contains GGGG, ends with TAA.

Example 2 — No Pattern Matches
Input Table:
sample_id dna_sequence species
5 TCAGTCAGTCAG Mouse
6 ATATCGCGCTAG Zebrafish
7 CGTATGCGTCGTA Zebrafish
Output:
sample_id dna_sequence species has_start has_stop has_atat has_ggg
5 TCAGTCAGTCAG Mouse 0 0 0 0
6 ATATCGCGCTAG Zebrafish 0 1 1 0
7 CGTATGCGTCGTA Zebrafish 0 0 0 0
💡 Note:

Sample 5: No patterns match - all flags are 0.

Sample 6: Starts with ATAT (has_atat=1) and ends with TAG (has_stop=1).

Sample 7: No genetic patterns detected - all flags are 0.

Constraints

  • 1 ≤ sample_id ≤ 1000
  • dna_sequence contains only characters 'A', 'T', 'G', 'C'
  • 1 ≤ dna_sequence.length ≤ 1000
  • species is a non-empty string

Visualization

Tap to expand
DNA Pattern Recognition INPUT id sequence 1 ATGCGATAA 2 GCATATGGG 3 ATGATATGA 4 CCCTTTAAA Patterns to Find: Start: ATG (begins with) Stop: TAA/TAG/TGA (ends with) Motif: ATAT (contains) Triple G: GGG+ (3+ consecutive) ALGORITHM STEPS 1 Check Start Codon LEFT(seq, 3) = 'ATG' 2 Check Stop Codons RIGHT(seq, 3) IN ('TAA','TAG','TGA') 3 Find ATAT Motif INSTR(seq,'ATAT') > 0 4 Find Triple G INSTR(seq,'GGG') > 0 SELECT sample_id, CASE WHEN LEFT(seq,3) ='ATG' THEN 1 ELSE 0 END AS has_start, ... (pattern checks) ORDER BY sample_id; FINAL RESULT id start stop ATAT GGG 1 1 1 0 0 2 0 0 1 1 3 1 1 1 0 4 0 0 0 0 Legend: 1 = Pattern Found 0 = Not Found Verification Example (ID=3): Sequence: ATGATATGA ATG at start --> OK TGA at end --> OK Contains ATAT --> OK No GGG found --> 0 Key Insight: Use SQL string functions: LEFT() for prefix matching, RIGHT() for suffix matching, and INSTR()/LIKE for substring detection. CASE expressions convert matches to 1/0 flags. Time Complexity: O(n * m) where n = rows, m = avg sequence length TutorialsPoint - DNA Pattern Recognition | Optimal Solution
Asked in
Google 28 Amazon 22 Microsoft 18
28.5K Views
Medium Frequency
~12 min Avg. Time
890 Likes
Ln 1, Col 1
Smart Actions
💡 Explanation
AI Ready
💡 Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen