- Biopython - Home
- Biopython - Introduction
- Biopython - Installation
- Creating Simple Application
- Biopython - Sequence
- Advanced Sequence Operations
- Sequence I/O Operations
- Biopython - Sequence Alignments
- Biopython - Overview of BLAST
- Biopython - Entrez Database
- Biopython - PDB Module
- Biopython - Motif Objects
- Biopython - BioSQL Module
- Biopython - Population Genetics
- Biopython - Genome Analysis
- Biopython - Phenotype Microarray
- Biopython - Plotting
- Biopython - Cluster Analysis
- Biopython - Machine Learning
- Biopython - Testing Techniques
Biopython Resources
Biopython - Sequence
A sequence is series of letters used to represent an organisms protein, DNA or RNA. It is represented by Seq class. Seq class is defined in Bio.Seq module.
Lets create a simple sequence in Biopython as shown below −
>>> from Bio.Seq import Seq
>>> seq = Seq("AGCT")
>>> seq
Seq('AGCT')
>>> print(seq)
AGCT
Here, we have created a simple protein sequence AGCT and each letter represents Alanine, Glycine, Cysteine and Threonine.
Each Seq object has one important attribute −
data − the actual sequence string (AGCT)
Also, Biopython exposes all the bioinformatics related configuration data through Bio.Data module. For example, IUPACData.protein_letters has the possible letters of IUPACProtein alphabet.
>>> from Bio.Data import IUPACData >>> IUPACData.protein_letters 'ACDEFGHIKLMNPQRSTVWY'
Basic Operations
This section briefly explains about all the basic operations available in the Seq class. Sequences are similar to python strings. We can perform python string operations like slicing, counting, concatenation, find, split and strip in sequences.
Use the below codes to get various outputs.
To get the first value in sequence.
>>> seq_string = Seq("AGCTAGCT")
>>> seq_string[0]
'A'
To print the first two values.
>>> seq_string[0:2]
Seq('AG')
To print all the values.
>>> seq_string[ : ]
Seq('AGCTAGCT')
To perform length and count operations.
>>> len(seq_string)
8
>>> seq_string.count('A')
2
To add two sequences.
>>> seq1 = Seq("AGCT")
>>> seq2 = Seq("TCGA")
>>> seq1+seq2
Seq('AGCTTCGA')
Here, the above two sequence objects, seq1, seq2 are generic DNA sequences and so you can add them and produce new sequence.
To add two or more sequences, first store it in a python list, then retrieve it using for loop and finally add it together as shown below −
>>> list = [Seq("AGCT"),Seq("TCGA"),Seq("AAA")]
>>> for s in list:
... print(s)
...
AGCT
TCGA
AAA
>>> final_seq = Seq(" ")
>>> for s in list:
... final_seq = final_seq + s
...
>>> final_seq
Seq('AGCTTCGAAAA')
In the below section, various codes are given to get outputs based on the requirement.
To change the case of sequence.
>>> rna = Seq("agct")
>>> rna.upper()
Seq('AGCT')
To check python membership and identity operator.
>>> rna = Seq("agct")
>>> 'a' in rna
True
>>> 'A' in rna
False
>>> rna1 = Seq("AGCT")
>>> rna is rna1
False
To find single letter or sequence of letter inside the given sequence.
>>> protein_seq = Seq('AGUACACUGGU')
>>> protein_seq.find('G')
1
>>> protein_seq.find('GG')
8
To perform splitting operation.
>>> protein_seq = Seq('AGUACACUGGU')
>>> protein_seq.split('A')
[Seq(''), Seq('GU'),
Seq('C'), Seq('CUGGU')]
To perform strip operations in the sequence.
>>> strip_seq = Seq(" AGCT ")
>>> strip_seq
Seq(' AGCT ')
>>> strip_seq.strip()
Seq('AGCT')