Biopython Resources

Selected Reading

Biopython - Creating Simple Application

Quiz

Let us create a simple Biopython application to parse a bioinformatics file and print the content. This will help us understand the general concept of the Biopython and how it helps in the field of bioinformatics.

example.fasta

Step 1 − First, create a sample sequence file, example.fasta and put the below content into it.

>sp|P25730|FMS1_ECOLI CS1 fimbrial subunit A precursor (CS1 pilin) 
MKLKKTIGAMALATLFATMGASAVEKTISVTASVDPTVDLLQSDGSALPNSVALTYSPAV
NNFEAHTINTVVHTNDSDKGVVVKLSADPVLSNVLNPTLQIPVSVNFAGKPLSTTGITID 
SNDLNFASSGVNKVSSTQKLSIHADATRVTGGALTAGQYQGLVSIILTKSTTTTTTTKGT 

>sp|P15488|FMS3_ECOLI CS3 fimbrial subunit A precursor (CS3 pilin) 
MLKIKYLLIGLSLSAMSSYSLAAAGPTLTKELALNVLSPAALDATWAPQDNLTLSNTGVS 
NTLVGVLTLSNTSIDTVSIASTNVSDTSKNGTVTFAHETNNSASFATTISTDNANITLDK 
NAGNTIVKTTNGSQLPTNLPLKFITTEGNEHLVSGNYRANITITSTIKGGGTKKGTTDKK

The extension, fasta refers to the file format of the sequence file. FASTA originates from the bioinformatics software, FASTA and hence it gets its name. FASTA format has multiple sequence arranged one by one and each sequence will have its own id, name, description and the actual sequence data.

simple_example.py

Step 2 − Create a new python script, *simple_example.py" and enter the below code and save it.

from Bio.SeqIO import parse 
from Bio.SeqRecord import SeqRecord 
from Bio.Seq import Seq 

file = open("example.fasta") 

records = parse(file, "fasta") 
for record in records:    
   print("Id: %s" % record.id) 
   print("Name: %s" % record.name) 
   print("Description: %s" % record.description) 
   print("Annotations: %s" % record.annotations) 
   print("Sequence Data: %s" % record.seq)

Let us take a little deeper look into the code −

Line 1 imports the parse class available in the Bio.SeqIO module. Bio.SeqIO module is used to read and write the sequence file in different format and `parse class is used to parse the content of the sequence file.

Line 2 imports the SeqRecord class available in the Bio.SeqRecord module. This module is used to manipulate sequence records and SeqRecord class is used to represent a particular sequence available in the sequence file.

*Line 3" imports Seq class available in the Bio.Seq module. This module is used to manipulate sequence data and Seq class is used to represent the sequence data of a particular sequence record available in the sequence file.

Line 5 opens the example.fasta file using regular python function, open.

Line 7 parse the content of the sequence file and returns the content as the list of SeqRecord object.

Line 9-15 loops over the records using python for loop and prints the attributes of the sequence record (SqlRecord) such as id, name, description, sequence data, etc.

Output

Step 3 − Open a command prompt and go to the folder containing sequence file, example.fasta and run the below command −

> python simple_example.py

Step 4 − Python runs the script and prints all the sequence data available in the sample file, example.fasta. The output will be similar to the following content.

(myenv) D:\biopython\myenv>py simple_example.py
Id: sp|P25730|FMS1_ECOLI
Name: sp|P25730|FMS1_ECOLI
Description: sp|P25730|FMS1_ECOLI CS1 fimbrial subunit A precursor (CS1 pilin)
Annotations: {}
Sequence Data: MKLKKTIGAMALATLFATMGASAVEKTISVTASVDPTVDLLQSDGSALPNSVALTYSPAVNNFEAHTINTVVHTNDSDKGVVVKLSADPVLSNVLNPTLQIPVSVNFAGKPLSTTGITIDSNDLNFASSGVNKVSSTQKLSIHADATRVTGGALTAGQYQGLVSIILTKSTTTTTTTKGT
Id: sp|P15488|FMS3_ECOLI
Name: sp|P15488|FMS3_ECOLI
Description: sp|P15488|FMS3_ECOLI CS3 fimbrial subunit A precursor (CS3 pilin)
Annotations: {}
Sequence Data: MLKIKYLLIGLSLSAMSSYSLAAAGPTLTKELALNVLSPAALDATWAPQDNLTLSNTGVSNTLVGVLTLSNTSIDTVSIASTNVSDTSKNGTVTFAHETNNSASFATTISTDNANITLDKNAGNTIVKTTNGSQLPTNLPLKFITTEGNEHLVSGNYRANITITSTIKGGGTKKGTTDKK

We have seen three classes, parse, SeqRecord and Seq in this example. These three classes provide most of the functionality and we will learn those classes in the coming section.

Previous Quiz Next