Biopython - Plotting



This chapter explains about how to plot sequences. Before moving to this topic, let us understand the basics of plotting.

Plotting

Matplotlib is a Python plotting library which produces quality figures in a variety of formats. We can create different types of plots like line chart, histograms, bar chart, pie chart, scatter chart, etc.

pyLab is a module that belongs to the matplotlib which combines the numerical module numpy with the graphical plotting module pyplot.Biopython uses pylab module for plotting sequences. To do this, we need to import the below code −

>>>import pylab

Before importing, we need to install the matplotlib package using pip command with the command given below −

(myenv) D:\biopython\myenv>pip3 install matplotlib

Sample Input File - plot.fasta

Create a sample file named plot.fasta in your Biopython directory and add the following changes −

>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGTG
AATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGGG
CCGCCTCGGGAGCGTCCATGGCGGGTTTGAACCTCTAGCCCGGCGCAGTTTGGGCGCCAAGCCATATGAA
AGCATCACCGGCGAATGGCATTGTCTTCCCCAAAACCCGGAGCGGCGGCGTGCTGTCGCGTGCCCAATGA
ATTTTGATGACTCTCGCAAACGGGAATCTTGGCTCTTTGCATCGGATGGAAGGACGCAGCGAAATGCGAT
AAGTGGTGTGAATTGCAAGATCCCGTGAACCATCGAGTCTTTTGAACGCAAGTTGCGCCCGAGGCCATCA
GGCTAAGGGCACGCCTGCTTGGGCGTCGCGCTTCGTCTCTCTCCTGCCAATGCTTGCCCGGCATACAGCC
AGGCCGGCGTGGTGCGGATGTGAAAGATTGGCCCCTTGTGCCTAGGTGCGGCGGGTCCAAGAGCTGGTGT
TTTGATGGCCCGGAACCCGGCAAGAGGTGGACGGATGCTGGCAGCAGCTGCCGTGCGAATCCCCCATGTT
GTCGTGCTTGTCGGACAGGCAGGAGAACCCTTCCGAACCCCAATGGAGGGCGGTTGACCGCCATTCGGAT
GTGACCCCAGGTCAGGCGGGGGCACCCGCTGAGTTTACGC

>gi|2765657|emb|Z78532.1|CCZ78532 C.californicum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAACAGAATATATGATCGAGTG
AATCTGGAGGACCTGTGGTAACTCAGCTCGTCGTGGCACTGCTTTTGTCGTGACCCTGCTTTGTTGTTGG
GCCTCCTCAAGAGCTTTCATGGCAGGTTTGAACTTTAGTACGGTGCAGTTTGCGCCAAGTCATATAAAGC
ATCACTGATGAATGACATTATTGTCAGAAAAAATCAGAGGGGCAGTATGCTACTGAGCATGCCAGTGAAT
TTTTATGACTCTCGCAACGGATATCTTGGCTCTAACATCGATGAAGAACGCAGCTAAATGCGATAAGTGG
TGTGAATTGCAGAATCCCGTGAACCATCGAGTCTTTGAACGCAAGTTGCGCTCGAGGCCATCAGGCTAAG
GGCACGCCTGCCTGGGCGTCGTGTGTTGCGTCTCTCCTACCAATGCTTGCTTGGCATATCGCTAAGCTGG
CATTATACGGATGTGAATGATTGGCCCCTTGTGCCTAGGTGCGGTGGGTCTAAGGATTGTTGCTTTGATG
GGTAGGAATGTGGCACGAGGTGGAGAATGCTAACAGTCATAAGGCTGCTATTTGAATCCCCCATGTTGTT
GTATTTTTTCGAACCTACACAAGAACCTAATTGAACCCCAATGGAGCTAAAATAACCATTGGGCAGTTGA
TTTCCATTCAGATGCGACCCCAGGTCAGGCGGGGCCACCCGCTGAGTTGAGGC

>gi|2765656|emb|Z78531.1|CFZ78531 C.fasciculatum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAGCAGAACATACGATCGAGTG
AATCCGGAGGACCCGTGGTTACACGGCTCACCGTGGCTTTGCTCTCGTGGTGAACCCGGTTTGCGACCGG
GCCGCCTCGGGAACTTTCATGGCGGGTTTGAACGTCTAGCGCGGCGCAGTTTGCGCCAAGTCATATGGAG
CGTCACCGATGGATGGCATTTTTGTCAAGAAAAACTCGGAGGGGCGGCGTCTGTTGCGCGTGCCAATGAA
TTTATGACGACTCTCGGCAACGGGATATCTGGCTCTTGCATCGATGAAGAACGCAGCGAAATGCGATAAG
TGGTGTGAATTGCAGAATCCCGCGAACCATCGAGTCTTTGAACGCAAGTTGCGCCCGAGGCCATCAGGCT
AAGGGCACGCCTGCCTGGGCGTCGTGTGCTGCGTCTCTCCTGATAATGCTTGATTGGCATGCGGCTAGTC
TGTCATTGTGAGGACGTGAAAGATTGGCCCCTTGCGCCTAGGTGCGGCGGGTCTAAGCATCGGTGTTCTG
ATGGCCCGGAACTTGGCAGTAGGTGGAGGATGCTGGCAGCCGCAAGGCTGCCGTTCGAATCCCCCGTGTT
GTCGTACTCGTCAGGCCTACAGAAGAACCTGTTTGAACCCCCAGTGGACGCAAAACCGCCCTCGGGCGGT
GATTTCCATTCAGATGCGACCCCAGTCAGGCGGGCCACCCGTGAGTAA

>gi|2765655|emb|Z78530.1|CMZ78530 C.margaritaceum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAAACAACATAATAAACGATTGAGTG
AATCTGGAGGACTTGTGGTAATTTGGCTCGCTAGGGATATCCTTTTGTGGTGACCATGATTTGTCATTGG
GCCTCATTGAGAGCTTTCATGGCGGGTTTGAACCTCTAGCACGGTCCAGTTTGCACCAAGGTATATAAAG
AATCACCGATGAATGACATTATTGCCCCACACAACGTCGGAGGTGTGGTGTGTTAATGTTCATTCCAATG
AATTTTGATGACTCTCGGCAGACGGATATCTTGACTCTTGCATCGATGAAGAACGCACCGAAATGTGATA
AGTGGTGTGAATTGCAGAATCCCGTGAACCATCGAGTCTTTGAACGCAAGTTGCGCCCGAGGCCATCAGG
CTAAGGGCACGCCTGCCTGGGCGTCGTATGTTTTATCTCTCCTTCCAATGCTTGTCCAGCATATAGCTAG
GCCATCATTGTGTGGATGTGAAAGATTGGCCCCTTGTGCTTAGGTGCGGTGGGTCTAAGGATATGTGTTT
TGATGGTCTGAAACTTGGCAAGAGGTGGAGGATGCTGGCAGCCGCAAGGCTATTGTTTGAATCCCCCATG
TTGTCATGTTTGTTGGGCCTATAGAACAACTTGTTTGGACCCTAATTAAGGCAAAACAATCCTTGGGTGG
TTGATTTCCAATCAGATGCGACCCCAGTCAGGGGGCCACCCCAT

>gi|2765654|emb|Z78529.1|CLZ78529 C.lichiangense 5.8S rRNA gene and ITS1 and ITS2 DNA
ACGGCGAGCTGCCGAAGGACATTGTTGAGACAGCAGAATATACGATTGAGTGAATCTGGAGGACTTGTGG
TTATTTGGCTCGCTAGGGATTTCCTTTTGTGGTGACCATGATTTGTCATTGGGCCTCATTGAGAGCTTTC
ATGGCGGGTTTGAACCTCTAGCACGGTGCAGTTTGCACCAAGGTATATAAAGAATCACCGATGAATGACA
TTATTGTCAAAAAAGTCGGAGGTGTGGTGTGTTATTGGTCATGCCAATGAATTGTTGATGACTCTCGCCG
AGGGATATCTTGGCTCTTGCATCGATGAAGAATCCCACCGAAATGTGATAAGTGGTGTGAATTGCAGAAT
CCCGTGAACCATCGAGTCTTTGAACGCAAGTTGCGCCCGAGGCCATCAGGCTAAGGGCACGCCTGCCTGG
GCGTCGTATGTTTTATCTCTCCTTCCAATGCTTGTCCAGCATATAGCTAGGCCATCATTGTGTGGATGTG
AAAGATTGGCCCCTTGTGCTTAGGTGCGGTGGGTCTAAGGATATGTGTTTTGATGGTCTGAAACTTGGCA
AGAGGTGGAGGATGCTGGCAGCCGCAAGGCTATTGTTTGAATCCCCCATGTTGTCATATTTGTTGGGCCT
ATAGAACAACTTGTTTGGACCCTAATTAAGGCAAAACAATCCTTGGGTGGTTGATTTCCAATCAGATGCG
ACCCCAGTCAGCGGGCCACCAGCTGAGCTAAAA

Line Plot

Now, let us create a simple line plot for the above fasta file.

Step 1 − Import SeqIO module to read fasta file.

>>> from Bio import SeqIO

Step 2 − Parse the input file.

>>> records = [len(rec) for rec in SeqIO.parse("plot.fasta", "fasta")] 
>>> len(records) 
>>> len(records) 
5 
>>> max(records) 
753 
>>> min(records) 
733

Step 3 − Let us import pylab module.

>>> import pylab

Step 4 − Configure the line chart by assigning x and y axis labels.

>>> pylab.xlabel("sequence length") 
Text(0.5, 0, 'sequence length') 

>>> pylab.ylabel("count") 
Text(0, 0.5, 'count') 
>>>

Step 5 − Configure the line chart by setting grid display.

>>> pylab.grid()

Step 6 − Draw simple line chart by calling plot method and supplying records as input.

>>> pylab.plot(records) 
[<matplotlib.lines.Line2D object at 0x000001CDC4BC6D80>]

Step 7 − Finally save the chart using the below command.

>>> pylab.savefig("lines.png")

Result

After executing the above command, you could see the following image saved in your Biopython directory.

Line Plot

Histogram Chart

A histogram is used for continuous data, where the bins represent ranges of data. Drawing histogram is same as line chart except pylab.plot. Instead, call hist method of pylab module with records and some custum value for bins (5). The complete coding is as follows −

Step 1 − Import SeqIO module to read fasta file.

>>> from Bio import SeqIO

Step 2 − Parse the input file.

>>> records = [len(rec) for rec in SeqIO.parse("plot.fasta", "fasta")] 
>>> len(records) 
5 
>>> max(records) 
753 
>>> min(records) 
733

Step 3 − Let us import pylab module.

>>> import pylab

Step 4 − Configure the line chart by assigning x and y axis labels.

>>> pylab.xlabel("sequence length") 
Text(0.5, 0, 'sequence length') 

>>> pylab.ylabel("count") 
Text(0, 0.5, 'count') 
>>>

Step 5 − Configure the line chart by setting grid display.

>>> pylab.grid()

Step 6 − Draw simple line chart by calling plot method and supplying records as input.

>>> pylab.hist(records,bins=5) 
array([1., 1., 1., 1., 1.]), array([733., 737., 741., 745., 749., 753.]), <BarContainer object of 5 artists>)
>>>

Step 7 − Finally save the chart using the below command.

>>> pylab.savefig("hist.png")

Result

After executing the above command, you could see the histogram saved in your Biopython directory.

Advertisements