Biopython - Motif Objects



A sequence motif is a nucleotide or amino-acid sequence pattern. Sequence motifs are formed by three-dimensional arrangement of amino acids which may not be adjacent. Biopython provides a separate module, Bio.motifs to access the functionalities of sequence motif as specified below −

from Bio import motifs

Creating Simple DNA Motif

Let us create a simple DNA motif sequence using the below command −

>>> from Bio import motifs 
>>> from Bio.Seq import Seq 
>>> DNA_motif = [ Seq("AGCT"), 
...               Seq("TCGA"), 
...               Seq("AACT"), 
...             ] 
>>> seq = motifs.create(DNA_motif) 
>>> print(seq) AGCT TCGA AACT

To count the sequence values, use the below command −

>>> print(seq.counts) 
         0       1      2       3 
A:    2.00    1.00   0.00    1.00 
C:    0.00    1.00   2.00    0.00 
G:    0.00    1.00   1.00    0.00 
T:    1.00    0.00   0.00    2.00

Use the following code to count ‘A’ in the sequence −

>>> seq.counts["A", :] 
(2, 1, 0, 1)

If you want to access the columns of counts, use the below command −

>>> seq.counts[:, 3] 
{'A': 1, 'C': 0, 'T': 2, 'G': 0}

Creating a Sequence Logo

We shall now discuss how to create a Sequence Logo.

Consider the below sequence −

AGCTTACG 
ATCGTACC 
TTCCGAAT 
GGTACGTA 
AAGCTTGG

You can create your own logo using the following link − http://weblogo.berkeley.edu/

Add the above sequence and create a new logo and save the image named seq.png in your biopython folder.

seq.png

Sequence Logo

After creating the image, now run the following command −

>>> seq.weblogo("seq.png")

This DNA sequence motif is represented as a sequence logo for the LexA-binding motif.

JASPAR Database

JASPAR is one of the most popular databases. It provides facilities of any of the motif formats for reading, writing and scanning sequences. It stores meta-information for each motif. The module Bio.motifs contains a specialized class jaspar.Motif to represent meta-information attributes.

It has the following notable attributes types −

  • matrix_id − Unique JASPAR motif ID
  • name − The name of the motif
  • tf_family − The family of motif, e.g. ’Helix-Loop-Helix’
  • data_type − the type of data used in motif.

Let us create a JASPAR sites format named in sample.sites in biopython folder. It is defined below −

sample.sites
>MA0001 ARNT 1 
AACGTGatgtccta 
>MA0001 ARNT 2 
CAGGTGggatgtac 
>MA0001 ARNT 3 
TACGTAgctcatgc 
>MA0001 ARNT 4 
AACGTGacagcgct 
>MA0001 ARNT 5 
CACGTGcacgtcgt 
>MA0001 ARNT 6 
cggcctCGCGTGc

In the above file, we have created motif instances. Now, let us create a motif object from the above instances −

>>> from Bio import motifs 
>>> with open("sample.sites") as handle: 
... data = motifs.read(handle,"sites") 
... 
>>> print(data) 
TF name None 
Matrix ID None 
Matrix:
            0       1       2       3       4       5 
A:       2.00    5.00    0.00    0.00    0.00    1.00 
C:       3.00    0.00    5.00    0.00    0.00    0.00 
G:       0.00    1.00    1.00    6.00    0.00    5.00 
T:       1.00    0.00    0.00    0.00    6.00    0.00

Here, data reads all the motif instances from sample.sites file.

To print all the instances from data, use the below command −

>>> for instance in data.instances: 
...    print(instance) 
... 
AACGTG 
CAGGTG 
TACGTA 
AACGTG 
CACGTG 
CGCGTG

Use the below command to count all the values −

>>> print(data.counts)
            0       1       2       3       4       5 
A:       2.00    5.00    0.00    0.00    0.00    1.00 
C:       3.00    0.00    5.00    0.00    0.00    0.00 
G:       0.00    1.00    1.00    6.00    0.00    5.00 
T:       1.00    0.00    0.00    0.00    6.00    0.00
>>>
Advertisements