Biopython - Genome Analysis



A genome is complete set of DNA, including all of its genes. Genome analysis refers to the study of individual genes and their roles in inheritance.

Genome Diagram

Genome diagram represents the genetic information as charts. Biopython uses Bio.Graphics.GenomeDiagram module to represent GenomeDiagram. The GenomeDiagram module requires ReportLab to be installed.

Steps for creating a diagram

The process of creating a diagram generally follows the below simple pattern −

  • Create a FeatureSet for each separate set of features you want to display, and add Bio.SeqFeature objects to them.

  • Create a GraphSet for each graph you want to display, and add graph data to them.

  • Create a Track for each track you want on the diagram, and add GraphSets and FeatureSets to the tracks you require.

  • Create a Diagram, and add the Tracks to it.

  • Tell the Diagram to draw the image.

  • Write the image to a file.

example.gbk

Let us take an example of input GenBank file −

LOCUS       Z78533                   740 bp    DNA     linear   PLN 30-NOV-2006
DEFINITION  C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA.
ACCESSION   Z78533
VERSION     Z78533.1  GI:2765658
KEYWORDS    5.8S ribosomal RNA; 5.8S rRNA gene; internal transcribed spacer;
            ITS1; ITS2.
SOURCE      Cypripedium irapeanum
  ORGANISM  Cypripedium irapeanum
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliophyta; Liliopsida; Asparagales; Orchidaceae;
            Cypripedioideae; Cypripedium.
REFERENCE   1
  AUTHORS   Cox,A.V., Pridgeon,A.M., Albert,V.A. and Chase,M.W.
  TITLE     Phylogenetics of the slipper orchids (Cypripedioideae:
            Orchidaceae): nuclear rDNA ITS sequences
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 740)
  AUTHORS   Cox,A.V.
  TITLE     Direct Submission
  JOURNAL   Submitted (19-AUG-1996) Cox A.V., Royal Botanic Gardens, Kew,
            Richmond, Surrey TW9 3AB, UK
FEATURES             Location/Qualifiers
     source          1..740
                     /organism="Cypripedium irapeanum"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:49711"
     misc_feature    1..380
                     /note="internal transcribed spacer 1"
     gene            381..550
                     /gene="5.8S rRNA"
     rRNA            381..550
                     /gene="5.8S rRNA"
                     /product="5.8S ribosomal RNA"
     misc_feature    551..740
                     /note="internal transcribed spacer 2"
ORIGIN      
        1 cgtaacaagg tttccgtagg tgaacctgcg gaaggatcat tgatgagacc gtggaataaa
       61 cgatcgagtg aatccggagg accggtgtac tcagctcacc gggggcattg ctcccgtggt
      121 gaccctgatt tgttgttggg ccgcctcggg agcgtccatg gcgggtttga acctctagcc
      181 cggcgcagtt tgggcgccaa gccatatgaa agcatcaccg gcgaatggca ttgtcttccc
      241 caaaacccgg agcggcggcg tgctgtcgcg tgcccaatga attttgatga ctctcgcaaa
      301 cgggaatctt ggctctttgc atcggatgga aggacgcagc gaaatgcgat aagtggtgtg
      361 aattgcaaga tcccgtgaac catcgagtct tttgaacgca agttgcgccc gaggccatca
      421 ggctaagggc acgcctgctt gggcgtcgcg cttcgtctct ctcctgccaa tgcttgcccg
      481 gcatacagcc aggccggcgt ggtgcggatg tgaaagattg gccccttgtg cctaggtgcg
      541 gcgggtccaa gagctggtgt tttgatggcc cggaacccgg caagaggtgg acggatgctg
      601 gcagcagctg ccgtgcgaat cccccatgtt gtcgtgcttg tcggacaggc aggagaaccc
      661 ttccgaaccc caatggaggg cggttgaccg ccattcggat gtgaccccag gtcaggcggg
      721 ggcacccgct gagtttacgc
//

Install reportLab using pip if not installed.

(myenv) D:\biopython\myenv>pip3 install reportlab
Collecting reportlab
  Obtaining dependency information for reportlab from https://files.pythonhosted.org/packages/0e/ee/5f7a31ab05cf817e0cc70ae6df51a1a4fda188c899790a3131a24dd78d18/reportlab-4.4.6-py3-none-any.whl.metadata
  Downloading reportlab-4.4.6-py3-none-any.whl.metadata (1.7 kB)
Collecting pillow>=9.0.0 (from reportlab)
  Obtaining dependency information for pillow>=9.0.0 from https://files.pythonhosted.org/packages/a2/0b/d87733741526541c909bbf159e338dcace4f982daac6e5a8d6be225ca32d/pillow-12.0.0-cp312-cp312-win_amd64.whl.metadata
  Downloading pillow-12.0.0-cp312-cp312-win_amd64.whl.metadata (9.0 kB)
Collecting charset-normalizer (from reportlab)
  Obtaining dependency information for charset-normalizer from https://files.pythonhosted.org/packages/3d/2d/1e5ed9dd3b3803994c155cd9aacb60c82c331bad84daf75bcb9c91b3295e/charset_normalizer-3.4.4-cp312-cp312-win_amd64.whl.metadata
  Using cached charset_normalizer-3.4.4-cp312-cp312-win_amd64.whl.metadata (38 kB)
Downloading reportlab-4.4.6-py3-none-any.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 6.9 MB/s eta 0:00:00
Downloading pillow-12.0.0-cp312-cp312-win_amd64.whl (7.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.0/7.0 MB 5.3 MB/s eta 0:00:00
Using cached charset_normalizer-3.4.4-cp312-cp312-win_amd64.whl (107 kB)
Installing collected packages: pillow, charset-normalizer, reportlab
Successfully installed charset-normalizer-3.4.4 pillow-12.0.0 reportlab-4.4.6

We shall import all the modules first as shown below −

>>> from reportlab.lib import colors 
>>> from reportlab.lib.units import cm 
>>> from Bio.Graphics import GenomeDiagram

Now, import SeqIO module to read data −

>>> from Bio import SeqIO 
record = SeqIO.read("example.gbk", "genbank")

Here, the record reads the sequence from genbank file.

Now, create an empty diagram to add track and feature set −

>>> diagram = GenomeDiagram.Diagram(
   "Yersinia pestis biovar Microtus plasmid pPCP1") 
>>> track = diagram.new_track(1, name="Annotated Features") 
>>> feature = track.new_set()

Let us draw a diagram for the above input records −

>>> diagram.draw(
   format = "linear", orientation = "landscape", pagesize = 'A4', 
   ... fragments = 4, start = 0, end = len(record)) 
>>> diagram.write("orchid.pdf", "PDF") 
>>> diagram.write("orchid.eps", "EPS") 
>>> diagram.write("orchid.svg", "SVG") 

After executing the above command, you could see the following image saved in your Biopython directory.

genome.svg

Creating Diagram

You can also draw the image in circular format by making the below changes −

>>> diagram.draw(
   format = "circular", circular = True, pagesize = (20*cm,20*cm), 
   ... start = 0, end = len(record), circle_core = 0.7) 
>>> diagram.write("circular.pdf", "PDF")

Chromosomes Overview

DNA molecule is packaged into thread-like structures called chromosomes. Each chromosome is made up of DNA tightly coiled many times around proteins called histones that support its structure.

Chromosomes are not visible in the cells nucleus not even under a microscope when the cell is not dividing. However, the DNA that makes up chromosomes becomes more tightly packed during cell division and is then visible under a microscope.

In humans, each cell normally contains 23 pairs of chromosomes, for a total of 46. Twenty-two of these pairs, called autosomes, look the same in both males and females. The 23rd pair, the sex chromosomes, differ between males and females. Females have two copies of the X chromosome, while males have one X and one Y chromosome.

Advertisements