Biopython - Phenotype Microarray


Phenotype is defined as an observable character or trait exhibited by an organism against a particular chemical or environment. Phenotype microarray simultaneously measures the reaction of an organism against a larger number of chemicals & environment and analyses the data to understand the gene mutation, gene characters, etc.

Biopython provides an excellent module, Bio.Phenotype to analyze phenotypic data. Let us learn how to parse, interpolate, extract and analyze the phenotype microarray data in this chapter.


Phenotype microarray data can be in two formats: CSV and JSON. Biopython supports both the formats. Biopython parser parses the phenotype microarray data and returns as a collection of PlateRecord objects. Each PlateRecord object contains a collection of WellRecord objects. Each WellRecord object holds data in 8 rows and 12 columns format. The eight rows are represented by A to H and 12 columns are represented by 01 to 12. For example, 4th row and 6th column are represented by D06.

Let us understand the format and the concept of parsing with the following example −

Step 1 − Download the Plates.csv file provided by Biopython team −

Step 2 − Load the phenotpe module as below −

>>> from Bio import phenotype

Step 3 − Invoke phenotype.parse method passing the data file and format option (“pm-csv”). It returns the iterable PlateRecord as below,

>>> plates = list(phenotype.parse('Plates.csv', "pm-csv")) 
>>> plates 
[PlateRecord('WellRecord['A01'], WellRecord['A02'], WellRecord['A03'], ..., WellRecord['H12']'), 
PlateRecord('WellRecord['A01'], WellRecord['A02'], WellRecord['A03'], ..., WellRecord['H12']'), 
PlateRecord('WellRecord['A01'], WellRecord['A02'], WellRecord['A03'], ..., WellRecord['H12']'), 
PlateRecord('WellRecord['A01'], WellRecord['A02'],WellRecord['A03'], ..., WellRecord['H12']')] 

Step 4 − Access the first plate from the list as below −

>>> plate = plates[0] 
>>> plate 
PlateRecord('WellRecord['A01'], WellRecord['A02'], WellRecord['A03'], ...,

Step 5 − As discussed earlier, a plate contains 8 rows each having 12 items. WellRecord can be access in two ways as specified below −

>>> well = plate["A04"] 
>>> well = plate[0, 4] 
>>> well WellRecord('(0.0, 0.0), (0.25, 0.0), (0.5, 0.0), (0.75, 0.0), 
   (1.0, 0.0), ..., (71.75, 388.0)')

Step 6 − Each well will have series of measurement at different time points and it can be accessed using for loop as specified below −

>>> for v1, v2 in well: 
... print(v1, v2) 
0.0 0.0 
0.25 0.0 
0.5 0.0 
0.75 0.0 
1.0 0.0 
71.25 388.0 
71.5 388.0 
71.75 388.0


Interpolation gives more insight into the data. Biopython provides methods to interpolate WellRecord data to get information for intermediate time points. The syntax is similar to list indexing and so, easy to learn.

To get the data at 20.1 hours, just pass as index values as specified below −

>>> well[20.10] 

We can pass start time point and end time point as well as specified below −

>>> well[20:30] 
[67.0, 84.0, 102.0, 119.0, 135.0, 147.0, 158.0, 168.0, 179.0, 186.0]

The above command interpolate data from 20 hour to 30 hours with 1 hour interval. By default, the interval is 1 hour and we can change it to any value. For example, let us give 15 minutes (0.25 hour) interval as specified below −

>>> well[20:21:0.25] 
[67.0, 73.0, 75.0, 81.0]

Analyze and Extract

Biopython provides a method fit to analyze the WellRecord data using Gompertz, Logistic and Richards sigmoid functions. By default, the fit method uses Gompertz function. We need to call the fit method of the WellRecord object to get the task done. The coding is as follows −

Traceback (most recent call last): 
Bio.MissingPythonDependencyError: Install scipy to extract curve parameters. 
>>> well.model 
>>> getattr(well, 'min') 0.0 
>>> getattr(well, 'max') 388.0 
>>> getattr(well, 'average_height') 

Biopython depends on scipy module to do advanced analysis. It will calculate min, max and average_height details without using scipy module.