Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to generate an array for bi-clustering using Scikit-learn?
In this tutorial, we will learn how to generate arrays with structured patterns for bi-clustering analysis using Python Scikit-learn. We'll cover two main approaches: creating arrays with constant block diagonal structure and block checkerboard structure.
What is Bi-clustering?
Bi-clustering is a data mining technique that simultaneously clusters rows and columns of a data matrix to find coherent sub-matrices. It's particularly useful in gene expression analysis and collaborative filtering.
Generating an Array with Constant Block Diagonal Structure
The make_biclusters function creates synthetic datasets with a block diagonal structure, where clusters appear as rectangular blocks along the main diagonal.
Example
Let's generate an array of shape (500, 500) with 6 clusters ?
# Importing libraries
from sklearn.datasets import make_biclusters
import matplotlib.pyplot as plt
# Set the figure size
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
# Create bi-cluster test dataset
data, rows, columns = make_biclusters(
shape=(500, 500),
n_clusters=6,
noise=5,
shuffle=False,
random_state=0
)
# Plot the array
plt.matshow(data, cmap=plt.cm.Reds)
plt.title("Array with Constant Block Diagonal Structure\nfor Biclustering")
plt.show()
print(f"Data shape: {data.shape}")
print(f"Number of row clusters: {len(rows)}")
print(f"Number of column clusters: {len(columns)}")
Data shape: (500, 500) Number of row clusters: 6 Number of column clusters: 6
Generating an Array with Block Checkerboard Structure
The make_checkerboard function creates datasets with a checkerboard pattern, where clusters alternate in a grid-like structure.
Example
Let's generate an array of shape (600, 600) with clusters arranged as (4, 3) ?
# Importing libraries
from sklearn.datasets import make_checkerboard
import matplotlib.pyplot as plt
# Set the figure size
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True
# Create checkerboard test dataset
n_clusters = (4, 3)
data, rows, columns = make_checkerboard(
shape=(600, 600),
n_clusters=n_clusters,
noise=10,
shuffle=False,
random_state=0
)
# Plot the array
plt.matshow(data, cmap=plt.cm.Greens)
plt.title("Array with Block Checkerboard Structure\nfor Biclustering")
plt.show()
print(f"Data shape: {data.shape}")
print(f"Row clusters: {n_clusters[0]}")
print(f"Column clusters: {n_clusters[1]}")
Data shape: (600, 600) Row clusters: 4 Column clusters: 3
Key Parameters
| Parameter | Description | Default |
|---|---|---|
shape |
Output array dimensions (rows, columns) | (100, 100) |
n_clusters |
Number of biclusters | 4 |
noise |
Standard deviation of Gaussian noise | 0.0 |
shuffle |
Shuffle row and column indices | True |
random_state |
Random seed for reproducibility | None |
Conclusion
Scikit-learn provides make_biclusters for block diagonal patterns and make_checkerboard for checkerboard patterns. These functions are essential for testing bi-clustering algorithms and creating synthetic datasets with known cluster structures.
