How to generate an array for bi-clustering using Scikit-learn?

In this tutorial, we will learn how to generate arrays with structured patterns for bi-clustering analysis using Python Scikit-learn. We'll cover two main approaches: creating arrays with constant block diagonal structure and block checkerboard structure.

What is Bi-clustering?

Bi-clustering is a data mining technique that simultaneously clusters rows and columns of a data matrix to find coherent sub-matrices. It's particularly useful in gene expression analysis and collaborative filtering.

Generating an Array with Constant Block Diagonal Structure

The make_biclusters function creates synthetic datasets with a block diagonal structure, where clusters appear as rectangular blocks along the main diagonal.

Example

Let's generate an array of shape (500, 500) with 6 clusters ?

# Importing libraries
from sklearn.datasets import make_biclusters
import matplotlib.pyplot as plt

# Set the figure size
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True

# Create bi-cluster test dataset
data, rows, columns = make_biclusters(
    shape=(500, 500), 
    n_clusters=6, 
    noise=5, 
    shuffle=False, 
    random_state=0
)

# Plot the array
plt.matshow(data, cmap=plt.cm.Reds)
plt.title("Array with Constant Block Diagonal Structure\nfor Biclustering")
plt.show()

print(f"Data shape: {data.shape}")
print(f"Number of row clusters: {len(rows)}")
print(f"Number of column clusters: {len(columns)}")
Data shape: (500, 500)
Number of row clusters: 6
Number of column clusters: 6

Generating an Array with Block Checkerboard Structure

The make_checkerboard function creates datasets with a checkerboard pattern, where clusters alternate in a grid-like structure.

Example

Let's generate an array of shape (600, 600) with clusters arranged as (4, 3) ?

# Importing libraries
from sklearn.datasets import make_checkerboard
import matplotlib.pyplot as plt

# Set the figure size
plt.rcParams["figure.figsize"] = [7.50, 3.50]
plt.rcParams["figure.autolayout"] = True

# Create checkerboard test dataset
n_clusters = (4, 3)
data, rows, columns = make_checkerboard(
    shape=(600, 600), 
    n_clusters=n_clusters, 
    noise=10, 
    shuffle=False, 
    random_state=0
)

# Plot the array
plt.matshow(data, cmap=plt.cm.Greens)
plt.title("Array with Block Checkerboard Structure\nfor Biclustering")
plt.show()

print(f"Data shape: {data.shape}")
print(f"Row clusters: {n_clusters[0]}")
print(f"Column clusters: {n_clusters[1]}")
Data shape: (600, 600)
Row clusters: 4
Column clusters: 3

Key Parameters

Parameter Description Default
shape Output array dimensions (rows, columns) (100, 100)
n_clusters Number of biclusters 4
noise Standard deviation of Gaussian noise 0.0
shuffle Shuffle row and column indices True
random_state Random seed for reproducibility None

Conclusion

Scikit-learn provides make_biclusters for block diagonal patterns and make_checkerboard for checkerboard patterns. These functions are essential for testing bi-clustering algorithms and creating synthetic datasets with known cluster structures.

Updated on: 2026-03-26T22:11:06+05:30

479 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements