What are the algorithms of Grid-Based Clustering?

A grid is an effective method to organize a set of data, minimum in low dimensions. The concept is to divide the applicable values of each attribute into a multiple contiguous intervals, making a set of grid cells. Each object declines into a grid cell whose equivalent attribute intervals include the values of the object.

Objects can be created to grid cells in one pass through the record, and data about each cell, including the number of points in the cell, can also be gathered concurrently.

There are multiple ways to implement clustering using a grid, but most methods are based on density. The algorithm of Grid-based clustering is as follows −

  • Represent a set of grid cells.

  • Create objects to the appropriate cells and calculate the density of each cell.

  • Remove cells having a density below a defined threshold, r.

  • Form clusters from contiguous set of dense cells.

Defining grid cells − This is a basic step in the process, but also the least clear, as there are several methods to divide the possible values of each attribute into a several contiguous intervals. For continuous attributes, one method is to divide the values into same width intervals. If this method is used to each attribute, therefore the resulting grid cells all have the similar volume, and the density of a cell is easily defined as the multiple points in the cell.

The Density of Grid Cells − It can define the density of a grid cell is as the multiple points divided by the volume of the region. In another terms, density is the number of points per amount of area, regardless of the dimensionality of that area.

Forming Clusters from Dense Grid Cells − Forming clusters from adjacent set of dense cells is relatively easy. There are some problems such as it is required to define what it can define by adjacent cells. The clustering method has some disadvantages that can be addressed by creating the algorithm slightly more refined. For instance, there are probable to be partly null cells on the boundary of a cluster.

It is applicable to improve basic grid-based clustering by using higher than density data. In some cases, the record has both spatial and non-spatial attributes. In another terms, there are various attributes defines the area of objects in time or space, while different attributes defines other elements of the objects.

An instance is houses, which have both an area and a multiple characteristics, including price or floor space in square feet. Due to spatial (or temporal) autocorrelation, objects in a specific cell have same values for their other attributes.