How can we approach the problem of clustering with obstacles?

Data MiningDatabaseData Structure

A partitioning clustering method is desirable because it minimizes the distance among sets and their cluster centers. If it can choose the k-means method, a cluster center cannot be available given the existence of obstacles.

For instance, the cluster can turn out to be in the center of a lake. In other words, the k-medoids method chooses an object inside the cluster as a center and thus guarantees that a problem cannot appear.

At each time a new medoid is selected, the distance among each object and its newly selected cluster center has to be recalculated. Because there can be obstacles among two objects, the distance among two objects can be derived by geometric computations (e.g., involving triangulation).

The computational cost can get high if a huge number of objects and obstacles are contained. The clustering with obstacles problem can be defined using a graphical description. First, a point, p, is apparent from another point, q, in the region, R, if the straight line adjacent p and q does not intersect some obstacles.

A visibility graph is the graph, V G = (V, E), including each vertex of the obstacles has an equivalent node in V and two nodes, v1 and v2, in V are combined by an edge in E if and only if the equivalent vertices they define are visible to each other.

Let VG’ = (V’, E’) be a visibility graph generated from VG by inserting two additional points, p and q, in V’. E’ includes an edge adding two points in V0 if the two points are jointly visible.

It can be used to reduce the cost of distance computation between any two set of objects or points, multiple preprocessing and optimization approaches can be used. There is one approach group points that are near together into microclusters. This can be completed by first triangulating the region R into triangles, and then combining nearby points in the similar triangle into microclusters, using an approach similar to BIRCH or DBSCAN.

By processing microclusters instead of single points, the complete computation is reduced. After that, precomputation can be implemented to build two type of join indices depends on the computation of the shortest paths −

  • VV indices, for some pair of obstacle vertices.

  • MV indices, for some pair of microcluster and obstacle vertex. It can facilitates indices helps more optimize the overall performance.

With such precomputation and optimization, the distance among any two points (at the granularity approach of microcluster) can be computed effectively. Therefore, the clustering process can be implemented in a manner similar to a typical effective k-medoids algorithm, including CLARANS, and achieve best clustering quality for huge data sets.

Updated on 17-Feb-2022 11:08:03