- Trending Categories
- Data Structure
- Operating System
- MS Excel
- C Programming
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is STREAM?
STREAM is an individual-pass, constant element approximation algorithm that was produced for the k-medians problem. The k-medians problem is to cluster N data points into k clusters or groups such that the sum squared error (SSQ) between the points and the cluster center to which they are assigned is minimized. The idea is to assign similar points to the same cluster, where these points are dissimilar from points in other clusters.
In the stream data model, data points can only be seen once, and memory and time are limited. It can implement high-quality clustering, the STREAM algorithm processes data streams in buckets (or batches) of m points, with each bucket fitting in main memory.
For each bucket, bi, STREAM clusters the bucket’s points into k clusters. It then summarizes the bucket information by retaining only the information regarding the k centers, with each cluster center being weighted by the number of points assigned to its cluster.
STREAM then discards the points, retaining only the center information. Because enough centers have been collected, the weighted centers are clustered to make another group of O(k) cluster centers. This is repeated so that at every level, at most m points are retained. This approach results in a one-pass, O(kN)-time, O(Nε)-space (for some constant ε < 1), constant-factor approximation algorithm for data stream k-medians.
STREAM changes quality k-medians clusters with definite area and time. However, it treated neither the evolution of the records nor time granularity. The clustering can become dominated by the older, outdated data of the stream. The feature of the clusters can vary with both the moment at which they are evaluated, and the time horizon over which they are measured.
For example, a user can required to test clusters appearing last week, last month, or last year. These can be different. Hence, a data stream clustering algorithm must also support the flexibility to calculate clusters over user-defined time periods in an interactive manner.
CluStream is an algorithm for the clustering of evolving data streams based on user-specified, online clustering queries. It divides the clustering process into on-line and offline components.
The online component computes and stores summary statistics about the data stream using micro-clusters, and performs incremental online computation and maintenance of the micro-clusters. The offline component does macro-clustering and solve several user questions using the saved summary statistics, which are depends on the tilted time frame model.
The cluster evolving data streams based on both historical and current stream data information, the tilted time frame model (such as a progressive logarithmic model) is adopted, which stores the snapshots of a set of microclusters at different levels of granularity depending on recency.
- Related Articles
- What is C++ Standard Output Stream (cout)?
- What is C++ Standard Error Stream (cerr)?
- What is Stream Cipher in Information Security?
- What is the use of stream ciphers in information security?
- What is Randomized Algorithms and Data Stream Management System in data mining?
- What is a Stream and what are the types of Streams and classes in Java?
- Character Stream vs Byte Stream in Java\n
- What is the comparison between Stream Cipher and Block Cipher in information security?
- What are the Stream Control Transmission Protocol (SCTP) services?
- Stream In Java
- Difference between the byte stream and character stream classes in Java?
- C++ Stream Classes Structure
- Stream sorted() in Java
- PHP compression Stream Wrappers
- Java 8 Stream Terminal Operations