Data Mining Articles

Page 27 of 36

How can we discover frequent substructures?

Ginni
Ginni
Updated on 25-Nov-2021 432 Views

The discovery of frequent substructures usually consists of two steps. In the first step, it can make frequent substructure candidates. The frequency of every candidate is tested in the second step. Most studies on frequent substructure discovery focus on the optimization of the first step because the second step involves a subgraph isomorphism test whose computational complexity is excessively high (i.e., NP-complete).There are various methods for frequent substructure mining which are as follows −Apriori-based Approach − Apriori-based frequent substructure mining algorithms send the same features with Apriori-based frequent itemset mining algorithms. The search for frequent graphs begins with graphs of ...

Read More

What is Periodicity analysis?

Ginni
Ginni
Updated on 25-Nov-2021 3K+ Views

Periodicity analysis is the mining of periodic patterns, namely, the search for recurring patterns in time-related series data. Periodicity analysis can be used in several important areas. For example, seasons, tides, planet trajectories, daily power consumptions, daily traffic patterns, and weekly TV programs all present certain periodic patterns.Periodicity analysis is implemented over time-series data, which includes sequences of values or events generally measured at equal time intervals (e.g., hourly, daily, weekly). It can also be applied to other time-related sequence data where the value or event may occur at a non-equal time interval or at any time (e.g., online transactions). ...

Read More

What is a time-series database?

Ginni
Ginni
Updated on 25-Nov-2021 1K+ Views

A time-series database includes sequences of values or events accessed over the repeated assessment of time. The values are generally calculated at equal time intervals (e.g., hourly, daily, weekly). Time-series databases are popular in many applications, such as stock market analysis, economic and sales forecasting, budgetary analysis, utility studies, inventory studies, yield projections, workload projections, process and quality control, observation of natural phenomena (including atmosphere, temperature, wind, and earthquake), numerical and engineering experiments, and medical treatments.A time-series database is also a sequence database. A sequence database is any database that includes sequences of ordered events, with or without a concrete ...

Read More

What is CluStream?

Ginni
Ginni
Updated on 25-Nov-2021 2K+ Views

CluStream is an algorithm for the clustering of evolving data streams based on userspecified, online clustering queries. It divides the clustering process into on-line and offline components.The online component computes and stores summary statistics about the data stream using micro-clusters, and performs incremental online computation and maintenance of the micro-clusters. The offline component does macro-clustering and answers various user questions using the stored summary statistics, which are based on the tilted time frame model.The cluster evolving data streams based on both historical and current stream data information, the tilted time frame model (such as a progressive logarithmic model) is adopted, ...

Read More

What is Hoeffding Tree Algorithm?

Ginni
Ginni
Updated on 25-Nov-2021 6K+ Views

The Hoeffding tree algorithm is a decision tree learning method for stream data classification. It was initially used to track Web clickstreams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound (or additive Chernoff bound).Suppose we make N independent observations of a random ...

Read More

What is a distance-based outlier?

Ginni
Ginni
Updated on 25-Nov-2021 3K+ Views

An object o in a data set S is a distance-based (DB) outlier with parameters p and d, i.e., DB (p, d), if minimum a fraction p of the objects in S lie at a distance higher than d from o. In other words, instead of depending on statistical tests, it can think of distance-based outliers as those objects who do not have enough neighbors.The neighbors are represented based on distance from the given object. In comparison with statistical-based methods, distance-based outlier detection generalizes or merges the ideas behind discordancy testing for standard distributions. Hence, a distance-based outlier is also ...

Read More

What is Conceptual Clustering?

Ginni
Ginni
Updated on 24-Nov-2021 3K+ Views

Conceptual clustering is a form of clustering in machine learning that, given a set of unlabeled objects, makes a classification design over the objects. Unlike conventional clustering, which generally identifies groups of like objects, conceptual clustering goes one step further by also discovering characteristic definitions for each group, where each group defines a concept or class.Therefore, conceptual clustering is a two-step process − clustering is implemented first, followed by characterization. Thus, clustering quality is not solely a service of single objects. Most techniques of conceptual clustering adopt a statistical method that uses probability measurements in deciding the concepts or clusters.Probabilistic ...

Read More

What are the types of Constraint-Based Cluster Analysis?

Ginni
Ginni
Updated on 24-Nov-2021 4K+ Views

Constraint-based clustering finds clusters that satisfy user-stated preferences or constraints. It is based on the nature of the constraints, constraint-based clustering can adopt instead of different approaches. There are several categories of constraints which are as follows −Constraints on individual objects − It can define constraints on the objects to be clustered. In a real estate application, for instance, one can like to spatially cluster only those luxury mansions worth over a million dollars. This constraint confines the collection of objects to be clustered. It can simply be managed by preprocessing (e.g., implementing selection using an SQL query), after which ...

Read More

What is Expectation-Maximization?

Ginni
Ginni
Updated on 24-Nov-2021 1K+ Views

The EM (Expectation-Maximization) algorithm is a famous iterative refinement algorithm that can be used for discovering parameter estimates. It can be considered as an extension of the k-means paradigm, which creates an object to the cluster with which it is most similar, depending on the cluster mean.EM creates each object to a cluster according to a weight defining the probability of membership. In other term, there are no strict boundaries among clusters. Thus, new means are evaluated based on weighted measures.EM begins with an original estimate or “guess” of the parameters of the combination model (collectively defined as the parameter ...

Read More

Why is wavelet transformation useful for clustering?

Ginni
Ginni
Updated on 24-Nov-2021 1K+ Views

WaveCluster is a multiresolution clustering algorithm that first summarizes the records by imposing a multidimensional grid architecture onto the data space. It can use a wavelet transformation to change the original feature space, finding dense domains in the transformed space.In this method, each grid cell summarizes the data of a group of points that map into the cell. This summary data generally fit into the main memory for use by the multiresolution wavelet transform and the subsequent cluster analysis.A wavelet transform is a signal processing approach that decomposes a signal into multiple frequency subbands. The wavelet model can be used ...

Read More
Showing 261–270 of 355 articles
« Prev 1 25 26 27 28 29 36 Next »
Advertisements