- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Found 426 Articles for Data Mining

Updated on 16-Feb-2022 12:24:47
ROCK stands for Robust Clustering using links. It is a hierarchical clustering algorithm that analyze the concept of links (the number of common neighbours among two objects) for data with categorical attributes. It display that such distance data cannot lead to high-quality clusters when clustering categorical information.Moreover, most clustering algorithms create only the similarity among points when clustering i.e., at each step, points that are combined into a single cluster. This “localized” method is prone to bugs. For instance, two distinct clusters can have a few points or outliers that are near; thus, relying on the similarity among points to ... Read More 
Updated on 16-Feb-2022 12:23:12
The k-means algorithm creates the input parameter, k, and division a group of n objects into k clusters so that the resulting intracluster similarity is large but the intercluster analogy is low. Cluster similarity is computed regarding the mean value of the objects in a cluster, which can be looked as the cluster’s centroid or center of gravity.The k-means algorithm proceeds as follows. First, it can randomly choose k of the objects, each of which originally defines a cluster mean or center. For each of the remaining objects, an object is created to the cluster to which it is the ... Read More 
Updated on 16-Feb-2022 12:18:00
A binary variable has only two states such as 0 or 1, where 0 defines that the variable is absent, and 1 defines that it is present. Given the variable smoker defining a patient, for example, 1 denotes that the patient smokes, while 0 denotes that the patient does not. It can be considering binary variables as if they are interval-scaled can lead to misleading clustering outcomes. Hence, methods defines to binary data are essential for calculating dissimilarities.There is one method involves calculating a dissimilarity matrix from the given binary data. If some binary variables are thought of as having ... Read More 
Updated on 16-Feb-2022 12:01:16
Interval-scaled variables are continuous data of an approximately linear scale. An examples such as weight and height, latitude and longitude coordinates (e.g., when clustering homes), and weather temperature. The measurement unit used can influence the clustering analysis.For instance, changing data units from meters to inches for height, or from kilograms to pounds for weight, can lead to several clustering structure. In general, defining a variable in smaller units will lead to a higher range for that variable, and therefore a larger effect on the resulting clustering architecture.It can prevent dependence on the choice of data units, the data must be ... Read More 
Updated on 16-Feb-2022 11:53:36
ROC stands for Receiver Operating Characteristic. ROC curves are a convenient visual tool for analyzing two classification models. ROC curves appears from signal detection theory that was produced during World War II for the search of radar images.An ROC curve displays the trade-off among the true positive rate or sensitivity (proportion of positive tuples that are recognized) and the false-positive rate (proportion of negative tuples that are incorrectly recognized as positive) for a given model.Given a two-class problem, it enables us to anticipate the trade-off between the rate at which the model can accurately identify ‘yes’ cases versus the rate ... Read More 
Updated on 16-Feb-2022 11:52:19
Generalized linear models defines the theoretical authority on which linear regression can be used to the modeling of categorical response variables. In generalized linear models, the variance of the response variable, y, is a function of the mean value of y, unlike in linear regression, where the variance of y is constant.Generalized linear models (GLMs) are an expansion of traditional linear models. This algorithm fits generalized linear models to the information by maximizing the loglikelihood. The elastic net penalty can be used for parameter regularization. The model fitting calculation is parallel, completely fast, and scales completely well for models with ... Read More 
Updated on 16-Feb-2022 11:50:51
CBR stands for Case-based reasoning. CBR classifiers need a database of problem solutions to clarify new problems. Unlike nearest-neighbor classifiers, which save training tuples as points in Euclidean space, CBR saves the tuples or “cases” for problem solving as difficult symbolic representation.There are various business applications of CBR include problem resolution for customer service help desks, where cases describe product-related diagnostic problems. CBR has been used to areas including engineering and law, where cases are technical designs or legal rulings, accordingly.Medical education is an application for CBR, where patient case histories and treatments are used to support diagnose and consider ... Read More 
Updated on 16-Feb-2022 11:49:48
Backpropagation defines the whole procedure encompassing both the computation of the gradient and its need in the stochastic gradient descent. Technically, backpropagation is used to calculate the gradient of the error of the network with respect to the network’s modifiable weights.The characteristics of Backpropagation are the iterative, recursive and effective approach through which it computes the updated weight to enhance the network until it cannot perform the function for which it is being trained. Derivatives of the activation service to be known at web design time is needed to Backpropagation.Backpropagation is generally used in neural network training and computes the ... Read More 
Updated on 16-Feb-2022 11:49:01
Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, including the probability that a given sample belongs to a specific class. Bayesian classifiers have also display large efficiency and speed when it can high databases.Once classes are defined, the system should infer rules that govern the classification, therefore the system should be able to find the description of each class. The descriptions should only refer to the predicting attributes of the training set so that only the positive examples should satisfy the description, not the negative examples. A rule is said to be correct if its description covers ... Read More 
Updated on 16-Feb-2022 11:46:57
An attribute selection measure is a heuristic for choosing the splitting test that “best” separates a given data partition, D, of class-labeled training tuples into single classes.If it can split D into smaller partitions as per the results of the splitting criterion, ideally every partition can be pure (i.e., some tuples that fall into a given partition can belong to the same class).Conceptually, the “best” splitting criterion is the most approximately results in such a method. Attribute selection measures are called a splitting rules because they decides how the tuples at a given node are to be divided.The attribute selection ... Read More Advertisements