- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What is Randomized Algorithms and Data Stream Management System in data mining?

**Randomized Algorithms** − Randomized algorithms in the form of random sampling and blueprint, are used to deal with large, high-dimensional data streams. The need of randomization leads to simpler and more effective algorithms in contrast to known deterministic algorithms.

If a randomized algorithm continually returns the correct answer but the running times change, it is called a Las Vegas algorithm. In contrast, a Monte Carlo algorithm has bounds on the running time but cannot restore the true result. It can usually consider Monte Carlo algorithms. The importance of a randomized algorithm is simply as a probability distribution over a group of deterministic algorithms.

Given that a randomized algorithm restore a random variable as a result, it is likely to have bounds on the tail probability of that random variable. This communicate us that the probability that a random variable vary from its expected value is short. The main tool is Chebyshev’s Inequality.

Let X be a random variable with mean µ and standard deviation σ (variance σ^{2}). Chebyshev’s inequality says that

$$\mathrm{P(|X-\mu|>k)<\frac{\sigma^2 }{k^2}}$$

for any given positive real number, k. This inequality is used to bound the variance of a random variable. In several cases, multiple random variables can be used to improve the confidence in this results. Considering these random variables are completely independent, Chernoff bounds can be used.

Let X_{1}X_{2} … X_{n} be independent Poisson trials. In a Poisson trial, the probability of success change from trial to trial. If X is the sum of X_{1} to X_{n}, then a weaker version of the Chernoff bound communicate us that

$$\mathrm{P[X<(1+\delta)\mu]< e^{-\mu\delta^2}}$$

where δ ∈ (0, 1]. This displays that the probability reduce exponentially as it can move from the mean, which creates poor estimates much more unlikely.

**Data Stream Management System** − In a Data Stream Management System, there are several data streams. They appear on-line and are continuous, temporally series, and possibly infinite. Because a component from a data stream has been treated, it is discarded or archived, and it cannot be simply fetched unless it is explicitly saved in memory.

A stream data query processing structure includes three elements such as end-user, query processor, and scratch space (which can include main memory and disks). An end user concern a query to the DSMS, and the query processor takes the query, processes it using the data saved in the scratch space, and restore the results to the user.

Queries can be one-time queries or continuous queries. A one-time query is computed once over a point-in-time photograph of the data set, with the answer restored to the user. A continuous query is computed continuously as data streams continue to appear.

- Related Articles
- What is Data Mining?
- File-based Data Management System
- What is the integration of a data mining system with a database system?
- What is the difference between Data Mining and Data Warehouse?
- What is Data Management?
- What is Visual and Audio Data Mining?
- What is the difference between Text Mining and Data Mining?
- What is Data Mining Metrics?
- What is Spatial Data Mining?
- What is Weka data mining?
- What is Bitcoin data mining?
- What is Orange Data Mining?
- What is Text Data Mining?
- What is Spatiotemporal data mining?
- Data Warehousing and Data Mining