- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# How does the Lossy Counting algorithm find frequent items?

A user supports two input parameters including the min support threshold, σ, and the error bound previously, indicated as ε. The incoming stream is theoretically divided into buckets of width w = [1/ε].

Let N be the current stream length, i.e., the number of items view so far. The algorithm needs a frequency-list data structure for all elements with frequency higher than 0. For every item, the list supports f, the approximate frequency count, and ∆, the maximum possible error of f.

The algorithm procedure buckets of items as follows. When a new bucket arrives in, the items in the bucket are inserted to the frequency list. If a given item exists in the list, it can simply increase its frequency count, f. Otherwise, it can add it into the list with a frequency count of 1. If the new item is from the bth bucket, it can set ∆, the maximum possible bug on the frequency count of the item, to be b−1.

Whenever a bucket boundary is acquired (i.e., N has reached a multiple of width w, including w, 2w, 3w, etc.), the frequency list is determined. Let b be the current bucket number. An item entry is removed if, for that entry, f + ∆ ≤ b. In this approach, the algorithm objective to maintain the frequency list small so that it can fit in primary memory. The frequency count saved for each item will be the true frequency of the item or minimize of it.

The essential factors in approximation algorithms is the approximation ratio (or error bound). Let’s look at the case where an item is removed. This appears when f +∆ ≤ b for an item, where b is the current bucket number.

It can understand that b ≤ N/w, that is, b ≤ εN. The real frequency of an item is at most f+∆. Therefore, an item can be minimize is εN. If the real support of this item is σ (this is the minimum support or lower bound for it to be treated frequent), therefore the actual frequency is σN, and the frequency, f, on the frequency list should be minimum (σN −εN).

Therefore, if we output all of the items in the frequency list having an f value of minimum (σN −εN), therefore some frequent items will be output. Moreover, some subfrequent items (with an actual frequency of minimum σN −εN but less than σN) will be output.

- Related Questions & Answers
- Explain the algorithm to check lossy or lossless decomposition
- How does the MD5 Algorithm works?
- How does the k-means algorithm work?
- How does Secure Hash Algorithm works?
- C# program to find the most frequent element
- Counting different distinct items in a single MySQL query?
- Efficient algorithm for grouping elements and counting duplicates in JavaScript
- Find the second most frequent element in array JavaScript
- Which algorithm does the JavaScript Array#sort() function use?
- Algorithm to get the combinations of all items in array JavaScript
- Lossless and Lossy Decomposition in DBMS
- Program to find minimum length of lossy Run-Length Encoding in Python
- Find Second most frequent character in array - JavaScript
- Find the k most frequent words from data set in Python
- Program to find frequency of the most frequent element in Python