How to Calculate Percentiles For Monitoring Data?

Machine Learning Artificial Intelligence Data Structure and Algorithms

Introduction

Monitoring online systems, especially which are data intensive is extremely essential for a continuous health check, analyzing and detecting downtimes, and improving performance. The percentile?based method is a very efficient technique to gauge the behavior of such a system. Let's have a look at this method.

A General Refresher

What are percentiles and why are they useful?

In statistics, the value which indicates that below which a certain group of observations falls is called a percentile or centile. For example, for a student, if he/she has scored 90 percentile marks, it means that 90% of the students have scored less than him. Another example can be if the response time of an HTTP request is 90 percentile, it means that 90% of the response values lie below it.

The range of observations between the 25^th percentile and 75^th percentile is known as the interquartile range

The 25^th percentile is also known as the 1^st quartile, the 50^th as the 2^nd quartile, and 75^thas the 3^rd quartile.

Percentiles are very useful when we want to know where a value lies relative to other observations. This can be achieved using a distribution graph of values. There are various statical terms like mean, median, and mode associated with it.

The formula for calculating percentile can be given as

$$\mathrm{n\:=\:\frac{p}{100}\:x\: ?}$$

Where P = percentile, N = number of values sorted in ascending order in the dataset, and n = fixed ordinal number.

Monitoring Data Intensive systems - Calculating percentiles

In monitoring tasks we mainly use percentiles. Other methods like the average method are highly influenced by outliers. In online systems, collectors are used to gathering data and calculate quantiles for data.

A Common Approach

In the case of HTTP request monitoring, the request cycle can be divided into quantiles. A particular quantile say ( ?50 ) can say to a random value that cannot exceed 50% probability. Let's suppose the data stream of HTTP requests contains n elements then we need to find an element with ? ? ? elements which can be huge as 1GB in size.

A solution to this is to calculate approximate quantiles for the data stream. In this approach, the whole data stream is compressed into a set of segments. Each segment has a fixed width ( ? ) and length of each segment ( l )

Percentiles for live capture data ?

For example, let's say we want to store 1000 values in memory at a particular moment in time.

Let's pick the size k = 100 and also a minimum width(resolution) of 1 ms.

The first bins of values lie between 0 to 1ms ( w = 1ms)

And,

Second bin ? 1 to 3ms (width = 2 ms)

Third bin ? 3 to 7 ms ( width = 4 ms)

till 10^th bin ? 511 to 1023 ms ( width = 512 ms)

Calculation

Create bins for our response times ( eg. 0 to 100ms, 100 ms to 200 ms, 200 ms to 400 ms ...)
Count how many responses are there and the number of responses in each bin.
Sum bin counters until the sum exceeds n percent of the total to calculate the nth percentile.

PSEDO Code snippet in Python

Example


def increment(millis):
   i = index(millis)
   if i < len(_limits):
      _counts[i] += 1
  
   _total+=1;

def estimate_percentile(percentile):
   if percentile < 0.0 or percentile > 100.0 :
      print("percentile must be between 0.0 and 100.0, was " + percentile)
      return "Error"
    
   if percentile - p.get_percentage() <= 0.0001): 
      return get_limit()

Conclusion

Performance monitoring and health check is the key to every data?intensive application today. Percentile?based approaches have been fruitful in this area and have proven to be useful tools in the current scenario

Mithilesh Pradhan

Updated on: 2022-12-30T12:32:47+05:30

911 Views

Kickstart Your Career

Get certified by completing the course

Get Started