What is trend analysis?

Data MiningDatabaseData Structure

Trend analysis defines the techniques for extracting a model of behavior in a time series that can be slightly or entirely hidden by noise. The methods of trend analysis have been generally used in detecting outbreaks and unexpected increases or decreases in disease appearances, monitoring the trends of diseases, evaluating the effectiveness of disease control programs and policies, and assessing the success of health care programs and policies, etc.

Various techniques can be used to detect trends in item series. Smoothing is an approach that is used to remove the non-systematic behaviors found in time series. Smoothing usually takes the form of finding moving averages of attribute values, given a window in time around a particular time point.

The local average of all attribute values is used instead of the specific value found at this point. Median value as opposed to mean value normally is used because it is less sensitive to outliers. Smoothing can filter out noise and outliers. It can be used to predict future values because the resulting data are easier to fit a known function (linear, logarithmic, exponential, etc.)

Detecting seasonal patterns in time series data is more difficult. One method is to find correlations between attributes at evenly distributed intervals. For example, a correlation may be found between every twelfth value (in monthly sales data). The time difference between the related items is referred to as the lag.

Autocorrelation functions can be generated to determine the correlations between data values at different lag intervals. A correlogram graphically displays the autocorrelation values for several lag values.

The covariance measures how two variables change together. It can be used as the basis for determining the relationship between either two time series or seasonal trends in one time-series. An autocorrelation coefficient, rk measures the correlations between time-series values a certain distance, lag k, apart.

Several approaches have been used for autocorrelation. Positive values indicate that both variables increase together, while negative values indicate that as one increases the other decreases.

A value close to zero indicates that there is little correlation between the two variables. One typical formula to calculate correlation is the correlation coefficient r, sometimes known as Pearson's r.

Given two time series, X and Y, with means X'and Y', each with n elements, the formula for r is

$$\mathrm{\frac{\sum(x_i-X')(y_i-Y')}{\sqrt{\sum(x_i-X)^2(y_i-Y')^2}}}$$

It is applying this to find the correlation coefficient with lag of k, rk, on a time series X=(x1,x2,…xn) is straightforward. The first time series is X′=(x1,x2,…xn−k), while the second time series is X''=(xk+1,xk+1,…xn).

raja
Updated on 16-Feb-2022 06:26:57

Advertisements