Time Series - Naive Methods


Advertisements

Introduction

Naive Methods such as assuming the predicted value at time ‘t’ to be the actual value of the variable at time ‘t-1’ or rolling mean of series, are used to weigh how well do the statistical models and machine learning models can perform and emphasize their need.

In this chapter, let us try these models on one of the features of our time-series data.

First we shall see the mean of the ‘temperature’ feature of our data and the deviation around it. It is also useful to see maximum and minimum temperature values. We can use the functionalities of numpy library here.

// Code Snippet 6: Showing statistics

We have the statistics for all 9357 observations across equi-spaced timeline which are useful for us to understand the data.

Now we will try the first naïve method, setting the predicted value at present time equal to actual value at previous time and calculate the root mean squared error(RMSE) for it to quantify the performance of this method.

In [135]:

import numpy
print ('Mean: ',numpy.mean(df['T']), '; Standard Deviation: ',numpy.std(df['T']),'; \nMaximum Temperature: ',max(df['T']),'; Minimum Temperature: ',min(df['T']))

Mean: 9.778305012290264; Standard Deviation: 43.201314375146595;

Maximum Temperature: 44.6 ; Minimum Temperature: -200.0

// Code Snippet 7: Showing 1st naïve method

Let us see the next naïve method, where predicted value at present time is equated to the mean of the time periods preceding it. We will calculate the RMSE for this method too.

In [136]:

df['T']
df['T_t-1'] = df['T'].shift(1)

In [137]:

df_naive = df[['T','T_t-1']][1:]

In [138]:

from sklearn import metrics
from math import sqrt

true = df_naive['T']
prediction = df_naive['T_t-1']
error = sqrt(metrics.mean_squared_error(true,prediction))
print ('RMSE for Naive Method 1: ', error)

RMSE for Naive Method 1: 12.901140576492974

// Code Snippet 8: Showing 2nd naïve method

Here, you can experiment with various number of previous time periods also called ‘lags’ you want to consider, which is kept as 3 here. In this data it can be seen that as you increase the number of lags and error increases. If lag is kept 1, it becomes same as the naïve method used earlier.

In [139]:

df['T_rm'] = df['T'].rolling(3).mean().shift(1)
df_naive = df[['T','T_rm']].dropna()

In [140]:

true = df_naive['T']
prediction = df_naive['T_rm']
error = sqrt(metrics.mean_squared_error(true,prediction))
print ('RMSE for Naive Method 2: ', error)

RMSE for Naive Method 2: 14.957633272839242

Points to Note

  • You can write a very simple function for calculating root mean squared error. Here, we have used the mean squared error function from the package ‘sklearn’ and then taken its square root.

  • In pandas df[‘column_name’] can also be written as df.column_name, however for this dataset df.T will not work the same as df[‘T’] because df.T is the function for transposing a dataframe. So use only df[‘T’] or consider renaming this column before using the other syntax.

Advertisements