Python Prophet - Basics of Time Series



Time series forecasting is an important task in many fields like finance, economics, and supply chain management, where predicting future values from past data helps in decision-making. One popular method for time series forecasting is the Prophet library in Python.

The Prophet library is an open-source tool for forecasting time series data. Let's understand the basics of time series data and its key features.

What is a Time Series?

A time series is a set of data points collected one after another at regular intervals. Each observation has a timestamp, and the order in which data points are observed is important for analysis. Examples include daily stock prices, hourly temperature readings, monthly sales figures, or yearly GDP data.

The main feature of time series data is that values often depend on time and on previous observations.

Fundamental Components of Time Series

A time series can be broken down into five main components, each explaining a different aspect of the data −

  • Trend − The long-term movement of the series, showing the overall direction over time. Trends can be linear, nonlinear, or logistic, reflecting growth, decline, or saturation patterns.
  • Seasonality − Predictable patterns that repeat at regular intervals, such as yearly, weekly, or daily cycles. Seasonality is periodic and often tied to calendar or natural events.
  • Cyclical Patterns − Fluctuations that occur at irregular intervals, usually lasting longer than a year. Unlike seasonality, these patterns do not have fixed periods and often relate to economic or business cycles.
  • Irregular Component − Random variations that remain after accounting for trend, seasonality, and cyclical patterns. These represent unexpected events or measurement errors.
  • Autocorrelation − The relationship between a data point and its past values. This shows how past patterns can influence future values, helping to identify trends and improve forecasting.

Additive vs Multiplicative Models

Time series decomposition can be done using two main approaches, depending on how seasonal patterns behave.

Additive Model

(Y(t) = Trend + Seasonality + Irregular) − The additive model adds trend, seasonality, and irregular components. It is used when seasonal variations remain fairly consistent, regardless of the overall trend. This works well when the size of seasonal fluctuations does not change as the series rises or falls.

Multiplicative Model

(Y(t) = Trend x Seasonality x Irregular) − The multiplicative model multiplies trend, seasonality, and irregular components. It is used when seasonal variations change proportionally with the trend. This is suitable when seasonal fluctuations grow larger as the overall level of the series increases.

Time Series Frequency and Granularity

The frequency of data collection determines what patterns can be observed in a time series. The main types are −

  • High-frequency data (secondly, minutely) − Captures very short-term patterns and rapid fluctuations. Common in financial markets and sensor readings.
  • Hourly data − Reveals intra-day patterns and daily seasonality. Useful for operational monitoring and real-time analysis.
  • Daily data − Shows weekly and yearly seasonality. Common in business metrics, weather data, and web analytics.
  • Weekly data − Captures monthly and yearly patterns but cannot detect weekly seasonality.
  • Monthly data − Limited to yearly seasonality and long-term trends. Often used in economic indicators or aggregated business reports.
  • Quarterly/Yearly data − Focuses on long-term trends and multi-year cycles. Used in macroeconomic analysis and strategic planning.

Stationarity in Time Series

Stationarity is an important property of time series where statistical characteristics remain stable over time. The following points describe its types and why they matter.

  • Strict Stationarity − The overall probability distribution does not change even if the series is shifted in time.
  • Weak Stationarity − The mean, variance, and autocorrelation remain constant over time.
  • Non-Stationary Series − These series show trends, changing variance, or evolving seasonal patterns. Most real-world time series like stock prices or weather data are non-stationary.

Data Requirements for Time Series Analysis

The following points explain the main requirements for data in time series analysis.

  • There should be enough historical data to identify patterns. For example, weekly patterns need at least 8-12 weeks of data, and yearly patterns require 2-3 full years. Patterns should repeat several times to be meaningful.
  • Observations should be recorded at regular intervals. If the timing is irregular, the data may need some adjustment or filling in to make it ready for analysis.
  • A minimum of 50-100 observations is usually needed for meaningful analysis, although more may be required for complex cases.
  • The data should be clean and accurate. Missing values, outliers, or errors can affect the results, so preprocessing may be necessary before analysis

Handling Missing Data

Below are some common ways to handle missing data in time series −

  • Deletion − Remove observations with missing values. This works well only when missing data is minimal and random.
  • Forward Fill − Fill missing values with the last observed value. This is useful for slowly changing series.
  • Backward Fill − Fill missing values with the next observed value. This method is less common but can be useful in specific cases.
  • Interpolation − Estimate missing values based on surrounding data using linear, polynomial, or spline methods.
  • Model-based Imputation − Use forecasting or statistical models to predict missing values from observed patterns.

Characteristics of Good Time Series Data

The following are some important features of good time series data that help make analysis and forecasting reliable −

  • Adequate Length − The dataset should have enough observations to show patterns clearly.
  • Consistency − Measurement methods and definitions should stay the same throughout the series
  • Relevance − The data should accurately reflect what you are studying.
  • Timeliness − Recent data is usually more useful for forecasting than old data.
  • Granularity − The level of detail should match the purpose of the analysis.

Time Series vs. Cross-Sectional Data

Cross-sectional data shows information collected at a single point in time. For example, if you record the income of different people today, that's cross-sectional data because it represents a single moment.

Time series data, on the other hand, shows how something changes over time. For example, tracking a company's sales each month or the temperature every day forms time series data. In this type of data, the order of observations is important because each value is often influenced by the values that came before it.

Conclusion

In this chapter, you learned what time series data is and why it matters. We also saw the key concepts needed to work with it. Understanding these basics will help us see how forecasting works and why tools like the Prophet library are useful for making predictions from past data.

Advertisements