Python Prophet - Getting Started



Prophet is a Python library for time series forecasting that automatically handles trends, seasonality, and missing data. It covers the complete workflow for getting started, including preparing data, fitting the model, generating forecasts, visualizing results, and saving predictions.

Understanding the Prophet Workflow

The Prophet model is designed for easy time series forecasting. It automatically handles trends, seasonality, and missing data with minimal configuration. The forecasting process is simple and follows these steps −

  • The first step is to prepare the time series data in the required format.
  • The second step is to initialize the Prophet model.
  • The third step is to fit the model to the historical data.
  • The fourth step is to create a future dataframe to define the forecast horizon.
  • The fifth step is to generate the forecasts.
  • And the final step is to visualize and interpret the results.

Prophet's Required Data Format

Prophet requires the dataset to be in a structured format with exactly two columns

  • ds − This is the date or timestamp column (stands for "datestamp").
  • Y − This is the numeric value to forecast.

The column names must be exactly "ds" and "y". Prophet will not work with columns named "date" or "value".

The ds column can accept various date formats like '2020-01-01', '2020-01-01 14:30:00', or Unix timestamps. The y column contains numeric values for forecasting, such as sales numbers, website visits, or temperatures.

The timestamps must be valid and in chronological order. Missing dates are allowed, and Prophet handles gaps automatically.

Here's an example of the required structure −

ds y
2021-01-01 123
2021-01-02 150
2021-01-03 98

Creating the First Dataset in Python

To get started with Prophet, the first step is to create a dataset in the required format. In this example, daily sales for a full year are created, including trend, weekly seasonality, and random noise for realism.

First, import the necessary libraries −

import pandas as pd
import numpy as np

Here, pandas is used to handle data, and numpy is used for numerical operations such as generating random numbers.

Next, generate a date range for the entire year 2023

dates = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')

This generates a list of dates from January 1 to December 31, 2023, with daily intervals.

Now, create the sales data.

np.random.seed(42)
trend = np.linspace(100, 150, len(dates))
seasonality = 20 * np.sin(np.arange(len(dates)) * 2 * np.pi / 7)
noise = np.random.normal(0, 5, len(dates))
sales = trend + seasonality + noise

Here, trend represents a gradual increase in sales, seasonality is the repeating weekly pattern in sales and noise is the random variation added to the data.

Finally, create a DataFrame with the dates and sales data. The ds column will hold the dates, and the y column will hold the sales values.

df = pd.DataFrame({'ds': dates, 'y': sales})
print(df.head())

Following is the output after running the code.

          ds           y
0 2023-01-01  102.483571
1 2023-01-02  115.082671
2 2023-01-03  123.011726
3 2023-01-04  116.704912
4 2023-01-05   90.701009

Initializing and Fitting the Prophet Model

Now that the dataset is prepared, the next step is to create a Prophet model and fit it to the data. First, import Prophet

from prophet import Prophet

Then, create the Prophet model −

model = Prophet()
If an error occurs, uninstall the old version by running pip uninstall prophet and reinstall the new version by running pip install prophet.

Next, fit the model to the dataset that created earlier −

model.fit(df)

Creating Future Dates for Forecasting

Before making predictions, Prophet needs the information on how far into the future to forecast. A list of future dates is created using the make_future_dataframe function. For example, to forecast 90 days into the future, use the following −

future = model.make_future_dataframe(periods=90)
print(future.tail())

Generating Predictions

Once the future dates are ready, use the predict method to generate forecasts −

forecast = model.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())

This gives the predicted values (yhat) along with their lower and upper bounds (yhat_lower and yhat_upper).

After running the command, You will see the following output

            ds        yhat  yhat_lower  yhat_upper
450 2024-03-26  180.776435  174.862981  186.764119
451 2024-03-27  171.161638  165.014731  176.792737
452 2024-03-28  153.311795  147.210514  159.180881
453 2024-03-29  143.098987  137.143426  149.134078
454 2024-03-30  147.461709  141.532230  153.979530

Visualizing the Forecast

To visualize the forecast, Prophet has a built-in plotting function. First, import matplotlib to create the plot. Then, use the following code to generate the visualization.

import matplotlib.pyplot as plt

fig = model.plot(forecast)
plt.title('Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

This plot shows the historical data alongside the forecasted values. The shaded area represents the range in which future values are likely to fall, reflecting the model's uncertainty in its predictions.

Visualizing the Forecast

Visualizing Forecast Components

Prophet breaks the forecast into trend and seasonal components, making it easy to see how each part affects the predictions. The plot_components function of the model is used to create the graph, and plt.show() displays it.

fig2 = model.plot_components(forecast)
plt.show()

This creates separate plots showing trend which is the overall direction (upward, downward, or flat) and the weekly pattern (which days are higher or lower). Below is the output graph.

Below is the graph showing the long-term trend and seasonal cycles, such as weekly or yearly patterns. graph

Visualizing Forecast Components

Filtering Only Future Predictions

The forecast can be filtered to keep only future dates beyond the historical data, showing predicted values along with their lower and upper bounds.

future_predictions = forecast[forecast['ds'] > df['ds'].max()]
print(future_predictions[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].head())

The output gives a clear view of upcoming predictions without including past data.

            ds        yhat  yhat_lower  yhat_upper
365 2024-01-01  166.784287  160.628995  172.777860
366 2024-01-02  169.256622  163.276815  175.819721
367 2024-01-03  159.641825  153.648941  165.565299
368 2024-01-04  141.791982  135.316069  147.882443
369 2024-01-05  131.579174  126.007299  137.371299

Saving the Forecast

The forecast can be saved to a CSV file using the to_csv() function. It can be saved as a full CSV containing all data or as a CSV containing only the date, predicted values, and their lower and upper uncertainty bounds.

# Save the full forecast with all columns
forecast.to_csv('sales_forecast.csv', index=False)

# Save only the key columns: date, predicted value, and uncertainty bounds
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].to_csv('sales_forecast_simple.csv', index=False)

Conclusion

In this chapter, we learned how to use Prophet for accurate time series forecasting. By preparing data, fitting the model, making predictions, and looking at trends and patterns, it is easy to understand the data better. Prophet makes forecasting simple and easy to use.

Advertisements