Adding Categorical Variables
Learn how to incorporate categorical variables in your TimeGPT forecasts to improve accuracy.
Categorical Variables in TimeGPT
This tutorial demonstrates how to incorporate categorical (discrete) variables into TimeGPT forecasts. Categorical variables can capture external factors such as events or conditions that have a finite number of possible values, which can significantly improve forecasting accuracy.
What Are Categorical Variables?
Categorical variables are external factors that take on a limited range of discrete values, grouping observations by categories. For example, “Sporting” or “Cultural” events in a dataset describing product demand.
Why Use Categorical Variables?
By capturing unique external conditions, categorical variables enhance the predictive power of your model and can reduce forecasting error. They are easy to incorporate by merging each time series data point with its corresponding categorical data.
1. Import Packages
Install and Import Dependencies
Make sure you have the necessary libraries installed: pandas, nixtla, and datasetsforecast.
Initialize the Nixtla Client
Using an Azure AI endpoint
If you’re connecting through an Azure AI endpoint, remember to set base_url
:
2. Load M5 Data
We use the M5 dataset — a collection of daily product sales demands across 10 US stores — to showcase how categorical variables can improve forecasts.
Fetch, Parse, and Display the Data
First 10 rows of sales data (Y_df)
Focus on the Relevant Categorical Columns
Categorical events in X_df
Notice that there is a Sporting event on February 6, 2011, listed under event_type_1
.
3. Forecast Product Demand Using Categorical Variables
We’ll select a specific product to demonstrate how to incorporate categorical features into TimeGPT forecasts.
Select a High-Selling Product and Merge Data
One-Hot Encode Categorical Events
One-hot encoding transforms each category into a separate column containing binary indicators (0 or 1).
Prepare Data for Forecasting
4. Compare Forecasts: With and Without Categorical Variables
When using Azure AI, set model="azureai"
in your forecast call.
Forecast without categorical variables
When using Azure AI, set model="azureai"
in your forecast call.
Forecast without categorical variables
Forecast with categorical variables
5. Evaluate Forecast Accuracy
Finally, we calculate the Mean Absolute Error (MAE) for the forecasts with and without categorical variables:
unique_id | TimeGPT-without-cat-vars | TimeGPT-with-cat-vars |
---|---|---|
FOODS_3_090_CA_3 | 24.285649 | 20.028514 |
Mean Absolute Error comparison
Including categorical variables noticeably improves forecast accuracy, reducing MAE by about 20%.
Conclusion
Categorical variables are powerful additions to TimeGPT forecasts, helping capture valuable external factors. By properly encoding these variables and merging them with your time series, you can significantly enhance predictive performance.
Continue exploring more advanced techniques or different datasets to further improve your TimeGPT forecasting models.