Categorical Variables in TimeGPT

This tutorial demonstrates how to incorporate categorical (discrete) variables into TimeGPT forecasts. Categorical variables can capture external factors such as events or conditions that have a finite number of possible values, which can significantly improve forecasting accuracy.

What Are Categorical Variables?

Categorical variables are external factors that take on a limited range of discrete values, grouping observations by categories. For example, “Sporting” or “Cultural” events in a dataset describing product demand.

Why Use Categorical Variables?

By capturing unique external conditions, categorical variables enhance the predictive power of your model and can reduce forecasting error. They are easy to incorporate by merging each time series data point with its corresponding categorical data.

Example Usage

1. Import Packages

Install and Import Dependencies

Make sure you have the necessary libraries installed: pandas, nixtla, and datasetsforecast.

Import Dependencies
import pandas as pd
import os

from nixtla import NixtlaClient
from datasetsforecast.m5 import M5

Initialize the Nixtla Client

Initialize NixtlaClient
nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'
)

Using an Azure AI endpoint
If you’re connecting through an Azure AI endpoint, remember to set base_url:

Azure AI Endpoint Configuration
nixtla_client = NixtlaClient(
    base_url="your azure ai endpoint",
    api_key="your api_key"
)

2. Load M5 Data

We use the M5 dataset — a collection of daily product sales demands across 10 US stores — to showcase how categorical variables can improve forecasts.

Fetch, Parse, and Display the Data

Load M5 Data
Y_df, X_df, _ = M5.load(directory=os.getcwd())

Y_df['ds'] = pd.to_datetime(Y_df['ds'])
X_df['ds'] = pd.to_datetime(X_df['ds'])

Y_df.head(10)

First 10 rows of sales data (Y_df)

Focus on the Relevant Categorical Columns

Extract Categorical Columns
X_df = X_df[['unique_id', 'ds', 'event_type_1']]
X_df.head(10)

Categorical events in X_df

Notice that there is a Sporting event on February 6, 2011, listed under event_type_1.

3. Forecast Product Demand Using Categorical Variables

We’ll select a specific product to demonstrate how to incorporate categorical features into TimeGPT forecasts.

Select a High-Selling Product and Merge Data

Select and Merge Product Data
product = 'FOODS_3_090_CA_3'

Y_df_product = Y_df.query('unique_id == @product')
X_df_product = X_df.query('unique_id == @product')

df = Y_df_product.merge(X_df_product)
df.head(10)

One-Hot Encode Categorical Events

One-hot encoding transforms each category into a separate column containing binary indicators (0 or 1).

One-Hot Encoding
event_type_1_ohe = pd.get_dummies(df['event_type_1'], dtype=int)

df = pd.concat([df, event_type_1_ohe], axis=1)
df = df.drop(columns=['event_type_1'])

df.tail(10)

Prepare Data for Forecasting

Prepare Forecast Data
# Prepare future external data for Feb 1-7, 2016
future_ex_vars_df = df.drop(columns=['y']).query("ds >= '2016-02-01' & ds <= '2016-02-07'")
future_ex_vars_df.head(10)

# Separate training data before Feb 1, 2016
df_train = df.query("ds < '2016-02-01'")
df_train.tail(10)

4. Compare Forecasts: With and Without Categorical Variables

Forecast Without Categorical Variables
timegpt_fcst_without_cat_vars_df = nixtla_client.forecast(
    df=df_train,
    h=7,
    level=[80, 90]
)

timegpt_fcst_without_cat_vars_df.head()

When using Azure AI, set model="azureai" in your forecast call.

Visualize Forecast Without Categorical Variables
# Visualize the forecast without categorical variables
nixtla_client.plot(
    df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
    timegpt_fcst_without_cat_vars_df,
    max_insample_length=28,
)

Forecast without categorical variables

Forecast Without Categorical Variables
timegpt_fcst_without_cat_vars_df = nixtla_client.forecast(
    df=df_train,
    h=7,
    level=[80, 90]
)

timegpt_fcst_without_cat_vars_df.head()

When using Azure AI, set model="azureai" in your forecast call.

Visualize Forecast Without Categorical Variables
# Visualize the forecast without categorical variables
nixtla_client.plot(
    df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
    timegpt_fcst_without_cat_vars_df,
    max_insample_length=28,
)

Forecast without categorical variables

Forecast With Categorical Variables
timegpt_fcst_with_cat_vars_df = nixtla_client.forecast(
    df=df_train,
    X_df=future_ex_vars_df,
    h=7,
    level=[80, 90]
)

timegpt_fcst_with_cat_vars_df.head()

Visualize Forecast With Categorical Variables
# Visualize the forecast with categorical variables
nixtla_client.plot(
    df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
    timegpt_fcst_with_cat_vars_df,
    max_insample_length=28,
)

Forecast with categorical variables

5. Evaluate Forecast Accuracy

Finally, we calculate the Mean Absolute Error (MAE) for the forecasts with and without categorical variables:

Calculate MAE for Forecasts
from utilsforecast.losses import mae

df_target = df[['unique_id', 'ds', 'y']].query("ds >= '2016-02-01' & ds <= '2016-02-07'")

df_target = df_target.merge(
    timegpt_fcst_without_cat_vars_df[['unique_id', 'ds', 'TimeGPT']].rename(
        columns={'TimeGPT': 'TimeGPT-without-cat-vars'}
    ),
    on=['unique_id', 'ds']
)

df_target = df_target.merge(
    timegpt_fcst_with_cat_vars_df[['unique_id', 'ds', 'TimeGPT']].rename(
        columns={'TimeGPT': 'TimeGPT-with-cat-vars'}
    ),
    on=['unique_id', 'ds']
)

mean_absolute_errors = mae(df_target, ['TimeGPT-without-cat-vars', 'TimeGPT-with-cat-vars'])
mean_absolute_errors

unique_id	TimeGPT-without-cat-vars	TimeGPT-with-cat-vars
FOODS_3_090_CA_3	24.285649	20.028514

Mean Absolute Error comparison

Including categorical variables noticeably improves forecast accuracy, reducing MAE by about 20%.

Conclusion

Categorical variables are powerful additions to TimeGPT forecasts, helping capture valuable external factors. By properly encoding these variables and merging them with your time series, you can significantly enhance predictive performance.

Continue exploring more advanced techniques or different datasets to further improve your TimeGPT forecasting models.

QUICK START

GETTING STARTED

CAPABILITIES

DEPLOYMENT

TUTORIALS

USE CASES

REFERENCE

About

Adding Categorical Variables

Categorical Variables in TimeGPT

What Are Categorical Variables?

Why Use Categorical Variables?

1. Import Packages

2. Load M5 Data

3. Forecast Product Demand Using Categorical Variables

4. Compare Forecasts: With and Without Categorical Variables

5. Evaluate Forecast Accuracy

Conclusion

QUICK START

GETTING STARTED

CAPABILITIES

DEPLOYMENT

TUTORIALS

USE CASES

REFERENCE

About

​Categorical Variables in TimeGPT

What Are Categorical Variables?

Why Use Categorical Variables?

​1. Import Packages

​2. Load M5 Data

​3. Forecast Product Demand Using Categorical Variables

​4. Compare Forecasts: With and Without Categorical Variables

​5. Evaluate Forecast Accuracy

​Conclusion

Categorical Variables in TimeGPT

1. Import Packages

2. Load M5 Data

3. Forecast Product Demand Using Categorical Variables

4. Compare Forecasts: With and Without Categorical Variables

5. Evaluate Forecast Accuracy

Conclusion