Data Requirements
Overview of the data format and requirements for TimeGPT forecasting.
TimeGPT accepts pandas and polars dataframes in long format. The minimum required columns are:
Required Columns
- ds(timestamp): String or datetime in
YYYY-MM-DD
orYYYY-MM-DD HH:MM:SS
format. - y(numeric): Numerical target variable to forecast.
Optional Index
If a DataFrame lacks the ds
column but uses a DatetimeIndex, that is also supported.
TimeGPT also supports distributed dataframe libraries such as dask, spark, and ray.
You can include additional exogenous features in the same DataFrame. See the Exogenous Variables tutorial for details.
Example DataFrame
Below is a sample of a valid input DataFrame for TimeGPT (with columns named timestamp
and value
instead of ds
and y
):
timestamp | value | |
---|---|---|
0 | 1949-01-01 | 112 |
1 | 1949-02-01 | 118 |
2 | 1949-03-01 | 132 |
3 | 1949-04-01 | 129 |
4 | 1949-05-01 | 121 |
Sample Data Preview
In this example:
•timestamp
corresponds tods
.
•value
corresponds toy
.
Matching Columns to TimeGPT
You can choose how to align your DataFrame columns with TimeGPT’s expected structure:
Rename timestamp
to ds
and value
to y
:
Now your DataFrame has the explicitly required columns:
Rename timestamp
to ds
and value
to y
:
Now your DataFrame has the explicitly required columns:
Specify column names directly when calling NixtlaClient
:
This way, you don’t need to rename your DataFrame columns, as TimeGPT will know which ones to treat as ds
and y
.
Example Forecast
When you run the forecast method:
timestamp | TimeGPT | |
---|---|---|
0 | 1961-01-01 | 437.83792 |
1 | 1961-02-01 | 426.06270 |
2 | 1961-03-01 | 463.11655 |
3 | 1961-04-01 | 478.24450 |
4 | 1961-05-01 | 505.64648 |
Forecast Output Preview
TimeGPT attempts to automatically infer your data’s frequency (freq
). You can override this by specifying the freq parameter (e.g., freq='MS'
).
For more information, see the TimeGPT Quickstart.
Multiple Series
When forecasting multiple time series simultaneously, each series must include a unique identifier column called unique_id
:
unique_id | ds | y | |
---|---|---|---|
0 | BE | 2016-10-22 00:00:00 | 70.00 |
1 | BE | 2016-10-22 01:00:00 | 37.10 |
Multiple-Series Data Preview
Simply call:
TimeGPT will produce forecasts for all unique IDs in your DataFrame simultaneously.
Exogenous Variables
TimeGPT can use exogenous variables in your forecasts. If you have future values for these variables, provide them in a separate DataFrame.
Important Considerations
Warning: Data passed to TimeGPT must not contain missing values or time gaps.
To handle missing data, see Dealing with Missing Values in TimeGPT.
Minimum Data Requirements (Azure AI)
These are the minimum data sizes required for each frequency when using Azure AI:
When preparing your data, also consider:
Forecast horizon (h)
Number of future periods you want to predict.
Number of validation windows (n_windows)
How many times to test the model’s performance.
Gaps (step_size)
Periodic offset between validation windows during cross-validation.
This ensures you have enough data for both training and evaluation.