Run TimeGPT in a distributed manner using Dask.

Dask is an open-source parallel computing library for Python. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks.

Before proceeding, make sure you have an API key from Nixtla.

Highlights

• Simplify distributed computing with Fugue.
• Run TimeGPT at scale on a Dask cluster.
• Seamlessly convert pandas DataFrames to Dask.

Outline

  1. Installation
  2. Load Your Data
  3. Import Dask
  4. Use TimeGPT on Dask
1

Step 1: Installation

2

Step 2: Load Your Data

You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets:

Load Electricity Data
import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
    parse_dates=['ds'],
)
df.head()
Example pandas DataFrame:
3

Step 3: Import Dask

Convert the pandas DataFrame into a Dask DataFrame for parallel processing.

Convert to Dask DataFrame
import dask.dataframe as dd

dask_df = dd.from_pandas(df, npartitions=2)
dask_df

When converting to a Dask DataFrame, you can specify the number of partitions based on your data size or system resources.

4

Step 4: Use TimeGPT on Dask

To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla’s client methods instead of a pandas DataFrame.

Important Concept: NixtlaClient

Instantiate the NixtlaClient class to interact with Nixtla’s API.

Initialize NixtlaClient
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'
)

You can use any method from the NixtlaClient, such as forecast or cross_validation.

Forecast with TimeGPT and Dask
fcst_df = nixtla_client.forecast(dask_df, h=12)
fcst_df.compute().head()

TimeGPT with Dask also supports exogenous variables. Refer to the Exogenous Variables Tutorial for details. Substitute pandas DataFrames with Dask DataFrames as needed.