Computing at Scale Tutorial

Handling large datasets is a common challenge in time series forecasting. For example, when working with retail data, you may need to forecast sales for thousands of products across hundreds of stores. Similarly, when dealing with electricity consumption data, you may need to predict consumption for thousands of households across multiple regions.

Nixtla’s TimeGPT enables you to efficiently scale these operations by integrating several distributed computing frameworks. Currently, Spark, Dask, and Ray are supported through Fugue.

TimeGPT’s distributed capabilities help you handle expansive datasets by parallelizing your forecasts across multiple time series, drastically reducing computation times.

High-level overview of distributed time series forecasting with TimeGPT

Outline

1. Getting Started

To use TimeGPT in any scenario—distributed or not—you must first have your API key. Make sure you’ve registered and confirmed your signup email with Nixtla.

Upon registration, you will receive an email prompting you to confirm your signup. Once confirmed, you can access your dashboard. Navigate to the API Keys section to retrieve your key.

For detailed steps on connecting your API key to Nixtla’s SDK, see the Setting Up Your Authentication Key tutorial.

2. Forecasting at Scale

Using TimeGPT with distributed computing frameworks is straightforward. The process only slightly differs from non-distributed usage.

1. Instantiate a NixtlaClient class

NixtlaClient Instantiation
from nixtla import NixtlaClient

# Replace 'YOUR_API_KEY' with the key obtained from your Nixtla dashboard
client = NixtlaClient(api_key="YOUR_API_KEY")

2. Load your data into a pandas DataFrame

Make sure your data is properly formatted, with each time series uniquely identified (e.g., by store or product).

Loading Time Series Data
import pandas as pd

data = pd.read_csv("your_time_series_data.csv")

3. Initialize a distributed computing framework

Currently, TimeGPT supports:

Follow the links above for examples on setting up each framework.

4. Use NixtlaClient methods to forecast at scale

Once your framework is initialized and your data is loaded, you can apply the forecasting methods:

Forecasting Example with NixtlaClient
# Example function call within the distributed environment
forecast_results = client.forecast(
    data=data, 
    h=14     # horizon (e.g., 14 days)
)

5. Stop the distributed computing framework

When you’re finished, you may need to terminate your Spark, Dask, or Ray session. This depends on your environment and setup.

Parallelization in these frameworks operates across multiple time series within your dataset. Ensure each series is uniquely identified so the parallelization can be fully leveraged.

3. Important Considerations

When to Use a Distributed Computing Framework

Choosing the Right Framework

Key Concept: Time Series Forecasting at Scale

Distribute your forecasts across multiple compute nodes to handle huge datasets without clogging up memory or single-machine resources.

Key Concept: Parallelization

Make sure your data has distinct identifiers for each series. Correct labeling is crucial for successful multi-series parallel forecasts.

With these guidelines, you can efficiently forecast large-scale time series data using TimeGPT and the distributed computing framework that best fits your environment.

QUICK START

GETTING STARTED

CAPABILITIES

DEPLOYMENT

TUTORIALS

USE CASES

REFERENCE

About

Outline

1. Getting Started

2. Forecasting at Scale

3. Important Considerations

QUICK START

GETTING STARTED

CAPABILITIES

DEPLOYMENT

TUTORIALS

USE CASES

REFERENCE

About

​Outline

​1. Getting Started

​2. Forecasting at Scale

​3. Important Considerations

Outline

1. Getting Started

2. Forecasting at Scale

3. Important Considerations