# Describe your data, don't reshape it

Most forecasting APIs ask you to flatten your world. You take whatever lives in
your warehouse — stores, SKUs, regions, daily and weekly grains, gaps and all —
collapse it into a positional array, throw away the timestamps, and re-upload
the whole thing on every call:

```json
{ 
  "values": [142, 137, 145, 139, ..., 90, 84, 88], 
  "series_lengths": [730, 365], 
  "horizon": 28 
}
```

Every time-series is concatenated into that one array; `series_lengths` is all
that marks where one ends and the next begins. Your data has structure — and
it's the first thing this shape throws away, leaving you to rebuild it by hand,
in glue code, before every request.

Our beta API inverts that. **You describe your data; we do the reshaping.** You
hand us a schema that says what your columns *mean*, and a selector that says
which slice you want to forecast. Everything else — aggregating to the right
grain, filling gaps, rolling up a hierarchy, running a backtest across dozens of
dates — falls out of that one description, automatically handled by our
time-series engine.

## Describe a family once

A real dataset isn't one time-series. It's a *family* of them — every store,
every SKU, recorded at whatever grain the source happens to use. Here's the kind
of raw, long-format table that lands in your lake:

<DataTable
  data={{
    date: ["2024-01-01", "2024-01-01", "2024-01-01", "2024-01-02", "2024-01-02", "2024-01-02"],
    sku: ["SKU_1", "SKU_1", "SKU_2", "SKU_1", "SKU_1", "SKU_2"],
    color: ["black", "black", "red", "black", "black", "red"],
    product_family: ["shoes", "shoes", "shirts", "shoes", "shoes", "shirts"],
    store: ["S1", "S2", "S1", "S1", "S2", "S1"],
    city: ["Paris", "Lyon", "Paris", "Paris", "Lyon", "Paris"],
    sales: [100, 80, 60, 102, 78, 55],
  }}
  truncated
/>

You don't reshape it — you describe it. Assign each column a role, say how each
signal behaves when it's combined, and declare how your identifiers roll up:

```json
{
  "input": {
    "source": "s3://acme/retail/sales/*.parquet",
    "schema": {
      "date":           { "kind": "time", "frequency": "1d" },
      "sku":            "identifier",
      "color":          "identifier",
      "product_family": "identifier",
      "store":          "identifier",
      "city":           "identifier",
      "sales":          { "kind": "target", "aggregate": "sum", "impute": { "const": 0.0 } }
    },
    "hierarchies": {
      "sku":   [["sku", "color"], ["sku", "product_family"]],
      "store": [["store", "city"]]
    }
  },
  "selector": { "product_family": "any", "city": "any" },
  "prediction_length": "3w"
}
```

Read the `hierarchies` block as roll-up paths. These paths tell the API which
identifiers can be aggregated together. Individual SKUs can be grouped by color
or by product_family; individual stores can be grouped by city. The `selector`
is the dial: it picks which level you want to forecast. Click around and watch the
request change:

<SelectorExplorer
  source="s3://acme/retail/sales/*.parquet"
  hierarchies={{
    sku: [["sku", "color"], ["sku", "product_family"]],
    store: [["store", "city"]],
  }}
  initial={{ sku: "product_family", store: "city" }}
  values={{
    sku: ["SKU_1", "SKU_2", "SKU_3"],
    color: ["black", "red", "blue"],
    product_family: ["shoes", "shirts"],
    store: ["S1", "S2", "S3"],
    city: ["Paris", "Lyon"],
  }}
/>

Roll SKUs up to families, stores into cities, days into weeks or months — and
you never hand-roll that `GROUP BY` again. Missing days, mixed grains, ragged
history: you declared how to handle them once, in the schema above.

## Point at the data where it lives

Your data already lives somewhere. There's no reason to serialize megabytes into
a request body to forecast it. Point the API at the source:

```json
"source": "s3://acme/retail/sales/*.parquet"
```

Inline arrays, parquet on object storage, a glob across many files — the same
schema describes all of them. Your lake stays the source of truth, and more
connectors (warehouses and databases) are on the way.

## One query, every forecast — and every backtest

Because you described a *family*, a single query fans out into a forecast for
every time-series it contains — every family, every city, in one request.

Once the API understands your data's shape, backtesting is another parameter,
not another pipeline. A backtest is just that forecast re-run from many points
in time, so say so:

```json
{
  "input": { 
    "source": "s3://acme/retail/sales/*.parquet", 
    "schema": { 
      "date": { "kind": "time", "frequency": "1d" },
      // rest of the schema...
    }
  },
  "selector": { "product_family": "any", "city": "any" },
  "context": "90d",
  "prediction_length": "28d",
  "cutoff": [{ "every": "2w" }],
  "quantiles": [0.1, 0.5, 0.9]
}
```

`"cutoff": [{ "every": "2w" }]` re-runs the forecast from a cutoff every two
weeks, across every time-series, with 90 days of context each time. You get back
quantiles *and* accuracy metrics per cutoff — a full rolling-origin evaluation
from the request you were already going to send. Pass a list of explicit dates
instead, and you control exactly when each forecast is made.

Each cutoff reuses the same request shape: historical cutoffs become backtests,
and future-facing cutoffs become forecasts. Expand the controls to see how
`cutoff`, `lead_time`, `span`, and `prediction_length` carve up the timeline:

<BacktestExplorer source="s3://acme/retail/sales/*.parquet" />

## Forecast among peers

Much of the signal for forecasting one time-series comes from its neighbors. A
newly launched product has no history of its own — but its predecessors do. So
instead of forecasting every time-series in isolation, you can pair each target
with the peers that should inform it — a small map from one selector to another.

Forecasting `{"sku": ["phone10"]}{:json}`? Let it borrow from `{"sku": ["phone8", "phone9"]}{:json}`.
A fresh store leans on the established ones nearby; this season's launch on the
last two generations. Each target points at its own hand-picked peers, and the
model sees them alongside the target at request time — no fine-tuning, no
separate training job. The forecast simply borrows the patterns it needs.

## Describe once, forecast anything

That's the whole idea. Describe your data where it lives, once — its grains, its
hierarchy, how its signals combine — then reuse that same description on every
request, changing only what you ask for: a slice, a different granularity, a
longer horizon, a year of backtests, a forecast leaning on its neighbors. The
shape of the request barely changes; the leverage is enormous.

This API is in beta. Grab an [API key](/settings/api-keys), point it at the data
you already have, describe it once, and forecast any slice of it you can name.

Questions or feedback? Reach us at support@theforecastingcompany.com.
