Time Series Are Not Tables
By Geoff ·
Open any data science textbook and you'll find time series data stored in CSV files, loaded into DataFrames, and treated as rows in a table. This seems natural at first glance — after all, a time series is just a list of (timestamp, value) pairs, right?
Wrong. And this misunderstanding is costing organizations real money. Hedge funds and Amazon know this already: time series are not tables.
The table illusion
The recent wave of tabular foundation models — models like TabPFN or TabICL that are pretrained on diverse tabular datasets and can perform classification and regression zero-shot — have been remarkably useful for general-purpose prediction tasks. People have long used tree-based regressors like Catboost or LightGBM for time-series forecasts, so it seemed reasonable to use the new shiny tabular models on the block for this too, right?
On the surface, it works. You create (a lot of) columns for lag features, rolling averages, calendar variables, and feed the table to a model. Gradient boosted trees dominated the M5 forecasting competition (Walmart, 2020) using exactly this approach. More recently, TabPFN-TS reformulates forecasting as tabular regression and achieves strong results on covariate-informed benchmarks — with zero time-series-specific pretraining.
This surface-level compatibility hides deep structural differences. Tabular foundation models treat rows as exchangeable (the independent, and identically distributed assumption of tabular data), which means it cannot natively encode the arrow of time. It fails to extrapolate simple linear trends. Its confidence intervals don't account for temporal autocorrelation. It doesn't understand regime changes. You also cannot batch different time series without mixing information, which makes predictions very slow across the large volumes that are typical in forecasting setups.
The cleanest way to feel the difference is to look at the same data both ways, and then shuffle the rows.
- 2026-Jan-0120.00
- 2026-Jan-0221.78
- 2026-Jan-0327.69
- 2026-Jan-0433.70
- 2026-Jan-0534.51
- 2026-Jan-0632.38
- 2026-Jan-0732.05
- 2026-Jan-0832.70
A table you sort by timestamp looks identical to a table you sort by anything else. A trajectory does not. Everything below follows from that asymmetry.
What makes time series special
1. Time is not just another column
In a table, all columns are created equal. But in time series data, the time dimension is structurally different from all others. It implies ordering, local correlation in time, and correlation across simultaneous events.
Tables have rows. Time series are trajectories.
2. Time is irregular by convention
If nothing about a business changes, should you expect the same monthly revenue in February and March? No — February has 28 days, March has 31. That's a 10% difference. Holidays shift between weekdays and weekends. Daylight saving time means some days have 23 hours and others have 25 (and this depends on where you are!).
3. Two kinds of temporal data
Physics distinguishes between extensive quantities (volumes) — revenue, transactions, production output — and intensive quantities (rates / levels) — temperature, pressure, exchange rates.
Missing data means completely different things for each type:
- A missing day of revenue likely means zero (the store was closed). You add it as zero, or you don't add anything at all when aggregating.
- A missing day of temperature does not mean it was 0°C. You interpolate. You never sum.
A tabular schema with a single "value" column can't carry this distinction. A temporal type system can.
4. The hierarchy problem
Most real-world forecasting happens in hierarchies. A retailer needs forecasts at the level of SKU × store × day, but also at the level of category × region × week, and at the level of total revenue × country × month. Every level matters: SKU-day for replenishment, category-week for buying, total-month for finance.
The catch: these forecasts must be coherent: correlations must be accounted for. If the SKU forecasts sum to one number and the category forecast says something different, your operations team and your CFO are looking at incompatible plans.
You cannot simply add quantiles. The P90 of a sum is not the sum of the P90s. We've studied this problem formally — the CLOVER method (co-developed when I was at Amazon Forecasting Science) achieves probabilistic coherence by construction.
5. Three kinds of input, not one
Time series forecasting has three distinct types of input:
- Historical variables — the past of the target series and other observed series.
- Future-known variables — holidays, scheduled promotions, store openings, official calendars.
- Static variables — product descriptions, store coordinates, category metadata.
t₀ — everything to its right is unknown. Future-known variables continue across the boundary because we already know them on both sides. Static features aren't on the time axis at all — they're metadata attached to the series (a product SKU, a store location, a region). A table treats all three as identical columns; a temporal system reasons about where each one lives in time.Tables flatten all three into identical columns, losing critical semantic distinctions. The model has no way to know that "is_holiday" extends into the future but "yesterday's revenue" does not. Either you leak future information into training, or you starve the model of legitimately knowable context. Both are common; both are silently wrong.
6. The two-dates problem
Every observation has two temporal coordinates: the event date and the information date. A hotel booking made March 1 for April 15 is information from March 1 about demand on April 15.
This leads to the concept of the information cutoff date — and it can be different for different variables. Sales arrive overnight; web traffic arrives in seconds; macroeconomic indicators arrive months late and get revised.
7. Temporal joins are hard
Joining weather data with restaurant traffic looks like a one-line operation. It is not. You need to decide which hours matter (lunch vs. dinner peaks differ by city), what geographic resolution to align (airport-level weather, store-level traffic), whether to lag or lead (a forecasted afternoon thunderstorm shifts lunch one way, an observed one shifts it the other), and how to handle different reporting cadences.
A table model treats all features as contemporaneous. A temporal system reasons about alignment: which value of weather was knowable, in which geography, at the moment a customer decided whether to walk to the restaurant.
8. Data isn't just missing — it's corrupted in structured ways
Censored data is everywhere. Demand capped by inventory. Restaurants turning away guests when full. Call centers dropping calls. Energy demand throttled by the grid.
Training on censored data as if it were ground truth creates a vicious cycle of underforecasting: the model learns the cap, the planner under-orders, demand gets censored harder, the model learns a lower cap.
Building a temporal data layer
These eight problems are not edge cases. They are the everyday substrate of operational forecasting. Solving them in glue code, dataset by dataset, is how forecasting teams end up with thousands of bespoke pipelines that nobody trusts.
Our temporal engine, Tolars, addresses them in a single layer. It is written in Rust with Apache Arrow integration.
Stay tuned for more.
Why all of this points to decoder models
Everything described above leads to an architectural conclusion: time series models should be causal decoders, not encoders.
A causal decoder can only look backward — the correct inductive bias for temporal data. It makes backtesting at scale possible via teacher forcing, because every position in the sequence is simultaneously a training example and a held-out evaluation point. And it makes information leakage from the future structurally impossible, not just a thing you remember to check for.
Encoder-style models — and tabular models, which are encoders by another name — have no notion of "past" baked into their architecture. They rely on the practitioner to mask carefully, to engineer features without peeking, to validate on the right splits. Sometimes the practitioner gets it right. More often, time leaks in quietly and the offline metrics look great until production starts forecasting tomorrow.
That's why our foundation model — and the temporal engine underneath it — are built decoder-first, on top of a data layer that takes time seriously. The next post in this series digs into what's inside that model, and how it got there. More coming in part 3.
