Learn AI Series (#28) - Time Series Fundamentals - When Order Matters
Learn AI Series (#28) - Time Series Fundamentals - When Order Matters

What will I learn
- You will learn why time series data is fundamentally different from the tabular data we've used throughout this series -- shuffling rows destroys the signal;
- autocorrelation -- how past values predict future values, and why this is THE defining property of time series;
- partial autocorrelation (PACF) -- separating direct from indirect influence;
- stationarity -- the assumption most time series methods require, and how to test for it;
- trend, seasonality, and residuals: decomposing a time series into its components;
- moving averages and exponential smoothing -- the simplest forecasting models and why they're hard to beat;
- differencing -- making non-stationary data stationary so models can work with it;
- the Augmented Dickey-Fuller test for formal stationarity testing;
- the golden rules of time series that will guide everything we build from here.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Python 3(.11+) distribution;
- The ambition to learn AI and machine learning.
Difficulty
- Beginner
Curriculum (of the Learn AI Series):
- Learn AI Series (#1) - What Machine Learning Actually Is
- Learn AI Series (#2) - Setting Up Your AI Workbench - Python and NumPy
- Learn AI Series (#3) - Your Data Is Just Numbers - How Machines See the World
- Learn AI Series (#4) - Your First Prediction - No Math, Just Intuition
- Learn AI Series (#5) - Patterns in Data - What "Learning" Actually Looks Like
- Learn AI Series (#6) - From Intuition to Math - Why We Need Formulas
- Learn AI Series (#7) - The Training Loop - See It Work Step by Step
- Learn AI Series (#8) - The Math You Actually Need (Part 1) - Linear Algebra
- Learn AI Series (#9) - The Math You Actually Need (Part 2) - Calculus and Probability
- Learn AI Series (#10) - Your First ML Model - Linear Regression From Scratch
- Learn AI Series (#11) - Making Linear Regression Real
- Learn AI Series (#12) - Classification - Logistic Regression From Scratch
- Learn AI Series (#13) - Evaluation - How to Know If Your Model Actually Works
- Learn AI Series (#14) - Data Preparation - The 80% Nobody Talks About
- Learn AI Series (#15) - Feature Engineering and Selection
- Learn AI Series (#16) - Scikit-Learn - The Standard Library of ML
- Learn AI Series (#17) - Decision Trees - How Machines Make Decisions
- Learn AI Series (#18) - Random Forests - Wisdom of Crowds
- Learn AI Series (#19) - Gradient Boosting - The Kaggle Champion
- Learn AI Series (#20) - Support Vector Machines - Drawing the Perfect Boundary
- Learn AI Series (#21) - Mini Project - Predicting Crypto Market Regimes
- Learn AI Series (#22) - K-Means Clustering - Finding Groups
- Learn AI Series (#23) - Advanced Clustering - Beyond K-Means
- Learn AI Series (#24) - Dimensionality Reduction - PCA
- Learn AI Series (#25) - Advanced Dimensionality Reduction - t-SNE and UMAP
- Learn AI Series (#26) - Anomaly Detection - Finding What Doesn't Belong
- Learn AI Series (#27) - Recommendation Systems - "Users Like You Also Liked..."
- Learn AI Series (#28) - Time Series Fundamentals - When Order Matters (this post)
Learn AI Series (#28) - Time Series Fundamentals - When Order Matters
At the end of episode #27 I mentioned that all the data we'd been working with throughout this series -- from the tabular datasets in episodes #10-11, through the clustering work in #22-23, all the way to the recommendation matrices in #27 -- shared one crucial property: the order of the rows didn't matter. Shuffle the rows in a user-item rating matrix and you get the same recommendations. Shuffle the rows in a classification dataset and you get the same decision boundaries. Each row was an independent observation, and we've been exploiting that independence assumption in everything from train/test splits to cross-validation.
Today that assumption dies.
Time series data is fundamentally different. Shuffle the rows and you destroy the data. Today's temperature is related to yesterday's temperature. This month's revenue depends on last month's. The value of a stock at 14:30 is connected to its value at 14:29, and at 14:28, and at 13:00 this morning, and at close yesterday, and maybe even to the same day last year. The serial dependence between observations is not noise to be removed -- it IS the signal. It's the whole point.
We actually got a taste of this back in episode #21 when we built the crypto regime predictor. Remember how we discovered that random train/test splits don't work for time-ordered data? We had to train on past data and test on future data -- the walk-forward validation approach. That was a preview of a much deeper principle. Today we dig into the foundations: the properties that make time series data tick, the tools for understanding them, and the assumptions that every time series model (whether classical statistics or bleeding-edge ML) either respects or violates at its peril.
Let's dive right in.
Autocorrelation: the past predicts the future
The defining property of time series data is autocorrelation -- the correlation of a signal with a delayed copy of itself. If today's stock price is correlated with yesterday's, the series has autocorrelation at lag 1. If it's also correlated with the price from a week ago, it has autocorrelation at lag 7. If there's a monthly pattern, you'll see autocorrelation peak at lag 30.
This is the signal that time series models exploit. No autocorrelation means no temporal structure, and if there's no temporal structure then you might as well treat the data as independent rows (and use all the tabular ML tools from episodes #10-20). Autocorrelation is what makes time series special, and understanding it is the first step to working with temporal data.
Let's build a synthetic time series with known structure so we can see exactly what autocorrelation reveals:
import numpy as np
import pandas as pd
np.random.seed(42)
# Build a time series with three known components:
# 1. Upward trend (100 -> 150 over a year)
# 2. Monthly seasonality (30-day cycle)
# 3. Random noise
n = 365
trend = np.linspace(100, 150, n)
seasonality = 10 * np.sin(np.arange(n) * 2 * np.pi / 30)
noise = np.random.randn(n) * 3
ts = trend + seasonality + noise
dates = pd.date_range('2024-01-01', periods=n, freq='D')
series = pd.Series(ts, index=dates, name='value')
# Compute autocorrelation at different lags
print("Autocorrelation by lag:")
for lag in [1, 7, 14, 15, 30, 60, 90]:
ac = series.autocorr(lag=lag)
bar = "#" * int(abs(ac) * 40)
print(f" Lag {lag:>3d}: {ac:>7.3f} {bar}")
Look at the output carefully. Lag 1 will show very high autocorrelation (probably 0.95+) because consecutive days are extremely similar -- the trend and seasonality barely change from one day to the next. Lag 15 will show noticeably lower autocorrelation -- that's half a seasonal cycle, so the seasonal component is working against you (you're comparing peaks to troughs). Lag 30 will bounce back up because that's a full seasonal cycle -- peaks align with peaks again. And as the lag increases beyond 60-90, autocorrelation gradually decays because the trend makes distant values less similar.
This pattern -- the shape of autocorrelation across lags -- is called the autocorrelation function (ACF), and it's the single most informative diagnostic tool for time series analysis. An ACF that decays slowly suggests a strong trend. An ACF with periodic spikes reveals seasonality. An ACF that drops to near-zero after lag 1 suggests a random walk. Before you build ANY time series model, plot the ACF. It tells you what kind of structure your data has ;-)
Partial autocorrelation: cutting through the chain
There's a subtlety to autocorrelation that trips people up. If lag 1 autocorrelation is 0.95, then lag 2 autocorrelation will be high too -- but maybe NOT because there's a direct relationship between today and two days ago. Maybe the only real relationship is day-to-day (lag 1), and the apparent lag-2 autocorrelation is just a chain effect: today correlates with yesterday, yesterday correlates with the day before, so today correlates with two days ago through the chain.
The partial autocorrelation function (PACF) strips out these indirect effects. PACF at lag K measures the correlation between a value and its value K steps ago, AFTER removing the effects of all intermediate lags. It answers: "does knowing the value K steps back give me any NEW information beyond what lags 1 through K-1 already told me?"
from statsmodels.tsa.stattools import pacf
# Compute PACF for our series
pacf_values = pacf(ts, nlags=40, method='ywm')
print("Partial Autocorrelation (PACF) by lag:")
for lag in [1, 2, 3, 5, 10, 15, 30]:
bar = "#" * int(abs(pacf_values[lag]) * 40)
print(f" Lag {lag:>3d}: {pacf_values[lag]:>7.3f} {bar}")
PACF is critical for model selection. If PACF cuts off sharply after lag 2 (significant at lags 1 and 2, then near-zero), that tells you the series can be modeled with an autoregressive model of order 2 (AR(2)) -- only the two most recent values directly influence the current value. If PACF decays gradually, you might need more lags or a different model structure entirely. We'll use this diagnostic heavily when we move to formal forecasting models.
Stationarity: the assumption behind everything
A time series is stationary if its statistical properties -- mean, variance, autocorrelation structure -- don't change over time. A stationary series fluctuates around a constant level with constant variability. Most classical time series methods (ARIMA, exponential smoothing, and their variants) assume stationarity, or at least that the series can be MADE stationary through transformation.
Our synthetic series is clearly non-stationary: the mean increases over time (the trend goes from 100 to 150), and the seasonality introduces periodic structure. Real-world data is almost always non-stationary too -- prices trend upward, website traffic grows, seasonal patterns shift, and volatility changes with market conditions.
A simple first check: split the series in half and compare the statistics:
half = len(ts) // 2
print("Stationarity check (split-half comparison):")
print(f" First half: mean = {ts[:half].mean():.1f}, "
f"std = {ts[:half].std():.1f}")
print(f" Second half: mean = {ts[half:].mean():.1f}, "
f"std = {ts[half:].std():.1f}")
print(f" Means differ by: {abs(ts[:half].mean() - ts[half:].mean()):.1f}")
print(f" (stationary series would show ~0 difference)")
The means will differ by roughly 25 (the trend shifts the center from ~112 in the first half to ~137 in the second). That's a dead giveaway of non-stationarity. For a formal statistical test, use the Augmented Dickey-Fuller (ADF) test from statsmodels:
from statsmodels.tsa.stattools import adfuller
# ADF test on the original series
result = adfuller(ts, autolag='AIC')
print(f"ADF test on original series:")
print(f" Test statistic: {result[0]:.4f}")
print(f" P-value: {result[1]:.4f}")
print(f" Lags used: {result[2]}")
print(f" Observations: {result[3]}")
print(f" Critical values:")
for key, val in result[4].items():
print(f" {key}: {val:.4f}")
if result[1] < 0.05:
print("\n --> Reject null: series IS stationary")
else:
print("\n --> Fail to reject: series is NOT stationary")
The ADF test's null hypothesis is "the series has a unit root" (i.e., it's non-stationary). A low p-value (< 0.05) means you reject the null and conclude stationarity. A high p-value means you can't reject non-stationarity -- and you'll need to transform the data before modeling it. Our trended+seasonal series should fail this test decisively.
Why does stationarity matter so much? Because most time series models learn patterns from the past and project them into the future. If the statistical properties keep changing, the patterns the model learned from the first half of the data won't apply to the second half. A model trained on data with a mean of 112 will produce forecasts centered around 112 -- useless if the data has shifted to a mean of 137 by the time you're predicting. Stationarity guarantees that the patterns are stable, so what the model learns is still valid at prediction time. Same principle as the i.i.d. assumption in supervised learning (episode #13), just adapted for temporal data.
Decomposition: trend + seasonality + residual
Any time series can be decomposed into three components: trend (the long-term direction), seasonality (regular periodic patterns), and residuals (everything left over -- the random fluctuations after removing trend and seasonality).
Understanding these components separately is valuable for diagnosing what's going on. The trend tells you WHERE things are heading. Seasonality tells you WHEN things happen (monthly sales spikes, weekly traffic patterns, annual cycles). Residuals tell you how predictable the series actually is. If residuals are small relative to the original signal, the series is highly predictable. If residuals are large and noisy, no model will forecast it well -- you're bumping up against the irreducible noise floor.
# Manual decomposition (additive model: ts = trend + seasonal + residual)
# Trend: centered moving average with window = seasonal period
trend_ma = pd.Series(ts).rolling(30, center=True).mean().values
# Seasonality: average pattern per position in the cycle
detrended = ts - trend_ma
seasonal_pattern = np.array([
np.nanmean(detrended[i::30]) for i in range(30)
])
# Tile to match the full series length
seasonal_full = np.tile(seasonal_pattern, len(ts) // 30 + 1)[:len(ts)]
# Residual: what's left after removing trend and seasonality
residual = ts - trend_ma - seasonal_full
# Signal-to-noise ratio
signal_std = np.nanstd(trend_ma) + np.nanstd(seasonal_full)
noise_std = np.nanstd(residual)
print(f"Decomposition results:")
print(f" Trend range: {np.nanmin(trend_ma):.1f} to "
f"{np.nanmax(trend_ma):.1f}")
print(f" Seasonal amplitude: {np.nanstd(seasonal_full):.1f}")
print(f" Residual std: {noise_std:.1f}")
print(f" Signal-to-noise: {signal_std / noise_std:.1f}")
print(f" (>3 = very predictable, ~1 = mostly noise)")
The signal-to-noise ratio here should be well above 3, because we built the series with a strong trend and clear seasonality and relatively small noise (std=3). A real-world series might have a ratio of 1.5-2 for a reasonably predictable process (monthly sales data, weather) or below 1 for a chaotic process (minute-by-minute stock returns). When you see a signal-to-noise ratio close to 1 or below, that's the data telling you: "don't expect accurate forecasts." No amount of model complexity will extract signal that isn't there.
Having said that, statsmodels provides a cleaner decomposition tool that handles edge cases better than our manual approach:
from statsmodels.tsa.seasonal import seasonal_decompose
# statsmodels wants a pandas Series with a DatetimeIndex
result = seasonal_decompose(series, model='additive', period=30)
print(f"statsmodels decomposition:")
print(f" Trend range: {result.trend.dropna().min():.1f} to "
f"{result.trend.dropna().max():.1f}")
print(f" Seasonal range: {result.seasonal.min():.1f} to "
f"{result.seasonal.max():.1f}")
print(f" Residual std: {result.resid.dropna().std():.1f}")
The model='additive' assumes the components add up: observed = trend + seasonal + residual. For data where the seasonal amplitude grows with the level (e.g., sales that swing +/- 20% of current level, not +/- a fixed amount), use model='multiplicative' instead: observed = trend * seasonal * residual. Choosing the right model matters -- an additive decomposition on multiplicative data will leave seasonal artifacts in the residuals, and vice versa.
Moving averages: the simplest forecasting model
A moving average smooths the series by averaging over a window of recent values. It removes short-term noise and reveals the underlying trend. But it's also a forecasting model in its own right: the simplest forecast for tomorrow is "the average of the last K days."
windows = [3, 7, 14, 30]
print(f"Moving Average forecasting performance:")
print(f" {'Window':>8s} {'RMSE':>8s} {'MAE':>8s}")
print("-" * 30)
for w in windows:
ma = pd.Series(ts).rolling(w).mean()
# Forecast: MA(t) predicts y(t+1)
# So we compare y[w:] with ma[w-1:-1]
actual = ts[w:]
forecast = ma.values[w-1:-1]
errors = actual - forecast
rmse = np.sqrt(np.mean(errors**2))
mae = np.mean(np.abs(errors))
print(f" MA({w:>2d}) {rmse:>8.2f} {mae:>8.2f}")
The window size involves a classic bias-variance tradeoff (remember episode #13?). A short window (MA(3)) is responsive -- it adapts quickly to recent changes -- but noisy, because three points don't average out the randomness very well. A long window (MA(30)) is smooth and stable, but laggy -- it takes weeks to recognize that conditions have changed. On our data, the 30-day window also happens to match the seasonal period, which means it smooths out the seasonality entirely and shows only the trend. That can be useful or terrible depending on whether you care about the seasonal pattern.
Moving averages are also powerful features, not just models. In episode #21, we used moving average ratios (price divided by its SMA) as features for the crypto regime predictor. The ratio tells you whether the current value is above or below its recent average -- a simple but effective momentum indicator. Feature engineering with lagged values and moving averages is the bread and butter of time series ML, and we'll see a lot more of it when we build forecasting models.
Exponential smoothing: weighted recency
Moving averages weight all points in the window equally -- the value from 30 days ago counts as much as yesterday's. That feels wrong intuitively. Yesterday should matter more than last month for predicting today, shouldn't it?
Exponential smoothing fixes this by weighting recent values more heavily, with weights that decay exponentially into the past. The most recent observation gets weight alpha, the one before gets alpha * (1 - alpha), the one before that gets alpha * (1 - alpha)^2, and so on. In theory all past values contribute, but in practice, values from more than a few "half-lives" back have negligible weight.
def exponential_smoothing(series, alpha=0.3):
"""Simple exponential smoothing.
alpha controls responsiveness: high = follow data closely."""
result = np.zeros(len(series))
result[0] = series[0]
for t in range(1, len(series)):
result[t] = alpha * series[t] + (1 - alpha) * result[t-1]
return result
print(f"Exponential Smoothing performance:")
print(f" {'Alpha':>8s} {'RMSE':>8s} {'Effective window':>18s}")
print("-" * 40)
for alpha in [0.05, 0.1, 0.3, 0.5, 0.9]:
smoothed = exponential_smoothing(ts, alpha)
# Forecast: smoothed[t] predicts y[t+1]
errors = ts[1:] - smoothed[:-1]
rmse = np.sqrt(np.mean(errors**2))
# Effective window: how many observations carry 95% of the weight
eff_window = int(np.log(0.05) / np.log(1 - alpha)) if alpha < 1 else 1
print(f" {alpha:>8.2f} {rmse:>8.2f} {eff_window:>18d} days")
The alpha parameter controls speed of adaptation. Low alpha (0.05-0.1) means the model changes slowly, smoothing heavily like a long moving average. High alpha (0.8-0.9) makes it follow the data closely, like a short moving average. The "effective window" column shows how many past observations carry most of the weight -- it's a useful way to compare exponential smoothing to moving averages in terms of their practical memory.
Exponential smoothing is the foundation for more sophisticated methods. Double Exponential Smoothing (Holt's method) adds a separate trend component that tracks the slope of the series -- so it can forecast a rising series in stead of always predicting a flat line. Triple Exponential Smoothing (Holt-Winters) adds a seasonal component on top of that, handling the three-part decomposition (trend + seasonality + noise) in a single model. These extended methods handle all the structure we decomposed earlier, combining everything into one unified forecast.
I mention Holt-Winters specifically because it's one of those methods that sounds like it should be obsolete in the age of neural networks and gradient boosting, but it consistently competes with or beats complex ML models on time series benchmarks. The M4 competition (a major forecasting competition with 100,000 time series) found that simple statistical methods like exponential smoothing performed shockingly well against deep learning approaches. Don't underestimate simple models -- they encode strong, well-calibrated assumptions about how temporal data behaves.
Differencing: making data stationary
When a series has a trend, differencing removes it. First-order differencing computes the change between consecutive values: d(t) = y(t) - y(t-1). If the original series has a linear trend (constantly rising by about the same amount), the differenced series will fluctuate around a constant mean -- the average daily change. That's stationarity.
# First differencing removes trend
diff_1 = np.diff(ts)
print(f"Before differencing:")
print(f" Mean: {ts.mean():.1f} (drifts over time)")
print(f" Std: {ts.std():.1f}")
print(f"\nAfter first differencing:")
print(f" Mean: {diff_1.mean():.3f} (stable around ~0.137)")
print(f" Std: {diff_1.std():.2f}")
# Verify stationarity with ADF test
result_diff = adfuller(diff_1, autolag='AIC')
print(f"\n ADF test p-value: {result_diff[1]:.6f}")
if result_diff[1] < 0.05:
print(f" --> Stationary! (p < 0.05)")
else:
print(f" --> Still non-stationary")
First differencing should make our series pass the ADF test. The trend is gone (the daily changes don't drift), though the seasonal pattern might still be there in the differenced data. For that, you need seasonal differencing: subtracting the value from one full seasonal period ago.
# Seasonal differencing: remove the 30-day cycle
seasonal_diff = ts[30:] - ts[:-30]
print(f"Seasonal differencing (lag 30):")
print(f" Mean: {seasonal_diff.mean():.3f}")
print(f" Std: {seasonal_diff.std():.2f}")
# First difference THEN seasonal difference
diff_1_then_seasonal = np.diff(ts)
double_diff = diff_1_then_seasonal[29:] - diff_1_then_seasonal[:-29]
print(f"\nFirst + seasonal differencing:")
print(f" Mean: {double_diff.mean():.4f}")
print(f" Std: {double_diff.std():.2f}")
result_double = adfuller(double_diff, autolag='AIC')
print(f" ADF p-value: {result_double[1]:.6f}")
You can stack differencing operations: first difference to remove the trend, then seasonal difference to remove the periodicity. The resulting series should be fully stationary -- ready for models like ARIMA that require stationarity as a precondition.
A word of caution about over-differencing. If you difference a series that's already stationary, you introduce artificial negative autocorrelation (each value becomes negatively correlated with its neighbors because the differencing operation creates that pattern). One or two rounds of differencing usually suffice. If you need three or more, the series might have a more fundamental issue -- a structural break, a regime change (like the ones we detected in episode #21), or a nonlinear transformation that differencing can't fix. When in doubt, check the ADF test after each round.
Walk-forward validation: the ONLY proper evaluation
I need to hammer this point home because it's the most violated rule in time series ML, and violating it gives you results that look spectacular on paper and crumble the moment you deploy. We touched on this in episode #21 with the crypto regime predictor, but it deserves its own section here because it applies to EVERYTHING in time series.
Never use random train/test splits on time series data. Ever. The k-fold cross-validation we've been using since episode #13? Not valid here. Random splits let the model peek at future data points during training -- that's not just data leakage, it's time travel. And time travel makes models look brilliant. The model effectively learns "the test set happened to go up next week" and tells you it predicted that. In production, next week hasn't happened yet.
# The RIGHT way: walk-forward validation
def walk_forward_evaluate(data, train_size, step=1):
"""Expanding window walk-forward validation.
Train on [0..t], predict t+1, then expand."""
errors = []
for t in range(train_size, len(data) - step):
# Train on everything up to time t
train = data[:t]
# Simple forecast: last value (naive baseline)
forecast = train[-1]
actual = data[t + step - 1]
errors.append((actual - forecast) ** 2)
rmse = np.sqrt(np.mean(errors))
return rmse
# Compare naive forecast with walk-forward vs random split
from sklearn.model_selection import train_test_split
# Walk-forward (correct)
rmse_wf = walk_forward_evaluate(ts, train_size=180)
# Random split (WRONG for time series)
train_idx = np.random.choice(len(ts), size=180, replace=False)
test_idx = np.setdiff1d(np.arange(len(ts)), train_idx)
train_random = ts[sorted(train_idx)]
test_random = ts[sorted(test_idx)]
# "Predict" each test point using the nearest past training point
errors_random = []
for i, idx in enumerate(sorted(test_idx)):
past_train = [j for j in sorted(train_idx) if j < idx]
if past_train:
forecast = ts[past_train[-1]]
errors_random.append((ts[idx] - forecast) ** 2)
rmse_random = np.sqrt(np.mean(errors_random)) if errors_random else 0
print(f"Walk-forward RMSE (correct): {rmse_wf:.2f}")
print(f"Random split RMSE (cheating): {rmse_random:.2f}")
print(f"\nRandom split looks {'better' if rmse_random < rmse_wf else 'worse'} "
f"-- but it's INVALID")
The random split RMSE will often look better because test points end up surrounded by training points in time -- the model effectively interpolates between known values in stead of extrapolating into the unknown future. That's cheating. Walk-forward validation forces the model to do the hard thing: predict genuinely unseen future values. The RMSE will be higher, but it honestly reflects how well the model will perform in production.
The golden rules of time series
Before we move to building actual forecasting models, let's consolidate the principles that will guide everything from here:
Never use future data. Every feature, every preprocessing step, every validation split must respect the time ordering. If you compute a moving average over the entire dataset and then use it as a feature for a model evaluated on the last 30%, you've leaked future information into the training set. The average includes future values. This is the most common and most devastating mistake in time series ML.
Stationarity matters. Most models assume it (at least after transformation). Always test with ADF, always difference or detrend if needed. A model fit on non-stationary data will learn a pattern that was true in the past and is already obsoloete.
Simpler models are hard to beat. Exponential smoothing and ARIMA regularly outperform complex ML models on time series benchmarks. The M4 and M5 competitions demonstrated this repeatedly. Don't dismiss simple methods -- they encode decades of statistical wisdom about temporal data into remarkably few parameters.
Evaluate properly. Walk-forward validation (expanding or sliding window) is mandatory. Never use random k-fold. Never use a single train/test split without respecting temporal order. If your time series model achieves 95%+ accuracy with random cross-validation, it's almost certainly cheating.
Autocorrelation is your diagnostic compass. Plot the ACF and PACF before you do anything else. They tell you what kind of temporal structure exists in the data, which directly informs model selection. A slowly decaying ACF with a sharp PACF cutoff suggests an AR model. A quickly decaying ACF with a slowly decaying PACF suggests an MA model. Periodic spikes reveal seasonality. No significant autocorrelation at any lag means the data might be a random walk -- and you can't forecast a random walk better than "tomorrow will be the same as today."
Zo, wat hebben we geleerd?
Let me tie together everything from today. We've entered a completely new domain of ML -- one where the ordering of observations carries critical information that we've been ignoring for 27 episodes. Here's the full picture:
- Time series data has a fundamental property: observations are dependent. Today is related to yesterday, and shuffling destroys the data. This violates the independence assumption behind every tabular ML method we've learned so far;
- Autocorrelation measures how strongly past values predict future ones -- it's the signal that time series models exploit. The ACF and PACF are your primary diagnostic tools: always plot them before modeling;
- Stationarity (constant mean, variance, autocorrelation over time) is required by most classical methods. Use the ADF test to check, and use differencing to achive it;
- Decomposition into trend, seasonality, and residuals reveals what's predictable and what's noise. The signal-to-noise ratio tells you the ceiling on forecast accuracy before you train a single model;
- Moving averages smooth noise by averaging over a window. Exponential smoothing weights recent values more heavily. Both are legitimate forecasting models, not just preprocessing tools;
- Differencing removes trends (first difference) and seasonality (seasonal difference), converting non-stationary series to stationary ones. Don't over-difference -- one or two rounds is almost always enough;
- Walk-forward validation is the ONLY proper way to evaluate time series models. Random splits are invalid because they let models peek at the future. Same principle we discovered in episode #21, now formalized as an absolute rule;
- The golden rules: no future data in any form, stationarity before modeling, simple methods are hard to beat, and autocorrelation is your compass.
This episode lays the groundwork for everything that follows in the time series domain. We now understand the properties that make temporal data special and the tools for diagnosing that structure. The natural next step is to put this knowledge to work: building models that actually produce forecasts, starting with the classical statistical approaches that remain the backbone of production forecasting systems worldwide. The concepts we covered today -- autocorrelation, stationarity, differencing -- will show up directly as parameters and design choices in those models.
Congratulations @scipio! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)
Your next target is to reach 200 posts.
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOPCheck out our last posts: