Background: Time Series Cross-Validation (Purged CV)

"Using standard K-Fold on financial data is like using tomorrow's newspaper to predict today's stock price."

Why Standard Cross-Validation Fails in Finance?

Standard K-Fold Assumption: Samples are mutually independent.

Financial Data Reality:

Today's return is highly correlated with yesterday's (autocorrelation)
Features used to predict day 100 may include information from days 99, 98
Labels (returns) often involve multi-day windows

Result: Information "leaks" from test set to training set, causing severe overfitting.

A Concrete Leakage Case

Scenario: Predicting 5-day future returns

Data: Days 1-100
Label: ret_5d[t] = (close[t+5] - close[t]) / close[t]

Sample 95's label uses: Days 95-100 prices
Sample 96's label uses: Days 96-101 prices (partial overlap!)

Standard K-Fold might:

Training set includes sample 95 (label involves days 95-100)
Test set includes sample 96 (label involves days 96-101)
Days 96-100 information is in both training and testing!

Three Time Series CV Methods

Method 1: Simple Time Split (Walk-Forward)

Fold 1: Train [1-60]  -> Test [61-70]
Fold 2: Train [1-70]  -> Test [71-80]
Fold 3: Train [1-80]  -> Test [81-90]
Fold 4: Train [1-90]  -> Test [91-100]

Pros: Simple, no future information leakage Cons: Training set grows larger, early data may be stale

Method 2: Rolling Window

Fold 1: Train [1-60]   -> Test [61-70]
Fold 2: Train [11-70]  -> Test [71-80]
Fold 3: Train [21-80]  -> Test [81-90]
Fold 4: Train [31-90]  -> Test [91-100]

Pros: Fixed training set size, uses most recent data Cons: Low sample utilization

Method 3: Purged K-Fold (Recommended)

On top of standard K-Fold:

Time Ordering: Split K folds in chronological order
Purge: Remove training samples whose labels overlap with test set
Embargo: Add safety buffer beyond the purge zone

Purged CV Explained

Problem Setup:

Feature window: Past 20 days
Label window: Future 5 days
Data: Days 1-100

Steps:

1. Split Folds (assume 5 folds)
   Fold 3 test set: Days 41-60

2. Identify Leakage Zone
   Test labels involve: Days 41-65 (41+5-1 to 60+5-1)
   Training features involve: Days 21-40 may also affect test

3. Purge
   Remove training samples whose labels overlap with test set
   Remove: Days 36-40 (labels involve 36-45, overlaps with 41-65)

4. Embargo
   Remove N additional samples after the Purge boundary
   If Embargo = 5 days, remove training samples from days 41-45

Visualization:

The Role of Embargo

Why is Embargo Needed?

Even after purging label overlap, there may still be:

Feature autocorrelation (today's MA20 and tomorrow's MA20 are nearly identical)
Information propagation delay (news impact lasts several days)
Market state persistence (trends don't disappear overnight)

Suggested Embargo Length:

Data Frequency	Label Window	Suggested Embargo
Daily	5 days	3-5 days
Daily	20 days	10-20 days
Minute	1 hour	30-60 minutes
Tick	100 Ticks	50-100 Ticks

Rule of Thumb: Embargo ≈ 0.5 x Label Window

Practical Calculation Example

Setup:

Data: 1000 samples (4 years daily)
Label: 10-day future return
K = 5 folds
Embargo = 5 days

Fold Allocation:

Fold	Original Test Range	Purge Removal	Embargo Removal	Effective Training Samples
1	1-200	None	None	210-1000 (790)
2	201-400	191-200	401-405	1-190, 406-1000 (785)
3	401-600	391-400	601-605	1-390, 606-1000 (785)
4	601-800	591-600	801-805	1-590, 806-1000 (785)
5	801-1000	791-800	None	1-790 (790)

Note: Each fold loses about 15 samples to prevent leakage.

Comparison with Other Methods

Method	Info Leakage	Sample Utilization	Compute Complexity	Suitable For
Standard K-Fold	Severe	High	Low	Not for finance
Simple Time Split	None	Medium	Low	Quick validation
Rolling Window	None	Low	Medium	Strategy stability testing
Purged K-Fold	None	Higher	Medium	Model selection, hyperparameter tuning
Purged + Embargo	None	Medium	Medium	Most rigorous validation

Multi-Agent Perspective

In multi-agent systems, different Agents need different CV strategies:

Signal Agent (predicting 5-day returns):
  - Purge: 5-day label window
  - Embargo: 3 days
  - Conservative model performance estimate

Regime Agent (identifying market states):
  - Purge: Usually not needed (state is current)
  - Embargo: Longer (state transitions have inertia)
  - Focus on accuracy during state transitions

Risk Agent (predicting volatility):
  - Purge: Volatility window (e.g., 20 days)
  - Embargo: 5 days
  - Volatility clustering requires longer Embargo

Common Misconceptions

Misconception 1: Using Purged CV prevents overfitting

Wrong. Purged CV only prevents information leakage, it cannot prevent:

Overfitting from too many features
Data snooping (repeatedly testing until finding good results)
Excessive model complexity

Misconception 2: Longer Embargo is always better

Not entirely true. Too long Embargo:

Wastes effective training samples
May make training data too stale
Increases computational cost

Misconception 3: Only need Purged CV for final testing

Wrong. Hyperparameter tuning must also use Purged CV, otherwise you'll select overfitting parameters.

Practical Recommendations

1. Check if Purging is Needed

Need Purging when:
- Labels involve multi-day windows (e.g., future N-day returns)
- Features involve long windows (e.g., 60-day moving average)
- Samples have overlap

Less Need for Purging when:
- Labels are instantaneous (e.g., next tick direction)
- Samples are completely independent (e.g., cross-section of different stocks)

2. Validate Purge Effectiveness

Comparison Experiment:
1. Train with standard K-Fold, record test accuracy
2. Train with Purged K-Fold, record test accuracy
3. Larger difference indicates more severe original leakage

3. Reserve Completely Independent Test Set

Data Allocation:
- 70%: Purged K-Fold for model selection and hyperparameter tuning
- 30%: Completely isolated final test set, use only once

Summary

Key Point	Explanation
Core Problem	Time series samples are not independent, standard CV causes leakage
Purge Function	Remove training samples whose labels overlap with test set
Embargo Function	Add safety buffer beyond Purge boundary
Recommended Method	Purged K-Fold + Embargo
Validation Method	Compare standard CV and Purged CV results