Background: Feature Engineering Common Pitfalls

80% of Alpha in quantitative trading comes from feature engineering, but 80% of failures also come from feature engineering.

Pitfall 1: Future Information Leakage

Problem: Accidentally using future data in feature calculation.

# Wrong Example: Using today's close for indicator, but deciding at open
df['ma20'] = df['close'].rolling(20).mean()  # Includes today's close
signal = df['close'] > df['ma20']  # Uses today's close to judge

# Correct Approach: shift(1) ensures only historical data is used
df['ma20'] = df['close'].rolling(20).mean().shift(1)
signal = df['close'] > df['ma20']

Detection Methods:

Feature-label correlation > 0.9 -> Likely leakage
Model accuracy > 90% -> Almost certainly leakage

Pitfall 2: Global Normalization

Problem: Normalizing with whole-dataset statistics leaks future distribution information.

# Wrong Example
mean = df['close'].mean()  # Includes future data
std = df['close'].std()
df['normalized'] = (df['close'] - mean) / std

# Correct Approach: Rolling window normalization
df['normalized'] = (
    (df['close'] - df['close'].rolling(20).mean()) /
    df['close'].rolling(20).std()
)

Pitfall 3: Highly Correlated Feature Redundancy

Problem: Multiple features are highly correlated, creating information redundancy and overfitting risk.

Feature Group	Correlation	Problem
MA5, MA10, MA20	0.95+	Nearly identical information
RSI, Stochastic	0.8+	Both measure overbought/oversold
Close, VWAP	0.99	Nearly identical

Solution:

Keep only one feature when correlation > 0.7
Use PCA for dimensionality reduction
Choose the most explanatory representative

Pitfall 4: Overfitting to Noise

Problem: More features mean more chance of memorizing training data noise.

Feature count vs Sample count rule of thumb:

Samples / Features > 20  -> Safe zone
Samples / Features = 10  -> Warning zone
Samples / Features < 5   -> Danger zone

Example: 1000 samples, use at most 50 features

Solution:

Recursive feature elimination, observe validation set performance
Use L1 regularization for automatic selection
Domain knowledge takes priority over statistical significance

Pitfall 5: Categorical Feature Encoding Errors

Problem: Incorrect handling of categorical features leads to information loss or spurious relationships.

# Wrong Example: Direct numeric encoding (implies ordering)
sector_map = {'Technology': 1, 'Healthcare': 2, 'Finance': 3}
df['sector'] = df['sector_name'].map(sector_map)  # Model thinks 3 > 2 > 1

# Correct Approach: One-Hot encoding
df = pd.get_dummies(df, columns=['sector_name'])

Pitfall 6: Ignoring Time-Varying Correlation

Problem: Feature predictive power is unstable across different time periods.

Period	Momentum Factor IC	Value Factor IC
2015-2017	0.05	0.02
2018-2019	0.01	0.04
2020-2021	0.06	-0.02

Solution:

Use rolling IC instead of single IC
IC stability (IC/std(IC)) more important than absolute IC value
Consider feature effectiveness conditional on regime

Pitfall 7: Data Snooping

Problem: Repeatedly testing until finding "effective" features, actually overfitting.

Test 100 features
-> Expect 5 to be "significant" at p<0.05
-> These 5 may just be random noise

Solution:

Use Bonferroni correction: p-value threshold = 0.05 / number of tests
Reserve independent OOS dataset
Record all tested features, not just "successful" ones

Pitfall 8: Ignoring Trading Cost Impact

Problem: High-turnover features get eaten by costs in live trading.

Feature A: IC = 0.05, turnover 200%/month
Feature B: IC = 0.03, turnover 50%/month

Assuming one-way cost 0.2%:
Feature A cost: 200% x 0.2% x 2 = 0.8%/month
Feature B cost: 50% x 0.2% x 2 = 0.2%/month

After costs, Feature B may be better

Feature Engineering Checklist

Check Item	Passing Standard
No future information	All features use shift(1) or earlier data
Rolling normalization	No global mean/std used
Low correlation	Inter-feature correlation < 0.7
Sample ratio	Samples / Features > 20
IC stability	IC / std(IC) > 0.5
Cost feasibility	Still positive returns after turnover costs