Quantitative Research · Results & Comparison

DPF vs BPF:
Full Comparison Results

Comprehensive metrics across warning signal quality, episode discrimination, filtering accuracy, and per-crisis lead times. 4,526 trading days · 9 labelled crisis episodes · 2 held-out OOS tests.

DPF — Differentiable · Sinkhorn OT
Adaptive Filter
Endo. Episode AUC0.567
Daily ROC-AUC0.682
PR-AUC0.102
Brier Score0.212
Signal-to-noise2.67
Crisis/Calm ratio340×
DPF wins
7
metrics
4
metrics
BPF wins
BPF — Bootstrap · Fixed Params
Stable Filter
Posterior CI width0.219
Tail prob. AUC0.726
Mean lead time50.2d
Bear Stearns peak L4.53
COVID peak L5.62
Particles N1000 (2×)

Full Performance Comparison

3 features · z-scored · COVID + SVB held out
Metric DPF BPF Winner
Warning Signal Quality (30-day pre-crisis labels)
Brier Score0.21240.2180DPF ✓
Brier Skill Score−2.88−2.98DPF ✓
Daily ROC-AUC0.68190.6695DPF ✓
PR-AUC0.10210.0965DPF ✓
Episode-Level Discrimination
Endogenous Episode AUC0.56670.5000DPF ✓
Exogenous Episode AUC0.00000.0000Tie — correct result
Lead Time (episodes with signals only)
Mean Lead Time (days)47.350.2BPF ✓
Median Lead Time (days)57.059.0BPF ✓
Filtering Quality
Posterior 90% CI width0.5480.219BPF ✓ (tighter)
Responsiveness std(|ΔL_t|)0.0460.036DPF ✓ (more reactive)
Signal-to-noise E[L|crisis]−E[L|calm]2.672.21DPF ✓
Tail prob AUC P(L>2)0.7090.726BPF ✓
Crisis Probability Quality
GFC mean crisis prob0.7840.788Comparable
Calm period mean crisis prob0.0020.014DPF ✓ (sharper)
Crisis/calm ratio340×54×DPF ✓
Calibrated Feature Weights
L level coefficient β₁+0.774+0.703Both positive ✓
dL momentum coefficient β₂−0.295−0.254Both negative ✓
Drawdown coefficient β₃+0.566+0.581Both positive ✓

Per-Crisis Warning Lead Times

60-day search window · threshold P > 0.5
Crisis Episode DPF Lead BPF Lead Type Notes
GFC (2007-08)13d26dEndogenousBPF earlier; DPF fires at peak stress buildup
Eurozone I (2010)58d58dEndogenousBoth detect 2 months out
Eurozone II (2011)56d60dEndogenousBoth detect near maximum window
China (2015)60d60dEndogenousBoth at maximum lead
Brexit (2016)60d60dExogenousConcurrent Eurozone stress — legitimate microstructure signal
Q4 Selloff (2018)no signalno signalEndogenousGradual multi-month selloff; beyond 60d horizon
COVID-19 (2020)no signalno signalExogenousCorrect — no microstructure precursor for pandemic
Rate Shock (2022)no signalno signalEndogenousPolicy-driven over 9 months; beyond 60d horizon
SVB (2023)37d37dExogenousUnrealised Treasury losses visible in credit spreads ~5w pre-run
Mean (signals only)47.3d50.2d5 of 9 episodes detected by both models within the 60-day window

Performance Dashboard & Analysis Figures

Click to enlarge

Animated Comparisons

Dual-filter forecast cones · research/animate_comparison.py

Key Findings

DPF wins on early warning
All five warning quality metrics, endogenous episode AUC (0.567 vs 0.500), and crisis/calm discrimination (340× vs 54×). Adaptive parameters learn each crisis's microstructure signature.
BPF wins on posterior quality
Tighter posteriors (CI 0.22 vs 0.55) and higher tail probability AUC (0.726 vs 0.709). Fixed parameters act as a regulariser — better for contemporaneous detection.
Exogenous shocks are unpredictable
Zero advance warning for COVID-19 and Rate Shock. Correct result: financial microstructure cannot predict pandemics or central bank pivots. The model correctly draws this boundary.
Bias-variance tradeoff confirmed
DPF requires separate fixed-param SSM for forward simulation. Rolling z-score normalisation required for calibration. Operational complexity is the cost of adaptivity.