Quantitative Inference Portfolio

State Estimation Under Nonlinearity and Dimensional Scaling

A controlled benchmark of Kalman, particle, and deterministic flow filters evaluated on calibration, stability, and computational cost.

Particle cloud evolution in range-bearing localization
Particle cloud evolution in the nonlinear tracking setting.

Key Insights

Calibration is a first-class metric

NEES exposed overconfidence in filters that appeared acceptable on point error alone.

Numerical stabilization has boundaries

Joseph-form covariance updates improved robustness but did not eliminate nonlinear divergence mechanisms.

Deterministic flow proposals shift trade-offs

Flow-based proposals improved robustness in hard regimes with measurable runtime overhead.

Scaling governs deployability

High-dimensional behavior determined practical feasibility more than single-scenario leaderboard performance.

Benchmark Results

The table combines accuracy, calibration, and systems costs. RMSE alone is not sufficient for decision-critical inference.

Method RMSE NEES (target ~ 1) Runtime (s) Memory (MB) Verdict
EKF 1.660 40.034 0.990 450.5 Fast but severely miscalibrated in nonlinear settings.
UKF 1.316 6.398 1.032 452.8 Better than EKF, still overconfident under stress.
Bootstrap PF 0.906 1.253 1.912 494.0 Best calibrated among NEES-logged methods with strong accuracy.
PF-PF (LEDH) 2.492 n/a 10.857 n/a Stable proposals, high runtime cost in current configuration.
LEDH 3.543 n/a 5.197 n/a Deterministic flow behavior with moderate compute overhead.
Kernel-PFF (matrix) 3.322 n/a 14.642 n/a High-dimensional robustness signal, but expensive in runtime.

Interpretation: Bootstrap PF gave the strongest combined signal on accuracy and calibration where NEES was logged, while deterministic flows showed useful stress robustness at significantly higher compute cost. For downstream decision systems, calibration quality is as important as error magnitude.

NEES Emphasis

NEES = (x - x̂)T P-1 (x - x̂)

A filter with low RMSE but inflated NEES is statistically overconfident and unsafe for downstream decision-making.

NEES comparison in nonlinear regime
Calibration diagnostics in nonlinear range-bearing localization.

Scaling Study

As dimensionality increases, method choice is governed by an accuracy-efficiency frontier, not a single scalar metric.

Scaling trade-off plot
Dimensional scaling analysis from the runtime-memory study.

Computational Awareness

Method Time Complexity (approx.) Memory Complexity (approx.)
KF / EKF / UKF O(d^3) O(d^2)
Bootstrap PF O(Nd) O(Nd)
PF-PF / LEDH O(Nd^2) O(Nd)
Kernel-PFF (matrix) O(N^2d) O(N^2)

Runtime and memory were explicitly profiled per method in the benchmark pipeline.

Experimental Design

Design Philosophy

This benchmark isolates three axes:

  1. Linearity vs nonlinearity
  2. Gaussian vs non-Gaussian uncertainty
  3. Dimensional scaling behavior

Each experiment evaluates:

  • Point accuracy
  • Covariance calibration
  • Numerical stability
  • Computational complexity

Progressive Stress Framework

Act 1: Linear-Gaussian Baseline

KF stability and covariance update behavior under controlled assumptions.

KF baseline stability figure

Act 2: Nonlinear Stress

Approximate Gaussian filters tested under range-bearing nonlinearity and calibration pressure.

Nonlinear NEES behavior

Act 3: Deterministic Flow Proposals

PF-PF and flow-based methods evaluated under stronger degeneracy and transport constraints.

Deterministic flow proposal performance

Failure Modes

EKF divergence under strong nonlinearity

First-order linearization produced unstable covariance behavior and inflated NEES when measurement geometry became strongly nonlinear.

Particle degeneracy when N is insufficient

Weight collapse reduced effective sample size and increased variance in posterior approximation under sparse or hard observations.

Runtime overhead in deterministic flows

EDH/LEDH and kernelized transport improved certain stress cases but incurred substantial runtime overhead relative to KF/PF baselines.

Overconfidence detected by NEES

Calibration failure occurred even when point-error metrics appeared acceptable, reinforcing NEES as a non-optional diagnostic.

Reproducibility

No new experiments were added in this portfolio redesign. The page uses existing outputs from the benchmark repository.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
bash scripts/run_part1.sh

Source repository: github.com/meamresh/MLCOE_Q2_PF

Relevance to Quantitative Research

State-space reasoning

Filtering maps naturally to latent-state inference, where hidden factors evolve sequentially under uncertainty.

Real-time inference discipline

The benchmark emphasizes online updates, calibration quality, and failure awareness rather than static offline fit.

Calibration as control signal

NEES provides a direct signal for uncertainty reliability, critical when downstream policies consume posterior covariance.

High-dimensional systems perspective

Scaling diagnostics connect algorithmic complexity to practical deployment constraints in large latent-state systems.