Quantitative Inference Portfolio

State Estimation Under Nonlinearity and Dimensional Scaling

A controlled benchmark of Kalman, particle, and deterministic flow filters evaluated on calibration, stability, and computational cost.

Designed and benchmarked 6 Bayesian state estimators (KF to Particle Flow).
Evaluated statistical consistency using NEES, not just RMSE.
Analyzed accuracy-runtime-memory trade-offs under nonlinear and high-dimensional stress.

Results GitHub Report Main Site

Particle cloud evolution in range-bearing localization — Particle cloud evolution in the nonlinear tracking setting.

Key Insights

Calibration is a first-class metric

NEES exposed overconfidence in filters that appeared acceptable on point error alone.

Numerical stabilization has boundaries

Joseph-form covariance updates improved robustness but did not eliminate nonlinear divergence mechanisms.

Deterministic flow proposals shift trade-offs

Flow-based proposals improved robustness in hard regimes with measurable runtime overhead.

Scaling governs deployability

High-dimensional behavior determined practical feasibility more than single-scenario leaderboard performance.

Benchmark Results

The table combines accuracy, calibration, and systems costs. RMSE alone is not sufficient for decision-critical inference.

Method	RMSE	NEES (target ~ 1)	Runtime (s)	Memory (MB)	Verdict
EKF	1.660	40.034	0.990	450.5	Fast but severely miscalibrated in nonlinear settings.
UKF	1.316	6.398	1.032	452.8	Better than EKF, still overconfident under stress.
Bootstrap PF	0.906	1.253	1.912	494.0	Best calibrated among NEES-logged methods with strong accuracy.
PF-PF (LEDH)	2.492	n/a	10.857	n/a	Stable proposals, high runtime cost in current configuration.
LEDH	3.543	n/a	5.197	n/a	Deterministic flow behavior with moderate compute overhead.
Kernel-PFF (matrix)	3.322	n/a	14.642	n/a	High-dimensional robustness signal, but expensive in runtime.

Interpretation: Bootstrap PF gave the strongest combined signal on accuracy and calibration where NEES was logged, while deterministic flows showed useful stress robustness at significantly higher compute cost. For downstream decision systems, calibration quality is as important as error magnitude.

NEES Emphasis

NEES = (x - x̂)^T P^-1 (x - x̂)

A filter with low RMSE but inflated NEES is statistically overconfident and unsafe for downstream decision-making.

NEES comparison in nonlinear regime — Calibration diagnostics in nonlinear range-bearing localization.

Scaling Study

As dimensionality increases, method choice is governed by an accuracy-efficiency frontier, not a single scalar metric.

Scaling trade-off plot — Dimensional scaling analysis from the runtime-memory study.

Computational Awareness

Method	Time Complexity (approx.)	Memory Complexity (approx.)
KF / EKF / UKF	O(d^3)	O(d^2)
Bootstrap PF	O(Nd)	O(Nd)
PF-PF / LEDH	O(Nd^2)	O(Nd)
Kernel-PFF (matrix)	O(N^2d)	O(N^2)

Runtime and memory were explicitly profiled per method in the benchmark pipeline.

Experimental Design

Design Philosophy

This benchmark isolates three axes:

Linearity vs nonlinearity
Gaussian vs non-Gaussian uncertainty
Dimensional scaling behavior

Each experiment evaluates:

Point accuracy
Covariance calibration
Numerical stability
Computational complexity

Progressive Stress Framework

Act 1: Linear-Gaussian Baseline

KF stability and covariance update behavior under controlled assumptions.

Act 2: Nonlinear Stress

Approximate Gaussian filters tested under range-bearing nonlinearity and calibration pressure.

Act 3: Deterministic Flow Proposals

PF-PF and flow-based methods evaluated under stronger degeneracy and transport constraints.

Failure Modes

EKF divergence under strong nonlinearity

First-order linearization produced unstable covariance behavior and inflated NEES when measurement geometry became strongly nonlinear.

Particle degeneracy when N is insufficient

Weight collapse reduced effective sample size and increased variance in posterior approximation under sparse or hard observations.

Runtime overhead in deterministic flows

EDH/LEDH and kernelized transport improved certain stress cases but incurred substantial runtime overhead relative to KF/PF baselines.

Overconfidence detected by NEES

Calibration failure occurred even when point-error metrics appeared acceptable, reinforcing NEES as a non-optional diagnostic.

Reproducibility

No new experiments were added in this portfolio redesign. The page uses existing outputs from the benchmark repository.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
bash scripts/run_part1.sh

Source repository: github.com/meamresh/MLCOE_Q2_PF

Relevance to Quantitative Research

State-space reasoning

Filtering maps naturally to latent-state inference, where hidden factors evolve sequentially under uncertainty.

Real-time inference discipline

The benchmark emphasizes online updates, calibration quality, and failure awareness rather than static offline fit.

Calibration as control signal

NEES provides a direct signal for uncertainty reliability, critical when downstream policies consume posterior covariance.

High-dimensional systems perspective

Scaling diagnostics connect algorithmic complexity to practical deployment constraints in large latent-state systems.