Dispensight
Research
Three Bayesian Engines · Convergence Study · Testville #002

Different paths, same destination.

Three distinct Bayesian inference engines—MCMC, Metropolis-Hastings, and Conjugate—process the same retail data through fundamentally different computational philosophies. Despite varying acceptance rates, uncertainty estimates, and coefficient magnitudes, they converge on remarkably similar forecasts. This study explores why.
MCMC (Gibbs)
Full Gibbs sampling with conditional posteriors. High acceptance, efficient exploration of parameter space.
Acceptance Rate ~100%
Noise Variance (σ²) 0.0847
Intercept 7.27 ± 0.15
Philosophy Conditional sampling
Metropolis-Hastings
Proposal-based MCMC with rejection mechanism. Low acceptance rate indicates careful exploration of high-probability regions.
Acceptance Rate 1.6%
Noise Variance (σ²) 0.1899
Intercept 2.86 ± 0.92
Philosophy Accept/reject proposals
Conjugate
Analytical posterior via conjugate priors. No sampling required—closed-form solution provides deterministic inference.
Acceptance Rate N/A (analytical)
Noise Variance (σ²) 0.0198
Intercept 9.38 ± 0.08
Philosophy Closed-form

The Paradox of Agreement

🎯 Core Question
How can three engines with dramatically different coefficient estimates, noise levels, and computational approaches produce nearly identical forecasts?

Coefficient Comparison: log1p_tx (Transactions)

Engine Coefficient Std Dev z-score
MCMC +0.705 0.070 10.08
MH +0.352 0.111 3.16
Conjugate +0.911 0.034 26.69

Lagged Sales Coefficient

Engine Coefficient Std Dev Role
MCMC +0.246 0.015 Moderate
MH +0.702 0.095 Dominant
Conjugate +0.029 0.008 Minimal

Why They Diverge

MCMC (Gibbs)

Samples from conditional posteriors sequentially. Each parameter updated given current values of others. Results in tight posterior intervals due to efficient exploration. Low noise variance (0.085) means it attributes less to randomness, more to signal.

Metropolis-Hastings

Proposes jumps, accepts/rejects based on probability ratio. 1.6% acceptance suggests narrow proposal distribution or conservative tuning. High noise variance (0.190) compensates—attributes more to stochastic effects. Lagged sales dominates (0.702) to stabilize predictions.

Conjugate (Analytical)

No sampling. Posterior derived analytically from likelihood × prior. Extremely tight intervals (σ² = 0.020). Heavily weights transactions (0.911) with minimal lag dependence. Deterministic, no randomness in inference—only in data.

Why They Converge

🔄 Compensation Mechanisms
Each engine balances signal vs. noise differently. High coefficient → low lag. Low coefficient → high lag. The linear combination of features produces equivalent predicted trajectories despite wildly different parameter spaces.
Key Convergence Factors:
  • All three detect same underlying patterns in training data
  • Transformation (log1p, z-score) creates bounded feature space
  • Forecasts average out parameter uncertainty—median predictions stable
  • Anomaly detection flags same indices [892-896] across all engines
  • Strong signal-to-noise ratio makes model class less critical
Empirical Result

Despite 10x difference in noise variance and 30x difference in transaction coefficients, 48-step-ahead forecasts align within 2% median error.

The Fisher Information Paradox

Coefficient MCMC Fisher MH Fisher Conjugate Fisher Interpretation
total_sales_lag 1,023,816 456,943 4,390,515 Extremely high precision—all agree on importance
Intercept 10,798 4,819 46,307 Conjugate most confident in baseline
log1p_tx 10,798 4,819 46,307 Uniform across features—standardized design

Key Insight: Fisher information diagonal values are nearly identical across engines except for the lagged term. This suggests the X'X design matrix is well-conditioned (cond ≈ 46,007 for all). The massive difference in lagged-term Fisher info (100x–400x larger) reveals where models store "memory"—but they compensate by adjusting other coefficients to maintain forecast stability.

Multi-Target Effects: Where They Disagree

Transaction Effect (log1p_tx)

MCMC total_sales: +0.70
MH total_sales: +0.35
Conjugate total_sales: +0.91

Conjugate believes transactions are primary driver. MH hedges with lag dependence. MCMC balances both.

Team Performance Effect

MCMC total_sales: -0.08
MH total_sales: -0.35
Conjugate total_sales: +0.03

Major disagreement. MCMC/MH see negative impact; Conjugate sees slight positive. Yet forecasts still converge because this feature has low overall influence.

Practical Implications

Model Selection

Use Conjugate for fast inference with tight bounds. Use MCMC for interpretable coefficients. Use MH when you need conservative uncertainty estimates.

Ensemble Strategy

Since forecasts converge despite parameter differences, an equally-weighted ensemble provides robustness without over-complicating. The median of three engines is more stable than any single model.

Diagnostic Value

When coefficients differ but forecasts align, it suggests model misspecification doesn't matter for prediction—only for causal interpretation. If forecasts diverge, your data needs more structure or longer history.