Why Layered Optimization?
Traditional portfolio optimization typically treats the problem as a single-stage mathematical program: given a universe of assets and some return/covariance estimates, find the weight vector that maximizes a risk-adjusted objective (Sharpe, Sortino, minimum variance, etc.). This works cleanly in theory but struggles in practice because real markets exhibit regime shifts, non-stationary correlations, and survivorship effects that a flat optimization cannot anticipate.
The key insight is that portfolio construction naturally decomposes into distinct decisions — which assets to consider, what market regime we're in, how to size positions, and what risk constraints to enforce. Each of these can be treated as a separate optimization layer with its own objective, state representation, and feedback signal. The layers are stacked: outputs from one layer become inputs or constraints for the next.
Layer 1 — Pilot Screening
Filters the investable universe using fundamental and technical criteria. Produces a screened candidate set per rebalance period.
Layer 2 — Attractiveness Scoring
Assigns a composite score to each candidate using multi-indicator fusion: momentum, mean-reversion, volatility-normalized signals.
Layer 3 — Regime Detection
Classifies the current market regime (trending, mean-reverting, high-vol, crisis) to modulate risk budgets and strategy weights.
Layer 4 — Allocation
Converts scores and regime into position weights. Supports rule-based sizing, mean-variance optimization, and RL policy approaches.
Methodology
Walk-Forward Validation
All backtests use a strict walk-forward (anchored expanding window) framework to avoid look-ahead bias. Each rebalance period trains on all data up to that point, generates a forward allocation, and records out-of-sample returns. No future information leaks into any layer's parameters.
Provenance Ledgers
Every experiment run produces a provenance ledger: a structured log of which screening criteria were active, what regime was detected per period, which signals contributed to the attractiveness score, and the resulting allocation vector. This makes it possible to trace any portfolio outcome back to the specific decisions made at each layer — essential for debugging and understanding what drives performance.
Pilot Universe Design
Multiple screened pilot universes are tested in parallel:
- Momentum-screened: assets above a trailing return threshold with liquidity filters
- Low-vol screened: assets with below-median realized volatility
- Multi-factor screened: composite filter combining momentum, quality, and size factors
- Unconstrained: full universe as a baseline
Regime Detection Approaches
Several regime classifiers are compared:
- Rolling volatility quantiles — simple, interpretable, fast
- Hidden Markov Models (HMM) — probabilistic regime assignment with smoother transitions
- Trend-following filter — moving-average crossovers with adaptive thresholds
RL Allocation Layer (In Development)
The current rule-based allocation layer (equal-weight top-N, volatility-weighted) serves as a strong baseline. The next phase replaces this with a reinforcement learning policy trained using PufferLib, where the state space includes the attractiveness scores, regime probabilities, and portfolio-level risk metrics. The action space is continuous position sizing, with the reward function defined as risk-adjusted return over a forward window.
This is where the layered approach shines: the RL agent doesn't need to learn screening or regime detection from raw price data — those are already handled by lower layers. Its job is strictly given these signals and this regime, how much should I allocate? This reduces the state space dramatically and should lead to faster, more stable training.
Current Findings
Preliminary walk-forward results across multiple pilot universes show several consistent patterns:
- Screening matters: screened universes consistently outperform their unconstrained counterparts on risk-adjusted metrics, especially during drawdown periods
- Regime-awareness reduces tail risk: dynamic risk budgeting based on detected regime reduces maximum drawdown and volatility-of-volatility compared to static allocations
- Provenance tracking reveals hidden drift: several strategies that looked good on aggregate metrics showed clear strategy decay when examined period-by-period through the ledger
Next steps: Complete PufferLib RL integration for the allocation layer, add cross-asset-class pilot universes, implement transaction cost modeling, and produce a formal paper with full statistical significance testing across all layer combinations.