System Optimization Methods

A research exploration into reframing portfolio construction not as a single prediction problem, but as a stacked optimization architecture — where candidate screening, regime detection, position sizing, and risk controls operate as independent tunable layers with walk-forward provenance tracking.

Why Layered Optimization?

Traditional portfolio optimization typically treats the problem as a single-stage mathematical program: given a universe of assets and some return/covariance estimates, find the weight vector that maximizes a risk-adjusted objective (Sharpe, Sortino, minimum variance, etc.). This works cleanly in theory but struggles in practice because real markets exhibit regime shifts, non-stationary correlations, and survivorship effects that a flat optimization cannot anticipate.

The key insight is that portfolio construction naturally decomposes into distinct decisions — which assets to consider, what market regime we're in, how to size positions, and what risk constraints to enforce. Each of these can be treated as a separate optimization layer with its own objective, state representation, and feedback signal. The layers are stacked: outputs from one layer become inputs or constraints for the next.

Layer 1 — Pilot Screening

Filters the investable universe using fundamental and technical criteria. Produces a screened candidate set per rebalance period.

Layer 2 — Attractiveness Scoring

Assigns a composite score to each candidate using multi-indicator fusion: momentum, mean-reversion, volatility-normalized signals.

Layer 3 — Regime Detection

Classifies the current market regime (trending, mean-reverting, high-vol, crisis) to modulate risk budgets and strategy weights.

Layer 4 — Allocation

Converts scores and regime into position weights. Supports rule-based sizing, mean-variance optimization, and RL policy approaches.

Methodology

Walk-Forward Validation

All backtests use a strict walk-forward (anchored expanding window) framework to avoid look-ahead bias. Each rebalance period trains on all data up to that point, generates a forward allocation, and records out-of-sample returns. No future information leaks into any layer's parameters.

Provenance Ledgers

Every experiment run produces a provenance ledger: a structured log of which screening criteria were active, what regime was detected per period, which signals contributed to the attractiveness score, and the resulting allocation vector. This makes it possible to trace any portfolio outcome back to the specific decisions made at each layer — essential for debugging and understanding what drives performance.

Pilot Universe Design

Multiple screened pilot universes are tested in parallel:

Momentum-screened: assets above a trailing return threshold with liquidity filters
Low-vol screened: assets with below-median realized volatility
Multi-factor screened: composite filter combining momentum, quality, and size factors
Unconstrained: full universe as a baseline

Regime Detection Approaches

Several regime classifiers are compared:

Rolling volatility quantiles — simple, interpretable, fast
Hidden Markov Models (HMM) — probabilistic regime assignment with smoother transitions
Trend-following filter — moving-average crossovers with adaptive thresholds

RL Allocation Layer (In Development)

The current rule-based allocation layer (equal-weight top-N, volatility-weighted) serves as a strong baseline. The next phase replaces this with a reinforcement learning policy trained using PufferLib, where the state space includes the attractiveness scores, regime probabilities, and portfolio-level risk metrics. The action space is continuous position sizing, with the reward function defined as risk-adjusted return over a forward window.

This is where the layered approach shines: the RL agent doesn't need to learn screening or regime detection from raw price data — those are already handled by lower layers. Its job is strictly given these signals and this regime, how much should I allocate? This reduces the state space dramatically and should lead to faster, more stable training.

Current Findings

Preliminary walk-forward results across multiple pilot universes show several consistent patterns:

Screening matters: screened universes consistently outperform their unconstrained counterparts on risk-adjusted metrics, especially during drawdown periods
Regime-awareness reduces tail risk: dynamic risk budgeting based on detected regime reduces maximum drawdown and volatility-of-volatility compared to static allocations
Provenance tracking reveals hidden drift: several strategies that looked good on aggregate metrics showed clear strategy decay when examined period-by-period through the ledger

Next steps: Complete PufferLib RL integration for the allocation layer, add cross-asset-class pilot universes, implement transaction cost modeling, and produce a formal paper with full statistical significance testing across all layer combinations.

Portfolio Allocation as Layered System Optimization

⚠ Work in Progress