Portfolio Allocation as Layered System Optimization

01

ECE-GY 6233 · System Optimization Methods · Spring 2026

Portfolio Allocation as
Layered System Optimization

Mathematical modelling, control decomposition, and a 60-arm walk-forward evaluation of where optimization actually pays.

Author

Ashesh Kaji
New York University · ask9184@nyu.edu

Method at a glance

3 universes × 5 screens × 4 controllers · 60 arms · chronological walk-forward · policy-gradient meta-router · game-theoretic equilibrium extension.

Use → or Space to advance · N for notes · S to toggle scroll mode May 2026

02

Thesis

The best recipe changes with the kitchen.

If a single recipe worked everywhere, the decomposition wouldn't matter. The fact that it changes proves the layers interact.

Top-100

Screen does the work.

Sharpe 1.47

21d momentum + equal weight. The binding constraint is which names, not how much.

Top-250

Risk filtering wins.

Sharpe 0.86

63d low-vol + equal weight. Middle universes are noisy — the best move is to filter, not chase.

Top-500

Two-stage ranking pays.

Sharpe 1.76

63d vol-adj momentum + 21d rank-and-hold. Enough dispersion survives screening to justify a second active layer.

03

The Baseline

Markowitz is convex, clean, and assumes the menu is fixed.

Classical theory: pick weights w on the simplex to maximize μᵀw − λwᵀΣw. But nobody tells you which μ and Σ to use. The upstream decision — which stocks, which screen, which horizon — changes the optimization problem entirely.

Risk aversion λ: 2.0

04

Friction

Trading cost is control effort, not just a fee.

A strategy that rebalances aggressively needs to earn back its movement cost. Equal weight with a stable screen trades at near zero turnover; rank-and-hold churns 22-40 times per year. The signal has to be that much better.

Turnover penalty κ: 0 | Cost: 0.02

05

The decomposition

Two stacked choices: what to consider, then how much of each.

Step 1, screen: pick which K items from a long list. (The shopping list.) Step 2, weighting: pick how much of each. (The recipe.) Both are optimization choices; classical theory only handles step 2.

1 · UniverseWhich market can I shop in?

2 · ScreenWhich names make the active list?

3 · ControllerHow much of each name?

4 · FrictionHow costly is movement?

5 · RouterWhich recipe should run now?

Index the choices by universe breadth b, screen rule j, and weighting controller k. The whole stack is one optimization:

$$\begin{aligned} \max_{b,\,j,\,k} \quad & \underbrace{\tfrac{\widehat{\mu}_{b,j,k}}{\widehat{\sigma}_{b,j,k}}}_{\text{risk-adj. return}} \;-\; \phi\,\underbrace{\widehat{\text{DD}}_{b,j,k}}_{\text{drawdown}} \;-\; \psi\,\underbrace{\widehat{\text{TO}}_{b,j,k}}_{\text{turnover}} \\[2pt] \text{s.t.} \quad & H_{b,j}(t) = S_j\!\left(\mathcal{U}_b(t),\mathcal{F}_t\right) && \text{(shopping list)} \\ & w_t = C_k\!\left(H_{b,j}(t),\mathcal{F}_t,w_{t^-}\right) && \text{(recipe)} \\ & |H_{b,j}(t)| \le K_{\max}, \quad w_t \in \Delta_{|H_{b,j}(t)|}. \end{aligned}$$

Step 1 · The shopping list (screen)

Score each candidate by a rule q_i(t) and keep the top K. A 0–1 cardinality problem:

$$\max_{z\in\{0,1\}^{|\mathcal{U}|}} \sum_{i\in\mathcal{U}} q_i(t)\, z_i \;\;\text{s.t.}\;\; \sum_i z_i \le K.$$

The score is the personality of the screen. Five were tested:

$$\begin{aligned} q_i^{\text{mom},\tau} &= \tfrac{P_i(t)}{P_i(t-\tau)} - 1 && \text{recent winners} \\ q_i^{\text{vam},\tau} &= \tfrac{P_i(t)/P_i(t-\tau)\,-\,1}{\hat\sigma_{i,\tau}(t)+\varepsilon} && \text{winners, noise-discounted} \\ q_i^{\text{lvol},\tau} &= -\hat\sigma_{i,\tau}(t) && \text{calmest names} \\ q_i^{\text{cluster}} &= q_i^{\text{mom},63} && \text{winners, sector-capped} \end{aligned}$$

Step 2 · The recipe (controller)

Given a shopping list H_t, pick weights. Two families tested.

Equal weight. Split evenly — no estimation, no opinions:

$$w_{i,t}^{\text{EW}} = \tfrac{1}{|H_t|}\,\mathbf{1}\{i \in H_t\}.$$

Rank-and-hold. Inside the list, keep only the top k recent winners:

$$w_{i,t}^{\text{MRH},\tau,k} = \tfrac{1}{k}\,\mathbf{1}\!\left\{ i \in \operatorname{TopK}_{\ell\in H_t}\!\left(\tfrac{P_\ell(t)}{P_\ell(t-\tau)} - 1\right)\right\}.$$

It's a simple recipe, but it isn't free — every re-rank means trading. Whether the extra ranking earns back its cost is what slide 9 measures.

What we actually solve

The headline equation is mixed-integer and we don't solve it directly. We sample it: 3×5×4 = 60 named combinations evaluated honestly, then we map the surface.

06

The Shopping List

Screening is discrete optimization, not preprocessing.

Same 500 candidates. Three different scoring rules. Three different active sets. The controller downstream never sees the same menu twice.

K = 20

07

How we kept ourselves honest

No peeking at tomorrow.

Chronological walk-forward splits. Screens and controllers see only training-window data. Performance is measured strictly out of sample. 60 arms × 3 cost slices: all recorded, no cherry-picking.

08

The whole map, in one picture

The bright spot moves as the kitchen grows.

If the layers didn't interact, the same cell would win everywhere. It doesn't. Toggle between universes and watch where the dark cells land.

09

Headline result

Different kitchens want different recipes.

Top-100 · 22 splits

21d momentum top 10 + equal weight

Sharpe 1.47

20.4% ann. return · 20.2% vol · −44.9% worst DD · ≈0 turnover

Why: picking the right 10 names does all the heavy lifting. Adding active ranking inside the screened set only adds turnover.

Top-250 · 11 splits

63d low-vol top 10 + equal weight

Sharpe 0.86

11.2% ann. return · 14.1% vol · −10.5% worst DD · ≈0 turnover

Why: the middle universe is noisy — momentum doesn't stick. Filtering for stability beats chasing trends.

Top-500 · 9 splits

63d vol-adj momentum top 10 + 21d rank-and-hold top 5

Sharpe 1.76

35.1% ann. return · 20.8% vol · −15.5% worst DD · 22.1× turnover

Why: the broad universe leaves enough dispersion that a second active layer earns back its turnover cost.

10

Layer attribution

Which layer is actually pulling its weight?

If the decomposition didn't matter, the same controller would dominate everywhere. Here's what actually happens — averaged across all 5 screens in each universe:

Mean controller Sharpe (screens averaged)

Turnover cost of each controller

The interaction is real.

Equal weight mean Sharpe: Top-100 → 1.38, Top-500 → 1.42 (stable). 21d rank-and-hold: Top-100 → 0.95, Top-500 → 1.43 (inverts). The gap between best and worst controller is 0.51-0.53 Sharpe in every universe. This is not config tuning — it's the same controllers solving fundamentally different problems.

11

Honesty Check

The exact winner is fragile. The pattern is not.

Block-bootstrap over walk-forward splits (B = 2000 resamples). The point-estimate best arm is selected in approximately one-third of resamples — the specific winner is fragile. Winner gaps are narrow (0.03–0.10 Sharpe) and most confidence intervals include zero, consistent with only 9–22 independent folds. But the interaction direction — EW dominates narrow universes, active ranking becomes viable in broad ones — survives resampling.

B = 2000 resamples

12

Robustness

Does the result depend on 10 bps?

No. The same winners are selected at 0, 10, and 25 bps in all three universes. The Top-500 active ranking winner decays from Sharpe 1.81 to 1.67 as costs rise — but it stays on top. Equal-weight winners don't move because they don't trade.

13

A head chef that learns the menu

Can a learned router beat "do whatever worked recently"?

We don't ask a neural net to design portfolios from scratch — that would be a black box. Instead we ask it to pick from the recipes we already understand.

Each recipe is an "arm" $a$. State $s_t$ (time, kitchen size, recent rewards). Softmax policy:

$$\pi_\theta(a \mid s_t) = \frac{\exp(f_\theta(s_t)_a)}{\sum_{a'} \exp(f_\theta(s_t)_{a'})}.$$

Router reward with switching penalty:

$$r^{\text{router}}_t = R_t(a_t) - \xi\,\mathbf{1}\{a_t \ne a_{t-1}\}.$$

Trained via REINFORCE:

$$\nabla_\theta J = \mathbb{E}_\pi\Bigl[\sum_t (G_t - b_t)\,\nabla_\theta \log \pi_\theta(a_t \mid s_t)\Bigr].$$

The fair comparison

Same reward stream, four head chefs:

Learned (RL): the policy above.
Train-fixed: pick the best training recipe, never change.
Trailing window: "do whatever worked recently."
Random / Oracle: floor and ceiling.

Cumulative test reward by router (10 bps)

RL improves over train-fixed in the combined panel (+0.08) but loses to trailing-window (−0.33).

Training trajectory · single seed

30 policy-gradient updates. Reward stabilizes after ~8 steps; high variance from few independent trajectories.

What this tells us

The learner beats holding one recipe, but loses to the dumb-but-honest "what worked recently" rule. In this finite sample, recency already squeezes most predictive juice from recent rewards. A learned router only earns its keep with more than recency — like structural descriptors about the kitchen itself. That's the next study.

14

Multi-Agent Extension

Crowding as a diversification mechanism

6 strategy families, one shared candidate universe. Each player maximizes its own utility minus a crowding penalty. At moderate crowding (γ = 0.05), strategies partially separate while retaining ~97% of independent utility. The Nash equilibrium becomes a tool for engineering orthogonal strategy ensembles — γ controls how much diversification you enforce.

Equilibrium at γ = 0.05 (moderate crowding)

Strategy	Single-agent	Equilibrium	Δ
21d mom / EW	0.205	0.209	−1.9%
63d mom / RH 21d	0.625	0.614	+1.8%
63d low-vol / EW	−0.046	−0.015	+67.4%
63d VAM / RH 21d	0.039	0.048	−23.1%
63d mom / EW	0.187	0.185	+1.1%
Cluster mom / EW	0.192	0.192	0.0%

Negative Δ = better in equilibrium (less competition for those names).

Three things this tells us

1. γ is a diversification dial. At γ=0, strategies overlap on 25% of allocations. At γ=0.05, overlap drops to 10% — a 58% reduction — while retaining 97% of independent utility. You can tune γ to engineer the desired level of strategy orthogonality.

2. Low-volatility benefits from separation. Other players are pushed AWAY from low-vol names, reducing competition. A strategy that looks weak in isolation (single-agent utility −0.046) improves substantially in equilibrium (−0.015). Crowding reveals hidden value.

3. No monoculture emerges. Over 20 quarterly windows, the dominant player's utility share fluctuates 28-36%. The equilibrium preserves a genuinely diversified strategy ecology — this isn't a robustness check, it's an ensemble construction method.

The contribution isn't new game theory — best-response dynamics are standard. It's the empirical validation: crowding can be used as a principled diversification tool, and γ gives you a dial for controlling strategy orthogonality.

15

Crowding Dynamics

Three regimes as crowding increases.

Overlap vs. crowding coefficient γ

γ	Overlap	Regime
0.00	24.6%	Independent play
0.01	14.1%	Partial separation
0.05	10.3%	Partial separation
0.10	6.7%	Partial separation
0.25	2.4%	Near-disjoint
0.50	0.0%	Disjoint
1.00	0.0%	Disjoint

Why this matters

This is a system optimization method applied to a multi-agent version of the same problem. The single-agent study says "this stack is best." The game theory extension says "but you can deploy the equilibrium ensemble instead — use γ to control diversification."

The method — iterative best-response with convex subproblems — converges reliably. The equilibrium is a diversification tool: rather than picking a single winner, deploy the equilibrium composition to capture orthogonal alpha while reducing concentration risk.

The answer: moderate crowding (γ≈0.05) partially separates strategies while keeping most of their independent alpha. You get a diversified strategy portfolio with minimal efficiency cost. γ lets you dial in the desired level of orthogonality.

16

Where the evidence runs out

What this study does not yet establish.

Data

Point-in-time membership.

Current universes use approximate constituents. Delisted names are silently missing, which biases performance estimates upward. Rebuilding with proper membership filters is the top priority.

Inference

Small fold counts.

Top-100 has 22 splits; Top-250 has 11; Top-500 has 9. Confidence intervals on mean Sharpe differences are wide. The results are mechanism evidence, not deployment claims.

Costs

Proportional only.

Real market impact is nonlinear in order size and varies by liquidity. The 0-25 bps sweep helps but doesn't replace a proper impact model. RL has 4-14 independent routing episodes per scope.

17

Why This Matters

Every optimization method from class has a job in this system.

Methods deployed

Method	Where
Convex QP	Markowitz with ℓ₁ turnover penalty
Combinatorial (top-K)	Cardinality-constrained screening
Dynamic programming	Bellman recursion with state-dependent costs
Online learning	Trailing-window model selection
Policy-gradient RL	Learned meta-controller over arms
Nash equilibrium	Best-response dynamics for 6 players

The system optimization insight

The class teaches these methods separately. This project shows they compose.

When the binding constraint is in the feasible set → use combinatorial screening.

When it's in the weight space → use convex QP.

When it's in timing → use online learning or RL.

When it's across agents → use game theory.

System optimization = knowing which tool to reach for at which layer.

18

Conclusion

Three things I want you to remember.

The decomposition is real. The same controller wins in narrow universes and loses in broad ones. The layers interact. If you treat allocation as one flat problem, you miss where optimization is actually binding.
RL shows the architecture works. The features don't. Learned meta-control over interpretable arms is the right structure. But the current router can't beat "do whatever worked recently." Give it structural descriptors and the story changes.
The crowding mechanism is a diversification tool. The Nash equilibrium doesn't just show strategies survive — it shows how to engineer a diversified ensemble. At moderate γ, strategies partially separate while retaining their alpha. γ is a dial: turn it up for more orthogonality, down for strategies closer to their single-agent optima.