Portfolio Allocation as
Layered System Optimization
Mathematical modelling, control decomposition, and a 60-arm walk-forward evaluation of where optimization actually pays.
Author
Ashesh Kaji
New York University · ask9184@nyu.edu
Method at a glance
3 universes × 5 screens × 4 controllers · 60 arms · chronological walk-forward · policy-gradient meta-router · game-theoretic equilibrium extension.
The best recipe changes with the kitchen.
Top-100
Screen does the work.
21d momentum + equal weight. The binding constraint is which names, not how much.
Top-250
Risk filtering wins.
63d low-vol + equal weight. Middle universes are noisy — the best move is to filter, not chase.
Top-500
Two-stage ranking pays.
63d vol-adj momentum + 21d rank-and-hold. Enough dispersion survives screening to justify a second active layer.
Markowitz is convex, clean, and assumes the menu is fixed.
Trading cost is control effort, not just a fee.
Two stacked choices: what to consider, then how much of each.
Index the choices by universe breadth b, screen rule j, and weighting controller k. The whole stack is one optimization:
Step 1 · The shopping list (screen)
Score each candidate by a rule q_i(t) and keep the top K. A 0–1 cardinality problem:
The score is the personality of the screen. Five were tested:
Step 2 · The recipe (controller)
Given a shopping list H_t, pick weights. Two families tested.
Equal weight. Split evenly — no estimation, no opinions:
Rank-and-hold. Inside the list, keep only the top k recent winners:
It's a simple recipe, but it isn't free — every re-rank means trading. Whether the extra ranking earns back its cost is what slide 9 measures.
What we actually solve
Screening is discrete optimization, not preprocessing.
No peeking at tomorrow.
The bright spot moves as the kitchen grows.
Different kitchens want different recipes.
Top-100 · 22 splits
21d momentum top 10 + equal weight
20.4% ann. return · 20.2% vol · −44.9% worst DD · ≈0 turnover
Why: picking the right 10 names does all the heavy lifting. Adding active ranking inside the screened set only adds turnover.
Top-250 · 11 splits
63d low-vol top 10 + equal weight
11.2% ann. return · 14.1% vol · −10.5% worst DD · ≈0 turnover
Why: the middle universe is noisy — momentum doesn't stick. Filtering for stability beats chasing trends.
Top-500 · 9 splits
63d vol-adj momentum top 10 + 21d rank-and-hold top 5
35.1% ann. return · 20.8% vol · −15.5% worst DD · 22.1× turnover
Why: the broad universe leaves enough dispersion that a second active layer earns back its turnover cost.
Which layer is actually pulling its weight?
Mean controller Sharpe (screens averaged)
Turnover cost of each controller
The interaction is real.
The exact winner is fragile. The pattern is not.
Does the result depend on 10 bps?
Can a learned router beat "do whatever worked recently"?
Each recipe is an "arm" $a$. State $s_t$ (time, kitchen size, recent rewards). Softmax policy:
Router reward with switching penalty:
Trained via REINFORCE:
The fair comparison
Same reward stream, four head chefs:
- Learned (RL): the policy above.
- Train-fixed: pick the best training recipe, never change.
- Trailing window: "do whatever worked recently."
- Random / Oracle: floor and ceiling.
Cumulative test reward by router (10 bps)
Training trajectory · single seed
What this tells us
The learner beats holding one recipe, but loses to the dumb-but-honest "what worked recently" rule. In this finite sample, recency already squeezes most predictive juice from recent rewards. A learned router only earns its keep with more than recency — like structural descriptors about the kitchen itself. That's the next study.
Crowding as a diversification mechanism
Equilibrium at γ = 0.05 (moderate crowding)
| Strategy | Single-agent | Equilibrium | Δ |
|---|---|---|---|
| 21d mom / EW | 0.205 | 0.209 | −1.9% |
| 63d mom / RH 21d | 0.625 | 0.614 | +1.8% |
| 63d low-vol / EW | −0.046 | −0.015 | +67.4% |
| 63d VAM / RH 21d | 0.039 | 0.048 | −23.1% |
| 63d mom / EW | 0.187 | 0.185 | +1.1% |
| Cluster mom / EW | 0.192 | 0.192 | 0.0% |
Negative Δ = better in equilibrium (less competition for those names).
Three things this tells us
1. γ is a diversification dial. At γ=0, strategies overlap on 25% of allocations. At γ=0.05, overlap drops to 10% — a 58% reduction — while retaining 97% of independent utility. You can tune γ to engineer the desired level of strategy orthogonality.
2. Low-volatility benefits from separation. Other players are pushed AWAY from low-vol names, reducing competition. A strategy that looks weak in isolation (single-agent utility −0.046) improves substantially in equilibrium (−0.015). Crowding reveals hidden value.
3. No monoculture emerges. Over 20 quarterly windows, the dominant player's utility share fluctuates 28-36%. The equilibrium preserves a genuinely diversified strategy ecology — this isn't a robustness check, it's an ensemble construction method.
Three regimes as crowding increases.
Overlap vs. crowding coefficient γ
| γ | Overlap | Regime |
|---|---|---|
| 0.00 | 24.6% | Independent play |
| 0.01 | 14.1% | Partial separation |
| 0.05 | 10.3% | Partial separation |
| 0.10 | 6.7% | Partial separation |
| 0.25 | 2.4% | Near-disjoint |
| 0.50 | 0.0% | Disjoint |
| 1.00 | 0.0% | Disjoint |
Why this matters
This is a system optimization method applied to a multi-agent version of the same problem. The single-agent study says "this stack is best." The game theory extension says "but you can deploy the equilibrium ensemble instead — use γ to control diversification."
The method — iterative best-response with convex subproblems — converges reliably. The equilibrium is a diversification tool: rather than picking a single winner, deploy the equilibrium composition to capture orthogonal alpha while reducing concentration risk.
The answer: moderate crowding (γ≈0.05) partially separates strategies while keeping most of their independent alpha. You get a diversified strategy portfolio with minimal efficiency cost. γ lets you dial in the desired level of orthogonality.
What this study does not yet establish.
Data
Point-in-time membership.
Current universes use approximate constituents. Delisted names are silently missing, which biases performance estimates upward. Rebuilding with proper membership filters is the top priority.
Inference
Small fold counts.
Top-100 has 22 splits; Top-250 has 11; Top-500 has 9. Confidence intervals on mean Sharpe differences are wide. The results are mechanism evidence, not deployment claims.
Costs
Proportional only.
Real market impact is nonlinear in order size and varies by liquidity. The 0-25 bps sweep helps but doesn't replace a proper impact model. RL has 4-14 independent routing episodes per scope.
Every optimization method from class has a job in this system.
Methods deployed
| Method | Where |
|---|---|
| Convex QP | Markowitz with ℓ₁ turnover penalty |
| Combinatorial (top-K) | Cardinality-constrained screening |
| Dynamic programming | Bellman recursion with state-dependent costs |
| Online learning | Trailing-window model selection |
| Policy-gradient RL | Learned meta-controller over arms |
| Nash equilibrium | Best-response dynamics for 6 players |
The system optimization insight
The class teaches these methods separately. This project shows they compose.
When the binding constraint is in the feasible set → use combinatorial screening.
When it's in the weight space → use convex QP.
When it's in timing → use online learning or RL.
When it's across agents → use game theory.
System optimization = knowing which tool to reach for at which layer.
Three things I want you to remember.
- The decomposition is real. The same controller wins in narrow universes and loses in broad ones. The layers interact. If you treat allocation as one flat problem, you miss where optimization is actually binding.
- RL shows the architecture works. The features don't. Learned meta-control over interpretable arms is the right structure. But the current router can't beat "do whatever worked recently." Give it structural descriptors and the story changes.
- The crowding mechanism is a diversification tool. The Nash equilibrium doesn't just show strategies survive — it shows how to engineer a diversified ensemble. At moderate γ, strategies partially separate while retaining their alpha. γ is a dial: turn it up for more orthogonality, down for strategies closer to their single-agent optima.