Multi-Pair Pairs Trading: Why Diversification Makes Things Worse on US

Multi-Pair Pairs Trading: Why Diversification Makes Things Worse on US Stocks

We ran a multi-pair portfolio construction backtest on US stocks from 2005 to 2024, testing portfolio sizes from 5 to 20 pairs with two allocation methods. The result contradicts the standard diversification narrative: adding more pairs made returns worse, not better. Transaction costs erode marginal convergence gains, and the best configuration (5-pair equal weight) still lost 0.43% annually against a benchmark that gained 9.81%.

Contents

  1. Method
  2. What We Found
  3. The transaction cost problem
  4. The diversification paradox
  5. When the strategy works: crisis periods
  6. Sharpe ratio trend by portfolio size
  7. 20-year summary (20-pair inv-vol)
  8. Why Not Run It Without Costs?
  9. Backtest Methodology
  10. Limitations
  11. Takeaway
  12. Part of a Series

Method

  • Data source: Ceta Research (FMP financial data warehouse)
  • Universe: NYSE, NASDAQ, AMEX, market cap > $1B USD
  • Period: 2005-2024 (20 years)
  • Benchmark: S&P 500 Total Return (SPY)

Pair selection uses the same engine as our single-pair pairs-zscore strategy. Same-sector pairs with correlation >= 0.70 (minimum 200 common trading days). Half-life filter: 5-60 trading days using an Ornstein-Uhlenbeck fit. Formation period = prior calendar year; trading period = current year. Up to 30 candidate pairs formed per exchange.

Each year, we form the top N pairs by correlation and run them as a portfolio with two allocation methods: - Equal weight: each pair gets 1/N of capital - Inverse-vol: each pair weighted by 1/spread_vol, normalized to sum to 1

Entry: z-score crosses ±2.0. Exit: z-score reverts below ±0.5, 60-day time stop, or -5% loss stop. Size-tiered transaction costs applied at every trade leg.

We tested portfolio sizes N = 5, 10, 15, 20. The primary configuration reported throughout is 20-pair inverse-vol (the most diversified setup).


What We Found

The strategy loses money across all configurations on US stocks.

Multi-pair portfolio: cumulative growth of $1,000 (US, 2005-2024)
Multi-pair portfolio: cumulative growth of $1,000 (US, 2005-2024)

The transaction cost problem

2,680 trades over 20 years. 86.6% of them ended in convergence. Pairs mean-reverted as expected. The problem isn't the trade success rate, it's the economics.

Metric Value
Total trades (20yr) 2,680
Convergence rate 86.6%
Time stop exits 0.7%
Loss stop exits 7.6%
Avg hold period 15.4 days
Avg trade return -0.181%

Average trade return is -0.181%. Each pairs trade has 4 legs (buy stock A, sell stock B on entry; reverse both on exit). At 5-8 basis points per leg, round-trip cost is 20-32 bps. The spread converges, but the convergence profit is smaller than the friction. 86.6% convergence with -0.181% average return per trade.

The diversification paradox

Here's what the textbooks don't cover: diversification only helps when you have positive-expectancy trades.

With positive expected trade return, adding more pairs increases average return (the law of large numbers working in your favor). With negative expected trade return, adding more pairs lowers the portfolio average. You're averaging more losers.

N pairs Equal CAGR Inv-vol CAGR Equal MaxDD Inv-vol MaxDD
5 -0.43% -0.99% -12.06% -18.16%
10 -1.11% -1.25% -20.06% -22.24%
15 -1.31% -1.28% -23.20% -23.69%
20 -1.22% -1.23% -21.81% -21.95%

The 5-pair equal-weight configuration is the least bad: -0.43% CAGR, -12.06% max drawdown. Moving to 20 pairs makes the CAGR worse (-1.22%) and doubles the drawdown (-21.81%).

Equal weight consistently beats inverse-vol in this market. Inverse-vol allocates more capital to pairs with lower spread volatility, but those aren't necessarily the ones with better expected returns.

Annual returns vs SPY (US, 20-pair inv-vol)
Annual returns vs SPY (US, 20-pair inv-vol)

When the strategy works: crisis periods

The 20-year record shows only two meaningful positive periods:

Period Portfolio SPY
2008 +1.45% -36.23%
2009 +1.77% +22.66%

2008 stands out. A market-neutral strategy that returned +1.45% while the S&P lost 36% shows the decorrelation working exactly as designed. The long-short spread positions were insulated from the directional crash. But 2009, a rebound year, also returned +1.77%, while SPY rallied 22.66%. The pairs captured a small absolute gain while the market posted its best year in decades. Both good years are small positives; every other year was a small negative.

The strategy isn't terrible during crises. It's just expensive to run in normal markets.

Sharpe ratio trend by portfolio size

Portfolio size vs risk-adjusted returns (US, equal weight vs inverse-vol)
Portfolio size vs risk-adjusted returns (US, equal weight vs inverse-vol)

Sharpe ratio deteriorates as portfolio size grows, then partially recovers at N=20. The Sharpe at N=5 equal weight (-1.082) is far better than at N=10 (-2.211) or N=15 (-2.402). The N=20 equal-weight partially recovers (-2.750 is still deeply negative but less so than N=15). The recovery from 15 to 20 is likely due to averaging across more pairs smoothing the individual pair variance, but the mean return stays negative, so Sharpe is still poor.

20-year summary (20-pair inv-vol)

Metric Portfolio SPY
CAGR -1.23% +9.81%
Total Return -21.95% +550.1%
Max Drawdown -21.95% -36.23%
Sharpe Ratio -2.735 +0.459
Annualized Vol 1.18% 17.04%
Win Rate vs SPY 15.0%
Beta to SPY -0.025

Beta of -0.025 confirms the strategy is genuinely market-neutral. Down capture of 4.0% means when SPY falls, this portfolio barely moves. When SPY rises, it also barely moves (up capture -7.73%). Market neutral, but losing about 1.2% per year to costs.


Why Not Run It Without Costs?

The backtest framework supports a --no-costs flag. Without transaction costs, the 20-pair equal-weight strategy returns approximately +2.5% CAGR. The spread convergence works. The alpha is there in gross terms.

This matters for context: pairs trading isn't a failed strategy in theory. It fails in practice on US large-cap stocks because the gross edge per trade (~20-30 bps) is too thin to survive realistic execution costs. Institutional players running pairs at much lower cost structures can capture this.

For retail and small institutional size using market orders, the friction is the problem.


Backtest Methodology

Parameter Value
Universe NYSE, NASDAQ, AMEX — MCap > $1B USD
Pair selection Correlation >= 0.70, min 200 common days
Half-life filter 5-60 trading days (OU fit)
Formation period Prior calendar year
Z-score entry |z| > 2.0
Z-score exit |z| < 0.5
Time stop 60 days
Loss stop -5%
Portfolio sizes tested 5, 10, 15, 20
Allocation methods Equal weight, Inverse-vol
Transaction costs Size-tiered (5-8 bps per leg)
Period 2005-2024 (20 years)
Benchmark SPY (S&P 500 Total Return)

Limitations

Transaction costs are the margin of error. The all-in gross return before costs is modestly positive on US stocks. The strategy is borderline: small changes in cost assumptions flip it from slightly positive to slightly negative.

Formation-period stability. Pairs are selected on prior-year correlation and half-life. A pair that was highly correlated in the formation year may decohere during the trading year. This happens more often than the 86.6% convergence rate suggests, pairs may converge without profiting if the convergence is slow or partial.

Sector concentration. The top-5 most correlated pairs often cluster in 1-2 sectors. Financials pairs dominate in crisis periods. The strategy carries implicit sector bets that aren't obvious from the market-neutral framing.

Survivorship bias. Universe is filtered by current-period market cap at rebalance, not prospective availability. Delistings and mergers during the holding period aren't modeled.

India is different. This analysis is US-only. On Indian stocks, the 5-pair equal-weight configuration returns +11.80% CAGR over the same period. The pair universe quality matters. See our India analysis.


Takeaway

Pairs trading on US large-cap stocks fails the cost test. 86.6% convergence rate, but average trade return of -0.181%. The trades work; the economics don't.

The diversification paradox is the sharpest finding: running 20 pairs doesn't help compared to 5. When expected trade return is negative, more pairs means a lower portfolio mean. The diversification "benefit" disappears.

The strategy's one genuine feature is crisis resilience. +1.45% in 2008 while the market lost 36% demonstrates real market neutrality. For portfolio diversification purposes, as a hedge against directional risk, multi-pair pairs trading has value. As a standalone return generator, the numbers don't work.

The interesting question: does the same engine work better in less efficient markets? See our comparison across 11 exchanges, and the India deep-dive where the top-5 pairs genuinely outperform.


Part of a Series


Part of a Series: Global | India

Run It Yourself

Explore the data behind this analysis on Ceta Research. Query our financial data warehouse with SQL, build custom screens, and run your own backtests across 70,000+ stocks on 20 exchanges.

Data: Ceta Research, FMP financial data warehouse. Universe: NYSE + NASDAQ + AMEX, MCap > $1B. 2005-2024. Size-tiered transaction costs included.

Read more