Piotroski F-Score Backtest: What 28 Years of US Data Actually Show

We ran the Piotroski F-Score on US value stocks from 1997 to 2024. Score 8-9 returned 6.1% vs 9.4% for SPY. Selection alpha is -1.6% per year. Avoidance alpha is +2.8%. The F-Score works as a floor (skip Score 0-2) but not as a ceiling in large-cap US.

Growth of $1 invested in Piotroski Score 8-9 vs Score 0-2 vs S&P 500 from 1997 to 2024

An accounting professor created a nine-point checklist to separate winning value stocks from future bankruptcies. We ran it on US equities from 1997 to 2024. The high-quality basket (Score 8-9) returned 6.1% annually with a Sharpe of 0.164. The S&P 500 returned 9.4% over the same window. The F-Score in large-cap US value doesn't beat the market on raw returns or on risk-adjusted returns. The avoidance signal still works: skipping Score 0-2 stocks saves you from a -71.9% max drawdown. But the selection signal does not deliver alpha here.

Contents

  1. Method
  2. What is the Piotroski F-Score?
  3. The Screens
  4. Simple F-Score Screen
  5. Advanced Screen (Value + Quality)
  6. What We Found
  7. The headline: the F-Score loses to the S&P 500 in large-cap US value.
  8. Avoidance vs selection alpha
  9. Score 8-9 vs the S&P 500
  10. The All Value baseline tells you the universe is the problem
  11. Decade breakdown
  12. Portfolio concentration
  13. When It Works and When It Struggles
  14. Run This Screen Yourself
  15. Reproduce the backtest
  16. Limitations
  17. References

Data: FMP financial data warehouse, 1997–2024. Updated May 2026.


Method

Parameter Value
Data source FMP via Ceta Research warehouse
Universe US-listed (NYSE, NASDAQ, AMEX), bottom 20% by price-to-book, market cap > $1B
Period 1997-2024 (28 years)
Rebalancing Annual (April 1, after annual reports)
Weighting Equal weight
Transaction costs 0.1-0.5% per trade (size-tiered)
Execution Next-day close (MOC)
Benchmark S&P 500 (SPY)
Data quality Phantom holiday rows and 3-5x adjClose oscillations removed before backtest. Individual stock returns filtered: entry price > $1, single-period return < 200%.

Piotroski scores computed from raw financial statements (income statement, balance sheet, cash flow statement) at each rebalance date. No pre-computed scores used in the backtest. Point-in-time data to avoid look-ahead bias.


What is the Piotroski F-Score?

Joseph Piotroski published his research in 2000 while at the University of Chicago. He wanted to answer a specific question: among cheap stocks, which ones recover and which ones fail?

His answer: a nine-point binary checklist using only financial statement data. One point for each criterion met.

Profitability (4 points): 1. Positive net income 2. Positive operating cash flow 3. Return on assets improved from last year 4. Cash flow exceeds net income (quality earnings)

Leverage and Liquidity (3 points): 5. Long-term debt decreased 6. Current ratio improved 7. No new shares issued (no dilution)

Operating Efficiency (2 points): 8. Gross margin improved 9. Asset turnover improved

Score 9 = fundamentally strong. Score 0 = red flags across the board.


The Screens

Simple F-Score Screen

-- Piotroski F-Score: Simple Screen
-- Run at: cetaresearch.com/data-explorer

SELECT
    s.symbol,
    p.companyName,
    s.piotroskiScore,
    f.priceToBookRatioTTM as price_to_book,
    k.marketCap / 1000000000 as market_cap_billions
FROM scores s
JOIN key_metrics_ttm k ON s.symbol = k.symbol
JOIN financial_ratios_ttm f ON s.symbol = f.symbol
JOIN profile p ON s.symbol = p.symbol
WHERE
    s.piotroskiScore >= 8
    AND k.marketCap > 1000000000
    AND p.exchange IN ('NYSE', 'NASDAQ', 'AMEX')
ORDER BY s.piotroskiScore DESC, f.priceToBookRatioTTM ASC
LIMIT 50

Finds all US-listed stocks scoring 8 or 9 with market cap above $1B. The pre-calculated score saves you from running nine separate queries.

Run this query on Ceta Research

Advanced Screen (Value + Quality)

-- Piotroski F-Score: Value + Quality Screen
-- Run at: cetaresearch.com/data-explorer

SELECT
    s.symbol,
    p.companyName,
    s.piotroskiScore,
    f.priceToBookRatioTTM as price_to_book,
    f.priceToEarningsRatioTTM as pe_ratio,
    k.currentRatioTTM,
    f.debtToEquityRatioTTM as debt_to_equity,
    k.returnOnEquityTTM * 100 as roe_pct,
    k.marketCap / 1000000000 as market_cap_billions
FROM scores s
JOIN key_metrics_ttm k ON s.symbol = k.symbol
JOIN financial_ratios_ttm f ON s.symbol = f.symbol
JOIN profile p ON s.symbol = p.symbol
WHERE
    s.piotroskiScore >= 7
    AND f.priceToBookRatioTTM > 0
    AND f.priceToBookRatioTTM < 1.5
    AND f.priceToEarningsRatioTTM > 0
    AND f.priceToEarningsRatioTTM < 20
    AND k.marketCap > 1000000000
    AND p.exchange IN ('NYSE', 'NASDAQ', 'AMEX')
    AND p.sector NOT IN ('Financial Services')
ORDER BY s.piotroskiScore DESC, f.priceToBookRatioTTM ASC
LIMIT 30

Adds price-to-book and P/E filters to match Piotroski's original value stock focus. Excludes financials where book value comparisons are less meaningful.

Run this query on Ceta Research


What We Found

Growth of $1 invested in Piotroski Score 8-9 vs Score 0-2 vs S&P 500 from 1997 to 2024
Growth of $1 invested in Piotroski Score 8-9 vs Score 0-2 vs S&P 500 from 1997 to 2024

Growth of $1: Score 8-9 (high quality value), Score 0-2 (low quality value), All Value (bottom 20% P/B), and S&P 500. 1997-2024.

The headline: the F-Score loses to the S&P 500 in large-cap US value.

Score 8-9 stocks returned 6.1% annually. The S&P 500 returned 9.4%. That's -3.3 percentage points per year over 28 years. Compounded, $1 in Score 8-9 grew to $5.22 vs $12.35 in SPY. The selection signal does not produce alpha in this universe.

Portfolio CAGR Volatility Sharpe Sortino Max Drawdown % Negative Years
Score 8-9 6.1% 24.8% 0.164 0.289 -48.1% 36%
All Value 7.7% 27.8% 0.204 0.425 -51.4% 29%
Score 0-2 4.9% 33.8% 0.086 0.185 -71.9% 39%
S&P 500 9.4% 21.5% 0.343 0.711 -39.3% 21%

The Score 8-9 vs Score 0-2 spread is +1.2% per year. Score 8-9 wins, but the gap is small.

Three observations matter more than the headline:

First: Score 8-9 still beats Score 0-2 on risk-adjusted returns. Sharpe of 0.164 vs 0.086. Sortino of 0.289 vs 0.185. The high-quality basket runs at lower volatility (24.8% vs 33.8%) and has a much smaller max drawdown (-48.1% vs -71.9%). The F-Score is sorting cheap stocks correctly within the value universe.

Second: the avoidance signal is +2.8%. Removing Score 0-2 stocks from the value universe lifted CAGR from 4.9% to 7.7%. The F-Score's most reliable role is telling you what to skip.

Third: the selection signal is -1.6%. Restricting to Score 8-9 stocks dropped CAGR from 7.7% (all value) to 6.1%. The top tier of F-Score winners isn't where the value-universe alpha lives.

Avoidance vs selection alpha

  • Selection alpha (Score 8-9 vs All Value): -1.6% per year
  • Avoidance alpha (All Value vs Score 0-2): +2.8% per year

The pattern is clear. The F-Score's value in large-cap US is in the floor it sets, not the ceiling. Buying only Score 8-9 stocks costs you alpha. Avoiding Score 0-2 stocks captures alpha.

Score 8-9 vs the S&P 500

Metric Value
Excess CAGR -3.31%
Up Capture 85.0%
Down Capture 101.9%
Beta 0.794
Alpha -1.79%

Score 8-9 captured 85% of the SPY upside but 102% of the downside. The negative alpha (-1.79%) means after adjusting for beta, the portfolio still underperformed. This is the opposite of what most readers expect from a quality-tilted value screen.

The All Value baseline tells you the universe is the problem

The unfiltered value universe (bottom 20% P/B, market cap > $1B) returned 7.7%. SPY returned 9.4%. The value premium isn't showing up in large-cap US over this window. The F-Score didn't break here. The universe did.

Decade breakdown

Annual returns for Score 8-9, Score 0-2, and S&P 500
Annual returns for Score 8-9, Score 0-2, and S&P 500

Annual returns by year. The spread between Score 8-9 and Score 0-2 varies widely. Two of four decades favor Score 0-2.

Decade Score 8-9 Avg Score 0-2 Avg Spread
1990s -1.2% 9.6% -10.8%
2000s 12.3% 9.2% +3.0%
2010s 3.0% 3.3% -0.3%
2020-25 21.3% 24.2% -2.9%

The 2000s is the only decade with a clear positive spread (+3.0%). In the 1990s, Score 0-2 beat Score 8-9 by 10.8 points. The 2010s were a wash. The 2020s favored Score 0-2 because beaten-down value stocks bounced harder coming out of COVID.

Portfolio concentration

Post-2010, the Score 8-9 basket averaged about 50 stocks per year. Earlier decades had thinner coverage. The portfolio carries meaningful concentration risk in the early sample, less so post-2010.


When It Works and When It Struggles

Works as an avoidance filter: - Removing Score 0-2 stocks lifts the value-universe CAGR by 2.8% per year. - Score 0-2 stocks experienced a -71.9% max drawdown. Avoiding that hole alone justifies the screen.

Struggles as a selector: - Score 8-9 stocks trail the unfiltered value universe by -1.6% per year. - Speculative recoveries (1990s, 2020-22) reward beaten-down low-score stocks more than improving high-score ones.

The market cap constraint matters. Piotroski's original research focused on small and micro-cap value stocks, where the F-Score effect was strongest. Our $1B market cap threshold restricts the universe to large-cap value, where the effect is documented to be weakest. Institutional coverage, analyst attention, and market efficiency all reduce the informational advantage of the F-Score in large caps.


Run This Screen Yourself

The current Piotroski screen for US stocks is live on Ceta Research:

cetaresearch.com/data-explorer

The screen runs on live FMP data. Results update as new annual filings come in.

Reproduce the backtest

Full backtest code is on GitHub: ceta-research/backtests

git clone https://github.com/ceta-research/backtests.git
cd backtests
pip install -r requirements.txt
export CR_API_KEY="your_key_here"

# Run the Piotroski screen on current data
python3 piotroski/screen.py

# Run the full backtest
python3 piotroski/backtest.py --preset us --verbose

Limitations

Large-cap bias. The $1B market cap floor puts this squarely in large-cap territory. Piotroski's original edge was strongest in small and micro caps where information asymmetry is greatest. Our results understate the effect as documented in the academic literature.

Backward-looking. The F-Score measures what happened last year. It doesn't predict disruption, regulatory changes, or competitive shifts.

Equal weighting of criteria. A 0.1% improvement in gross margin gets the same point as doubling cash flow. Not all improvements are equal.

Underperformance vs SPY. Score 8-9 trails the S&P 500 by 3.31% per year. If you can buy an index fund, the F-Score doesn't help your CAGR in large-cap US.

Concentration risk. With smaller stock counts in earlier years, single-stock outcomes can move the annual return substantially.

No sector balancing. The screen doesn't control for sector concentration. Some years may over-weight a single industry.


References

  • Piotroski, J. (2000). "Value Investing: The Use of Historical Financial Statement Information to Separate Winners from Losers." Journal of Accounting Research, 38 (suppl.), 1-41.
  • Asness, C., Frazzini, A. & Pedersen, L. (2019). "Quality Minus Junk." Review of Accounting Studies, 24(1), 34-112.
  • Fama, E. & French, K. (1993). "Common Risk Factors in the Returns on Stocks and Bonds." Journal of Financial Economics, 33(1), 3-56.

Data: Ceta Research, FMP financial data. US exchanges (NYSE, NASDAQ, AMEX). Backtest: 1997-2024 (28 years), annual rebalance (April 1), equal weight, value universe (bottom 20% P/B, market cap > $1B). Transaction costs: 0.1-0.5% per trade (size-tiered). Next-day close (MOC) execution. Data quality guards: phantom holiday rows removed, individual stock returns filtered for adjClose artifacts.