From a pile of papers to tested ideas in minutes.

Outsample reads the most relevant work in your corpus and returns strategies, research programs, and answers with real citations. It shows you where the literature disagrees, stress-tests every idea it proposes, and tells you when the evidence is not there. Start from the included 14,500-paper library or upload your own.

Start free See the methodology

No card. 3 full synthesis runs included.

14,500+ papers in the corpus. 7 research programs in 61 s for $0.09. 21 adversarial findings on the last strategy run.

The bottleneck is not access. You already have the papers. SSRN adds more in a week than you can read in a month.

The cost is synthesis. Reading fifty papers to find the three disagreements that matter, pulling out what is actually implementable, turning it into something you can test. That takes weeks per idea, and most ideas die in the data, so most of those weeks go to disproving things a careful read would have flagged on day one.

The shortcut everyone tries is pasting abstracts into a general chatbot. It answers fluently, cites papers that do not exist, and never once says the evidence is not there.

I once spent the better part of a month on market microstructure and orderbook convexity. I got through six papers. Two were usable, and the strategy I built on them was mediocre, because two papers is not a foundation.

A real run

Weeks to minutes is a measurement, not a slogan.

On June 11 we asked for research programs on intraday momentum and microstructure-based short-horizon signals. Sixty-one seconds and nine cents later: seven scoped programs, each with a thesis, specific research questions, seed papers from the corpus, and a rationale for why the gap exists. The screenshot below is that run, untouched.

Outsample Suggest-a-Project run. Seven priority-ranked research topics returned from the corpus, each with a thesis and seed papers, from a real session on 2026-06-11. — Real run, 2026-06-11. 60.9 s wall time, $0.0864 in model cost, 50 papers retrieved, 7 ranked topics out.

Each program routes into the next step. A topic becomes a project, a project generates strategies, every strategy gets an adversarial review before you ever see it.

How it works

Point it at a corpus. Ask. Get back the work product.

With the cost and the reading list attached.

outsample sessionmeasured medians, june 2026

❯ ask "what drives drift bursts in index futures"

answer in ~30 s $0.05 12 citations, grounding counts attached

❯ disagree "momentum decay vs persistence"

paper-vs-paper map in ~25 s $0.04 the axis named, or an honest bail

❯ suggest --focus "short-horizon microstructure"

7 research programs in ~60 s $0.09 thesis, questions, seed papers

❯ strategies --adversarial

3 strategies in 3 to 4 min $0.18 flaws flagged before you see them

❯ audit --paper 1291

methodology audit in ~30 s $0.06 lookahead risk, scored and located

❯

Every answer shows what it read and what it spent. (The product is a web app; the session above is its vocabulary.)

Outsample Ask-the-corpus run. The drift-burst question answered with a multi-paragraph synthesis and 12 paper citations, from a real session on 2026-06-11. — Real run, 2026-06-11. The ask above, in the product. 34.28 s wall time, $0.0501 in model cost, 12 citations retrieved and used.

Adversarial review

Every strategy gets a hostile review before you see it.

Outsample Strategy generator with Devil’s Advocate critique. The full dashboard: three strategy cards, each with categorized adversarial findings and a verdict. — Real run, 2026-06-11. 234.31 s wall time, $0.1802 in model cost, 3 strategies generated, 21 adversarial findings.

Not a disclaimer. A structured critique across lookahead, overfitting, regime dependence, costs, scale, implementation, and statistical validity. Each finding carries a severity and a concrete mitigation. The run above produced 21 findings across three strategies, and one verdict was needs-revision. The product will tell you to abandon its own idea.

HIGH, LOOKAHEAD

The t-statistic is described as computed over a 'rolling 1-minute accumulation window' of 5-second returns. In a real-time system, the statistic at decision time T uses bars [T-60s, T-55s, ..., T-5s, T]. If the implementation includes the bar closing AT T (i.e., the last 5-second bar whose close equals the trigger timestamp), the signal uses the price move that just completed the burst, which is fine. But the description says 'rolling 5-second returns over a 1-minute window' without specifying whether the window is closed at T or at T-5s. Any implementation that includes a partially-formed 5-second bar or uses the T bar's close price to both detect the burst AND as the entry reference price creates a subtle lookahead: the entry is at the price that already reflects the burst bottom, which is unreachable in practice because detection latency guarantees the fill is at a worse price. The volume condition ('cumulative volume during the burst window') similarly must use only volume up to T-1 bars, not the bar being detected.

mitigation: Explicitly define the t-statistic window as [T-65s, T-5s] (last complete 5-second bar before submission), and stamp volume accumulation up to the same endpoint. Document in code as 'signal computed at bar close T-5s; order submitted at T.' Test that the signal timestamp index is strictly less than the execution timestamp.

MEDIUM, COSTS

At 15:30 entry, ES is liquid (tight spread, ~0.25-point bid-ask), so entry slippage is modest: 1 tick ($12.50) is a reasonable estimate. The exit at 16:00 is more problematic: the strategy exits at 'MOC equivalent for futures.' ES futures do not have a formal MOC order type; the CME cash-settle reference (SP) settles at 15:15 ET, not 16:00. The ES futures pit close is 16:15 ET. The strategy seems to assume 16:00 ET exit, which in practice requires a market order during the final cash equity MOC imbalance window—a period of elevated slippage (1–3 ticks in normal conditions, 5+ ticks on high-volume close days). For NQ futures the spread is similar but notional per tick is higher ($20/tick vs $12.50 for ES), making cost drag more significant. At 2 ES + 1 NQ per trade, round-trip costs ≈ $60–$120 per day traded.

mitigation: Use actual 15:58–16:02 ET ES VWAP from historical data as the exit price proxy instead of the 16:00 last trade price. Compare the per-trade P&L distribution pre- and post-cost to confirm positive expectancy survives.

And when the evidence is not there, it says so.

Asked where the literature disagrees on momentum decay, it retrieved 295 candidate papers, gated them to 50, and found exactly two that engage each other directly. The answer says that out loud:

"No other paper pairs in the corpus engage directly enough."

You get that sentence instead of a confident guess. That is the whole point.

Outsample Disagreement run on cross-sectional momentum. The answer body names the corpus coverage limit and surfaces the single real disagreement pair, from a real session on 2026-06-11. — Real run, 2026-06-11. Cross-sectional momentum, quick mode. 295 retrieved, gated to 50, 2 cited, 1 pair. 23.98 s, $0.0407 on the first run (shown here reloaded from cache, $0).

Why not a generalist

Elicit and Consensus find you papers. General chatbots talk about them. Outsample is vertical to quantitative finance and returns the work product: the strategy with its flaws already flagged, the research program scoped, the disagreement mapped to the papers that disagree.

It is also honest about scope. Each answer reasons across the most relevant retrieved work, not your whole corpus, and shows you exactly which papers it used. If you want a tool that claims to read everything, this is not it.

166 themes with thirty or more papers each
market microstructure 347, causal inference 431, portfolio optimization 257, behavioral finance 289.

Check your niche

Pricing

Free

25 papers, 10 questions a month, 3 full synthesis runs. No card.

Pro

$39 / month. Annual is two months free.

Unlimited queries and synthesis, 1,000 papers, $15 of model credit included monthly. MCP access for your agents.

Team

$129 / month

Five seats, 10,000 papers, $60 of credit included.

Model usage beyond included credit is prepaid, itemized, and previewed before every heavy run. Every query shows what it cost, and nothing can bill past the credit you bought. Full details, including the booster math: /pricing

From the builder

Before this, my research process was a folder of SSRN PDFs and a rough reading plan. Ten or twenty pages on a good day, and half of that was background reading just to understand a paper's premise. After weeks of that produced one strategy, and the strategy failed, I accepted that reading hundreds of papers one at a time was never going to find my ideas.

So I built the tool I wanted: I generate research programs off my corpus, expand the ones worth expanding, trace which papers connect, and find the central ones and the niche ones I would have missed.

Honestly: it runs on a model and on the corpus you give it. I can't promise you a six-figure strategy. It makes the search faster and wider. The judgment is still yours.

Mario

Outsample. Because in-sample results lie.

Start free. No card. 3 full synthesis runs included, and the product will tell you if your question is not answerable from the corpus.

See the methodology