FFO

What is FFO?

A Unified Factor Backtesting Interface

FFO (Formulaic Factor Optimization) is the execution backbone of AlphaBench. It wraps the Microsoft Qlib backtesting framework and exposes a clean, model-agnostic API that any LLM agent can call to validate, execute, and get feedback on alpha factor formulas.

When an LLM generates a factor expression — such as Div(Sub($close, Ref($close, 5)), Ref($close, 5)) — FFO parses it, checks syntax, runs it against real market data (CSI300 or SP500), and returns structured performance metrics back to the model. This forms the closed-loop feedback essential for iterative factor improvement.

FFO Execution Pipeline

LLM Output
Formula String

→

Syntax
Validation

→

Qlib
Backtesting

→

IC / RankIC
Metrics

→

Feedback to
LLM Agent

Key Capabilities

FFO handles every step from raw formula string to structured performance feedback, allowing LLMs to focus entirely on factor ideation and refinement.

Syntax Validation

Automatically parses Qlib DSL expressions and checks structural validity before any computation occurs. Returns structured error messages — operator mismatches, undefined functions, type errors — that LLMs can act on directly for self-repair.

Performance Metrics

Computes IC (Information Coefficient), RankIC, ICIR, and annualized returns for each factor on real market data spanning 2020–2025, covering both CSI300 (China A-share) and SP500 (US equities) universes.

Iterative Feedback

Provides structured backtesting responses — metric values, error traces, and comparative rankings against existing pool factors — enabling LLMs to perform closed-loop refinement across Chain-of-Experience, Tree-of-Thought, and EA paradigms.

Multi-Market Support

Seamlessly switches between market configurations — CSI300 (China) and SP500 (US) — without changing the factor formula or API call, enabling direct cross-market generalization studies.

Cost Tracking

Records per-call token usage and estimated API costs throughout the pipeline. This powers AlphaBench's cost-aware evaluation axis, allowing fair comparison of models under real-world deployment budgets.

Standardized Interface

Exposes a unified API compatible with all three AlphaBench searching paradigms (CoE, ToT, EA), making it trivial to swap LLM backends, adjust market settings, or integrate FFO into custom research pipelines.

Output Metrics Reference

Every FFO call returns a structured result containing the following fields, which are passed directly into LLM prompts as feedback context.

Field	Type	Description
ic	float	Information Coefficient — Pearson correlation between factor values and next-period returns. Range [−1, 1]; higher is better.
rank_ic	float	Rank IC — Spearman rank correlation; more robust to outliers than Pearson IC. Primary signal quality measure in AlphaBench.
icir	float	IC Information Ratio — mean IC divided by standard deviation of IC across periods. Measures signal stability over time.
annualized_return	float	Long-short portfolio annualized return based on top/bottom decile factor sort on each rebalancing date.
valid	bool	Whether the formula string passed syntax validation and executed without runtime errors on the market data.
error_msg	str \| null	Structured error message returned when `valid=false`. Includes error type, operator context, and suggested fix hints.
tokens_used	int	Total tokens consumed in the LLM call that produced this factor (prompt + completion). Used for cost-aware benchmarking.
pool_rank	int \| null	Rank of this factor among the current factor pool by RankIC. Provided during searching tasks to guide comparative refinement.

Using FFO in Your Pipeline

FFO can be used independently as a standalone factor backtesting utility, or embedded as the evaluation engine inside any LLM-driven factor search loop.

from ffo import FFOClient

# Initialize FFO with your market configuration
ffo = FFOClient(
    market="csi300",          # "csi300" or "sp500"
    start_date="2020-01-01",
    end_date="2025-01-01",
)

# Evaluate a single factor formula
result = ffo.evaluate(
    formula="Div(Sub($close, Ref($close, 5)), Ref($close, 5))",
    tokens_used=412,
)

# Inspect results
print(result.valid)           # True
print(result.rank_ic)         # 0.043
print(result.icir)            # 0.81
print(result.annualized_return)  # 0.127

# Use structured feedback in your LLM prompt
feedback = ffo.format_feedback(result)
print(feedback)
# → "RankIC: 0.043 | ICIR: 0.81 | Annualized Return: 12.7%"
# → "Pool rank: 12 / 30 — above median. Consider …"

FFO is fully integrated with all three AlphaBench searching paradigms (Chain-of-Experience, Tree-of-Thought, and Evolutionary Algorithms). Each paradigm calls ffo.evaluate() identically — FFO's unified interface is what makes cross-paradigm comparison fair and reproducible.

Design Principles

Reproducibility

Fixed Seeds & Snapshots

All evaluations are seeded and logged. Any factor result can be reproduced exactly from the saved formula string and config.

Plug-and-Play

Model-Agnostic API

FFO places no constraints on the upstream LLM. Any model that emits a valid formula string — via any framework — integrates immediately.

Extensibility

Beyond AlphaBench

FFO ships as a standalone package. It can power any formulaic DSL evaluation task — symbolic regression, feature engineering, or constraint optimization.