Formulaic Factor Optimization
The closed-loop execution and evaluation engine that bridges LLM-generated factor formulas with real market performance signals — built on Qlib backtesting infrastructure.
FFO (Formulaic Factor Optimization) is the execution backbone of AlphaBench. It wraps the Microsoft Qlib backtesting framework and exposes a clean, model-agnostic API that any LLM agent can call to validate, execute, and get feedback on alpha factor formulas.
When an LLM generates a factor expression — such as
Div(Sub($close, Ref($close, 5)), Ref($close, 5))
— FFO parses it, checks syntax, runs it against real market data (CSI300 or SP500),
and returns structured performance metrics back to the model. This forms the
closed-loop feedback essential for iterative factor improvement.
FFO Execution Pipeline
FFO handles every step from raw formula string to structured performance feedback, allowing LLMs to focus entirely on factor ideation and refinement.
Automatically parses Qlib DSL expressions and checks structural validity before any computation occurs. Returns structured error messages — operator mismatches, undefined functions, type errors — that LLMs can act on directly for self-repair.
Computes IC (Information Coefficient), RankIC, ICIR, and annualized returns for each factor on real market data spanning 2020–2025, covering both CSI300 (China A-share) and SP500 (US equities) universes.
Provides structured backtesting responses — metric values, error traces, and comparative rankings against existing pool factors — enabling LLMs to perform closed-loop refinement across Chain-of-Experience, Tree-of-Thought, and EA paradigms.
Seamlessly switches between market configurations — CSI300 (China) and SP500 (US) — without changing the factor formula or API call, enabling direct cross-market generalization studies.
Records per-call token usage and estimated API costs throughout the pipeline. This powers AlphaBench's cost-aware evaluation axis, allowing fair comparison of models under real-world deployment budgets.
Exposes a unified API compatible with all three AlphaBench searching paradigms (CoE, ToT, EA), making it trivial to swap LLM backends, adjust market settings, or integrate FFO into custom research pipelines.
Every FFO call returns a structured result containing the following fields, which are passed directly into LLM prompts as feedback context.
| Field | Type | Description |
|---|---|---|
| ic | float | Information Coefficient — Pearson correlation between factor values and next-period returns. Range [−1, 1]; higher is better. |
| rank_ic | float | Rank IC — Spearman rank correlation; more robust to outliers than Pearson IC. Primary signal quality measure in AlphaBench. |
| icir | float | IC Information Ratio — mean IC divided by standard deviation of IC across periods. Measures signal stability over time. |
| annualized_return | float | Long-short portfolio annualized return based on top/bottom decile factor sort on each rebalancing date. |
| valid | bool | Whether the formula string passed syntax validation and executed without runtime errors on the market data. |
| error_msg | str | null | Structured error message returned when valid=false. Includes error type, operator context, and suggested fix hints. |
| tokens_used | int | Total tokens consumed in the LLM call that produced this factor (prompt + completion). Used for cost-aware benchmarking. |
| pool_rank | int | null | Rank of this factor among the current factor pool by RankIC. Provided during searching tasks to guide comparative refinement. |
FFO can be used independently as a standalone factor backtesting utility, or embedded as the evaluation engine inside any LLM-driven factor search loop.
from ffo import FFOClient # Initialize FFO with your market configuration ffo = FFOClient( market="csi300", # "csi300" or "sp500" start_date="2020-01-01", end_date="2025-01-01", ) # Evaluate a single factor formula result = ffo.evaluate( formula="Div(Sub($close, Ref($close, 5)), Ref($close, 5))", tokens_used=412, ) # Inspect results print(result.valid) # True print(result.rank_ic) # 0.043 print(result.icir) # 0.81 print(result.annualized_return) # 0.127 # Use structured feedback in your LLM prompt feedback = ffo.format_feedback(result) print(feedback) # → "RankIC: 0.043 | ICIR: 0.81 | Annualized Return: 12.7%" # → "Pool rank: 12 / 30 — above median. Consider …"
ffo.evaluate()
identically — FFO's unified interface is what makes cross-paradigm comparison fair and reproducible.
All evaluations are seeded and logged. Any factor result can be reproduced exactly from the saved formula string and config.
FFO places no constraints on the upstream LLM. Any model that emits a valid formula string — via any framework — integrates immediately.
FFO ships as a standalone package. It can power any formulaic DSL evaluation task — symbolic regression, feature engineering, or constraint optimization.
FFO is open-source and available as part of the AlphaBench codebase. Start evaluating your LLM-generated alpha factors today.