ML × LLM × FINANCIAL DATA
Bringing modern AI to financial research
AI SuperInvestor is an AI research project investigating how machine learning and large language models can process public datasets, identify trends, generate research reports, and deepen our understanding of market dynamics.
Backed by leading investors


Research agenda
Three open questions drive the work
Markets are noisy, non-stationary, and described largely in natural language. Each property poses a distinct challenge for machine learning, and each is a research track of its own.
Signal & Trend Detection
How well can supervised learning mine meaningful patterns from noisy public market data? We benchmark gradient boosting, LSTMs, and transformer architectures on trend-identification tasks across equities, rates, and macro series.
Modeling Non-Stationary Markets
Financial environments shift constantly. We study regime detection, concept drift, and adaptive modeling, measuring how quickly analytical models degrade and what it takes to keep them honest.
Financial Language Understanding
Filings, transcripts, and news encode market context in text. We evaluate how large language models extract, summarize, and reason over financial documents, and where they hallucinate.
What we're building
Analytical tools for open financial research
In the spirit of open platforms like Qlib, FinGPT, and OpenBB, the project develops reusable software for studying markets, built around public data and reproducible methodology.
Data Pipeline
An ingestion and normalization layer for public datasets such as SEC filings, market prices, macro indicators, and news feeds, cleaned and aligned into research-ready form.
Model Benchmark Suite
A reproducible harness for comparing ML architectures on financial analysis tasks, with walk-forward validation and honest out-of-sample reporting.
Report Generator
An LLM pipeline that converts quantitative findings into structured research reports, with every claim traceable back to the underlying public data.
Market Dynamics Lab
Interactive analytics for studying how information propagates through markets: correlation structure, regime shifts, and cross-asset behavior over time.
Methodology
From public data to research insight
Source & Prepare
Public datasets including filings, prices, macro indicators, and news are collected, cleaned, and normalized. Data quality work is treated as first-class research, not plumbing.
Model
Statistical methods, gradient boosting, deep learning, and LLMs are applied to extract structure: patterns, correlations, anomalies, and regime changes.
Synthesize
Quantitative findings are turned into structured, readable research reports by LLM pipelines, with full traceability from every claim to its source data.
Evaluate & Iterate
Outputs are benchmarked out-of-sample, stress-tested, and peer-reviewed. Negative results are documented, because knowing what doesn't work is half the research.
Principles & scope
How the research is conducted
Public data only
All analysis is conducted on publicly available datasets. No privileged information, no proprietary feeds requiring special access.
Research, not advice
Activities are limited to research, software development, and data analytics. Nothing produced is investment advice, and no assets are managed or traded.
Honest evaluation
Out-of-sample testing, walk-forward validation, and published negative results. Overfit findings help no one.
Human + machine
Models are tools for understanding, not oracles. Every automated finding passes through human review before it becomes a research claim.
Scope of activities. AI SuperInvestor is an AI research project. Activities are limited to research, software development, and data analytics. The project does not provide investment advice, manage assets, execute trades, or offer financial services of any kind. Nothing on this site should be construed as a recommendation to buy or sell any security.