6.0 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Overview
Alpha Lab is a quantitative research experiment framework for the qshare library. It uses a notebook-centric approach for exploring trading strategies and ML models. The codebase is organized around two prediction tasks:
- cta_1d: CTA (Commodity Trading Advisor) futures 1-day return prediction
- stock_15m: Stock 15-minute forward return prediction using high-frequency features
Directory Structure
alpha_lab/
├── common/ # Shared utilities
│ ├── __init__.py
│ ├── paths.py # Path management
│ └── plotting.py # Common plotting functions
│
├── cta_1d/ # CTA 1-day return prediction
│ ├── __init__.py # Re-exports from src/
│ ├── config.yaml # Task configuration
│ ├── src/ # Implementation modules
│ │ ├── __init__.py
│ │ ├── loader.py # CTA1DLoader
│ │ ├── train.py # Training functions
│ │ ├── backtest.py # Backtest functions
│ │ └── labels.py # Label blending utilities
│ └── *.ipynb # Experiment notebooks
│
├── stock_15m/ # Stock 15-minute return prediction
│ ├── __init__.py # Re-exports from src/
│ ├── config.yaml # Task configuration
│ ├── src/ # Implementation modules
│ │ ├── __init__.py
│ │ ├── loader.py # Stock15mLoader
│ │ └── train.py # Training functions
│ └── *.ipynb # Experiment notebooks
│
└── results/ # Output directory (gitignored)
Common Commands
Development Setup
# Install dependencies
pip install -r requirements.txt
# Create environment configuration
cp .env.template .env
# Edit .env with your DolphinDB host and data paths
Running Experiments
# Start Jupyter for interactive experiments
jupyter notebook
# Train CTA model from config
python -m cta_1d.train --config cta_1d/config.yaml --output results/cta_1d/exp01
# Train Stock 15m model
python -m stock_15m.train --config stock_15m/config.yaml --output results/stock_15m/exp01
# Run CTA backtest
python -m cta_1d.backtest \
--model results/cta_1d/exp01/model.json \
--dt-range 2023-01-01 2023-12-31 \
--output results/cta_1d/backtest_01
Python API Usage
# CTA 1D workflow
from cta_1d import CTA1DLoader, train_model, TrainConfig
loader = CTA1DLoader(return_type='o2c_twap1min', normalization='dual')
dataset = loader.load(dt_range=['2020-01-01', '2023-12-31'])
config = TrainConfig(dt_range=['2020-01-01', '2023-12-31'], feature_sets=['alpha158'])
model, metrics = train_model(config, output_dir='results/exp01')
# Stock 15m workflow
from stock_15m import Stock15mLoader, train_model, TrainConfig
loader = Stock15mLoader(normalization_mode='dual')
dataset = loader.load(
dt_range=['2020-01-01', '2023-12-31'],
feature_path='/data/parquet/stock_1min_alpha158',
kline_path='/data/parquet/stock_1min_kline'
)
Architecture
Module Organization
All implementation code lives in src/ subdirectories:
-
cta_1d/src/: CTA-specific implementationsloader.py: CTA1DLoader classtrain.py: train_model, TrainConfigbacktest.py: run_backtest, BacktestConfiglabels.py: Label blending utilities
-
stock_15m/src/: Stock-specific implementationsloader.py: Stock15mLoader classtrain.py: train_model, TrainConfig
Root __init__.py files re-export public APIs for backward compatibility:
from cta_1d import CTA1DLoader # Imports from cta_1d.src
Data Flow
Both tasks follow a consistent pattern:
- Loaders (
src/loader.py): Fetch data from DolphinDB (CTA) or Parquet files (Stock), apply normalization, compute sample weights, returnpl_Dataset - Training (
src/train.py): XGBoost with early stopping, outputs model JSON + metrics - Backtest (
src/backtest.py): CTA-only; usesqshare.eval.cta.backtest.CTABacktesterfor strategy simulation
Key Classes
CTA1DLoader: Loads alpha158/hffactor features from DolphinDB; supports 5 normalization modes (zscore,cs_zscore,rolling_20,rolling_60,dual)Stock15mLoader: Loads Alpha158 on 1-min data; computes 15-min forward returns; normalization modes:industry,cs_zscore,dualpl_Dataset: Fromqshare.data; provides.with_segments(),.split(),.to_numpy()methods
Normalization Modes
CTA 1D (dual blending):
zscore: Fit-time mean/std normalizationcs_zscore: Cross-sectional z-score per datetimerolling_20/60: Rolling window normalizationdual: Weighted blend (default: [0.2, 0.1, 0.3, 0.4])
Stock 15m:
industry: Industry-neutralized returnscs_zscore: Cross-sectional z-scoredual: 80% industry-neutral + 20% cs_zscore
Experiment Tracking
Manual tracking in results/{task}/README.md:
## 2025-01-15: Baseline XGB
- Notebook: `cta_1d/03_baseline_xgb.ipynb` (cells 1-50)
- Config: eta=0.5, lambda=0.1
- Train IC: 0.042
- Test IC: 0.038
- Notes: Dual normalization, 4 trades/day
Dependencies on qshare
The codebase relies heavily on the qshare library (already installed in the venv):
qshare.data.pl_Dataset: Dataset container with Polars backendqshare.io.ddb: DolphinDB session managementqshare.io.polars: Parquet loading utilitiesqshare.algo.polars: Industry neutralization, cross-sectional z-scoreqshare.eval.cta.backtest: CTA backtesting frameworkqshare.config.research.cta: Predefined column lists (HFFACTOR_COLS)
Configuration Files
YAML configs define data ranges, model hyperparameters, and output settings:
data:
dt_range: ['2020-01-01', '2023-12-31']
feature_sets: [alpha158, hffactor]
normalization: dual
model:
type: xgb
params: {eta: 0.05, max_depth: 6}
Load with: python -m cta_1d.train --config config.yaml or yaml.safe_load() directly.