|
|
|
|
# CTA 1-Day Return Prediction
|
|
|
|
|
|
|
|
|
|
Experiments for predicting CTA (Commodity Trading Advisor) futures 1-day returns.
|
|
|
|
|
|
|
|
|
|
## Data
|
|
|
|
|
|
|
|
|
|
- **Features**: alpha158, hffactor
|
|
|
|
|
- **Labels**: Return indicators (o2c_twap1min, o2o_twap1min, etc.)
|
|
|
|
|
- **Normalization**: dual (blend of zscore, cs_zscore, rolling_20, rolling_60)
|
|
|
|
|
|
|
|
|
|
## Notebooks
|
|
|
|
|
|
|
|
|
|
| Notebook | Purpose |
|
|
|
|
|
|----------|---------|
|
|
|
|
|
| `01_data_check.ipynb` | Load and validate CTA data |
|
|
|
|
|
| `02_label_analysis.ipynb` | Explore label distributions and blending |
|
|
|
|
|
| `03_baseline_xgb.ipynb` | Train baseline XGBoost model |
|
|
|
|
|
| `04_blend_comparison.ipynb` | Compare different normalization blends |
|
|
|
|
|
|
|
|
|
|
## Blend Configurations
|
|
|
|
|
|
|
|
|
|
The label blending combines 4 normalization methods:
|
|
|
|
|
- **zscore**: Fit-time mean/std normalization
|
|
|
|
|
- **cs_zscore**: Cross-sectional z-score per datetime
|
|
|
|
|
- **rolling_20**: 20-day rolling window normalization
|
|
|
|
|
- **rolling_60**: 60-day rolling window normalization
|
|
|
|
|
|
|
|
|
|
Predefined weights (from qshare.config.research.cta.labels):
|
|
|
|
|
- `equal`: [0.25, 0.25, 0.25, 0.25]
|
|
|
|
|
- `zscore_heavy`: [0.5, 0.2, 0.15, 0.15]
|
|
|
|
|
- `rolling_heavy`: [0.1, 0.1, 0.3, 0.5]
|
|
|
|
|
- `cs_heavy`: [0.2, 0.5, 0.15, 0.15]
|
|
|
|
|
- `short_term`: [0.1, 0.1, 0.4, 0.4]
|
|
|
|
|
- `long_term`: [0.4, 0.2, 0.2, 0.2]
|
|
|
|
|
|
|
|
|
|
Default: [0.2, 0.1, 0.3, 0.4]
|
|
|
|
|
|
|
|
|
|
## Processors Module
|
|
|
|
|
|
|
|
|
|
The `cta_1d.src.processors` module provides Polars-based data processors that replicate Qlib's preprocessing pipeline:
|
|
|
|
|
|
|
|
|
|
### Available Processors
|
|
|
|
|
|
|
|
|
|
| Processor | Description |
|
|
|
|
|
|-----------|-------------|
|
|
|
|
|
| `DiffProcessor` | Adds diff features with configurable period |
|
|
|
|
|
| `FlagMarketInjector` | Adds market_0, market_1 columns based on instrument codes |
|
|
|
|
|
| `FlagSTInjector` | Creates IsST column from ST flags |
|
|
|
|
|
| `ColumnRemover` | Removes specified columns |
|
|
|
|
|
| `FlagToOnehot` | Converts one-hot industry flags to single index column |
|
|
|
|
|
| `IndusNtrlInjector` | Industry neutralization per datetime |
|
|
|
|
|
| `RobustZScoreNorm` | Robust z-score normalization using median/MAD |
|
|
|
|
|
| `Fillna` | Fills NaN values with specified value |
|
|
|
|
|
|
|
|
|
|
### RobustZScoreNorm with Pre-fitted Parameters
|
|
|
|
|
|
|
|
|
|
The `RobustZScoreNorm` processor supports loading pre-fitted parameters from Qlib's `proc_list.proc`:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
from cta_1d.src.processors import RobustZScoreNorm
|
|
|
|
|
|
|
|
|
|
# Method 1: Load from saved version (recommended)
|
|
|
|
|
processor = RobustZScoreNorm.from_version("csiallx_feature2_ntrla_flag_pnlnorm")
|
|
|
|
|
|
|
|
|
|
# Method 2: Load with direct parameters
|
|
|
|
|
processor = RobustZScoreNorm(
|
|
|
|
|
feature_cols=['KMID', 'KLEN', ...],
|
|
|
|
|
use_qlib_params=True,
|
|
|
|
|
qlib_mean=mean_array,
|
|
|
|
|
qlib_std=std_array
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
# Apply normalization
|
|
|
|
|
df = processor.process(df)
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Parameter Extraction
|
|
|
|
|
|
|
|
|
|
Extract parameters from Qlib's proc_list.proc:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
python stock_1d/d033/alpha158_beta/scripts/extract_qlib_params.py \
|
|
|
|
|
--proc-list /path/to/proc_list.proc \
|
|
|
|
|
--version my_version
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Output structure:
|
|
|
|
|
```
|
|
|
|
|
data/robust_zscore_params/{version}/
|
|
|
|
|
├── mean_train.npy # Pre-fitted mean (330,)
|
|
|
|
|
├── std_train.npy # Pre-fitted std (330,)
|
|
|
|
|
└── metadata.json # Feature columns and metadata
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
### Pipeline Helper Functions
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
from cta_1d.src.processors import create_processor_pipeline, get_final_feature_columns
|
|
|
|
|
|
|
|
|
|
# Create pipeline from processor configs
|
|
|
|
|
pipeline = create_processor_pipeline([
|
|
|
|
|
{'type': 'Diff', 'columns': ['turnover', 'free_turnover']},
|
|
|
|
|
{'type': 'RobustZScoreNorm', 'feature_cols': feature_cols},
|
|
|
|
|
{'type': 'Fillna', 'value': 0},
|
|
|
|
|
])
|
|
|
|
|
|
|
|
|
|
# Get final feature columns after industry neutralization
|
|
|
|
|
final_cols = get_final_feature_columns(
|
|
|
|
|
alpha158_cols=ALPHA158_COLS,
|
|
|
|
|
market_ext_cols=MARKET_EXT_COLS,
|
|
|
|
|
)
|
|
|
|
|
```
|