GT MTF 프로필·캘리브레이션과 04 매칭/시뮬/실거래 파이프라인을 추가한다.
3분~일봉 GT 타점 분석(03c), leg 체결 순서 수정, 총자산 90% 검증 루프, walk-forward Go/No-Go 시뮬, monitor·live_trader 및 reference 문서를 포함한다. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
383
deepcoin/matching/gt_comparison.py
Normal file
383
deepcoin/matching/gt_comparison.py
Normal file
@@ -0,0 +1,383 @@
|
||||
"""
|
||||
Ground truth(450타점) vs 규칙 발화·시뮬 결과 비교 리포트.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from config import MATCH_GT_TOLERANCE_MIN
|
||||
from deepcoin.ground_truth.ground_truth import load_ground_truth
|
||||
from deepcoin.matching.select_rules import (
|
||||
_rule_metrics,
|
||||
_split_train_valid_holdout,
|
||||
gt_overlap_report,
|
||||
)
|
||||
from deepcoin.paths import (
|
||||
MATCHING_FIRE_OUTCOMES,
|
||||
MATCHING_GT_COMPARISON_HTML,
|
||||
MATCHING_GT_COMPARISON_JSON,
|
||||
MATCHING_MATCHED_RULES,
|
||||
MATCHING_SIMULATION_JSON,
|
||||
resolve_ground_truth_file,
|
||||
)
|
||||
|
||||
|
||||
def _precision_near_gt(
|
||||
fire_ts: pd.Series,
|
||||
gt_ts: pd.Series,
|
||||
tolerance: pd.Timedelta,
|
||||
) -> dict[str, Any]:
|
||||
"""
|
||||
발화 시각이 GT 타점 ±허용 내인 비율(precision proxy).
|
||||
|
||||
Args:
|
||||
fire_ts: 규칙 발화 시각.
|
||||
gt_ts: GT 시각.
|
||||
tolerance: 허용 timedelta.
|
||||
|
||||
Returns:
|
||||
near_count, fire_count, precision.
|
||||
"""
|
||||
if fire_ts.empty:
|
||||
return {"near_count": 0, "fire_count": 0, "precision": 0.0}
|
||||
gt_sorted = gt_ts.sort_values()
|
||||
near = 0
|
||||
for fts in fire_ts:
|
||||
if (gt_sorted - fts).abs().min() <= tolerance:
|
||||
near += 1
|
||||
n = len(fire_ts)
|
||||
return {
|
||||
"near_count": near,
|
||||
"fire_count": n,
|
||||
"precision": round(near / n, 4) if n else 0.0,
|
||||
}
|
||||
|
||||
|
||||
def _matched_pairs(
|
||||
fires: pd.DataFrame,
|
||||
gt_df: pd.DataFrame,
|
||||
rule_id: str,
|
||||
tolerance: pd.Timedelta,
|
||||
) -> pd.DataFrame:
|
||||
"""
|
||||
GT 타점별 가장 가까운 동일 rule·side 발화와 수익률 쌍을 만듭니다.
|
||||
|
||||
Args:
|
||||
fires: fire_outcomes.
|
||||
gt_df: GT trades DataFrame.
|
||||
rule_id: 규칙 ID.
|
||||
tolerance: 매칭 허용.
|
||||
|
||||
Returns:
|
||||
매칭된 행 DataFrame.
|
||||
"""
|
||||
sub = fires[fires["rule_id"] == rule_id].copy()
|
||||
if sub.empty:
|
||||
return pd.DataFrame()
|
||||
side = sub["side"].iloc[0]
|
||||
g = gt_df[gt_df["action"] == side].copy()
|
||||
g["ts"] = pd.to_datetime(g["dt"])
|
||||
sub["ts"] = pd.to_datetime(sub["dt"])
|
||||
rows: list[dict[str, Any]] = []
|
||||
for _, gt_row in g.iterrows():
|
||||
gts = pd.Timestamp(gt_row["ts"])
|
||||
delta = (sub["ts"] - gts).abs()
|
||||
if delta.empty or delta.min() > tolerance:
|
||||
continue
|
||||
idx = delta.idxmin()
|
||||
fr = sub.loc[idx]
|
||||
rows.append(
|
||||
{
|
||||
"side": side,
|
||||
"rule_id": rule_id,
|
||||
"gt_dt": str(gt_row["dt"]),
|
||||
"fire_dt": str(fr["dt"]),
|
||||
"delta_min": round(delta.min().total_seconds() / 60, 2),
|
||||
"gt_forward_pct": float(gt_row.get("forward_return_pct") or 0),
|
||||
"sim_leg_gt_pct": float(fr["forward_ret_pct"]),
|
||||
"split": fr.get("split"),
|
||||
}
|
||||
)
|
||||
return pd.DataFrame(rows)
|
||||
|
||||
|
||||
def build_gt_comparison_report(
|
||||
outcomes_path: Path | None = None,
|
||||
matched_path: Path | None = None,
|
||||
gt_path: Path | None = None,
|
||||
sim_path: Path | None = None,
|
||||
tolerance_min: int = MATCH_GT_TOLERANCE_MIN,
|
||||
) -> dict[str, Any]:
|
||||
"""
|
||||
GT vs 발화·시뮬 비교 dict 생성.
|
||||
|
||||
Args:
|
||||
outcomes_path: fire_outcomes.csv.
|
||||
matched_path: matched_rules.json.
|
||||
gt_path: ground_truth_trades.json.
|
||||
sim_path: simulation_report.json.
|
||||
tolerance_min: GT 매칭 허용(분).
|
||||
|
||||
Returns:
|
||||
gt_comparison_report dict.
|
||||
"""
|
||||
op = outcomes_path or MATCHING_FIRE_OUTCOMES
|
||||
mp = matched_path or MATCHING_MATCHED_RULES
|
||||
if not op.is_file():
|
||||
raise FileNotFoundError(f"fire_outcomes 없음: {op}")
|
||||
|
||||
outcomes = pd.read_csv(op)
|
||||
outcomes["ts"] = pd.to_datetime(outcomes["dt"])
|
||||
outcomes["split"] = _split_train_valid_holdout(outcomes)
|
||||
matched: dict[str, Any] = {}
|
||||
if mp.is_file():
|
||||
matched = json.loads(mp.read_text(encoding="utf-8"))
|
||||
|
||||
sim_report: dict[str, Any] = {}
|
||||
sp = sim_path or MATCHING_SIMULATION_JSON
|
||||
if sp.is_file():
|
||||
sim_report = json.loads(sp.read_text(encoding="utf-8"))
|
||||
|
||||
gt_data = load_ground_truth(gt_path or resolve_ground_truth_file()) or {}
|
||||
gt_trades = gt_data.get("trades") or []
|
||||
gt_df = pd.DataFrame(gt_trades)
|
||||
tol = pd.Timedelta(minutes=tolerance_min)
|
||||
|
||||
gt_baseline: dict[str, Any] = {
|
||||
"total": len(gt_df),
|
||||
"buy": int((gt_df["action"] == "buy").sum()) if not gt_df.empty else 0,
|
||||
"sell": int((gt_df["action"] == "sell").sum()) if not gt_df.empty else 0,
|
||||
}
|
||||
for side in ("buy", "sell"):
|
||||
sub = gt_df[gt_df["action"] == side] if not gt_df.empty else pd.DataFrame()
|
||||
if sub.empty or "forward_return_pct" not in sub.columns:
|
||||
gt_baseline[side] = {}
|
||||
continue
|
||||
r = sub["forward_return_pct"].astype(float)
|
||||
gt_baseline[side] = {
|
||||
"mean_forward_pct": round(float(r.mean()), 4),
|
||||
"median_forward_pct": round(float(r.median()), 4),
|
||||
"win_rate": round(float((r > 0).mean()), 4),
|
||||
"count": int(len(r)),
|
||||
}
|
||||
|
||||
all_fires = outcomes.copy()
|
||||
if "rule_id" not in all_fires.columns:
|
||||
all_fires["rule_id"] = "all"
|
||||
overlap_all = gt_overlap_report(
|
||||
all_fires.drop_duplicates(subset=["dt", "side"]),
|
||||
gt_trades,
|
||||
tolerance_min=tolerance_min,
|
||||
)
|
||||
|
||||
per_rule: list[dict[str, Any]] = []
|
||||
pair_stats: list[dict[str, Any]] = []
|
||||
for rid in sorted(outcomes["rule_id"].unique()):
|
||||
sub = outcomes[outcomes["rule_id"] == rid]
|
||||
side = str(sub["side"].iloc[0])
|
||||
gt_side = gt_df[gt_df["action"] == side]
|
||||
gt_ts = pd.to_datetime(gt_side["dt"]) if not gt_side.empty else pd.Series(dtype="datetime64[ns]")
|
||||
fire_ts = sub["ts"]
|
||||
ov = gt_overlap_report(sub, gt_trades, tolerance_min=tolerance_min)
|
||||
prec = _precision_near_gt(fire_ts, gt_ts, tol)
|
||||
m_all = _rule_metrics(sub)
|
||||
m_hold = _rule_metrics(sub[sub["split"] == "holdout"])
|
||||
|
||||
pairs = _matched_pairs(outcomes, gt_df, rid, tol)
|
||||
pair_row: dict[str, Any] = {"rule_id": rid, "side": side, "pair_count": len(pairs)}
|
||||
if len(pairs) >= 2:
|
||||
corr = pairs["gt_forward_pct"].corr(pairs["sim_leg_gt_pct"])
|
||||
pair_row["corr_gt_vs_sim"] = round(float(corr), 4) if pd.notna(corr) else None
|
||||
pair_row["mean_abs_diff_pct"] = round(
|
||||
float((pairs["gt_forward_pct"] - pairs["sim_leg_gt_pct"]).abs().mean()),
|
||||
4,
|
||||
)
|
||||
pair_row["mean_delta_min"] = round(float(pairs["delta_min"].mean()), 2)
|
||||
pair_stats.append(pair_row)
|
||||
|
||||
near_mask = []
|
||||
for fts in fire_ts:
|
||||
near_mask.append(
|
||||
not gt_ts.empty and (gt_ts - fts).abs().min() <= tol
|
||||
)
|
||||
sub_near = sub.loc[near_mask] if near_mask else sub.iloc[0:0]
|
||||
sub_far = sub.loc[[not x for x in near_mask]] if near_mask else sub
|
||||
|
||||
per_rule.append(
|
||||
{
|
||||
"rule_id": rid,
|
||||
"side": side,
|
||||
"fire_count": int(len(sub)),
|
||||
"gt_recall": ov.get(side, {}).get("recall", 0),
|
||||
"gt_matched": ov.get(side, {}).get("matched", 0),
|
||||
"gt_count": ov.get(side, {}).get("gt_count", 0),
|
||||
"precision_near_gt": prec["precision"],
|
||||
"fires_near_gt": prec["near_count"],
|
||||
"sim_ev_all_pct": m_all.get("ev_pct"),
|
||||
"sim_ev_near_gt_pct": _rule_metrics(sub_near).get("ev_pct") if len(sub_near) else None,
|
||||
"sim_ev_far_gt_pct": _rule_metrics(sub_far).get("ev_pct") if len(sub_far) else None,
|
||||
"sim_win_rate": m_all.get("win_rate"),
|
||||
"sim_profit_factor": m_all.get("profit_factor"),
|
||||
"holdout_ev_pct": m_hold.get("ev_pct"),
|
||||
"holdout_count": m_hold.get("count"),
|
||||
}
|
||||
)
|
||||
|
||||
monitor_ids = [r["rule_id"] for r in matched.get("monitor_rules", [])]
|
||||
monitor_summary = [r for r in per_rule if r["rule_id"] in monitor_ids]
|
||||
|
||||
go = sim_report.get("go_no_go", {})
|
||||
|
||||
return {
|
||||
"tolerance_min": tolerance_min,
|
||||
"label_mode": matched.get("label_mode"),
|
||||
"gt_baseline": gt_baseline,
|
||||
"gt_overlap_all_fires_dedup": overlap_all,
|
||||
"gt_overlap_matched_json": matched.get("gt_overlap"),
|
||||
"per_rule": per_rule,
|
||||
"pair_alignment": pair_stats,
|
||||
"monitor_rules": monitor_summary,
|
||||
"simulation_go_no_go": {
|
||||
"go": go.get("go"),
|
||||
"checks": go.get("checks", []),
|
||||
"live_cap_taken_ratio": go.get("live_cap_taken_ratio"),
|
||||
},
|
||||
"notes": [
|
||||
"gt_overlap_matched_json: 04 선별 시 전 규칙 발화 합산(중복 dt 제거 전) 기준.",
|
||||
"per_rule.gt_recall: 해당 규칙 발화만으로 GT 타점 커버.",
|
||||
"precision_near_gt: 발화 중 GT±tolerance 내 비율(낮을수록 잡음 많음).",
|
||||
"gt_forward_pct vs sim_leg_gt_pct: leg_gt 라벨과 GT JSON forward_return_pct 정의 차이 가능.",
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def write_gt_comparison_html(report: dict[str, Any], out_path: Path) -> Path:
|
||||
"""
|
||||
gt_comparison_report.html 저장.
|
||||
|
||||
Args:
|
||||
report: build_gt_comparison_report 결과.
|
||||
out_path: HTML 경로.
|
||||
|
||||
Returns:
|
||||
out_path.
|
||||
"""
|
||||
def _rows(items: list[dict], cols: list[str]) -> str:
|
||||
lines = []
|
||||
for it in items:
|
||||
cells = "".join(f"<td>{it.get(c, '')}</td>" for c in cols)
|
||||
lines.append(f"<tr>{cells}</tr>")
|
||||
return "\n".join(lines)
|
||||
|
||||
pr_cols = [
|
||||
"rule_id", "side", "fire_count", "gt_recall", "precision_near_gt",
|
||||
"sim_ev_all_pct", "sim_ev_near_gt_pct", "sim_ev_far_gt_pct", "holdout_ev_pct",
|
||||
]
|
||||
go = report.get("simulation_go_no_go", {})
|
||||
go_flag = "GO" if go.get("go") else "NO-GO"
|
||||
gb = report.get("gt_baseline", {})
|
||||
html = f"""<!DOCTYPE html>
|
||||
<html lang="ko"><head><meta charset="utf-8"/>
|
||||
<title>GT vs Simulation Comparison</title>
|
||||
<style>
|
||||
body {{ font-family: "Malgun Gothic", Arial, sans-serif; margin: 24px; max-width: 1100px; }}
|
||||
table {{ border-collapse: collapse; width: 100%; margin: 12px 0; font-size: 0.9rem; }}
|
||||
th, td {{ border: 1px solid #ccc; padding: 6px 8px; text-align: right; }}
|
||||
th {{ background: #e2e8f0; text-align: center; }}
|
||||
td:first-child, th:first-child {{ text-align: left; }}
|
||||
h2 {{ margin-top: 28px; }}
|
||||
.warn {{ color: #b45309; }}
|
||||
</style></head><body>
|
||||
<h1>Ground Truth vs 규칙·시뮬 비교</h1>
|
||||
<p>허용 오차: ±{report.get('tolerance_min')}분 · 라벨: {report.get('label_mode')}</p>
|
||||
<p><strong>시뮬 Go/No-Go: {go_flag}</strong></p>
|
||||
|
||||
<h2>GT 기준선 (forward_return_pct)</h2>
|
||||
<p>총 {gb.get('total')}건 (매수 {gb.get('buy')} / 매도 {gb.get('sell')})</p>
|
||||
<table>
|
||||
<thead><tr><th>구분</th><th>건수</th><th>평균 forward%</th><th>중앙값</th><th>승률</th></tr></thead>
|
||||
<tbody>
|
||||
<tr><td>매수 GT</td><td>{gb.get('buy', {}).get('count', '')}</td>
|
||||
<td>{gb.get('buy', {}).get('mean_forward_pct', '')}</td>
|
||||
<td>{gb.get('buy', {}).get('median_forward_pct', '')}</td>
|
||||
<td>{gb.get('buy', {}).get('win_rate', '')}</td></tr>
|
||||
<tr><td>매도 GT</td><td>{gb.get('sell', {}).get('count', '')}</td>
|
||||
<td>{gb.get('sell', {}).get('mean_forward_pct', '')}</td>
|
||||
<td>{gb.get('sell', {}).get('median_forward_pct', '')}</td>
|
||||
<td>{gb.get('sell', {}).get('win_rate', '')}</td></tr>
|
||||
</tbody></table>
|
||||
|
||||
<h2>규칙별 GT recall / precision / EV</h2>
|
||||
<table>
|
||||
<thead><tr>{''.join(f'<th>{c}</th>' for c in pr_cols)}</tr></thead>
|
||||
<tbody>{_rows(report.get('per_rule', []), pr_cols)}</tbody>
|
||||
</table>
|
||||
|
||||
<h2>monitor_rules (실감시·시뮬 대상)</h2>
|
||||
<table>
|
||||
<thead><tr>{''.join(f'<th>{c}</th>' for c in pr_cols)}</tr></thead>
|
||||
<tbody>{_rows(report.get('monitor_rules', []), pr_cols)}</tbody>
|
||||
</table>
|
||||
|
||||
<h2>GT–발화 수익률 정렬 (±{report.get('tolerance_min')}분)</h2>
|
||||
<table>
|
||||
<thead><tr><th>rule</th><th>side</th><th>pairs</th><th>corr</th><th>mean|diff|%</th><th>mean Δmin</th></tr></thead>
|
||||
<tbody>
|
||||
{''.join(
|
||||
f"<tr><td>{p['rule_id']}</td><td>{p['side']}</td><td>{p['pair_count']}</td>"
|
||||
f"<td>{p.get('corr_gt_vs_sim','')}</td><td>{p.get('mean_abs_diff_pct','')}</td>"
|
||||
f"<td>{p.get('mean_delta_min','')}</td></tr>"
|
||||
for p in report.get('pair_alignment', [])
|
||||
)}
|
||||
</tbody></table>
|
||||
|
||||
<h2>시뮬 검증 (monitor)</h2>
|
||||
<pre>{json.dumps(go, ensure_ascii=False, indent=2)}</pre>
|
||||
|
||||
<h2>참고</h2>
|
||||
<ul>
|
||||
{''.join(f'<li>{n}</li>' for n in report.get('notes', []))}
|
||||
</ul>
|
||||
</body></html>"""
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
out_path.write_text(html, encoding="utf-8")
|
||||
return out_path
|
||||
|
||||
|
||||
def run_gt_comparison_report(
|
||||
outcomes_path: Path | None = None,
|
||||
matched_path: Path | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""
|
||||
GT 비교 리포트 생성·저장.
|
||||
|
||||
Args:
|
||||
outcomes_path: fire_outcomes.csv.
|
||||
matched_path: matched_rules.json.
|
||||
|
||||
Returns:
|
||||
report dict.
|
||||
"""
|
||||
report = build_gt_comparison_report(outcomes_path, matched_path)
|
||||
MATCHING_GT_COMPARISON_JSON.parent.mkdir(parents=True, exist_ok=True)
|
||||
MATCHING_GT_COMPARISON_JSON.write_text(
|
||||
json.dumps(report, ensure_ascii=False, indent=2),
|
||||
encoding="utf-8",
|
||||
)
|
||||
write_gt_comparison_html(report, MATCHING_GT_COMPARISON_HTML)
|
||||
print(f"[GT비교] 저장: {MATCHING_GT_COMPARISON_JSON}")
|
||||
print(f"[GT비교] 저장: {MATCHING_GT_COMPARISON_HTML}")
|
||||
for m in report.get("monitor_rules", []):
|
||||
print(
|
||||
f" {m['rule_id']}: recall={m['gt_recall']:.1%} prec={m['precision_near_gt']:.1%} "
|
||||
f"fires={m['fire_count']} EV={m['sim_ev_all_pct']}% holdout={m['holdout_ev_pct']}%"
|
||||
)
|
||||
go = report.get("simulation_go_no_go", {})
|
||||
print(f"[GT비교] 시뮬 연동: {'GO' if go.get('go') else 'NO-GO'}")
|
||||
return report
|
||||
Reference in New Issue
Block a user