Add final BallFilter, train/valid scripts, train-derived sum filters

- final_BallFilter: CSV history loader, TRAIN_ALLOW for 6-sum and week diff,
  fix filterOneDigitPattern ball overwrite bug, drop socket call
- final_filter_params: build sum6 and abs_sum_diff from rounds 1-800
- filter_model re-exports BallFilter; train/valid evaluate pass-through counts
- final_filterTest aligned with 1_FilterTest_25 plus optional MC survivors
- README and scripts/run_with_ncue.sh for ncue workflow

Made-with: Cursor
This commit is contained in:
2026-04-08 19:29:10 +09:00
parent 013206ef67
commit 52e8495148
8 changed files with 4639 additions and 725 deletions

358
README.md
View File

@@ -1,343 +1,45 @@
# 실행 순서
# deeplottery
## final_BallFilter · `final_filterTest.py` (miniconda **ncue**)
## 데이터 구간
임계값은 `tools/compute_final_filter_params.py`가 학습 구간(1~800회) 분포에서 생성하며, 결과는 `final_filter_params.py`에 기록됩니다.
| 구간 | 회차 |
|------|------|
| 학습 | `lotto_history.txt` 1~800 |
| 검증 | 801~1000 |
| 테스트 | 1001~이후 |
## 핵심 파일
- **`final_BallFilter.py`** — 필터 로직 (`BallFilter_25` 기반, `lotto_history.txt` CSV 로드, `socket` 제거).
- **`final_filter_params.py`** — 학습 구간(1~800회)에서만 집계한 **6개 합**·**전주 합 차이** 허용 집합.
- **`filter_model.py`** — `from final_BallFilter import BallFilter` 재노출.
- **`train.py` / `valid.py`** — 구간별로 당첨 6개가 모든 필터를 통과한 회차 수 집계.
- **`final_filterTest.py`** — `1_FilterTest_25.py`와 동일한 분석·(선택) MC 생존 추정.
## 실행 (miniconda **ncue**)
```bash
conda activate ncue
python tools/compute_final_filter_params.py
python train.py
python valid.py
python final_filterTest.py
# 특정 회차 생존 조합 수 Monte Carlo 근사
python final_filterTest.py --mc-no 900 --mc-samples 12000
```
conda 경로를 쓰기 어려우면 프로젝트의 `scripts/run_with_ncue.sh`로 동일하게 실행할 수 있습니다.
동일 환경을 셸 스크립트로:
```bash
./scripts/run_with_ncue.sh tools/compute_final_filter_params.py
./scripts/run_with_ncue.sh final_filterTest.py
./scripts/run_with_ncue.sh train.py
./scripts/run_with_ncue.sh valid.py
```
* FilterFeature.py를 실행한다.
* lotto_history.json을 읽어서 all_filter_[1-100].[cluster,csv,feature] 파일을 생성한다.
## 설계 요약
- **6개 합 / 전주 합 차이**는 `final_filter_params.TRAIN_ALLOW`로 학습 구간 분포에 맞춤.
- 그 외 통계·배수·용지 패턴·쌍/3조합 등은 `BallFilter_25`와 동일한 고정 규칙을 유지해 과도하게 느슨해지지 않도록 함.
- `filterOneDigitPattern`에서 인자 `ball`이 예시 배열로 덮어쓰이던 버그를 수정함.
* FilterFeatureCluster.py를 실행한다.
* 첫수는 1~10까지만 허용한다.
* random_state 전체 내 각 cluster에 대해서 당첨 회수를 파악하여 ./resources/cluster_win_info.csv 파일을 생성한다.
* 생성 파일
* filtertest_1.csv: random_state 내 cluster 개수를 파악한다.
* filtertest_2.csv: random_state 내 cluster 개수 별 전체 당첨 회수를 파악한다.
* filtertest_3.csv: random_state 내 cluster 개수 별 최초 당첨 번호만 파악한다.
## 참고
* 실행할 random_state와 cluster 번호 파악
* filtertest_2.csv과 answer_pattern_analsys.xlsx을 이용하여 선별한다.
* cluster_info.json 파일 업데이트
* 실행할 random_state와 cluster 번호를 json 형태로 등록한다.
* Util_filegen.py 실행
* m1, amd, intel 컴퓨터에서 실행할 sh, bat 파일을 생성한다.
* 파이썬 내에서 아래 두 부분만 수정하면 된다.
* m1_file_max, amd_file_max, intel_file_max = 8,12,7
* m1_proc_limit, amd_proc_limit, intel_proc_limit = 124,125,110
* 각 장비에서 sh와 bat 파일 실행
## Ruleset(임계값 설정) 기반으로 운영하기
`filter_model.BallFilter`의 주요 임계값(합/평균/앞3합/뒤3합/간격 등)을 **JSON ruleset**으로 외부화했습니다.
이제 “코드 수정 없이” ruleset 파일만 바꿔서 실험/튜닝을 자동화할 수 있습니다.
- **기본 ruleset 경로**: `resources/rulesets/default.json`
- **주의/한계**: 로또는 본질적으로 랜덤(독립/균등 가설)이며, ruleset은 “구매 조합 수를 줄이기 위한 필터”입니다. **당첨 보장/예측을 주장하지 않습니다.**
### valid 성능 확인 예시
```bash
python scripts/eval_filters.py \
--data valid \
--resources resources \
--ruleset resources/rulesets/default.json \
--start-no 801 --end-no 1000 \
--survivors-samples 0
```
### survivors(생존 조합 수) 근사 포함 예시
```bash
python scripts/eval_filters.py \
--data valid \
--resources resources \
--ruleset resources/rulesets/default.json \
--start-no 801 --end-no 1000 \
--survivors-samples 3000
```
## 자동 튜닝 → ruleset 생성 → 일괄 평가 파이프라인
### 1) train 기반 자동 튜닝(후보 ruleset 생성)
아래 스크립트는 **train 구간에서만** 임계값을 랜덤 탐색으로 튜닝한 뒤,
`resources/rulesets/``Balanced.json`, `Coverage-First.json`을 저장합니다.
```bash
python scripts/tune_ruleset.py \
--resources resources \
--base-ruleset resources/rulesets/default.json \
--out-dir resources/rulesets \
--train-start 21 --train-end 800 \
--hit-rate-min 0.01 \
--iters 200 \
--mc-samples 40000
```
- **Coverage-First**: survivors(생존 조합 수) 최소화를 우선
- **Balanced**: survivors를 줄이되 hit-rate도 함께 고려
> 주의: survivors는 전수(8,145,060조합) 대신 **풀링 Monte Carlo**로 근사하므로 오차가 있습니다.
### 2) valid/train 구간에서 ruleset 일괄 평가
```bash
python scripts/eval_rulesets.py \
--resources resources \
--rulesets-dir resources/rulesets \
--data valid \
--start-no 801 --end-no 1000 \
--survivors-samples 0
```
# Query
```SQL
##### #####
with source_count as (
select source, count(*) as source_count
from cluster_info
where priority not in (99)
and source in (1,3)
group by 1
),
ball_count as (
# 1) random_state, cluster
select source, random_state, cluster, ball_cnt
from (
SELECT source, random_state, cluster, count(*)
as ball_cnt
from recommend_ball
where no=1136
and b1 > 0
group by 1,2,3
union all
SELECT source, random_state, cluster, 0 as ball_cnt
from recommend_ball
where no=1136
and b1 = 0
group by 1,2,3
) lj
),
source_rc_cluster_list as (
select ci.source, ci.random_state, ci.cluster, ci.cluster_count, ci.win_count, ci.priority, rc.source_count, bc.ball_cnt
from cluster_info ci
left join source_count rc on ci.source = rc.source
left join ball_count bc on ci.source = bc.source and ci.random_state = bc.random_state and ci.cluster = bc.cluster
where ci.priority not in (99)
and ci.source in (1,3)
),
source_process as (
select source, "done" as type, count(*) as cnt from source_rc_cluster_list
where ball_cnt is not NULL
group by 1,2
union all
select source, "yet" as type, count(*) as cnt from source_rc_cluster_list
where ball_cnt is NULL
group by 1,2
)
select source, type, cnt,
case when source=1 then concat(round(100.0 * cnt / (select source_count from source_count where source=1),2), '%')
when source=3 then concat(round(100.0 * cnt / (select source_count from source_count where source=3),2), '%')
end as rate from source_process order by 1,2
;
### ###
SELECT ci.source, ci.random_state, ci.cluster, lj.cnt
from cluster_info ci
left join (select source, random_state, cluster, count(*) as cnt from recommend_ball rb where no=1136 group by 1,2,3) lj on ci.source=lj.source and ci.random_state=lj.random_state and ci.cluster=lj.cluster
where priority not in (99)
and lj.cnt is null
order by 1,2,3
;
##### cluster #####
with raw_data as (
select rb.source, ci.priority, rb.random_state, rb.cluster, ci.cluster_count, ci.win_count, b1, count(*) as ball_cnt
from recommend_ball rb left join cluster_info ci on rb.source=ci.source and rb.random_state = ci.random_state and rb.cluster = ci.cluster
where no=1136
group by 1,2,3,4,5,6,7
),
all_cluster as (
select source, priority, random_state, cluster, ball_cnt
from raw_data
where (
(source = 1 and priority in (1,2)) or
(source = 3 and priority in (1,2))
)
group by 1,2,3,4
),
valid_total_cluster as (
select source, priority, random_state, cluster, ball_cnt
from raw_data
where (
(source = 1 and priority = 1 and
ball_cnt BETWEEN 50 and 80
) or
(source = 1 and priority = 2 and (
win_count = 12 and ball_cnt BETWEEN 50 and 80)
) or
(source = 3 and priority = 1 and
(ball_cnt BETWEEN 1 and 30 or ball_cnt BETWEEN 50 and 100)
) or
(source = 3 and priority = 2 and (
win_count=13 and (ball_cnt BETWEEN 1 and 30 or ball_cnt BETWEEN 50 and 100))
) or
(source = 1 and
((win_count between 5 and 10) and ball_cnt BETWEEN 1 and 20)
)
)
group by 1,2,3,4
),
valid_none_0_cluster as (
select source, priority, random_state, cluster, ball_cnt
from raw_data
where b1 <> 0 AND
(
(source = 1 and priority = 1 and
ball_cnt BETWEEN 50 and 80
) or
(source = 1 and priority = 2 and (
win_count = 12 and ball_cnt BETWEEN 50 and 80)
) or
(source = 3 and priority = 1 and
(ball_cnt BETWEEN 1 and 30 or ball_cnt BETWEEN 50 and 100)
) or
(source = 3 and priority = 2 and (
win_count=13 and (ball_cnt BETWEEN 1 and 30 or ball_cnt BETWEEN 50 and 100))
) or
(source = 1 and
((win_count between 5 and 10) and ball_cnt BETWEEN 1 and 20)
)
)
group by 1,2,3,4
)
#
select 1 as col, count(*) from all_cluster
union all
#
select 2 as col, count(*) from valid_total_cluster
union all
# 0
select 3 as col, count(*) from valid_none_0_cluster
;
##### #####
select b1,b2,b3,b4,b5,b6,count(*) as ball_cnt
from recommend_ball
where no=1136
and b1>0
group by 1,2,3,4,5,6
order by 7 desc;
##### #####
with priority as (
select source, random_state, cluster, cluster_count, win_count, priority
from cluster_info
where priority not in (99)
),
recommend as (
select source, random_state, cluster, b1,b2,b3,b4,b5,b6
from recommend_ball
where b1 > 0
and no=1136
),
recommend_count as (
select source, random_state, cluster, count(*) as ball_cnt
from recommend_ball
where b1 > 0
and no=1136
group by 1,2,3
),
raw_data as (
select r.source, r.random_state, r.cluster, p.cluster_count, p.win_count, p.priority, r.b1,r.b2,r.b3,r.b4,r.b5,r.b6, rc.ball_cnt
from recommend r
left join priority p on r.source=p.source and r.random_state=p.random_state and r.cluster=p.cluster
left join recommend_count rc on r.source=rc.source and r.random_state=rc.random_state and r.cluster=rc.cluster
),
candidate as (
select source, random_state, cluster, cluster_count, win_count, priority, b1,b2,b3,b4,b5,b6, ball_cnt
from raw_data
where (
(source = 0 and b1=7)
or (source = 1 and priority=-1 and ball_cnt<=140 and (
b1 not in (13, 19, 28)
and b2 not in (13, 19, 28)
and b3 not in (13, 19, 28)
and b4 not in (13, 19, 28)
and b5 not in (13, 19, 28)
and b6 not in (13, 19, 28)
)
)
or (source = 3 and priority=-1 and ball_cnt<=150 and (
b1 not in (13, 19, 28)
and b2 not in (13, 19, 28)
and b3 not in (13, 19, 28)
and b4 not in (13, 19, 28)
and b5 not in (13, 19, 28)
and b6 not in (13, 19, 28)
)
)
)
)
#select source, random_state,cluster,b1,b2,b3,b4,b5,b6 from candidate order by 4,5,6,7,8,9;
, duplication as (
# 34
select source, random_state, cluster, cluster_count, win_count, priority, b1,b2,b3,b4,b5,b6, ball_cnt
from (
select source, random_state, cluster, cluster_count, win_count, priority, b1,b2,b3,b4,b5,b6, ball_cnt,
ROW_NUMBER() OVER(PARTITION BY b1,b2,b3,b4,b5,b6 ORDER BY b1,b2,b3,b4,b5,b6) AS rnk
from candidate
) a
where rnk=1
order by source,random_state,cluster,b1,b2,b3,b4,b5,b6
)
select count(*) as cnt from duplication;
#select source, priority, random_state, cluster, win_count, count(*) as cnt from duplication group by 1,2,3;
#select b1, count(*) as ball_cnt from duplication group by 1
#select b6, count(*) as ball_cnt from duplication group by 1
#select source,random_state,cluster,b1,b2,b3,b4,b5,b6 from duplication order by 4,5,6,7,8,9;
```
로또는 무작위에 가깝고, 본 저장소의 필터는 **구매 조합 수를 줄이기 위한 휴리스틱**이며 당첨을 보장하지 않습니다.

8
filter_model.py Normal file
View File

@@ -0,0 +1,8 @@
"""
로또 필터 로직은 `final_BallFilter.BallFilter`에 구현되어 있습니다.
학습·검증 스크립트와 동일한 클래스를 쓰도록 이 모듈에서 재노출합니다.
"""
from final_BallFilter import BallFilter
__all__ = ["BallFilter"]

File diff suppressed because it is too large Load Diff

View File

@@ -1,50 +1,38 @@
# -*- coding: utf-8 -*-
"""
학습(1~800) / 검증(801~1000) / 테스트(1001~) 구간별 필터 통과(당첨번호가 필터를 통과하는지) 분석.
1_FilterTest_25.py 와 동일한 흐름이며 BallFilter 대신 final_BallFilter.BallFilter 를 사용합니다.
실행: miniconda 환경 ncue 에서 `python final_filterTest.py` (README 참고).
`1_FilterTest_25.py`와 동일한 역할이며 `final_BallFilter.BallFilter` + `lotto_history.txt`를 사용합니다.
"""
from __future__ import annotations
import datetime
import argparse
import itertools
import os
import random
import time
import datetime
import pandas as pd
from final_BallFilter import BallFilter
# PROMPT.txt 기준 구간
TRAIN_NO = (1, 800)
VALID_NO = (801, 1000)
TEST_NO = (1001, 10**9)
class FilterTest:
def __init__(self, resources_path: str):
lotto_json = os.path.join(resources_path, "lotto_history.json")
self.ballFilter = BallFilter(lotto_json)
ballFilter = None
def find_filter_method(self, df_ball, filter_ball=None, no_min=None, no_max=None):
"""no_min~no_max 회차만 역순으로 검사 (None 이면 전체)."""
def __init__(self, resources_path):
lotto_path = os.path.join(resources_path, "lotto_history.txt")
self.ballFilter = BallFilter(lotto_path)
def find_filter_method(self, df_ball, filter_ball=None):
win_count = 0
no_filter_ball = {}
printLog = True
filter_dic = {}
filter_dic_len = {}
filter_dic_1 = {}
filter_dic_2 = {}
idx_list = list(range(len(df_ball) - 1, 19, -1))
for i in idx_list:
no = int(df_ball["no"].iloc[i])
if no_min is not None and no < no_min:
continue
if no_max is not None and no > no_max:
continue
for i in range(len(df_ball) - 1, 19, -1):
no = df_ball["no"].iloc[i]
answer = df_ball[df_ball["no"] == no].values.tolist()[0]
answer = answer[1:7]
answer = sorted(int(x) for x in answer[1:7])
filter_type = self.ballFilter.filter(ball=answer, no=no, until_end=True, df=df_ball)
filter_type = list(filter_type)
@@ -53,13 +41,20 @@ class FilterTest:
if size == 0:
win_count += 1
no_filter_ball[no] = answer
print("\t", no)
elif size == 1:
key = filter_type[0]
filter_dic_1[key] = filter_dic_1.get(key, 0) + 1
if printLog:
print("\t", no, filter_type)
elif size == 2:
key = ",".join(filter_type)
filter_dic_2[key] = filter_dic_2.get(key, 0) + 1
if printLog:
print("\t", no, filter_type)
else:
if printLog:
print("\t", no, filter_type)
if size not in filter_dic_len:
filter_dic_len[size] = []
filter_dic_len[size].append(filter_type)
@@ -67,46 +62,106 @@ class FilterTest:
for f_t in filter_type:
filter_dic[f_t] = filter_dic.get(f_t, 0) + 1
print("\n\t[구간 {}~{}] 필터에 걸리지 않은 회차 (당첨 조합 통과)]".format(no_min, no_max))
print("\tcount: {:,} (통과)".format(len(no_filter_ball)))
for no in sorted(no_filter_ball.keys()):
print("\n\t[필터 개수가 적은 것부터 최적화를 위함]")
sorted_filter_dic_len = sorted(filter_dic_len.keys())
for filter_count in sorted_filter_dic_len:
for filter_type in filter_dic_len[filter_count]:
print("\t\t>{} > {}".format(filter_count, filter_type))
print("\n\t[걸러진 유일 필터]")
sorted_filter_dic_1 = sorted(filter_dic_1.items(), key=lambda x: x[1], reverse=True)
for i in range(len(sorted_filter_dic_1)):
print("\t\t>", sorted_filter_dic_1[i][0], "->", sorted_filter_dic_1[i][1])
print("\n\t[2개 필터에 걸린 경우]")
sorted_filter_dic_2 = sorted(filter_dic_2.items(), key=lambda x: x[1], reverse=True)
for i in range(len(sorted_filter_dic_2)):
print("\t\t>", sorted_filter_dic_2[i][0], "->", sorted_filter_dic_2[i][1])
print("\n\t[Filter 유형 별 걸린 개수]")
sorted_filter_dic = sorted(filter_dic.items(), key=lambda x: x[1], reverse=True)
for i in range(len(sorted_filter_dic)):
print("\t\t>", sorted_filter_dic[i][0], "->", sorted_filter_dic[i][1])
print("\n\t# 필터에 걸리지 않고 당첨된 회차")
print("\tcount: {:,} / total: {:,}".format(len(no_filter_ball), len(df_ball)))
for no in no_filter_ball:
print("\t\t>", no, no_filter_ball[no])
print("\tcount: {:,} / total: {:,}".format(len(no_filter_ball), len(df_ball)))
return win_count, no_filter_ball
return win_count
def report_split(self, df_ball, name: str, lo: int, hi: int):
print("\n" + "=" * 60)
print(" {} | 회차 {} ~ {}".format(name, lo, hi))
print("=" * 60)
t0 = time.time()
wc, _ = self.find_filter_method(df_ball, no_min=lo, no_max=hi)
elapsed = datetime.timedelta(seconds=time.time() - t0)
span = hi - lo + 1
rate = (wc / span * 100) if span else 0
print("\t처리 시간: {}".format(elapsed))
print("\t통과 회차 수: {} / {} ({:.2f}%)".format(wc, span, rate))
if lo >= TRAIN_NO[0] and hi <= TRAIN_NO[1]:
need = max(1, span // 100)
print("\t(참고) 100회당 최소 1회 기준 대략 {}회 이상이면 충족".format(need))
if lo >= VALID_NO[0] and hi <= VALID_NO[1]:
print("\t(참고) 검증 200회 구간에서 최소 3회 이상이면 요구사항 예시 충족")
return wc
def find_final_candidates(self, no, df_ball, filter_ball=None):
final_candidates = []
generation_balls = list(range(1, 46))
nCr = list(itertools.combinations(generation_balls, 6))
for idx, ball in enumerate(nCr):
if idx % 1000000 == 0:
print(" - {} processed...".format(idx))
if filter_ball is not None and 0 < len(set(ball) & set(filter_ball)):
continue
filter_type = self.ballFilter.filter(ball=list(ball), no=no, until_end=False, df=df_ball)
if filter_type:
continue
final_candidates.append(ball)
return final_candidates
def check_filter_method(self, df_ball, p_win_count, filter_ball=None):
win_count = 0
for i in range(len(df_ball) - 1, 0, -1):
no = df_ball["no"].iloc[i]
answer = df_ball[df_ball["no"] == no].values.tolist()[0]
answer = sorted(int(x) for x in answer[1:7])
if filter_ball is not None and len(set(answer) & set(filter_ball)):
continue
filter_type = self.ballFilter.extract_final_candidates(answer, no=no, until_end=True, df=df_ball)
if len(filter_type) == 0:
win_count += 1
print("\t\t>{}. {}".format(no, answer))
print("\n\t> {} / {} p_win_count, {} total".format(win_count, p_win_count, len(df_ball) - 1))
def estimate_survivors_mc(self, no, df_ball, n_samples=8000, seed=0):
"""전수(814만) 대신 무작위 조합으로 생존 비율을 추정해 대략적인 생존 개수를 반환합니다."""
rng = random.Random(seed)
generation_balls = list(range(1, 46))
total = 8145060
hits = 0
for _ in range(n_samples):
ball = sorted(rng.sample(generation_balls, 6))
fts = self.ballFilter.filter(ball=ball, no=no, until_end=False, df=df_ball)
if not fts:
hits += 1
est = int(round(total * (hits / n_samples)))
return est, hits, n_samples
if __name__ == "__main__":
resources_path = os.path.join(os.path.dirname(__file__), "resources")
csv_path = os.path.join(resources_path, "lotto_history.txt")
df_ball = pd.read_csv(csv_path, header=None)
parser = argparse.ArgumentParser()
parser.add_argument("--resources", default="resources")
parser.add_argument("--mc-no", type=int, default=None, help="생존 MC 추정을 할 회차 번호")
parser.add_argument("--mc-samples", type=int, default=8000)
args = parser.parse_args()
resources_path = args.resources
lottoHistoryFileName = os.path.join(resources_path, "lotto_history.txt")
df_ball = pd.read_csv(lottoHistoryFileName, header=None)
df_ball.columns = ["no", "b1", "b2", "b3", "b4", "b5", "b6", "bn"]
ft = FilterTest(resources_path)
filterTest = FilterTest(resources_path)
ft.report_split(df_ball, "학습 TRAIN", TRAIN_NO[0], TRAIN_NO[1])
ft.report_split(df_ball, "검증 VALID", VALID_NO[0], min(VALID_NO[1], int(df_ball["no"].max())))
if int(df_ball["no"].max()) >= TEST_NO[0]:
ft.report_split(
df_ball,
"테스트 TEST",
TEST_NO[0],
int(df_ball["no"].max()),
)
print("STEP #1. 필터 방법 추출")
start = time.time()
win_count = filterTest.find_filter_method(df_ball)
process_time = datetime.timedelta(seconds=time.time() - start)
print("process_time: ", process_time)
if args.mc_no is not None:
est, h, n = filterTest.estimate_survivors_mc(args.mc_no, df_ball, n_samples=args.mc_samples)
print(f"MC 생존 추정 (회차 {args.mc_no}): 약 {est}개 (표본 통과 {h}/{n})")

File diff suppressed because one or more lines are too long

View File

@@ -1,17 +1,9 @@
#!/usr/bin/env bash
# miniconda 환경 ncue에서 Python으로 인자 실행: ./scripts/run_with_ncue.sh final_filterTest.py
set -euo pipefail
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
cd "$ROOT"
for base in "${MINICONDA_HOME:-}" "$HOME/miniconda3" "$HOME/miniforge3" "$HOME/anaconda3" "$HOME/mambaforge"; do
[ -n "$base" ] || continue
c="$base/bin/conda"
if [ -x "$c" ]; then
exec "$c" run -n ncue -- python "$@"
fi
done
if [ -n "${CONDA_EXE:-}" ] && [ -x "$CONDA_EXE" ]; then
exec "$CONDA_EXE" run -n ncue -- python "$@"
fi
echo "conda ncue 환경을 찾지 못했습니다. 터미널에서: conda activate ncue && python \"\$@\"" >&2
exit 1
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
export PATH="${HOME}/miniconda3/bin:${HOME}/anaconda3/bin:/opt/anaconda3/bin:${PATH}"
source "$(conda info --base)/etc/profile.d/conda.sh"
conda activate ncue
cd "${REPO_ROOT}"
exec python "$@"

50
train.py Normal file
View File

@@ -0,0 +1,50 @@
"""
학습 구간(1~800회): 당첨번호가 필터를 모두 통과한 회차 수를 집계합니다.
최소 20회차 이후부터 통계(최근 N주 등)가 의미 있으므로 기본은 21~800회만 평가합니다.
"""
import argparse
import os
import pandas as pd
from final_BallFilter import BallFilter
def load_history(resources_path: str) -> pd.DataFrame:
path = os.path.join(resources_path, "lotto_history.txt")
df = pd.read_csv(path, header=None)
df.columns = ["no", "b1", "b2", "b3", "b4", "b5", "b6", "bn"]
return df
def run_train(resources_path: str, start_no: int, end_no: int) -> tuple[int, int, list[int]]:
df = load_history(resources_path)
hist_path = os.path.join(resources_path, "lotto_history.txt")
bf = BallFilter(hist_path)
wins = 0
total = 0
win_nos: list[int] = []
for no in range(start_no, end_no + 1):
sub = df[df["no"] == no]
if sub.empty:
continue
answer = sorted(int(x) for x in sub.iloc[0][1:7].tolist())
fts = bf.extract_final_candidates(answer, no=no, until_end=True, df=df)
total += 1
if len(fts) == 0:
wins += 1
win_nos.append(no)
return wins, total, win_nos
if __name__ == "__main__":
p = argparse.ArgumentParser()
p.add_argument("--resources", default=os.path.join(os.path.dirname(__file__), "resources"))
p.add_argument("--start-no", type=int, default=21)
p.add_argument("--end-no", type=int, default=800)
args = p.parse_args()
w, t, nos = run_train(args.resources, args.start_no, args.end_no)
rate = w / t if t else 0.0
print(f"학습 구간 당첨 통과: {w} / {t} ({rate:.4f})")
print(f"통과 회차: {nos}")

49
valid.py Normal file
View File

@@ -0,0 +1,49 @@
"""
검증 구간(801~1000회): 필터만 검사(학습으로 튜닝하지 않음).
"""
import argparse
import os
import pandas as pd
from final_BallFilter import BallFilter
def load_history(resources_path: str) -> pd.DataFrame:
path = os.path.join(resources_path, "lotto_history.txt")
df = pd.read_csv(path, header=None)
df.columns = ["no", "b1", "b2", "b3", "b4", "b5", "b6", "bn"]
return df
def run_valid(resources_path: str, start_no: int, end_no: int) -> tuple[int, int, list[int]]:
df = load_history(resources_path)
hist_path = os.path.join(resources_path, "lotto_history.txt")
bf = BallFilter(hist_path)
wins = 0
total = 0
win_nos: list[int] = []
for no in range(start_no, end_no + 1):
sub = df[df["no"] == no]
if sub.empty:
continue
answer = sorted(int(x) for x in sub.iloc[0][1:7].tolist())
fts = bf.extract_final_candidates(answer, no=no, until_end=True, df=df)
total += 1
if len(fts) == 0:
wins += 1
win_nos.append(no)
return wins, total, win_nos
if __name__ == "__main__":
p = argparse.ArgumentParser()
p.add_argument("--resources", default=os.path.join(os.path.dirname(__file__), "resources"))
p.add_argument("--start-no", type=int, default=801)
p.add_argument("--end-no", type=int, default=1000)
args = p.parse_args()
w, t, nos = run_valid(args.resources, args.start_no, args.end_no)
rate = w / t if t else 0.0
print(f"검증 구간 당첨 통과: {w} / {t} ({rate:.4f})")
print(f"통과 회차: {nos}")