Web STT: speaker diarization via pyannote; whisper_stt snapshot validation

- Add app/diarize.py: merge faster-whisper segments with pyannote (A/B/C)
- Wire /api/jobs and /api/transcribe; job API returns speaker_diarization, diarize_skip_reason
- UI: meta line shows diarization applied/skipped; hint for models path
- requirements.txt: pyannote.audio; README APP_DIARIZE / APP_PYANNOTE_MODEL_DIR
- whisper_stt.py: validate config.yaml before loading pipeline
- requirements-whisper-stt.txt: minor doc updates if any

Made-with: Cursor
This commit is contained in:
dosangyoon
2026-03-23 13:09:31 +09:00
parent c90230053a
commit 2e503d1a56
7 changed files with 285 additions and 8 deletions

View File

@@ -314,7 +314,8 @@
<div class="hint">
- 허용: mp3, m4a, wav, mp4, aac, ogg, flac, webm<br />
- 첫 실행 시 Whisper 모델 다운로드로 시간이 걸릴 수 있습니다.
- 첫 실행 시 Whisper 모델 다운로드로 시간이 걸릴 수 있습니다.<br />
- 완료 후 pyannote로 화자 구분을 시도합니다 (<code>models/pyannote-diarization-3.1</code> 필요).
</div>
<div class="progress">
@@ -613,7 +614,10 @@
const lang = body.detected_language ? `${body.detected_language}` : "-";
const prob = typeof body.language_probability === "number" ? body.language_probability.toFixed(3) : "-";
const dur = typeof body.duration_sec === "number" ? `${body.duration_sec.toFixed(1)}s` : "-";
metaEl.textContent = `감지 언어: ${lang} (p=${prob}), 오디오 길이: ${dur}`;
let diarizeMeta = "";
if (body.speaker_diarization === true) diarizeMeta = " · 화자 구분: 적용";
else if (body.diarize_skip_reason) diarizeMeta = " · 화자 구분: 생략";
metaEl.textContent = `감지 언어: ${lang} (p=${prob}), 오디오 길이: ${dur}${diarizeMeta}`;
if (startedAt) {
timingEl.textContent = `${((performance.now() - startedAt) / 1000).toFixed(2)}s`;