Web STT: speaker diarization via pyannote; whisper_stt snapshot validation

- Add app/diarize.py: merge faster-whisper segments with pyannote (A/B/C) - Wire /api/jobs and /api/transcribe; job API returns speaker_diarization, diarize_skip_reason - UI: meta line shows diarization applied/skipped; hint for models path - requirements.txt: pyannote.audio; README APP_DIARIZE / APP_PYANNOTE_MODEL_DIR - whisper_stt.py: validate config.yaml before loading pipeline - requirements-whisper-stt.txt: minor doc updates if any Made-with: Cursor
2026-03-23 13:09:31 +09:00
parent c90230053a
commit 2e503d1a56
7 changed files with 285 additions and 8 deletions
--- a/app/static/index.html
+++ b/app/static/index.html
@@ -314,7 +314,8 @@

          <div class="hint">
            - 허용: mp3, m4a, wav, mp4, aac, ogg, flac, webm<br />
-            - 첫 실행 시 Whisper 모델 다운로드로 시간이 걸릴 수 있습니다.
+            - 첫 실행 시 Whisper 모델 다운로드로 시간이 걸릴 수 있습니다.<br />
+            - 완료 후 pyannote로 화자 구분을 시도합니다 (<code>models/pyannote-diarization-3.1</code> 필요).
          </div>

          <div class="progress">
@@ -613,7 +614,10 @@
        const lang = body.detected_language ? `${body.detected_language}` : "-";
        const prob = typeof body.language_probability === "number" ? body.language_probability.toFixed(3) : "-";
        const dur = typeof body.duration_sec === "number" ? `${body.duration_sec.toFixed(1)}s` : "-";
-        metaEl.textContent = `감지 언어: ${lang} (p=${prob}), 오디오 길이: ${dur}`;
+        let diarizeMeta = "";
+        if (body.speaker_diarization === true) diarizeMeta = " · 화자 구분: 적용";
+        else if (body.diarize_skip_reason) diarizeMeta = " · 화자 구분: 생략";
+        metaEl.textContent = `감지 언어: ${lang} (p=${prob}), 오디오 길이: ${dur}${diarizeMeta}`;

        if (startedAt) {
          timingEl.textContent = `${((performance.now() - startedAt) / 1000).toFixed(2)}s`;