dsyoon/stt

Go to file

dosangyoon 1c25bed926 Add whisper_stt CLI with default diarization, Ubuntu README, editor config

- Replace test.py with whisper_stt.py: OpenAI Whisper + default speaker diarization
  via local ./models/pyannote-diarization-3.1; --no-diarize for plain text
- Add requirements-whisper-stt.txt (whisper, pyannote, huggingface_hub, imageio-ffmpeg)
- README: stt conda env, Ubuntu/macOS ffmpeg, CLI usage
- .vscode: Python interpreter /opt/anaconda3/envs/stt; .cursor rule for stt env
- .gitignore: exclude downloaded pyannote snapshot under models/

Made-with: Cursor

2026-03-23 11:34:46 +09:00

.cursor/rules

Add whisper_stt CLI with default diarization, Ubuntu README, editor config

2026-03-23 11:34:46 +09:00

.vscode

Add whisper_stt CLI with default diarization, Ubuntu README, editor config

2026-03-23 11:34:46 +09:00

app

Initial commit after re-install

2026-02-25 19:07:56 +09:00

resources/uploads

Initial commit after re-install

2026-02-25 19:07:56 +09:00

sql

Initial commit after re-install

2026-02-25 19:07:56 +09:00

.gitignore

Add whisper_stt CLI with default diarization, Ubuntu README, editor config

2026-03-23 11:34:46 +09:00

environment.yml

Initial commit after re-install

2026-02-25 19:07:56 +09:00

README.md

Add whisper_stt CLI with default diarization, Ubuntu README, editor config

2026-03-23 11:34:46 +09:00

requirements-whisper-stt.txt

Add whisper_stt CLI with default diarization, Ubuntu README, editor config

2026-03-23 11:34:46 +09:00

requirements.txt

Add OpenAI Whisper test script with ffmpeg fallback via imageio-ffmpeg

2026-03-23 10:43:30 +09:00

run.sh

Initial commit after re-install

2026-02-25 19:07:56 +09:00

whisper_stt.py

Add whisper_stt CLI with default diarization, Ubuntu README, editor config

2026-03-23 11:34:46 +09:00

README.md

Web STT (mp3/m4a 업로드 → 텍스트 변환)

구성

백엔드: FastAPI (업로드/검증/STT 수행)
STT 엔진: faster-whisper (Whisper 모델)
프론트: 단일 HTML (파일 선택 → 전사 → 결과 표시/다운로드)
선택 CLI: whisper_stt.py — OpenAI Whisper 기반 로컬 전사(기본: 화자 구분, 로컬 ./models/pyannote-diarization-3.1)

동작 개요 (pseudocode)

UI:
  onSelect(file):
    validate client-side (extension)
    enable "전사" 버튼

  onClickTranscribe():
    POST /api/transcribe (multipart/form-data, file, options)
    show progress (업로드 중 / 처리 중)
    render returned text + segments
    allow download as .txt

API:
  POST /api/transcribe:
    if no file -> 400
    validate mime/ext in allowed audio types -> 415 if not
    save to temp file
    run STT(model, language, vad_filter, beam_size, ...)
    return { text, segments[], detected_language, duration_sec }
    cleanup temp file

사전 요구 사항

Ubuntu (22.04 / 24.04 등)

오디오 디코딩과 일부 Python 패키지 빌드에 쓰입니다.

sudo apt update
sudo apt install -y ffmpeg build-essential

ffmpeg: faster-whisper·Whisper가 mp3/m4a 등을 읽을 때 필요합니다. (apt로 설치하는 편이 가장 단순합니다.)
build-essential: 소스/휠 빌드가 필요한 의존성이 있을 때 도움이 됩니다.

선택(GPU로 faster-whisper 등을 쓸 때):

NVIDIA 드라이버 및 CUDA는 NVIDIA 문서에 맞게 설치합니다.
이 저장소 기본값은 CPU입니다. GPU 사용 시 APP_WHISPER_DEVICE·APP_WHISPER_COMPUTE_TYPE 등을 환경에 맞게 조정하세요.

macOS

brew install ffmpeg

pip의 imageio-ffmpeg만으로도 CLI 쪽 보조는 가능하지만, 서버·도구 공통으로 시스템 ffmpeg 설치를 권장합니다.

Python 환경 (Conda 권장)

이 프로젝트는 conda 환경 stt (Python 3.11) 사용을 권장합니다. (Cursor/VS Code는 .vscode/settings.json에 인터프리터 경로가 있습니다.)

1) `stt` 생성 및 웹 서버 의존성

conda create -n stt python=3.11 -y
conda activate stt
pip install -r requirements.txt

2) (선택) 로컬 전사 CLI — `whisper_stt.py`

conda activate stt
pip install -r requirements-whisper-stt.txt

Hugging Face hf CLI: pip install huggingface_hub 후 hf auth login, hf download … (화자 구분용 pyannote 모델 등).
화자 구분(기본 켜짐): ./models/pyannote-diarization-3.1 에 pyannote 스냅샷이 있어야 합니다. 없으면 스크립트가 hf download 안내 후 종료합니다. 모델 받기: pyannote/speaker-diarization-3.1 약관 동의 후 hf auth login, hf download … --local-dir ./models/pyannote-diarization-3.1. 다른 경로는 --diarize-model-dir 또는 WHISPER_DIARIZE_MODEL_DIR 로 지정.
화자 구분 끄기: python whisper_stt.py 입력.m4a 출력.txt --no-diarize (Whisper 통문만 저장)

python whisper_stt.py 입력.m4a 출력.txt
python whisper_stt.py 입력.m4a 출력.txt --no-diarize
python whisper_stt.py 입력.m4a 출력.txt --diarize-model-dir /다른/경로/pyannote-diarization-3.1

대안: `environment.yml` (환경 이름 `ncue`, conda에 `ffmpeg` 포함)

conda env create -f environment.yml
conda activate ncue

pip 의존성은 requirements.txt를 통해 설치됩니다. 팀에서 이미 ncue를 쓰는 경우에만 사용해도 됩니다.

서버 실행

conda activate stt   # 또는 ncue
uvicorn app.main:app --reload --host 127.0.0.1 --port 8025

브라우저에서 http://127.0.0.1:8025 접속.

옵션·환경 변수

모델: 기본 small (정확도/속도 균형). APP_WHISPER_MODEL=base|small|medium|large-v3 등으로 변경 가능.
디바이스: 기본 CPU. Apple Silicon에서 Metal은 faster-whisper 단독으로는 제한이 있어 CPU 기본값을 권장.
기타: APP_WHISPER_DEVICE, APP_WHISPER_COMPUTE_TYPE, 업로드 크기 등은 app/main.py 및 .env 예시를 참고.

플랫폼 요약

항목	Ubuntu	macOS
`ffmpeg`	`sudo apt install ffmpeg`	`brew install ffmpeg`
Python	Conda `stt` 권장	동일
웹 STT	`pip install -r requirements.txt`	동일
`whisper_stt.py`	`pip install -r requirements-whisper-stt.txt`	동일

README.md

Web STT (mp3/m4a 업로드 → 텍스트 변환)

구성

동작 개요 (pseudocode)

사전 요구 사항

Ubuntu (22.04 / 24.04 등)

macOS

Python 환경 (Conda 권장)

1) stt 생성 및 웹 서버 의존성

2) (선택) 로컬 전사 CLI — whisper_stt.py

대안: environment.yml (환경 이름 ncue, conda에 ffmpeg 포함)

서버 실행

옵션·환경 변수

플랫폼 요약

1) `stt` 생성 및 웹 서버 의존성

2) (선택) 로컬 전사 CLI — `whisper_stt.py`

대안: `environment.yml` (환경 이름 `ncue`, conda에 `ffmpeg` 포함)