interface / externals / tts_stt / voice_server.py
FastAPI service that runs on a Jetson / Raspberry Pi and exposes Whisper STT and Piper TTS to the agent.
Endpoints
| Method ยท Path | Behaviour |
|---|---|
| POST /stt | Accepts an UploadFile (audio/wav). WhisperModel large-v3-turbo on CUDA float16. Beam size 2, vad_filter=False, condition_on_previous_text=False. Returns {"text": "..."}. |
| POST /tts | Accepts {"text": "..."}. Subprocess call to Piper en_US-amy-medium.onnx (CUDA). Streams audio/wav. |
Models & paths
- Whisper:
large-v3-turbo(16 kHz mono optimised). - Piper binary: venv-isolated under
/home/vasilis/Data/AI/voice_server/voice_env/bin/piper. - Piper voice:
models/en_US-amy-medium.onnxrelative to working dir.
Concurrency
asyncio for HTTP, subprocess.Popen for Piper (blocking).
Dependencies
faster_whisper(WhisperModel)- CUDA (Jetson)
- Piper binary in isolated venv