No description
  • Python 82.2%
  • Nix 17.8%
Find a file
2026-06-13 17:01:40 +03:00
.gitignore fresh with nemo model 2026-06-13 17:01:40 +03:00
.python-version fresh with nemo model 2026-06-13 17:01:40 +03:00
devenv.lock fresh with nemo model 2026-06-13 17:01:40 +03:00
devenv.nix fresh with nemo model 2026-06-13 17:01:40 +03:00
devenv.yaml fresh with nemo model 2026-06-13 17:01:40 +03:00
main.py fresh with nemo model 2026-06-13 17:01:40 +03:00
pyproject.toml fresh with nemo model 2026-06-13 17:01:40 +03:00
README.md fresh with nemo model 2026-06-13 17:01:40 +03:00
utils.py fresh with nemo model 2026-06-13 17:01:40 +03:00

SU-ASR: Streaming ASR Service

A minimal, file-driven Server-Sent Events (SSE) service for real-time Automatic Speech Recognition (ASR) using NVIDIA NeMo's cache-aware streaming models.

Features

  • Real-time Streaming: Processes audio in chunks and streams partial transcriptions via SSE.
  • Cache-Aware: Utilizes NeMo's CacheAwareStreamingAudioBuffer for high-accuracy streaming inference.
  • Minimalist Design: Built with FastAPI and organized into composable utils.py and main.py files.
  • Model Caching: Automatically downloads and caches the nvidia/nemotron-3.5-asr-streaming-0.6b model for efficient startup.

Prerequisites

  • Python 3.10+
  • NVIDIA GPU (recommended for low-latency streaming) or CPU support.

Installation

pip install -e .

Usage

1. Start the Server

The server will automatically download the model on the first run and cache it locally.

python main.py

2. Client Integration

Send an audio file (.wav) to the /asr/stream endpoint. The response will be a stream of JSON objects containing partial and final transcriptions.

Example JavaScript Client:

const formData = new FormData();
formData.append('file', audioBlob);

const response = await fetch('http://localhost:8000/asr/stream', { method: 'POST', body: formData });
const reader = response.body.getReader();
// ... handle ReadableStream for SSE data ...

Project Structure

  • main.py: FastAPI application entry point and SSE endpoint logic.
  • utils.py: NeMo model initialization, cache management, and chunked inference logic.
  • pyproject.toml: Project dependencies and build configuration.

References

This implementation is based on the official NVIDIA NeMo cache-aware streaming example: NeMo Cache-Aware Streaming Inference Script