No description

Python 82.2%
Nix 17.8%

Find a file

DenisSud 8ff53d9151 fresh with nemo model		2026-06-13 17:01:40 +03:00
.gitignore	fresh with nemo model	2026-06-13 17:01:40 +03:00
.python-version	fresh with nemo model	2026-06-13 17:01:40 +03:00
devenv.lock	fresh with nemo model	2026-06-13 17:01:40 +03:00
devenv.nix	fresh with nemo model	2026-06-13 17:01:40 +03:00
devenv.yaml	fresh with nemo model	2026-06-13 17:01:40 +03:00
main.py	fresh with nemo model	2026-06-13 17:01:40 +03:00
pyproject.toml	fresh with nemo model	2026-06-13 17:01:40 +03:00
README.md	fresh with nemo model	2026-06-13 17:01:40 +03:00
utils.py	fresh with nemo model	2026-06-13 17:01:40 +03:00

README.md

SU-ASR: Streaming ASR Service

A minimal, file-driven Server-Sent Events (SSE) service for real-time Automatic Speech Recognition (ASR) using NVIDIA NeMo's cache-aware streaming models.

Features

Real-time Streaming: Processes audio in chunks and streams partial transcriptions via SSE.
Cache-Aware: Utilizes NeMo's CacheAwareStreamingAudioBuffer for high-accuracy streaming inference.
Minimalist Design: Built with FastAPI and organized into composable utils.py and main.py files.
Model Caching: Automatically downloads and caches the nvidia/nemotron-3.5-asr-streaming-0.6b model for efficient startup.

Prerequisites

Python 3.10+
NVIDIA GPU (recommended for low-latency streaming) or CPU support.

Installation

pip install -e .

Usage

1. Start the Server

The server will automatically download the model on the first run and cache it locally.

python main.py

2. Client Integration

Send an audio file (.wav) to the /asr/stream endpoint. The response will be a stream of JSON objects containing partial and final transcriptions.

Example JavaScript Client:

const formData = new FormData();
formData.append('file', audioBlob);

const response = await fetch('http://localhost:8000/asr/stream', { method: 'POST', body: formData });
const reader = response.body.getReader();
// ... handle ReadableStream for SSE data ...

Project Structure

main.py: FastAPI application entry point and SSE endpoint logic.
utils.py: NeMo model initialization, cache management, and chunked inference logic.
pyproject.toml: Project dependencies and build configuration.

References

This implementation is based on the official NVIDIA NeMo cache-aware streaming example: NeMo Cache-Aware Streaming Inference Script