Client v5: BLE, BLE Hosting, HTTP, Jobs - Linux, MacOS, & Blazor Support! Full AOT, RX on BLE only & MANY other features! Power up!

Speech Releases

v2

2.1 - May 21, 2026

Feature

ISpeechToTextProvider.Error event — providers can surface non-fatal errors (e.g. transient network failures between chunked requests in continuous mode) without aborting the RecognizeAsync enumerator. CloudSpeechToText subscribes and forwards to the service-level ISpeechToTextService.Error event automatically. Azure / OpenAI / ElevenLabs providers updated to use it instead of throwing out of the enumerator

BREAKING Chore

ISpeechToTextProvider now requires implementers to expose an event EventHandler<SpeechRecognitionError>? Error; — existing custom providers must add the event declaration (it can be unraised for one-shot providers)

Feature

ElevenLabs Scribe speech-to-text provider — AddElevenLabsSpeech() now registers both STT and TTS, or use the new AddElevenLabsSpeechToText() helper. Buffers captured PCM, wraps in a WAV container, and posts a single request to /v1/speech-to-text; yields one final SpeechRecognitionResult per session

Feature

ElevenLabsConfig.SpeechToTextModel property (default scribe_v1) — configurable Scribe model id

BREAKING Chore

Renamed ElevenLabsConfig.ModelId → ElevenLabsConfig.TextToSpeechModel to disambiguate from the new SpeechToTextModel property

Fix

KeywordHeard no longer re-fires for the same final transcription within a 3-second window — eliminates duplicate keyword events caused by trailing-audio carry-over between recognition tasks (iOS SFSpeechRecognizer re-arm, Android SpeechRecognizer restart). Applied uniformly to Apple, Android, Browser, Windows, and CloudSpeechToText

Feature

ITextToSpeechService.AudioLevelChanged event and IsPlayerAnalysisSupported flag — normalized 0.0–1.0 RMS level for driving VU-meter UI during speech playback

Feature

IAudioPlayer.AudioLevelChanged event and IsPlayerAnalysisSupported flag — same RMS signal raised during generic audio playback (e.g. cloud TTS audio streams)

Enhancement iOS

Apple native TTS now routes AVSpeechSynthesizer through AVAudioEngine + AVAudioPlayerNode with a player-node tap, enabling AudioLevelChanged for built-in iOS / macOS / Mac Catalyst voices. Engine is created lazily on first speak and kept warm across utterances

Enhancement Android

Android native TTS taps UtteranceProgressListener.OnAudioAvailable to compute RMS from PCM bytes without rerouting playback

Enhancement Android

AndroidAudioPlayer attaches Android.Media.Audiofx.Visualizer to the MediaPlayer audio session for cloud TTS / generic playback metering (no RECORD_AUDIO permission needed for per-session capture; MODIFY_AUDIO_SETTINGS recommended)

Enhancement iOS

AppleAudioPlayer enables AVAudioPlayer.MeteringEnabled and polls AveragePower for VU metering during cloud / generic audio playback

Chore

CloudTextToSpeech forwards AudioLevelChanged and IsPlayerAnalysisSupported from the underlying IAudioPlayer — Azure / OpenAI / ElevenLabs / custom providers get VU metering for free

Enhancement iOS

CarPlay compatible — iOS audio session uses PlayAndRecord with AllowBluetooth, AllowBluetoothA2dp, and DefaultToSpeaker so audio automatically routes through the car’s microphone and speakers when CarPlay is active

2.0 - May 13, 2026

BREAKING Chore

ISpeechToTextService redesigned from IAsyncEnumerable-based to event-based Start/Stop model — ContinuousRecognize() and ListenUntilSilence() removed from the interface

Feature

Start(SpeechRecognitionOptions?) / Stop() methods — long-lived listening sessions with explicit lifecycle control; Start() throws if already listening, Stop() is a safe no-op

Feature

ResultReceived event — fires for every recognition result (partial and final) with full SpeechRecognitionResult including Text, IsFinal, and Confidence; multiple subscribers supported

Feature

KeywordHeard event — fires when a keyword from SpeechRecognitionOptions.Keywords is detected in a final result using case-insensitive whole-word matching

Feature

Error event — fires on recognition errors with SpeechRecognitionError containing Message and optional Exception

Feature

SpeechRecognitionError record — new type for error reporting via the Error event

Feature

SpeechRecognitionOptions.Keywords property (string[]?) — built-in keyword detection at the platform level; keywords are matched with compiled regex on final results

BREAKING Chore

ListenWithWakeWord() extension method removed — replaced by StatementAfterKeyword()

BREAKING Chore

ListenForKeyword() extension method removed — replaced by WaitListenForKeywords() and ListenForKeywords()

Feature

ListenUntilSilence() extension method — starts listening, waits for first final result, then stops; replaces the former interface method

Feature

StatementAfterKeyword(string[]) extension method — waits for a keyword to be heard, then returns the next final statement (replaces ListenWithWakeWord)

Feature

WaitListenForKeywords(string[], TimeSpan?) extension method — returns the first keyword heard with optional timeout

Feature

ListenForKeywords(string[]) extension method — yields keywords continuously as IAsyncEnumerable<string>

Enhancement

All extension methods handle Start/Stop/event wiring automatically — no manual lifecycle management needed for simple scenarios

Enhancement

Multiple classes can now subscribe to speech recognition events simultaneously — eliminates the single-consumer limitation of IAsyncEnumerable

Enhancement

Cloud provider (CloudSpeechToText) adapted to consume ISpeechToTextProvider.RecognizeAsync() internally on a background task and raise events — ISpeechToTextProvider interface unchanged

v1

1.2.1 - May 11, 2026

Feature

OpenAI cloud provider — AddOpenAiSpeech() registers OpenAI STT (Whisper / GPT-4o Transcribe) and TTS (GPT-4o Mini TTS) with configurable model and voice selection

Feature

OpenAI TTS supports 10 built-in voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer

Feature

OpenAI STT/TTS follows the same cloud provider pattern as Azure and ElevenLabs — platform IAudioSource and IAudioPlayer handle audio I/O

Feature

Microsoft.Extensions.AI adapter — AddShinySpeechClients() exposes any registered cloud provider as ISpeechToTextClient and ITextToSpeechClient from Microsoft.Extensions.AI

Feature

AddShinySpeechToTextClient() / AddShinyTextToSpeechClient() for registering M.E.AI adapters individually

Feature

M.E.AI streaming support — GetStreamingTextAsync() emits SessionOpen, TextUpdating, TextUpdated, SessionClose update kinds mapped from SpeechRecognitionResult.IsFinal

Feature

M.E.AI TTS streaming — GetStreamingAudioAsync() emits SessionOpen, AudioUpdated, SessionClose update kinds with audio as DataContent

Feature

M.E.AI options mapping — SpeechToTextOptions.SpeechLanguage → CultureInfo, TextToSpeechOptions voice/speed/pitch/volume mapped to Shiny equivalents

1.2 - May 6, 2026

Feature WASM

Browser IAudioSource implementation — raw PCM microphone capture via the Web Audio API (getUserMedia + ScriptProcessorNode), downsampled to 16kHz 16-bit mono, matching the output format of Android, iOS, and Windows

Enhancement WASM

Cloud STT providers (Azure, custom) now work in the browser — IAudioSource provides the raw audio stream that CloudSpeechToText requires

1.1.2 - May 6, 2026

Enhancement

Cloud provider extensions (AddAzureSpeech, AddElevenLabsTextToSpeech, AddCloudSpeechToText, AddCloudTextToSpeech) now automatically register IAudioSource and IAudioPlayer — manual AddAudioSource() / AddAudioPlayer() calls are no longer required

1.1.1 - May 5, 2026

Feature

ISpeechToTextService.IsListening property — indicates whether speech recognition is currently active, analogous to ITextToSpeechService.IsSpeaking

1.1 - May 4, 2026

Feature

ListenWithWakeWord() extension method — “Hey Siri” style wake word activation that continuously listens for a wake phrase, then captures the spoken command after it until silence

Feature

ListenForKeyword() extension method — listens continuously until one of the specified keywords is detected (case-insensitive, whole-word matching), returns the matched keyword

Feature

Wake word supports pause-then-speak — if the user says the wake phrase and pauses before speaking, the method waits for the next utterance as the command

Feature

Both methods are extension methods on ISpeechToTextService composing over ContinuousRecognize — no platform-specific code changes required

Feature

Sample apps updated with Wake Word and Keyword listening modes (MAUI + Blazor)

1.0 - May 2, 2026

Feature

ISpeechToTextService interface — platform-native speech recognition with permission management, continuous streaming, and listen-until-silence modes

Feature

ITextToSpeechService interface — platform-native text-to-speech with voice selection, speech rate, pitch, and volume control

Feature

IAudioSource interface — raw PCM audio capture from the device microphone (16kHz, 16-bit, mono)

Feature

IAudioPlayer interface — MP3 audio stream playback with play/stop control

Feature

SpeechRecognitionOptions — configurable culture, silence timeout, and on-device preference for STT

Feature

TextToSpeechOptions — configurable culture, voice, speech rate, pitch, and volume for TTS

Feature

ContinuousRecognize() — streaming recognition results via IAsyncEnumerable<SpeechRecognitionResult> with partial and final results

Feature

ListenUntilSilence() — simple dictation mode that returns the final transcription after silence is detected

Feature

GetVoicesAsync() — enumerate available TTS voices with optional culture filtering

Feature

AddSpeechServices() — single extension method to register all core services (STT, TTS, AudioSource, AudioPlayer)

Feature Android

Android STT implementation using SpeechRecognizer with streaming partial results

Feature Android

Android TTS implementation using Android.Speech.Tts.TextToSpeech

Feature Android

Android audio capture via AudioRecord with 16kHz PCM streaming

Feature Android

Android audio playback via MediaPlayer

Feature iOS

iOS STT implementation using SFSpeechRecognizer with SFSpeechAudioBufferRecognitionRequest

Feature iOS

iOS TTS implementation using AVSpeechSynthesizer

Feature iOS

iOS audio capture via AVAudioEngine with PCM tap

Feature iOS

iOS audio playback via AVAudioPlayer

Feature

Cloud provider abstraction — ISpeechToTextProvider and ITextToSpeechProvider interfaces for pluggable cloud backends

Feature

CloudSpeechToText and CloudTextToSpeech — bridge classes that combine platform audio with cloud provider APIs

Feature

AddCloudSpeechToText<T>() and AddCloudTextToSpeech<T>() — generic DI registration for custom cloud providers

Feature

Azure AI Speech provider — AddAzureSpeech() registers Azure STT and/or TTS with subscription key and region

Feature

Azure TTS with SSML prosody control — speech rate, pitch, and volume mapped to SSML elements

Feature

ElevenLabs TTS provider — AddElevenLabsTextToSpeech() registers ElevenLabs cloud TTS with configurable voice and model

Feature

PipeStream utility — thread-safe producer-consumer stream using System.IO.Pipelines for bridging audio capture with cloud providers

Feature WASM

Browser/WebAssembly support — STT and TTS via Web Speech API, auto-detected at runtime via OperatingSystem.IsBrowser()

Feature WASM

Browser STT implementation using SpeechRecognition API with streaming partial and final results

Feature WASM

Browser TTS implementation using SpeechSynthesis API with voice selection, rate, pitch, and volume control

Feature WASM

Browser audio playback via HTML5 Audio element with base64 data URL conversion

Feature

Blazor WebAssembly sample app demonstrating STT, TTS, and voice listing