Added Cartesia Ink 2 as the default English Cartesia STT path, while keeping Ink Whisper available as a separate multilingual option with its own realtime protocol.
Expanded Faster Whisper model choices to Tiny, Base, Small, Medium, Large V3, and Large V3 Turbo, with model-specific cache/download handling and clearer UI descriptions.
Improved cloud STT vocabulary biasing with larger provider-specific budgets, clearer truncation behavior, and an explicit policy for terms that already have local correction variants.
Added an optional Follow Cursor Screen setting so the dock can move to the cursor's monitor when recording starts.
Increased recording-stop patience for slower speech sessions so finalization has more time to complete cleanly.
Prompts, Context & Vocabulary
Added the missing Context picker to Smart Dictation prompt settings, using the same prompt-to-Context association already used by Smart Actions.
Added AI-assisted vocabulary variation generation inside the dictionary editor, so users can draft likely misheard forms before explicitly saving them.
Simplified vocabulary guidance around the in-editor generation button and removed the obsolete vocabulary tutorial path and bundled vocabulary-prompt default.
Synced shipped prompt and Context assets with first-run defaults, onboarding fallback labels, default hotkeys, and prompt documentation.
Desktop, Settings & Onboarding
Added reusable Settings help tooltips with hover, keyboard focus, click-to-pin, outside-click close, and Escape close behavior.
Cleaned Settings copy and layout around License Activation, About contact cards, active-application prompt assignments, AI provider overrides, and Backspace cancel guidance for hold-style dictation.
Reworked the AI tutorial around prompt visibility screenshots, in-place demo hotkey editing, a shared two-column demo layout, and modal dismissal that avoids accidental backdrop exits.
Added the Smart Dictation Context modal UI and aligned Smart Actions and Smart Dictation modal label/icon treatment.
Tightened dock startup and reconnect behavior so startup expansion, Electron window shape, backend readiness, and collapsed-pill state stay in sync.
Fixed a dock startup expansion regression that could leave the dock open after readiness and added lifecycle coverage for reload and reconnect cases.
Performance & Release
Added Electron-main caches for prompts, STT models, and settings so popup menus can paint from known data while refreshing in the background.
Reduced renderer startup and Settings work by route-splitting styles, lazy-loading window surfaces, keeping only active/recent Settings sections mounted, and fetching prompt editor file content on demand.
Removed duplicated onboarding launcher cards, the general Twemoji parser dependency, stale dock startup timing logs, and unnecessary popup/window shadows.
Updated documentation for the cache handles, Settings loading behavior, tutorial launcher ownership, flag rendering, and startup logging expectations.
Repaired changelog source coverage for the release range and regenerated landing-page changelog data for the normal `00_build_release_lite_full.bat` patch-bump flow.
1.0.6
Speech & Dictation
Reworked Windows microphone capture around native shared-mode WASAPI, with a smaller logical device picker, faster default-mic startup, raw/event capture, DC filtering, stronger mono handling, and the same 24 kHz STT queue contract.
Improved Press & Hold finalization with clearer `release-drain` behavior, better Faster Whisper tail capture, restored Parakeet release flushing, and safer handling of quiet short phrases.
Added a lightweight default Parakeet model option plus an optional full-precision model, and changed live Parakeet batching so empty timed output keeps captured audio pending instead of dropping it or retrying the same buffer forever.
Removed the legacy MedASR engine from backend routing, build specs, dependencies, UI metadata, onboarding copy, tests, and documentation.
Tightened STT model switching so overlapping requests do not double-unload processors or start extra replacements.
Updated speech-engine metadata, badges, and resource labels so local CPU engines and cloud engines are described more accurately.
AI & Providers
Split LLM request orchestration into clearer owners, including a dedicated payload tracer, structured prompt payload sections, and a smaller LLM processor.
Improved provider error handling so Anthropic token budgets, streaming errors, empty responses, and secret redaction produce clearer backend diagnostics and safer user-facing messages.
Hardened OpenAI, Qwen, and MiniMax OAuth persistence so sign-in and refresh success are only reported after encrypted session storage succeeds.
Made AI model-list refresh more honest: failed fetches report visible error codes, reachable providers with zero models are cached as real results, and OAuth cache refresh failures are logged.
Kept saved API-key status visible even when OAuth is active, while preserving OAuth as the active credential.
Fixed Ask Mode so the preview hotkey always opens AI Preview, even when the active Smart Action already has preview enabled.
Prompts, Vocabulary & Workflow
Preserved valid Windows prompt and Context filename characters such as `+`, `&`, and emoji instead of silently deleting them.
Made prompt and Context creation reject duplicate names before writing placeholder files.
Tightened prompt manager behavior around file watching, prompt listing, linked Context resolution, rename safety, and prompt-backed LLM payloads.
Shared one backend `.txt` filename helper across prompts, Context files, and vocabulary dictionaries.
Hardened vocabulary dictionary loading and settings persistence so malformed enabled-dictionary names are rejected and dictionary files cannot drift from saved settings after a failed save.
Added optional recording-history audio sidecars so users who enable it can replay the original captured audio without forcing WAV accumulation for everyone.
Desktop & UI
Reduced dock startup work by lazy-loading non-dock routes and centralizing renderer IPC subscriptions so shared app events no longer create excess Electron listeners.
Reworked dock expand/collapse behavior into one lifecycle owner, improving shape synchronization and reducing brittle hover/animation state.
Restored preview notifications after the startup optimization so failed API calls, empty speech, and other preview toasts have a mounted listener.
Cleaned the Smart Dictation app picker by hiding obvious Windows/AppX helper entries and folding duplicate Start Menu, Desktop, and registry variants into one visible row.
Avoided heavier window-title capture when title context is disabled, so normal external app titles are not kept in raw context snapshots.
Kept high-frequency capture and Parakeet diagnostics out of normal terminal output while preserving opt-in debug categories.
Reliability, Settings & Licensing
Made settings persistence publish to memory only after disk writes succeed, centralized user-data path ownership, and tightened validation for vocabulary terms and AI quick lists.
Reduced settings-save side effects so background work only starts when vocabulary or Windows startup settings actually need it.
Made the Python-to-Electron bridge less likely to freeze during file, model, history, vocabulary, OAuth, or long-running LLM work.
Fixed backend readiness reporting so startup does not claim the app is ready just because the speech processor thread launched.
Improved clipboard paste cleanup, hotkey edge-case handling, and recording history path safety.
Made license activation/authorization treat secure-storage persistence as part of the state transition, and added timeout hardening for Dodo licensing calls.
Maintenance & Release
Added report-first backend audit checkpoints and applied fixes across audio manager, model manager, LLM providers, OpenAI OAuth, user messages, bridge handling, hotkeys, settings, vocabulary, STT manager, recording history, API-key manager, Dodo, and logging.
Updated backend packaging specs for new modules and removed obsolete MedASR packaging entries.
Added UTF-8 and PowerShell 7 repository defaults to reduce Windows text-encoding damage.
Cleaned stale documentation clutter and moved loose task/research material into the current docs structure.
Made same-version release redeploys skip changelog validation/upload/verification while keeping bumped or custom versions strict.
Prepared the 1.0.6 changelog source block and generated landing-page changelog data for the normal `00_build_release_lite_full.bat` patch-bump flow.
1.0.5
AI
Added xAI/Grok as an API-key AI provider with live language-model discovery, setup links, image-capability metadata, and refreshed default quick-list ordering.
Split LLM requests into clearly labeled sections so prompts, clipboard text, dictation transcripts, system context, and app context stay distinct.
Added workflow-specific system prompts for Smart Actions, Smart Dictation, AI Voice Command, and preview responses, while keeping normal paste output plain by default.
Prompts
Changed the dock prompt menu so Smart Actions and Smart Dictation appear side by side in one wider popup.
Kept each prompt surface's selection, overrides, hotkeys, preview toggle, and app-binding controls independent inside the shared menu.
Speech
Made recording cancellation stricter so delayed STT output after cancel is dropped instead of reaching history, preview, paste, or the target application.
Desktop
Kept the dock in its spinner-only startup placeholder until identity settings are hydrated, avoiding a brief fallback READY/red-dot shell during backend-first startup.
Improved LLM terminal tracing with one redacted request payload block that preserves section order and line breaks for debugging.
Licensing
Added Dodo stablecoin checkout support for lifetime purchases while leaving subscription checkout on the previous subscription-safe payload.
Support & Release
Prepared the 1.0.5 app version and landing-page changelog data before the build.
Fixed backend packaging specs so the new LLM system-prompt module is included in Lite, Full, and PyTorch frozen builds.
Documented release changelog preparation in `AGENTS.md` as part of the standard build procedure.
1.0.4
Features
Added per-prompt usage controls so a prompt can appear in Smart Actions, Smart Dictation, or both.
Added shared prompt-usage rules across Smart Actions, Smart Dictation, the dock prompt menu, dedicated action hotkeys, and Smart Dictation routing.
Added richer audio-device and recording-attempt diagnostics to bug reports without storing raw audio or transcript text.
Added session caching for the installed-app catalog while keeping manual refresh available for Smart Dictation app assignment.
Fixes
Fixed missing final words when releasing Press & Hold by draining final audio and provider responses more carefully across Kyutai, Deepgram, and Speechmatics.
Fixed Speechmatics stream shutdown while keeping Standard and Enhanced mapped to the intended provider models.
Fixed microphone selection after device changes by refreshing device metadata before recording starts.
Fixed prompt compatibility for existing users by treating missing `prompt_usage` settings as visible in both prompt surfaces.
Fixed the Lite + Full release wrapper so release notes are checked before version or upload questions appear.
Fixed release prep for older release-range commits with missing HCR metadata without rewriting pushed history.
Improvements
Improved speech-engine comparison badges, scores, and resource labels so local CPU engines and cloud engines are described more accurately.
Improved notification routing by moving full messages into the transcription preview overlay while keeping the dock focused on recording, timing, and compact status.
Improved Smart Dictation app assignment so the installed-app picker feels faster and visually steadier.
Improved fresh-install AI defaults so new users start on the current recommended fast OpenAI model.
Improved prompt wording by renaming user-facing "System Info" copy to "Context" while keeping existing internal files and settings compatible.
Improved prompt, configuration, hotkey, STT, Speechmatics, build/release, and style documentation for the current behavior.
1.0.3
Dictation
Made Press & Hold paste feedback faster by returning success as soon as text is pasted, while clipboard restoration continues safely in the background.
Fixed a Parakeet quick-release case where clearly captured speech could be discarded before the first sentence update, causing a false "No speech captured" result.
Kept the empty-speech warning readable while shortening only the successful "Pasted!" feedback.
Improved the light reformat prompt so noisy transcripts with mumbling, stutters, repetitions, and self-corrections are cleaned more naturally.
Support & Updates
Added recent recording-attempt diagnostics to bug reports, including stream start state, chunk counts, audio-energy summaries, speech-like activity, and transcript length without storing raw audio or transcript text.
Added the current recognized audio device list to bug reports, including microphone and WASAPI loopback entries, so support can see what the app detected before or after a failed recording.
Made startup update checks persist in the About panel even when Settings was closed, added a visible About-tab update indicator, and added a timeout fallback so update checks do not stay stuck forever.
Moved Report an Issue to the top of About, made it more prominent, and enlarged the bug-report modal for longer descriptions.
Appearance
Removed the disabled None visualizer option so recording always has an active visual effect.
Migrated old saved none visualizer settings back to the default ripple visualizer.
Prompts
Added a bundled TLDR prompt.
Localization
Corrected repeated missing French accents in touched provider, bug-report, preview, AI-error, licensing, and visualizer copy.
Fixed
Fixed the Lite + Full release wrapper upload preflight so patch, minor, and major version bumps validate against the complete target version before builds start.
Added a release-source guard so upload-enabled builds can fail before building when the top changelog entry does not declare the HCRs for the commits being released.
Documentation
Clarified the HCR workflow so each commit has one current plain-English recap and release changelogs are synthesized from those recaps.
Updated audio, STT, configuration, IPC, bug-report, update, usage, and release documentation for the current behavior.
1.0.2
Speech
Improved cloud speech-to-text quality and sentence handling for live dictation.
Tuned Gladia Solaria realtime transcription for cleaner final text.
Simplified Cartesia realtime transcription back to the provider-supported websocket path after live testing showed punctuation workarounds were not reliable enough.
Added repeatable cloud STT benchmark reporting so provider quality changes can be compared against real transcripts.
AI
Added MiniMax OAuth support to the packaged backend build path.
Kept OpenAI OAuth reconnect and provider handling aligned with packaged release checks.
Fixed
Fixed backend restart reconnect behavior in development so a manually restarted backend can receive startup STT load permission again.
Fixed release packaging guards so all backend modules are included in Lite, Full, and PyTorch frozen builds.
Fixed the Lite + Full release wrapper so Windows command labels resolve reliably and upload-only release metadata problems are caught before backend and installer builds start.
Documentation
Expanded release, STT provider, and benchmark documentation for the current cloud ASR behavior.
Added release-build notes for batch line endings and PyInstaller hidden-import checks.
1.0.1
Added
Initial public release
Floating Windows dock with global hotkeys for dictation, AI commands, Ask Mode, and prompt-backed actions
Hands-free streaming dictation and instant Press & Hold capture modes
Windows system-sound capture via WASAPI loopback, alongside microphone device selection
Live transcription preview window with adjustable font size and auto-hide timing
Persistent recording history for dictation sessions and AI voice commands
Speech
Local STT engines: Parakeet V3, MedASR, Faster Whisper, and Kyutai Moshi