[Release Notes]

Changelog

Newest releases first. Older releases follow below.

1.0.7

Speech & Dictation

  • Added Cartesia Ink 2 as the default English Cartesia STT path, while keeping Ink Whisper available as a separate multilingual option with its own realtime protocol.
  • Expanded Faster Whisper model choices to Tiny, Base, Small, Medium, Large V3, and Large V3 Turbo, with model-specific cache/download handling and clearer UI descriptions.
  • Improved cloud STT vocabulary biasing with larger provider-specific budgets, clearer truncation behavior, and an explicit policy for terms that already have local correction variants.
  • Added an optional Follow Cursor Screen setting so the dock can move to the cursor's monitor when recording starts.
  • Increased recording-stop patience for slower speech sessions so finalization has more time to complete cleanly.

Prompts, Context & Vocabulary

  • Added the missing Context picker to Smart Dictation prompt settings, using the same prompt-to-Context association already used by Smart Actions.
  • Added AI-assisted vocabulary variation generation inside the dictionary editor, so users can draft likely misheard forms before explicitly saving them.
  • Simplified vocabulary guidance around the in-editor generation button and removed the obsolete vocabulary tutorial path and bundled vocabulary-prompt default.
  • Synced shipped prompt and Context assets with first-run defaults, onboarding fallback labels, default hotkeys, and prompt documentation.

Desktop, Settings & Onboarding

  • Added reusable Settings help tooltips with hover, keyboard focus, click-to-pin, outside-click close, and Escape close behavior.
  • Cleaned Settings copy and layout around License Activation, About contact cards, active-application prompt assignments, AI provider overrides, and Backspace cancel guidance for hold-style dictation.
  • Reworked the AI tutorial around prompt visibility screenshots, in-place demo hotkey editing, a shared two-column demo layout, and modal dismissal that avoids accidental backdrop exits.
  • Added the Smart Dictation Context modal UI and aligned Smart Actions and Smart Dictation modal label/icon treatment.
  • Tightened dock startup and reconnect behavior so startup expansion, Electron window shape, backend readiness, and collapsed-pill state stay in sync.
  • Fixed a dock startup expansion regression that could leave the dock open after readiness and added lifecycle coverage for reload and reconnect cases.

Performance & Release

  • Added Electron-main caches for prompts, STT models, and settings so popup menus can paint from known data while refreshing in the background.
  • Reduced renderer startup and Settings work by route-splitting styles, lazy-loading window surfaces, keeping only active/recent Settings sections mounted, and fetching prompt editor file content on demand.
  • Removed duplicated onboarding launcher cards, the general Twemoji parser dependency, stale dock startup timing logs, and unnecessary popup/window shadows.
  • Updated documentation for the cache handles, Settings loading behavior, tutorial launcher ownership, flag rendering, and startup logging expectations.
  • Repaired changelog source coverage for the release range and regenerated landing-page changelog data for the normal `00_build_release_lite_full.bat` patch-bump flow.

1.0.6

Speech & Dictation

  • Reworked Windows microphone capture around native shared-mode WASAPI, with a smaller logical device picker, faster default-mic startup, raw/event capture, DC filtering, stronger mono handling, and the same 24 kHz STT queue contract.
  • Improved Press & Hold finalization with clearer `release-drain` behavior, better Faster Whisper tail capture, restored Parakeet release flushing, and safer handling of quiet short phrases.
  • Added a lightweight default Parakeet model option plus an optional full-precision model, and changed live Parakeet batching so empty timed output keeps captured audio pending instead of dropping it or retrying the same buffer forever.
  • Removed the legacy MedASR engine from backend routing, build specs, dependencies, UI metadata, onboarding copy, tests, and documentation.
  • Tightened STT model switching so overlapping requests do not double-unload processors or start extra replacements.
  • Updated speech-engine metadata, badges, and resource labels so local CPU engines and cloud engines are described more accurately.

AI & Providers

  • Split LLM request orchestration into clearer owners, including a dedicated payload tracer, structured prompt payload sections, and a smaller LLM processor.
  • Improved provider error handling so Anthropic token budgets, streaming errors, empty responses, and secret redaction produce clearer backend diagnostics and safer user-facing messages.
  • Hardened OpenAI, Qwen, and MiniMax OAuth persistence so sign-in and refresh success are only reported after encrypted session storage succeeds.
  • Made AI model-list refresh more honest: failed fetches report visible error codes, reachable providers with zero models are cached as real results, and OAuth cache refresh failures are logged.
  • Kept saved API-key status visible even when OAuth is active, while preserving OAuth as the active credential.
  • Fixed Ask Mode so the preview hotkey always opens AI Preview, even when the active Smart Action already has preview enabled.

Prompts, Vocabulary & Workflow

  • Preserved valid Windows prompt and Context filename characters such as `+`, `&`, and emoji instead of silently deleting them.
  • Made prompt and Context creation reject duplicate names before writing placeholder files.
  • Tightened prompt manager behavior around file watching, prompt listing, linked Context resolution, rename safety, and prompt-backed LLM payloads.
  • Shared one backend `.txt` filename helper across prompts, Context files, and vocabulary dictionaries.
  • Hardened vocabulary dictionary loading and settings persistence so malformed enabled-dictionary names are rejected and dictionary files cannot drift from saved settings after a failed save.
  • Added optional recording-history audio sidecars so users who enable it can replay the original captured audio without forcing WAV accumulation for everyone.

Desktop & UI

  • Reduced dock startup work by lazy-loading non-dock routes and centralizing renderer IPC subscriptions so shared app events no longer create excess Electron listeners.
  • Reworked dock expand/collapse behavior into one lifecycle owner, improving shape synchronization and reducing brittle hover/animation state.
  • Restored preview notifications after the startup optimization so failed API calls, empty speech, and other preview toasts have a mounted listener.
  • Cleaned the Smart Dictation app picker by hiding obvious Windows/AppX helper entries and folding duplicate Start Menu, Desktop, and registry variants into one visible row.
  • Avoided heavier window-title capture when title context is disabled, so normal external app titles are not kept in raw context snapshots.
  • Kept high-frequency capture and Parakeet diagnostics out of normal terminal output while preserving opt-in debug categories.

Reliability, Settings & Licensing

  • Made settings persistence publish to memory only after disk writes succeed, centralized user-data path ownership, and tightened validation for vocabulary terms and AI quick lists.
  • Reduced settings-save side effects so background work only starts when vocabulary or Windows startup settings actually need it.
  • Made the Python-to-Electron bridge less likely to freeze during file, model, history, vocabulary, OAuth, or long-running LLM work.
  • Fixed backend readiness reporting so startup does not claim the app is ready just because the speech processor thread launched.
  • Improved clipboard paste cleanup, hotkey edge-case handling, and recording history path safety.
  • Made license activation/authorization treat secure-storage persistence as part of the state transition, and added timeout hardening for Dodo licensing calls.

Maintenance & Release

  • Added report-first backend audit checkpoints and applied fixes across audio manager, model manager, LLM providers, OpenAI OAuth, user messages, bridge handling, hotkeys, settings, vocabulary, STT manager, recording history, API-key manager, Dodo, and logging.
  • Updated backend packaging specs for new modules and removed obsolete MedASR packaging entries.
  • Added UTF-8 and PowerShell 7 repository defaults to reduce Windows text-encoding damage.
  • Cleaned stale documentation clutter and moved loose task/research material into the current docs structure.
  • Made same-version release redeploys skip changelog validation/upload/verification while keeping bumped or custom versions strict.
  • Prepared the 1.0.6 changelog source block and generated landing-page changelog data for the normal `00_build_release_lite_full.bat` patch-bump flow.

1.0.5

AI

  • Added xAI/Grok as an API-key AI provider with live language-model discovery, setup links, image-capability metadata, and refreshed default quick-list ordering.
  • Split LLM requests into clearly labeled sections so prompts, clipboard text, dictation transcripts, system context, and app context stay distinct.
  • Added workflow-specific system prompts for Smart Actions, Smart Dictation, AI Voice Command, and preview responses, while keeping normal paste output plain by default.

Prompts

  • Changed the dock prompt menu so Smart Actions and Smart Dictation appear side by side in one wider popup.
  • Kept each prompt surface's selection, overrides, hotkeys, preview toggle, and app-binding controls independent inside the shared menu.

Speech

  • Made recording cancellation stricter so delayed STT output after cancel is dropped instead of reaching history, preview, paste, or the target application.

Desktop

  • Kept the dock in its spinner-only startup placeholder until identity settings are hydrated, avoiding a brief fallback READY/red-dot shell during backend-first startup.
  • Improved LLM terminal tracing with one redacted request payload block that preserves section order and line breaks for debugging.

Licensing

  • Added Dodo stablecoin checkout support for lifetime purchases while leaving subscription checkout on the previous subscription-safe payload.

Support & Release

  • Prepared the 1.0.5 app version and landing-page changelog data before the build.
  • Fixed backend packaging specs so the new LLM system-prompt module is included in Lite, Full, and PyTorch frozen builds.
  • Documented release changelog preparation in `AGENTS.md` as part of the standard build procedure.

1.0.4

Features

  • Added per-prompt usage controls so a prompt can appear in Smart Actions, Smart Dictation, or both.
  • Added shared prompt-usage rules across Smart Actions, Smart Dictation, the dock prompt menu, dedicated action hotkeys, and Smart Dictation routing.
  • Added richer audio-device and recording-attempt diagnostics to bug reports without storing raw audio or transcript text.
  • Added session caching for the installed-app catalog while keeping manual refresh available for Smart Dictation app assignment.

Fixes

  • Fixed missing final words when releasing Press & Hold by draining final audio and provider responses more carefully across Kyutai, Deepgram, and Speechmatics.
  • Fixed Speechmatics stream shutdown while keeping Standard and Enhanced mapped to the intended provider models.
  • Fixed microphone selection after device changes by refreshing device metadata before recording starts.
  • Fixed prompt compatibility for existing users by treating missing `prompt_usage` settings as visible in both prompt surfaces.
  • Fixed the Lite + Full release wrapper so release notes are checked before version or upload questions appear.
  • Fixed release prep for older release-range commits with missing HCR metadata without rewriting pushed history.

Improvements

  • Improved speech-engine comparison badges, scores, and resource labels so local CPU engines and cloud engines are described more accurately.
  • Improved notification routing by moving full messages into the transcription preview overlay while keeping the dock focused on recording, timing, and compact status.
  • Improved Smart Dictation app assignment so the installed-app picker feels faster and visually steadier.
  • Improved fresh-install AI defaults so new users start on the current recommended fast OpenAI model.
  • Improved prompt wording by renaming user-facing "System Info" copy to "Context" while keeping existing internal files and settings compatible.
  • Improved prompt, configuration, hotkey, STT, Speechmatics, build/release, and style documentation for the current behavior.

1.0.3

Dictation

  • Made Press & Hold paste feedback faster by returning success as soon as text is pasted, while clipboard restoration continues safely in the background.
  • Fixed a Parakeet quick-release case where clearly captured speech could be discarded before the first sentence update, causing a false "No speech captured" result.
  • Kept the empty-speech warning readable while shortening only the successful "Pasted!" feedback.
  • Improved the light reformat prompt so noisy transcripts with mumbling, stutters, repetitions, and self-corrections are cleaned more naturally.

Support & Updates

  • Added recent recording-attempt diagnostics to bug reports, including stream start state, chunk counts, audio-energy summaries, speech-like activity, and transcript length without storing raw audio or transcript text.
  • Added the current recognized audio device list to bug reports, including microphone and WASAPI loopback entries, so support can see what the app detected before or after a failed recording.
  • Made startup update checks persist in the About panel even when Settings was closed, added a visible About-tab update indicator, and added a timeout fallback so update checks do not stay stuck forever.
  • Moved Report an Issue to the top of About, made it more prominent, and enlarged the bug-report modal for longer descriptions.

Appearance

  • Removed the disabled None visualizer option so recording always has an active visual effect.
  • Migrated old saved none visualizer settings back to the default ripple visualizer.

Prompts

  • Added a bundled TLDR prompt.

Localization

  • Corrected repeated missing French accents in touched provider, bug-report, preview, AI-error, licensing, and visualizer copy.

Fixed

  • Fixed the Lite + Full release wrapper upload preflight so patch, minor, and major version bumps validate against the complete target version before builds start.
  • Added a release-source guard so upload-enabled builds can fail before building when the top changelog entry does not declare the HCRs for the commits being released.

Documentation

  • Clarified the HCR workflow so each commit has one current plain-English recap and release changelogs are synthesized from those recaps.
  • Updated audio, STT, configuration, IPC, bug-report, update, usage, and release documentation for the current behavior.

1.0.2

Speech

  • Improved cloud speech-to-text quality and sentence handling for live dictation.
  • Tuned Gladia Solaria realtime transcription for cleaner final text.
  • Simplified Cartesia realtime transcription back to the provider-supported websocket path after live testing showed punctuation workarounds were not reliable enough.
  • Added repeatable cloud STT benchmark reporting so provider quality changes can be compared against real transcripts.

AI

  • Added MiniMax OAuth support to the packaged backend build path.
  • Kept OpenAI OAuth reconnect and provider handling aligned with packaged release checks.

Fixed

  • Fixed backend restart reconnect behavior in development so a manually restarted backend can receive startup STT load permission again.
  • Fixed release packaging guards so all backend modules are included in Lite, Full, and PyTorch frozen builds.
  • Fixed the Lite + Full release wrapper so Windows command labels resolve reliably and upload-only release metadata problems are caught before backend and installer builds start.

Documentation

  • Expanded release, STT provider, and benchmark documentation for the current cloud ASR behavior.
  • Added release-build notes for batch line endings and PyInstaller hidden-import checks.

1.0.1

Added

  • Initial public release
  • Floating Windows dock with global hotkeys for dictation, AI commands, Ask Mode, and prompt-backed actions
  • Hands-free streaming dictation and instant Press & Hold capture modes
  • Windows system-sound capture via WASAPI loopback, alongside microphone device selection
  • Live transcription preview window with adjustable font size and auto-hide timing
  • Persistent recording history for dictation sessions and AI voice commands

Speech

  • Local STT engines: Parakeet V3, MedASR, Faster Whisper, and Kyutai Moshi
  • Cloud STT providers: AssemblyAI, Deepgram, Cartesia, Gradium, Gladia, Speechmatics, and Groq Whisper
  • Hot engine switching without restarting the app
  • Vocabulary correction with editable dictionaries and file-watch reloads
  • Dock waveform and bar visualizers driven by live RMS audio data

AI

  • AI provider support for OpenAI, Anthropic, Google, DeepSeek, OpenRouter, Cerebras, Groq, Alibaba, Zhipu, Perplexity, Mistral, MiniMax, and Moonshot
  • Local AI runtimes via Ollama and LM Studio
  • OpenAI OAuth sign-in for ChatGPT subscriptions, alongside standard API-key auth
  • AI Quick Lists for speed, intelligence, local models, and custom model shortcuts
  • AI Voice Command hold mode for spoken instructions over clipboard text or images
  • Provider-side web search through OpenAI OAuth/Codex and Perplexity
  • Image understanding for clipboard screenshots and photos with multimodal models
  • Image generation via Replicate, with results copied back to the clipboard
  • Rich AI Response Preview window with markdown, code highlighting, tables, math, and Mermaid diagrams

Workflow

  • Smart Actions with active prompt selection, dedicated action hotkeys, linked system info files, and per-prompt provider and model overrides
  • Smart Dictation on release, so Press & Hold transcripts can be rewritten before paste
  • Per-prompt AI Preview toggles for actions that should open the preview window automatically
  • Prompt and system-info libraries stored as editable text files, with live file watching and restore-default support
  • Bundled prompt pack covering rewriting, translation, email replies, image extraction, link cleanup, YouTube summaries, and vocabulary-term creation

Desktop

  • Dock mode with collapse and expand behavior, always-on-top control, top or bottom placement, and opacity settings
  • Theme customization with dark and light mode, curated backgrounds, accent colors, border-radius styles, and visualizer presets
  • Clipboard-preserving burst paste flow for dictation and AI output
  • Automatic updater delivery through Electron updater, Cloudflare R2, and the MachinesFluent Worker
  • GPU unload and reload controls for supported local models