mkisel.com / Software / Dictation
Dictation
Dictation
Turn a photo of a textbook page into a spoken school dictation — free for Windows
Free Windows 10/11 Python / PySide6

🌐 Try in your browser

No installation — open it on your phone, snap a photo of the textbook page, and the app reads it aloud as a dictation. Works on iOS, Android and desktop. Uses Microsoft Edge neural voices online.

Download for Windows

Full — bundles Tesseract OCR with Russian, English and Thai data, Piper TTS and two Russian neural voices. Works offline (Thai voices use the online Edge engine).
Lite — smaller package; Tesseract must be installed system-wide, and Piper voices download on first use.
Signed with a code-signing certificate.

About

Dictation is a free desktop application for Windows built around a single use-case: a parent or teacher photographs a page from a textbook, and the app reads it aloud as a school dictation. The text is broken into sentences, then into shorter parts, and finally word-by-word — with configurable pauses at every level — so a child can write at their own pace.

It works in Russian, English and Thai. Thai is written without spaces between words, so the app segments it into real words (via pythainlp) and dictates them one by one — and automatically stretches the pauses, since Thai handwriting takes longer.

The Full edition bundles Tesseract OCR (Russian, English, Thai) and Piper neural text-to-speech with two Russian voices, so Russian and English work with no internet. Thai speech uses Microsoft's online Edge voices.

Features

  • Add pages by file, paste from clipboard, drag-and-drop or live webcam capture
  • Multi-page support — thumbnail strip at the bottom, per-page crop area
  • Rubber-band crop selection — focus OCR on the relevant text block
  • OCR via bundled Tesseract 5 — Russian, English and Thai, with automatic script detection
  • Thai word segmentation (pythainlp) — spaceless Thai is split into real words for dictation
  • Spell-check with red underline and right-click fixes (Russian + English, Hunspell)
  • Image preprocessing (deskew, threshold) for cleaner OCR on photos
  • Recognized text appears in an editable panel — fix typos before dictating
  • Cascade dictation: whole sentence → parts → individual words (short sentences repeated whole)
  • Adjustable pause slider up to 10 seconds — auto-extended for Thai handwriting
  • Streaming synthesis — playback starts on the first fragment, no waiting
  • Change voice or speech rate live, without interrupting playback
  • Three TTS engines: Edge TTS (Microsoft neural voices incl. Thai, online), Piper (offline neural), Windows SAPI
  • Trilingual UI — Russian, English and Thai, switchable on the fly

How it works

Take a photo of the textbook page with your phone, paste it into the app (or open it from a file, or use the built-in webcam capture). Drag a rectangle over the text you want to dictate — that crop is remembered per page, so a single dictation can span several pages of a textbook.

Press Recognize and the bundled Tesseract OCR converts the image to text in the right-hand panel. You can edit anything that came out wrong before starting the dictation.

Press Start (or F5) and the cascade begins: for every sentence, the app speaks the full sentence, then pauses; speaks each comma-delimited part, with a shorter pause; then speaks each word individually, with the shortest pause. Use the slider to make the pauses longer for younger children, shorter for older ones — the speech rate itself never changes, only the silence between fragments.

System Requirements

Requirement Full edition Lite edition
Operating system Windows 10 or Windows 11 (64-bit)
Tesseract OCR (Russian, English, Thai) Bundled Must be installed system-wide
Piper TTS + Russian voices Bundled (irina + ruslan) Downloads on first use (~60 MB / voice)
Thai text-to-speech Online Edge voices (Premwadee / Niwat) — needs an internet connection
Internet connection Only for Edge TTS (incl. all Thai speech) For Edge TTS, Thai speech, and first voice download
Microphone / webcam Webcam optional — for capturing textbook pages without a phone