You are using an outdated browser. Please upgrade your browser to improve your experience.
Whisper Direct

WhisperDirect is a high-accuracy speech-to-text and summarization app that works with your own API key

WhisperDirect is a high-accuracy speech-to-text and summarization app that works with your own API key

Whisper Direct

by koji ozono

What is it about?

WhisperDirect is a high-accuracy speech-to-text and summarization app that works with your own API key.

App Details

Version
1.1.20
Rating
NA
Size
33Mb
Genre
Productivity Utilities
Last updated
October 17, 2025
Release date
July 16, 2025
More info

App Store Description

WhisperDirect is a high-accuracy speech-to-text and summarization app that works with your own API key.
No subscription required — you only pay OpenAI’s usage fees when you need it, making it more cost-effective than subscription-based apps.

Pricing & Trial
• Free trial: 5 sessions included
• After the trial: one-time in-app purchase unlocks unlimited use of current features
• API usage billed directly by OpenAI (the app does not charge for API usage)

Cost Guide
• With $5, you can transcribe about 14 hours of audio
• Whisper API ≈ $0.006 per minute (≈ $0.36 per hour)
• OpenAI API pricing → https://openai.com/ja-JP/api/pricing/

Models for summaries and meeting minutes
Choose from compact, low-cost models:
• GPT-4.1-nano
• GPT-4.1-mini
• GPT-5-nano
• GPT-5-mini
Even long texts (1,000–2,000 words) can usually be processed for just a few cents per run.

Main Features
• Record with the microphone button and instantly convert to text
• Import audio files (or directly from the share sheet)
• Import video files (audio extracted and compressed automatically)
• Playback-synced highlighting of transcript segments
• Insert timeline markers (configurable in 5-second steps)
• Generate summaries and meeting minutes (prompts editable in Settings)
• OCR transcription from images (supports multiple images, all processed locally with no extra API cost)
• Export audio, text, summaries, minutes, or subtitles (VTT / SRT)
• Automatically post transcripts/summaries/minutes to Slack
• Estimate costs in Settings (based on audio length and character count)
• Other customization options (LLM model, timeline interval, prompts, etc.)

Supported formats
Audio: mp3, m4a, aac, wav, flac, ogg, opus, wma, amr, mpga, webm, aiff, caf
Video: mp4, mov, m4v, webm, mkv, avi, mpeg, mpg

Notes
• An API key (such as OpenAI) is required
• Pricing and available models may change according to OpenAI’s offerings

Disclaimer:
AppAdvice does not own this application and only provides images and links contained in the iTunes Search API, to help our users find the best apps to download. If you are the developer of this app and would like your information removed, please send a request to takedown@appadvice.com and your information will be removed.