Run quantized large language models directly on your iPhone
Local LLM: MITHRIL
What is it about?
Run quantized large language models directly on your iPhone. No cloud, no internet required.
App Store Description
Run quantized large language models directly on your iPhone. No cloud, no internet required.
Access state-of-the-art quantized AI models optimized for mobile hardware. Download GGUF-format models that compress billion-parameter networks into mobile-friendly sizes while maintaining performance.
COMPLETE MODEL SUITE
• Llama 3.2 1B/3B (Meta) - Q4/Q8 quantization
• Gemma 3 270M/2B/9B (Google) - IQ4_NL optimization
• Qwen 2.5 0.5B-7B (Alibaba) - Multiple quantization levels
• LLaVA 1.5/1.6 (Vision) - Multimodal image understanding
• Direct integration with Hugging Face model repository
TECHNICAL FEATURES
• GGML/llama.cpp inference engine
• Metal GPU acceleration on Apple Silicon
• Dynamic context window management (2K-8K tokens)
• Retrieval-Augmented Generation (RAG) with embeddings
• Real-time streaming with token/second metrics
• SQLite conversation storage with vector search
SYSTEM REQUIREMENTS
Models run efficiently when file size ≤ available RAM. Recommended minimum 6GB RAM for larger models. iPhone 15 Pro/Pro Max optimal. iOS26 for Apple foundation model.
Zero telemetry. Zero data transmission. Pure local AI computing.
AppAdvice does not own this application and only provides images and links contained in the iTunes Search API, to help our users find the best apps to download. If you are the developer of this app and would like your information removed, please send a request to takedown@appadvice.com and your information will be removed.