iPhone iPad

FREE in the App Store

What is it about?

Run quantized large language models directly on your iPhone. No cloud, no internet required.

App Details

Version

1.6.2

Rating

(3)

Size

51Mb

Genre

Productivity Developer Tools

Last updated

October 13, 2025

Release date

September 8, 2025

More info

App Store Description

Run quantized large language models directly on your iPhone. No cloud, no internet required.

Access state-of-the-art quantized AI models optimized for mobile hardware. Download GGUF-format models that compress billion-parameter networks into mobile-friendly sizes while maintaining performance.

COMPLETE MODEL SUITE
• Llama 3.2 1B/3B (Meta) - Q4/Q8 quantization
• Gemma 3 270M/2B/9B (Google) - IQ4_NL optimization
• Qwen 2.5 0.5B-7B (Alibaba) - Multiple quantization levels
• LLaVA 1.5/1.6 (Vision) - Multimodal image understanding
• Direct integration with Hugging Face model repository

TECHNICAL FEATURES
• GGML/llama.cpp inference engine
• Metal GPU acceleration on Apple Silicon
• Dynamic context window management (2K-8K tokens)
• Retrieval-Augmented Generation (RAG) with embeddings
• Real-time streaming with token/second metrics
• SQLite conversation storage with vector search

SYSTEM REQUIREMENTS
Models run efficiently when file size ≤ available RAM. Recommended minimum 6GB RAM for larger models. iPhone 15 Pro/Pro Max optimal. iOS26 for Apple foundation model.

Zero telemetry. Zero data transmission. Pure local AI computing.

Get it for FREE in the App Store

Disclaimer:
AppAdvice does not own this application and only provides images and links contained in the iTunes Search API, to help our users find the best apps to download. If you are the developer of this app and would like your information removed, please send a request to takedown@appadvice.com and your information will be removed.