ANEMLL

Run LLMs on the Apple Neural Engine.
Open source Library and Pipeline from HuggingFace to on-device inference.

(pronounced like "animal")

Anemll v0.3.5 Beta

The flagship library for porting Large Language Models to the Apple Neural Engine (ANE). Convert models directly from HuggingFace to CoreML, optimized for ANE tensor processing. Includes Swift and Python inference, iOS/macOS/visionOS sample apps, and a full conversion pipeline — all targeting low-power, on-device, fully private AI.

Gemma 3 (270M–4B) LLaMA 3.1/3.2 (1B–8B) Qwen 3 (0.6B–1.7B) Qwen 2.5 DeepSeek R1 (distill) DeepHermes (distill)
1,568 stars
1,089 views
242 clones
ANE-First

Optimized for Apple Neural Engine tensor processing. Specialized model splitting for iOS (1GB) and macOS (2GB) constraints.

HF → CoreML

Single-shot conversion from HuggingFace weights to CoreML. Auto-download, FP16, LUT quantization, monolithic and chunked modes.

Swift + Python

Reference inference in both Swift CLI and Python. IOSurface-backed buffers, serial prediction, ring buffer patterns for ANE stability.

ANEMLL Chat

Redesigned iOS/macOS/visionOS app with voice input, AirDrop sharing, Markdown rendering, and thinking mode. On TestFlight now.

Companion Tool

anemll-profile

ANE CostModel profiler for CoreML models. Analyze the compute plan without Xcode — see which ops land on ANE vs CPU vs GPU, find bottlenecks, and verify hardware compatibility. Per-op runtime breakdown, GFLOP/s, bandwidth, and CPU/GPU fallback reasons. Ideal for AI agents and automated pipelines — CLI-first, no GUI required.

$ brew install anemll/tap/anemll-profile click to copy
$ brew upgrade anemll/tap/anemll-profile click to copy
27 Objective-C
ANE Hardware

anemll-bench

Apple Neural Engine bandwidth and performance benchmarks. Measure real ANE vs GPU vs CPU throughput across Apple Silicon devices — understand what the hardware can actually deliver.

49 62 Python

TestFlight Apps

ANEMLL Chat

iOS · macOS · visionOS

On-device LLM chat powered by Apple Neural Engine. Voice input, AirDrop model sharing, Markdown rendering, thinking mode. Runs models locally with full privacy. Full source included in the Anemll project.

Join TestFlight
ANEMLL Claw

ANEMLL Claw

iOS · macOS · tvOS

AI-powered coding assistant for iOS. Intelligent codebase navigation, code generation, and development tools — all on your iPhone.

Flash-MoE Ecosystem

Run massive Mixture-of-Experts models on Apple devices — Mac, iPhone, and more. A family of projects that make the impossible possible.

The core Flash-MoE research repo — testing different MoE algorithms, memory management strategies, and expert routing on Apple Silicon.

195 985 84

Flash-MoE sidecar slot-bank runtime for large GGUF MoE models on Apple Silicon — llama.cpp fork.

39 1,303 727

Flash-MoE iOS — first 400B model running on iPhone. Based on @Alexintosh's work. Adds macOS compat, iOS memory entitlements, fanout I/O, and pread chunking.

45 155 39 See demo

Auto-research tuned MLX fork optimized for Flash-MoE experiments — fast MoE inference and algorithm prototyping on Apple's ML stack.

28 464 101

More Projects

Experiments with MLX and RDMA — pushing the boundaries of distributed ML on Apple hardware.

68 115

AI-powered iOS coding assistant — intelligent codebase navigation and development tools for mobile.

28

Visual development agent for macOS — fast UI iterations via iPhone Mirroring, screen capture, and automated interaction loops.

12

Thunderbolt block floating point experiments — high-bandwidth ML interconnects.

5

Get Involved