ANEMLL

Run LLMs on the Apple Neural Engine.
Open source Library and Pipeline from HuggingFace to on-device inference.

(pronounced like "animal")

Anemll v0.3.5 Beta

The flagship library for porting Large Language Models to the Apple Neural Engine (ANE). Convert models directly from HuggingFace to CoreML, optimized for ANE tensor processing. Includes Swift and Python inference, iOS/macOS/visionOS sample apps, and a full conversion pipeline — all targeting low-power, on-device, fully private AI.

Gemma 3 (270M–4B) LLaMA 3.1/3.2 (1B–8B) Qwen 3 (0.6B–1.7B) Qwen 2.5 DeepSeek R1 (distill) DeepHermes (distill)

1,568 stars

1,089 views

242 clones

ANE-First

Optimized for Apple Neural Engine tensor processing. Specialized model splitting for iOS (1GB) and macOS (2GB) constraints.

HF → CoreML

Single-shot conversion from HuggingFace weights to CoreML. Auto-download, FP16, LUT quantization, monolithic and chunked modes.

Swift + Python

Reference inference in both Swift CLI and Python. IOSurface-backed buffers, serial prediction, ring buffer patterns for ANE stability.

ANEMLL Chat

Redesigned iOS/macOS/visionOS app with voice input, AirDrop sharing, Markdown rendering, and thinking mode. On TestFlight now.

GitHub Models on HuggingFace TestFlight Beta

Companion Tool

anemll-profile

ANE CostModel profiler for CoreML models. Analyze the compute plan without Xcode — see which ops land on ANE vs CPU vs GPU, find bottlenecks, and verify hardware compatibility. Per-op runtime breakdown, GFLOP/s, bandwidth, and CPU/GPU fallback reasons. Ideal for AI agents and automated pipelines — CLI-first, no GUI required.

                        $ brew install anemll/tap/anemll-profile
                        click to copy
                    

                        $ brew upgrade anemll/tap/anemll-profile
                        click to copy
                    

27 Objective-C

ANE Hardware

anemll-bench

Apple Neural Engine bandwidth and performance benchmarks. Measure real ANE vs GPU vs CPU throughput across Apple Silicon devices — understand what the hardware can actually deliver.

49 62 Python

On the App Store (Beta)

TestFlight Apps

ANEMLL Chat

iOS · macOS · visionOS

On-device LLM chat powered by Apple Neural Engine. Voice input, AirDrop model sharing, Markdown rendering, thinking mode. Runs models locally with full privacy. Full source included in the Anemll project.

Join TestFlight

ANEMLL Claw

iOS · macOS · tvOS

AI-powered coding assistant for iOS. Intelligent codebase navigation, code generation, and development tools — all on your iPhone.

Join TestFlight Source

Flash-MoE Ecosystem

Run massive Mixture-of-Experts models on Apple devices — Mac, iPhone, and more. A family of projects that make the impossible possible.

Open Source

ANEMLL

Anemll v0.3.5 Beta

ANE-First

HF → CoreML

Swift + Python

ANEMLL Chat

anemll-profile

anemll-bench

TestFlight Apps

ANEMLL Chat

ANEMLL Claw

Flash-MoE Ecosystem

flash-moe

anemll-flash-llama.cpp

Flash-iOS

anemll-flash-mlx

More Projects

mlx-rdma

anemllclaw

anemll_macOS_agent

anemll-thunderbolt-BFP

Get Involved

ANEMLL

Anemll v0.3.5 Beta

ANE-First

HF → CoreML

Swift + Python

ANEMLL Chat

TestFlight Apps

ANEMLL Chat

ANEMLL Claw

Most Active Project

Flash-MoE Ecosystem

More Projects

Get Involved