About ANEMLL
ANEMLL is focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE). Our mission is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures.
Enabling seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security. Critical for autonomous applications where models run directly on the device without requiring an internet connection.
Project Goals
- Provide flexible and easy to use library/framework to port LLMs to ANE directly from Hugging Face models
- Provide on-device examples for iOS and macOS swift or C/C++ Applications
- Enable seamless integration for privacy-focused edge computing
Core Components
Model Conversion
Convert LLMs directly from Hugging Face to CoreML format, optimized for Apple Neural Engine. Currently supporting LLAMA models including DeepSeek distilled variants.
ANE Optimization
Specialized model splitting and optimization for iOS (1GB) and macOS (2GB) constraints, with efficient handling of Embeddings, Feed Forward Networks, and LM Head components.
Inference Tools
Ready-to-use chat interfaces for testing and deployment, featuring conversation history management, context window handling, and performance monitoring.
System Requirements
- • macOS Sequoia with Apple Neural Engine
- • Minimum 16GB RAM
- • Python 3.9
Current Release
Alpha Release 0.1.1 featuring:
- • Single-shot model conversion
- • Simplified model configuration
- • Automated Hugging Face distribution preparation
- • Enhanced Chat Interfaces with improved error handling
- • Improved LLaMA model with prefill optimization