ANEMLL - Artificial Neural Engine Machine Learning Library

About ANEMLL

ANEMLL is focused on accelerating the porting of Large Language Models (LLMs) to tensor processors, starting with the Apple Neural Engine (ANE). Our mission is to provide a fully open-source pipeline from model conversion to inference for common LLM architectures.

Enabling seamless integration and on-device inference for low-power applications on edge devices, ensuring maximum privacy and security. Critical for autonomous applications where models run directly on the device without requiring an internet connection.

Project Goals

Provide flexible and easy to use library/framework to port LLMs to ANE directly from Hugging Face models
Provide on-device examples for iOS and macOS swift or C/C++ Applications
Enable seamless integration for privacy-focused edge computing

Core Components

Model Conversion

Convert LLMs directly from Hugging Face to CoreML format, optimized for Apple Neural Engine. Currently supporting LLAMA models including DeepSeek distilled variants.

ANE Optimization

Specialized model splitting and optimization for iOS (1GB) and macOS (2GB) constraints, with efficient handling of Embeddings, Feed Forward Networks, and LM Head components.

Inference Tools

Ready-to-use chat interfaces for testing and deployment, featuring conversation history management, context window handling, and performance monitoring.

System Requirements

• macOS Sequoia with Apple Neural Engine
• Minimum 16GB RAM
• Python 3.9

Current Release

Alpha Release 0.1.1 featuring:

• Single-shot model conversion
• Simplified model configuration
• Automated Hugging Face distribution preparation
• Enhanced Chat Interfaces with improved error handling
• Improved LLaMA model with prefill optimization