CUDA for Apple Neural Engine

Stop letting ANE go unused. Compile any model, target ANE directly, get 10x efficiency.

Get Started →View on GitHub

Why momo-kiji?

The Problem

Apple Neural Engine is on every Apple device. Yet most developers ignore it. Why?

✗ CoreML is limited and locked
✗ No direct ANE access
✗ Can't compile your own models
✗ Efficiency left on the table

The Solution

momo-kiji brings ANE into the open. Compile any model, target ANE directly.

✓ Direct ANE compilation
✓ Bring your own models
✓ 10x better efficiency
✓ Open source, MIT licensed

Features

🎯

Direct ANE

Bypass CoreML. Compile directly to ANE.

⚡

10x Efficiency

Specialized hardware acceleration on every Apple device.

📱

macOS & iOS

Target both platforms with a single toolchain.

🔄

Multi-Format

ONNX, PyTorch, TensorFlow input support.

📊

Auto Quantization

Automatic INT8 and FP16 quantization.

🛠

Python API

Simple, intuitive Python interface.

🔺

3-Tier Speculative Decoding

92% quality at $5-10/month with pyramid architecture.

Learn more →

Quick Start

# Install
pip install momo-kiji

# Compile a model
momo-kiji compile model.onnx \
  --target ane \
  --output model_ane.mlmodel

# Use in your app
import momo_kiji as mk
model = mk.load("model_ane.mlmodel")
output = model.predict(input_data)

Research Foundation

momo-kiji is built on the latest peer-reviewed research in neural engine optimization. Our compiler architecture and optimization strategies are informed by cutting-edge academic work.

📄

Orion Paper

Characterizing Apple Neural Engine

Latest research revealing ANE architecture, optimization opportunities, and compiler design principles that power momo-kiji.

Ready to compile for ANE?

Start with the documentation or jump into the GitHub repository.

Read Docs →View GitHub Join Discord