Unlocking the Black Box: Researchers Crack Apple's Neural Engine for AI Training

A new breakthrough bypasses Apple's restrictions to enable hyper-efficient local model development on Mac silicon.

March 2, 2026 at 9:13 PM·4 min read

For years, the Apple Neural Engine (ANE) has been a specialized, inference-only component of Mac and iPhone chips—a high-speed lane for running AI, but a dead end for building it. Because Apple provides no public documentation for training, developers have been forced to rely on power-hungry GPUs for even small fine-tuning tasks. That wall has finally been breached by an independent researcher who found a way to run training loops directly on the hardware.

Bypassing the CoreML Gatekeeper

The project, led by researcher maderix, sidesteps Apple’s official CoreML framework entirely. Instead of using supported tools, the methodology leverages undocumented private APIs like _ANEClient to compile programs in-memory using Model Intermediate Language (MIL). By feeding data through shared memory buffers, the system can perform the complex matrix multiplications and activation functions necessary for neural networks.\n\nWhile Apple markets the M4 ANE at 38 TOPS, this research reveals a real-world throughput of approximately 19 TFLOPS at FP16 precision. This level of performance is achieved by baking weights directly into compiled programs as constants. It represents a fundamental shift in how we perceive the architecture of Apple Silicon and what its hidden silicon can actually achieve.

Why Efficiency Matters for Local AI

The implications for power consumption are staggering. The M4 Neural Engine delivers roughly 6.6 TFLOPS per watt, making it over 80 times more efficient than an enterprise-grade NVIDIA A100 GPU for specific tasks. This efficiency allows for always-on local learning and fine-tuning without draining a laptop's battery or requiring a dedicated server.\n\nHowever, the approach remains a hybrid experiment for now. While the ANE handles the heavy lifting of forward and backward passes, weight gradients are still calculated on the CPU using Apple's Accelerate libraries. This creates a potential bottleneck, but it marks the first time the ANE's true capabilities have been exposed beyond Apple's strict software guardrails.

Reverse Engineering the Apple Neural Engine

Keep reading

Principles Over Profit: How Anthropic Won the Internet by Saying No to the Pentagon

Anthropic's defiance of the U.S. government marks a historic turning point where AI safety and ethical boundaries outperformed state-sponsored contracts in the court of public opinion.

March 2, 2026 at 4:33 PM

Tech

Apple’s New iPad Air Bridges the Gap with M4 Power and Custom Silicon

Apple's refreshed iPad Air brings M4 performance and the debut of the C1X modem, signaling a new era for the company's wireless hardware.

March 2, 2026 at 5:07 PM

The 27B Revolution: How Qwen 3.5 is Dismantling the Big AI Monopoly

Qwen 3.5-27B has emerged as a sleeper hit, outperforming massive proprietary models while running locally on high-end consumer PCs.

March 2, 2026 at 4:19 PM