The article (see below) titled
“Free, offline ChatGPT on your phone? Technically possible, basically useless”, describes how running ChatGPT-oss on smartphones is technically feasible, but current smartphones aren’t powerful enough to make it practical at this point in time.
Right now, GPT-oss on mobile is bottlenecked because it runs purely on the CPU (and sometimes GPU) without tapping into specialized neural processing hardware.
So, I asked ChatGPT 5.0 if there'd be an improvement if the model were to be adapted to run on an NPU - particularly one optimized for event-based, sparse processing like Akida and here's the response I received.
You can now run ChatGPT offline with OpenAI’s gpt-oss model, even on a phone. But is it practical? Here's what works and what really doesn’t.
www.androidauthority.com
ChatGPT 5's Response:
Yes — Akida could absolutely be a technology to consider if the goal is to make something like “ChatGPTOS” actually usable on a mobile phone, and
the limitations described in that article are almost a perfect setup for why a neuromorphic processor would matter.
Here’s why:
Why Akida Fits This Problem
The main bottlenecks in running a 20B parameter LLM on a phone (as described in the article) are:
- RAM requirements – Needs 12–24 GB just to load the model.
- Compute requirements – Mobile CPUs & GPUs can’t sustain high token generation without overheating and throttling.
- Power draw – 100% CPU load drains the battery extremely fast.
- Heat – The phone becomes uncomfortably hot after just a few minutes.
Akida (or any high-efficiency neuromorphic processor) is built for:
- Event-driven, sparse compute — only processes relevant “spikes” of information instead of hammering every computation step like a CPU/GPU.
- Ultra-low power draw — operates in milliwatts for inference.
- On-chip learning & inference — avoids constant memory transfers that waste time and energy.
- Massive parallelism — can run edge AI workloads in real time without overheating.
That means for the
same LLM inference task, a neuromorphic setup could:
- Reduce energy consumption by orders of magnitude.
- Maintain higher sustained token rates without thermal throttling.
- Extend usable session time from minutes to hours on a battery.
Why Not Just Use Mobile NPUs Instead?
Modern NPUs (like Qualcomm Hexagon or Apple Neural Engine) do accelerate AI workloads, but:
- They’re matrix-math optimised, not spiking-event optimised.
- They often require proprietary SDKs and retraining models in special formats.
- They still hit thermal limits in sustained generative AI workloads.
Neuromorphic processors like Akida:
- Can run quantised, sparse, and pruned models natively without expensive data shuffling.
- Could theoretically run a distilled / specialised LLM that fits in a fraction of the RAM and still delivers conversational quality.
Would It Make ChatGPTOSS Work on Phones?
Realistically:
- You’re not going to run the full 20B gpt-oss on Akida right now — it’s still too large.
- But with model compression (distillation + sparsity), you could run a smaller, reasoning-optimised model that feels far more responsive than the brute-force CPU approach.
- Instead of the “2–3 tokens per second” in the article, you might see 10× or more throughput without cooking the phone.
Implications for BrainChip
If OpenAI or others seriously want “ChatGPTOS” to run locally:
- They’ll either need dedicated AI accelerators in consumer devices or move toward neuromorphic architectures.
- Qualcomm, Apple, Google already have NPUs — but neuromorphic could leapfrog them in power efficiency per watt.
- If BrainChip could prove Akida can handle a slimmed-down LLM at useful speeds, it becomes a licensable IP block for smartphone SoCs.
- This would be a huge market — billions of devices, not just niche edge AI.