Writing

Notes on shipping on-device AI

Jun 23, 2026

What on-device LLMs actually cost on mid-range Android

Reproducible benchmark numbers from a Snapdragon 7 Gen 1 phone and a Snapdragon 870 tablet — measured from inside the app, on the CPU, fully offline. Decode tok/s, time-to-first-token, peak RAM, and RAG latency.

Jun 9, 2026

What's new in NativeLM v0.10.0: answering from the right document

v0.10 is a retrieval release — an optional EmbeddingGemma embedder tiered to your device, hybrid dense + lexical search, a flagship reranker, and a set of grounding fixes that stop the model answering from the wrong file. Still fully local, no account, no upload, no telemetry.

Jun 5, 2026

What's new in NativeLM v0.9.0: charts in chat, an adaptive UI, and a real engine library

v0.9 teaches the on-device model to answer with charts, makes the UI adapt from phone to tablet, and pulls the whole AI core out of the app into a reusable Kotlin Multiplatform library — still fully local, no account, no upload, no telemetry.

Jun 4, 2026

Your data, your key: local encrypted backup without a server

NativeLM keeps everything on your phone — which means losing the phone means losing the data. v0.7 fixes that with a passphrase-encrypted .nlmbak file you fully control: Argon2id → AES-256-GCM, no server, no account, no key we hold.

Jun 4, 2026

Talk to your local LLM: on-device voice input with Whisper

NativeLM v0.8 lets you dictate your questions — transcribed entirely on-device with Whisper (whisper.cpp), no cloud. Here's why we picked Whisper over Android's built-in recognizer, and how the Whisper model became a first-class 'Audio' entry in the model catalog.

Jun 4, 2026

The OCR library that phoned home: restoring NativeLM's zero-telemetry guarantee

Google's ML Kit gave NativeLM on-device OCR — and quietly bundled a datatransport pipeline that uploaded diagnostics to firebaselogging.googleapis.com on startup. Here's how we found it and stripped it out with a three-line manifest merge.

Jun 3, 2026

AirDrop for your LLM: building cloudless peer-to-peer sync without Google Play Services

How we built local device-to-device sync for NativeLM using mDNS and TCP sockets, keeping your private AI data completely off the cloud—and why we explicitly avoided Google's Nearby Connections API.

Jun 3, 2026

Ask in your language, about your English documents: on-device cross-lingual RAG

NativeLM v0.8 answers in Hindi, Tamil, Kannada and more — reading your English documents and replying in your language, with zero translation model. The whole feature is one prompt directive (plus one stubborn script bug).

Jun 3, 2026

Turning your documents into artifacts, on-device: NativeLM Studio

NativeLM v0.6.0 adds Studio — generate briefings, FAQs, study guides, timelines, mind maps, and even spoken audio overviews from your own documents, entirely on the phone, via a map-reduce pipeline over on-device Gemma.

Jun 2, 2026

What's new in NativeLM v0.5.0: open, highlight, zoom, OCR, better retrieval

v0.4 made on-device document chat work. v0.5 makes it usable — tap a citation to open the source at the exact page with the passage highlighted, pinch to zoom, chat with scans, and get sharper answers. Plus the bugs we fixed along the way.

Jun 2, 2026

Chatting with scanned documents: on-device OCR (no cloud)

NativeLM v0.5.0 reads scanned PDFs and photos with on-device OCR, and blends keyword + vector search so exact terms actually get retrieved — all without an image ever leaving the phone.

Jun 1, 2026

The low-end gauntlet: running a local LLM on budget Android phones

A local LLM that only runs on flagships isn't private AI for everyone — it's a toy for people with expensive phones. Here's how NativeLM tiers models across devices, why budget phones break in two different ways (RAM and the navigation bar), and what's still hard about the 4–6 GB tier.

Jun 1, 2026

Why Android's ActivityManager lies about RAM — and how litertlm-kmp works around it

Xiaomi, Realme, and OPPO inflate reported RAM with swap-to-flash. Here's how we detect it and prevent OOM crashes when loading on-device LLMs.

Jun 1, 2026

Shipping on-device RAG: Building NativeLM for Android

How we implemented fully offline document RAG using MediaPipe's USE-Lite and ObjectBox HNSW vector search to ground Gemma's chat answers in imported PDFs.

May 30, 2026

Stateful KV-cache sessions for on-device Gemma on Android

How litertlm-kmp v0.3 makes multi-turn memory lossless and free — plus what an on-device CPU/GPU/NPU benchmark actually told me.

May 26, 2026

Seeing on-device: multimodal image input for local Gemma

litertlm-kmp v0.2.4 added vision — attach an image and the local Gemma model reasons over it, on-device. Here's how image attachments flow through the engine, why we default to the CPU vision backend, and the model gotcha that bites you on init.

May 25, 2026

Wrapping Google's LiteRT-LM into a Kotlin Multiplatform engine

The engine origin story: how litertlm-kmp turns Google's LiteRT-LM into a clean KMP library — four core abstractions, a resumable SHA-256 download manager, typed-Kotlin-to-OpenAPI function calling, and the thread discipline that keeps a non-thread-safe native runtime honest.