Writing
Notes on shipping on-device AI
Jun 1, 2026
Shipping on-device RAG: Building NativeLM for Android
How we implemented fully offline document RAG using MediaPipe's USE-Lite and ObjectBox HNSW vector search to ground Gemma's chat answers in imported PDFs.
Jun 1, 2026Why Android's ActivityManager lies about RAM — and how litertlm-kmp works around it
Xiaomi, Realme, and OPPO inflate reported RAM with swap-to-flash. Here's how we detect it and prevent OOM crashes when loading on-device LLMs.
May 30, 2026Stateful KV-cache sessions for on-device Gemma on Android
How litertlm-kmp v0.3 makes multi-turn memory lossless and free — plus what an on-device CPU/GPU/NPU benchmark actually told me.