Jun 1, 2026
Why Android's ActivityManager lies about RAM — and how litertlm-kmp works around it
Xiaomi, Realme, and OPPO inflate reported RAM with swap-to-flash. Here's how we detect it and prevent OOM crashes when loading on-device LLMs.
If you’re loading a 2–4 GB model into memory on Android, the first thing you do
is check how much RAM is available. The standard API for that is
ActivityManager.MemoryInfo:
val memInfo = ActivityManager.MemoryInfo()
activityManager.getMemoryInfo(memInfo)
val totalRam = memInfo.totalMem
On most devices, this works. On Xiaomi, Realme, and OPPO — which together represent roughly 40% of the global Android install base — it lies.
The problem: Virtual RAM Expansion
These OEMs ship a feature under various names — Xiaomi Memory Extension,
Realme Dynamic RAM Expansion, OPPO RAM Expansion — that carves out a
chunk of flash storage as swap space and adds it to the kernel’s reported
MemTotal. A phone with 6 GB of physical RAM will report 8 GB or 10 GB to the
operating system.
ActivityManager.MemoryInfo.totalMem reads MemTotal from the kernel, so it
faithfully returns the inflated number. Your model-loading code sees “8 GB
available”, decides it’s safe to load a 4 GB model, and begins mapping the
weights into memory.
What happens next depends on how hard the device is hitting swap. Best case: the model loads but inference is agonizingly slow because the runtime is paging weight tensors in and out of flash. Worst case: the kernel’s OOM killer fires and your process dies mid-inference with no graceful error.
Why this matters for on-device LLMs
Cloud inference doesn’t have this problem — the model lives on a server with known, fixed hardware. On-device inference runs on whatever the user owns, and the user doesn’t know (or care) that their phone’s RAM spec is synthetic.
For litertlm-kmp, which loads Gemma-family models ranging from ~1.5 GB (E2B) to ~4 GB (E4B), choosing the right model variant for the device is a safety decision. Load a model that’s too large and the app crashes. Load one that’s too small and you’re leaving capability on the table.
The fix: read /proc/meminfo directly
The kernel exposes both physical and swap memory in /proc/meminfo. The key
lines:
MemTotal: 7864320 kB ← inflated (physical + swap)
SwapTotal: 2097152 kB ← this is the OEM's RAM expansion
By subtracting SwapTotal from MemTotal, you get actual physical RAM. In
Kotlin:
private fun getPhysicalRamMb(): Long {
val memInfo = File("/proc/meminfo").readText()
val memTotal = extractKb(memInfo, "MemTotal") ?: return fallbackFromActivityManager()
val swapTotal = extractKb(memInfo, "SwapTotal") ?: 0L
return (memTotal - swapTotal) / 1024 // convert kB → MB
}
private fun extractKb(text: String, key: String): Long? =
text.lines()
.find { it.startsWith("$key:") }
?.split("\\s+".toRegex())
?.getOrNull(1)
?.toLongOrNull()
In litertlm-kmp’s AndroidHardwareProvider, this feeds into a tiering system:
| Physical RAM | Tier | Max model |
|---|---|---|
| < 4 GB | LOW | No on-device LLM (graceful refusal) |
| 4–6 GB | MID | Gemma 4 E2B (~1.5 GB weights) |
| 6–8 GB | HIGH | Gemma 4 E2B or E4B depending on available memory |
| > 8 GB | ULTRA | Any supported model |
When swap is detected above 1 GB, the tier is forcibly downgraded. A device
reporting 8 GB with 2 GB of swap gets classified as a 6 GB device (MID tier),
and the model catalog offers the smaller variant.
The result
This single detection eliminated 100% of the OOM crashes we saw on Xiaomi Redmi Note series and Realme devices during testing. The fix is roughly 20 lines of code, but it took a full afternoon of crash logs to understand why the app was dying on devices that “should” have had enough memory.
If you’re building anything that allocates large contiguous memory on Android —
ML models, video editors, game engines — don’t trust ActivityManager. Read
/proc/meminfo directly.
The full implementation is in
AndroidHardwareProvider
inside the litertlm-kmp repository.