← Writing

Jun 1, 2026

Why Android's ActivityManager lies about RAM — and how litertlm-kmp works around it

Xiaomi, Realme, and OPPO inflate reported RAM with swap-to-flash. Here's how we detect it and prevent OOM crashes when loading on-device LLMs.

on-device-llmandroidkotlin-multiplatformgemmaoem

If you’re loading a 2–4 GB model into memory on Android, the first thing you do is check how much RAM is available. The standard API for that is ActivityManager.MemoryInfo:

val memInfo = ActivityManager.MemoryInfo()
activityManager.getMemoryInfo(memInfo)
val totalRam = memInfo.totalMem

On most devices, this works. On Xiaomi, Realme, and OPPO — which together represent roughly 40% of the global Android install base — it lies.

The problem: Virtual RAM Expansion

These OEMs ship a feature under various names — Xiaomi Memory Extension, Realme Dynamic RAM Expansion, OPPO RAM Expansion — that carves out a chunk of flash storage as swap space and adds it to the kernel’s reported MemTotal. A phone with 6 GB of physical RAM will report 8 GB or 10 GB to the operating system.

ActivityManager.MemoryInfo.totalMem reads MemTotal from the kernel, so it faithfully returns the inflated number. Your model-loading code sees “8 GB available”, decides it’s safe to load a 4 GB model, and begins mapping the weights into memory.

What happens next depends on how hard the device is hitting swap. Best case: the model loads but inference is agonizingly slow because the runtime is paging weight tensors in and out of flash. Worst case: the kernel’s OOM killer fires and your process dies mid-inference with no graceful error.

Why this matters for on-device LLMs

Cloud inference doesn’t have this problem — the model lives on a server with known, fixed hardware. On-device inference runs on whatever the user owns, and the user doesn’t know (or care) that their phone’s RAM spec is synthetic.

For litertlm-kmp, which loads Gemma-family models ranging from ~1.5 GB (E2B) to ~4 GB (E4B), choosing the right model variant for the device is a safety decision. Load a model that’s too large and the app crashes. Load one that’s too small and you’re leaving capability on the table.

The fix: read /proc/meminfo directly

The kernel exposes both physical and swap memory in /proc/meminfo. The key lines:

MemTotal:        7864320 kB    ← inflated (physical + swap)
SwapTotal:       2097152 kB    ← this is the OEM's RAM expansion

By subtracting SwapTotal from MemTotal, you get actual physical RAM. In Kotlin:

private fun getPhysicalRamMb(): Long {
    val memInfo = File("/proc/meminfo").readText()
    val memTotal = extractKb(memInfo, "MemTotal") ?: return fallbackFromActivityManager()
    val swapTotal = extractKb(memInfo, "SwapTotal") ?: 0L
    return (memTotal - swapTotal) / 1024 // convert kB → MB
}

private fun extractKb(text: String, key: String): Long? =
    text.lines()
        .find { it.startsWith("$key:") }
        ?.split("\\s+".toRegex())
        ?.getOrNull(1)
        ?.toLongOrNull()

In litertlm-kmp’s AndroidHardwareProvider, this feeds into a tiering system:

Physical RAMTierMax model
< 4 GBLOWNo on-device LLM (graceful refusal)
4–6 GBMIDGemma 4 E2B (~1.5 GB weights)
6–8 GBHIGHGemma 4 E2B or E4B depending on available memory
> 8 GBULTRAAny supported model

When swap is detected above 1 GB, the tier is forcibly downgraded. A device reporting 8 GB with 2 GB of swap gets classified as a 6 GB device (MID tier), and the model catalog offers the smaller variant.

The result

This single detection eliminated 100% of the OOM crashes we saw on Xiaomi Redmi Note series and Realme devices during testing. The fix is roughly 20 lines of code, but it took a full afternoon of crash logs to understand why the app was dying on devices that “should” have had enough memory.

If you’re building anything that allocates large contiguous memory on Android — ML models, video editors, game engines — don’t trust ActivityManager. Read /proc/meminfo directly.

The full implementation is in AndroidHardwareProvider inside the litertlm-kmp repository.