Ask in your language, about your English documents: on-device cross-lingual RAG

NativeLM v0.8 answers in Hindi, Tamil, Kannada and more — reading your English documents and replying in your language, with zero translation model. The whole feature is one prompt directive (plus one stubborn script bug).

For a huge share of NativeLM’s intended users, English is the document language but not the thinking language. A user in India often has English contracts, reports, and notes — but wants the answer in Hindi, Tamil, or Bengali. So v0.8 adds multilingual answers, and the headline capability is cross-lingual RAG: your documents stay English, your answer comes back in your language.

The surprising part is how little machinery it took.

No translation model. None.

The obvious architecture is a translation layer — run the answer through something like NLLB. We didn’t, because on-device that path is brutal: a ~600 MB model download and 2–5 seconds of translation latency per response, on top of generation. It would make the feature too heavy to justify.

Instead we lean on something Gemma already does well: it generates natively in many languages. So the entire mechanism is a single directive appended to the prompt:

append("Output language: ").append(localeDescriptor(locale)).append('.')

Gemma reads the retrieved English context and simply writes the answer in the requested language. No translation step, no second model, no extra latency beyond normal generation. We shipped 10 languages (English + 9 Indian) this way — the same prompt-only approach we’d already proven in our Astro app, now ported into the engine.

The one bug that almost everyone ships

There’s a trap, and it’s the kind you only find by testing the long tail of scripts. For well-covered languages the directive just works. But for weak-coverage scripts like Kannada and Punjabi, Gemma silently falls back to Devanagari (Hindi) script — producing text that’s phonetically right but in the wrong alphabet. To a Kannada reader, that’s gibberish wearing a familiar accent.

The fix is to stop being polite in the prompt and get emphatic:

"kn" -> "Kannada. CRITICAL: output STRICTLY in Kannada script (ಕನ್ನಡ ಲಿಪಿ). NEVER use Devanagari / Hindi script — that would be wrong"
"pa" -> "Punjabi. CRITICAL: output STRICTLY in Gurmukhi script (ਗੁਰਮುಖಿ ਲಿಪੀ). NEVER use Devanagari / Hindi script — that would be wrong"

Naming the correct script in that script, plus an explicit “never use Devanagari,” reliably pins the model to the right alphabet. It’s a small thing that decides whether the feature works at all for those users.

Where it lives: one language layer, three products

We didn’t bury this in the app. The Language enum and the localeDescriptor() table (with its strict-script lines) live down in the engine (:lib), so NativeLM, the kids’ product Curio, and the Astro app all reuse one battle-tested language layer. The script bug, once solved, stays solved everywhere.

On the NativeLM side, v1 applies the directive to the two prompts that matter most: the chat turn (both grounded RAG answers and ungrounded chat) and the auto-generated conversation title. You set the language in Settings (a sheet of cards, each shown in its own native script) and can flip it from a compact chip in the chat top bar — both bound to one stored preference.

Honest scope

A few deliberate boundaries:

Studio artifacts are a follow-up. The map-reduce Studio generators have parsers that key off English structural markers (### date for timelines, Alex: / Sam: for podcast turns, the Mind Map outline). Naively forcing the output language would break those parsers, so Studio needs per-artifact care — out of scope for the half-day v1 win.
This is the AI’s output language, not full UI localization. The buttons and labels stay English for now. NativeLM’s UI is simple; the differentiator is the AI answering in your language, not the menu text. Localizing app strings is a separate, lower-urgency effort.

That’s the whole feature: a one-line directive, one stubborn script fix, and a shared engine module — turning NativeLM into an on-device assistant that reads English and answers in your language, with nothing leaving the phone. It pairs naturally with on-device voice input: speak Hindi, get a Hindi answer about your English documents.

NativeLM v0.8 multilingual answers are live. The entire engine is open-source (AGPL-3.0).

Ask in your language, about your English documents: on-device cross-lingual RAG

No translation model. None.

The one bug that almost everyone ships

Where it lives: one language layer, three products

Honest scope

Comments