Playing IF Games with Natural Language

November 14, 2025

I've been experimenting with the idea of integrating LLMs with text-adventure and interactive fiction games in a couple of different ways, most are not yet ready (and will probably take a long time for anything useful to come out of them if any) but one area I wanted to share is introducing LLMs at the parser layer of the engine or the VM. From the minimal experience I have with the world of IF (other than just playing some games), the traditional parser is both the charm and possibly the barrier to IF games. You need to learn the syntax and figure out what verbs the game understands (some might be defaults/common and some might be custom to the game), but mostly you have to phrase things in a very specific way. For example "take key" works but "grab it" does not, even though technically both mean the same thing.

But the game really just needs a normalized command like "take key" or "go north", and doesn't (or shouldn't) care about how the player phrased it. So basically the interpretation of what the player meant is really just an I/O problem, not a game logic problem. So this is where the idea came from: what if we use an LLM to handle that interpretation?

This experiment is built on top of Andrew Plotkin's CheapGlk and Glulxe. I forked both and added a thin layer in the Glk library that intercepts user input and sends it to an OpenAI-compatible API for interpretation, then passes it back to the game as if the user wrote it. The API reads the last few lines of game output for context, tries to figure out what the player probably meant, and returns a "normalized" command (of course that might be more difficult than it seems because LLMs don't "think", but it does actually seem to work most of the time) and that's it. The VM doesn't change, the game files don't change, nothing changes except the an additional step to parse out the intent of the player before it gets passed back to the game.

Here's a demo playing Tom Devereaux's "Try Again":

Gameplay example with modified Glk/Glulxe to use LLM-based user input parsing

When you type something like "let's go through that metal door", the Glk library sends it to the LLM with context about the current scene (the bunker description, the objects you can see), gets back "go through metal door", and passes that to the game.

The system reads room descriptions and object lists to understand context. When you say "take it" after opening a box with a pistol inside, it knows you mean "take pistol". When you say "read them" in front of a filing cabinet with multiple entries, it starts with the first one. Say "and the next one" and it continues the sequence. This means existing games work without modification.

Almost nothing changed in Glulxe itself. I added OpenSSL to the build configuration and linked against the modified CheapGlk. The entire LLM integration lives in the Glk library, which seemed reasonable to me since Glk's job is to handle input and output.

One interesting side effect is that because some LLMs are multilingual, you can play in different languages. The game still receives English commands but you could type in Swedish or transliterated Arabic and it would work the same way.

I wanted to answer a few questions with this: Can LLM interpretation make IF more accessible without destroying what makes it interesting? Does it preserve the puzzle-solving nature or does it just become "tell the AI what you want and it figures it out"? Can spatial reasoning be extracted from text descriptions well enough to understand what "go through that metal door" means in context? I think some answers will take more time to figure out but at least I can try playing some games now and see how it works.

There are some obvious issues though. Interpretation isn't always perfect or consistent. LLM calls add latency and API costs can become high if you're playing for hours. You're dependent on external service availability unless you run a local model. Model quality varies a lot too. I've had good results with google/gemini-2.5-flash but smaller models need significant finetuning to be useful (though I think this is probably where this idea should go next, a specialized finetuned SLM for parsing user input specifically for text-adventure games).

Although this is just a proof of concept, I tried to handle some edge cases. Some games ask for raw string input like your character's name, you can wrap your input in brackets to skip LLM processing. When you try to take something that's fixed in place, it passes that through and lets the game respond naturally.

The experiment is done entierly by adjusting Andrew Plotkin's CheapGlk to call the external LLM provider. After building both CheapGlk and Glulxe, you add a configuration file at ~/.glk_llm.conf with the API endpoint you want to use, key, model, and a few other settings. You can use OpenAI, OpenRouter, local Ollama, or whatever supports the OpenAI chat completions format.

Is it actually fun? I have to play some more to be able to answer that, but it probably depends on what you want from IF. If you enjoy the puzzle of figuring out the parser itself, this removes that layer entirely. If you find the parser frustrating and just want to explore the story, with my limited test run with some games I found this to help a lot. The interpretation adds a different kind of uncertainty though. Sometimes the LLM makes unexpected choices about what you meant, which can be amusing or annoying depending on the situation.

I'm treating this as a prototype to explore these questions, not as a real feature or contribution. The code is minimal and possibly rough in places and there are definitely improvements to be made. But it's functional enough to play through games and see how it feels.

If you want to try it yourself, check out the repos: Glulxe fork and CheapGlk fork. Build instructions and configuration details are in the READMEs.

Update (2025-11-17)