I built a machine that writes a spy novel. The hard part was not the writing.

I have spent a while building a system that writes a complete, full-length spy novel from a single paragraph of premise. Four finished books now, each around a hundred thousand words. People assume the interesting problem is the prose. It is not. Modern models write good sentences. The interesting problem is everything a novelist holds in their head and a model does not: who knows what, who is lying to whom, what has not been revealed yet, and what year it is on any given page.

The whole thing is open source: github.com/devsandip/ludLLM.

I used it to generate two alternative versions of the upcoming movie Alpha. You can read them on my site: Alpha and Alpha V2. These are full-length spy novels, mind you.

Before you proceed, let me set two things aside.

1. Does it write a thriller worthy of le Carré, Forsyth, Ludlum, or Clancy? The answer is an emphatic no. There are still glaring blind spots, which you will spot immediately after reading a few chapters of these generated novels. For instance, the AI relies heavily on unnecessary internal monologue and over-explanation (expect to see the phrase “which was” a lot). Current models do not yet grasp subtext or the golden rule of “show, don’t tell.” It leans on repetitive literary imagery, and you will frequently see specific words appearing in dense clusters, a dead giveaway of the probabilistic nature of LLMs.

2. So, what is this actually good for? I built LudLLM to help non-technical users see exactly how awesome (and how not awesome) current AI models are at generating long-form content. Right now, the most voluminous content generated by LLMs is code, which non-technical folks aren’t equipped to read or evaluate. Hence, a spy novel. This project, along with the two alternative versions of Alpha it produced, is designed to give you a readable, relatable benchmark to evaluate AI capabilities for yourself.

That being said, you should treat these novels as the absolute floor of what an LLM can write today. We could have achieved much better results by throwing more compute at it, fine-tuning the model exclusively on a handful of master thriller authors (rather than the entire internet), or increasing the generation and critique passes to a hundred. LudLLM even lets you gate every generation stage, meaning you can step in, review, and provide human input to course-correct later chapters. So while these novels are a fascinating experiment, they are by no means the ceiling of what AI can do.

A spy novel is, underneath the chases, an enormous bookkeeping exercise. The whole genre runs on the gap between what the reader knows and what the characters know. The reader sees the handler is lying. The hero does not, not for another twenty chapters. If the system ever forgets that gap, even once, the book is ruined. A single sentence where a character casually knows a thing they should not know yet, and the spell breaks.

So the project became less about generating text and more about building the scaffolding that lets a model write a long story without tripping over its own secrets. Here is how it works, without the engineering jargon.

A plot is really a graph

The first move was to stop thinking of the story as words and start thinking of it as a structure. Strip a spy plot down and you get a set of people, a set of facts, and a web of relationships between them: who knows this fact, who believes a lie about it, who learns the truth and in which chapter. That is a graph. Boxes and lines.

A plot drawn as a graph: people and facts, with lines for who knows what and when.

Once you draw the story this way, the central trick of the genre becomes something you can actually track. A secret is just a fact with timing attached: the reader is allowed to know it in chapter two, the hero is not allowed to know it until chapter thirty-one, and one character has been carrying a planted lie about it the whole time. The model never gets to decide any of that on a whim. The structure decides, and the model fills in the prose to match.

This is the part I am quietly proud of. The novel is stored as data first and words second. The words are a view of the data, never the other way around.

An assembly line, not one big ask

You cannot ask a model to “write me a spy novel” and get anything good. It has no plan, so it wanders. Instead the system builds the book the way you would build anything complicated: in stages, each one handing its work to the next, with a human checkpoint between every step.

The pipeline: a premise becomes a world, a cast, a structure, an outline, then chapters, then the finishing touches.

It starts from the premise and builds outward. First the world and the central secret. Then the cast. Then the act structure and where each reveal lands. Then a chapter-by-chapter outline. Only then does it write prose, one chapter at a time. I read and approve each stage before the next one runs, and if I change something early, everything downstream is rebuilt to match. It is a planning tool that happens to end in a novel, more than it is a writing tool.

There is one more piece I lean on. At every stage, a second model checks the first one’s work. Crucially it is a different model, from a different company, so it is not grading its own homework. It scores the draft, points out the weakest part, and suggests a fix. It advises me. It never overrules me. The taste stays human; the second model just makes sure I am never rubber-stamping something lazy.

Keeping the secrets, and keeping the dates straight

Two specific failures haunted me, so each got its own guardrail.

The first is the early leak: a character saying something they should not know yet. The fix is almost embarrassingly simple. When the system writes a chapter from a character’s point of view, it only hands the model the facts that character is actually allowed to know at that moment. The forbidden facts are simply not in the room. You cannot blurt out a secret you were never told. After the chapter is written, a checker reads it back looking for any secret that slipped through, and a fresh pair of eyes (that second model again) catches the subtler cases where something was implied rather than stated.

The second failure is the anachronism. These books jump between the present day and events two decades earlier, and it is dangerously easy to put a piece of modern technology, or a place under a name it did not have yet, into a scene set in the past. So there is a plain, mechanical check that scans for words that do not belong in the period, in either direction, before a human ever has to notice. It is not clever. It does not need to be. It just refuses to let the obvious mistakes through.

Two clocks at once

That jump between time periods deserves its own mention, because it is where most of this bookkeeping earns its keep. Several of these novels braid two timelines: a present-day hunt and a founding story from twenty years before, cut together so the past lands at the exact moment it hurts the most.

The same chapters in reading order on top and in the order events actually happened below, with lines showing the braid.

The order you read the chapters in is not the order the events happened. So the system keeps both orders at once. Every character’s knowledge is tracked against when things actually happened, not when you read about them. That means a flashback chapter shows you exactly what a person knew back then, not what they figured out later. Without that, flashbacks quietly contaminate themselves with future knowledge, and nobody can quite say why the book feels off.

It invents most of the cast

One thing that surprises people: you do not hand it a cast. You give it a premise and the one secret at the heart of the book. It works out who has to exist for that story to function. If the plot needs an analyst who first spotted the threat, a financial middleman to move money quietly, a mole inside the agency, it creates them, gives each one a name, a history, and a private set of beliefs (including the lies they have been fed), and wires them into the graph. A two-sentence idea walks in. A full ensemble, each person knowing a different slice of the truth, walks out.

You can poke at the result

Because the whole book is data underneath, I could build an interactive map of it. You drag a slider along the chapters and watch knowledge spread through the cast: a dot lights up the moment a character learns something, the reader’s own line sits on top, and you can literally see the gap between what you know and what the hero knows widen and close.

An interactive map of one finished novel: each row is a character, each dot is the chapter where they learned a fact.

It turned out to be the clearest possible proof that the bookkeeping works. The suspense of a spy novel, the thing that is usually invisible and felt, is sitting right there as a picture.

Try it yourself

The engine is open source at github.com/devsandip/ludLLM. There are two ways in.

The easy way, if you use Claude Code, is the plugin. You never type a pipeline command; you just talk to it, and it runs the stages, shows you each result, and waits for your approval before moving on.

/plugin marketplace add devsandip/ludLLM
/plugin install ludllm@ludllm
/ludllm:ludllm-setup          # one time: checks your setup and keys
/ludllm:write-spy-novel       # walks you through writing a book

The hands-on way is to run it locally. This first run uses stand-in models, so it needs no API keys and costs nothing. It builds the bundled example book and lets you open the interactive map.

git clone https://github.com/devsandip/ludLLM
cd ludLLM
uv sync                              # installs it (needs uv, the Python tool)
uv run ludllm demo ./out             # builds the sample book on mock models, no keys
uv run ludllm show ./out/book_state.json   # prints what it produced

When you want it to write with real models, add your keys and pull in the model extras:

uv sync --extra models
cp .env.example .env                 # then put your keys in .env
uv run ludllm viz runs/alpha --open  # build and open the interactive map for a sample

One rule the setup enforces: the model that critiques is always a different family from the model that writes, so nothing grades its own work. The full design is in the repo’s docs.

What it is really about

I set out to make a model write a novel. What I actually built was a system for holding a very large, very fragile structure of secrets steady while a model fills it in sentence by sentence. The lesson, which I keep relearning in different contexts, is that the model is rarely the hard part. The hard part is the structure you put around it: the stages, the checks, the single source of truth, the refusal to let it know things it should not. Get that right and the writing takes care of itself.

You can read the plots, download the books, and play with the interactive story map on the LudLLM project page. The engine is open source at github.com/devsandip/ludLLM.