I yelled at Claude until it built a Unity game

In this post, I explore vibecoding for game development.

Me, blissfully vibecoding

As I've been learning Unity and using it to develop games like Build + Brawl, I've found the Cursor IDE's AI coding tools to be quite helpful. Its composer feature is great for writing snippets where I know roughly what I want to do – for instance, "get the aspect ratio of the current screen" – but don't know the specific Unity idioms or API calls to do it. Instead of having to dig through Google results, I can ask the composer to help, which queries a large language model (LLM) like Claude and inserts (usually correct) code directly into the file.

But using AI tools to augment my human intuition felt so primitive, with every calorie-burning thought melting away the spherical dream body that WALL-E promised me. Could I instead achieve pure vibecoding for game development, and rant at the computer until something cool shows up on the screen?

The scene, unseen

AI-hostile architecture (via Unity)

I found one key blocker to my vibecoding goal: the Unity editor was foolishly designed for the outmoded eyes and appendages of human beings. Very roughly, a project in Unity consists of scripts, written in divine code, and scenes, rendered in the editor's profane GUI. Scripts define the behavior of objects, but the objects themselves are defined within the scene. An enemy that pursues a player and then explodes might be represented in the scene as an object with both "ChasesPlayer" and "BlowsUp" scripts attached. These two separate behaviors would be obvious in code, but there would be no indication that they might be combined in a single enemy.

Cursor's composer could read and edit my scripts, but had no way to read or edit the scene. My omnipotent AI savior was reduced to cheap parlor tricks, helping me tweak behaviors but remaining entirely unaware of the big picture. I needed (a) a way to serialize the scene contents as text, so they could be consumed by LLMs; and (b) a way for the LLM's text outputs to manifest changes to the scene itself.

Enter (and exit) MCP

A client (i.e., an AI agent) interacts with MCP servers that use external tools (via Docker)

The wise (and Anthropic-sponsored) way to do this is with a Model Context Protocol (MCP) server. An MCP server exposes endpoints that AI agents like the Cursor composer can query to gather information and make changes to external systems. One could run an MCP server that sits between the Unity editor and the agent, passing editor state to the agent and agent-requested actions to the editor. In fact, this is exactly what Jack Richards built in his UnityMCP project.

I tried to be wise, but as I started to build what amounted to a UnityMCP knockoff, I came to a few opinions:

  1. In an ideal world, I wouldn't need to run a middleman server between the Unity editor and my AI agent. I could simply open the editor and get cracking.
  2. Agent models – at least, Cursor's composer agent with Claude 3.7 – seem really good at working with files. They have built in file system tools, and there's a lot of training data in existence showing how to interact with file systems. Meanwhile, barring time travel, the model would definitely not have seen my non-existent MCP server in its training data.
  3. It's hard to define an API for scene understanding without either dumping the entirety of the scene into the model context, which feels like information overload, or designing a query language for specific pieces of information, which feels like the kind of thing I shouldn't be doing.
  4. It's fun to try something new!
These opinions led me to a non-MCP approach.

Enter vibe-gamedev

vibe-gamedev is a package within Unity that interacts with agents via JSON

To bridge the gap between the Unity editor and my vibecoding servantbuddy, I wrote a Unity package called vibe-gamedev. Rather than use MCP, vibe-gamedev serializes the open scene as JSON files that the composer agent can peruse and edit, and automatically deserializes any edits back into the scene. For example, if our explosive enemy has a "damage" property of 10, an entry like {"damage": 10} would be written in the enemy's JSON file. If the agent edited that file and changed the 10 to 100, the "damage" property of the enemy in the scene would be changed to 100 as well.

vibe-gamedev seemed to satisfy my unwise opinions:
  1. No middleman server was required. The package runs entirely within the Unity editor, listening to editor changes (to trigger serializations) and JSON file changes (to trigger deserializations).
  2. The interface between agent and editor consisted entirely of JSON files. LLMs could grep and ls to the content of their cold mechanical hearts, just like they were trained to do.
  3. I did have to make some decisions about the layout of the JSON files, but the agent was free to peruse them in any way it decided. It could search for certain keywords, list the filesystem to any depth, and read in whichever JSON files would be most helpful.
  4. It was fun to build!
With vibe-gamedev up and running, no limits remained on my almighty AI. It was finally ready to develop my magnum opus.

Look on my Snake implementation, ye mighty, and despair


Preorder on Steam for only $69.99

I kicked my feet up, cracked my knuckles, and created a new Unity project. With nothing but the default scene, I gave my first instructions to Cursor's composer:

"I want to build the game Snake in Unity. Specifically, I want a player character to collect a food pellet that moves to random locations after being captured. Each time it's eaten, the player's tail grows longer. The game ends when the player collides with its own tail."

The composer spits out a few scripts and JSON files, and confidently assured me that the game was ready. I don't bother to check any of its work. I'm feeling the vibes. I try to test the scene in Unity. It immediately fails. The main game script is looking for objects with a tag that doesn't exist. I turn back to the composer:

"I'm getting this error when I test the scene: ..."

The composer calmly makes some JSON file changes, adding the appropriate tag to the appropriate objects. I'm a bit offended by its lack of an apology, but the vibes are still flowing. I run the scene again, and again it fails, but with two new errors now. I don't stop to contemplate what the errors might mean; I simply slam them back into the composer chat:

"I'm getting two errors now. Error 1: ... Error 2: ..."

Turns out it never gave attached the "FoodController" component to the FoodPellet object. I don't know what a "FoodController" is, but it's not my job to know. It's my job to vibe. Some more JSON changes are made, I test the scene, and I receive no errors this time. Instead, I'm greeted by a log message saying "Game Over!" every frame. Back to the composer:

"When I run, it just says "Game Over!" repeatedly"

It identifies the issue as a premature collision check between the snake and its tail. I take its word for it, accept all code changes, and run the scene again. No more errors, no more "Game Over!" log. In fact, there's nothing at all except the blank scene background.

Here, I commit a vibecoding faux pas and look into the issue myself. I quickly realize none of the objects in the scene have sprites attached, so they are effectively invisible. My vibes temporarily non-copacetic, I add some basic sprites to them myself. I test the scene again, and find my snake's green square head motionless at the center of the screen. I inform the composer:

"The player doesn't move at all. It's just a green square at the center of the screen."

The composer realizes it never actually defined any movement logic. It writes some. I test the scene, and find everything is working surprisingly well: I have a moving snake with a tail that grows as it collects randomly-placed food. Now we're rolling.

"There's a weird gap between the snake's head and first tail segment." I tell the composer, and it fixes it. "The food is spawning off-screen." I tell the composer, and it adds tighter bounds and adjusts the camera size. "The snake is boringly slow." I tell the composer to double its speed, and it does. The vibes are immaculate.

All told, it took about ten minutes of harassing Claude to build a fully-functional Snake game. Besides pasting errors into the composer chat, I only intervened once (to add sprites). I never debugged anything or even looked at the generated code. Looking now, it's strange and poorly architected, and I expect would start to collapse in a project of greater complexity.

AI worship aside, I was actually pretty surprised how well this all worked. If I developed bare-bones Snake clones for a living, I'd be a bit nervous. If I were anyone else, I'd be excited. I'm optimistic about a future of game development that consists more of "what do I want to build?" and less of "how do I actually build it?" If you want to vibe your way to the top of the Steam bestsellers, you can download vibe-gamedev now!

Comments