Balancing games while staying (somewhat) sane

In this post, I'm going to explain a framework for balancing games and how I applied this framework to my game Build + Brawl.



I recently published a free browser game, developed in Unity, called Build + Brawl. The player must battle increasingly-difficult waves of enemies using upgrades distributed throughout the game. Making the game fun required a careful balancing act: enemies needed to grow more dangerous, but not infuriatingly so; player upgrades needed to feel powerful, but not trivialize other options; the difficulty curve needed to be excitingly unpredictable, but not too spiky.

Every value that was set during development, from the strength of an upgrade to an enemy's movement speed, was a dial I could turn to achieve balance. But games are complicated, chaotic systems, and the impact of turning each dial was not clear. Maybe improving a particular upgrade is irrelevant because very few players choose it. Maybe making an enemy 2% faster means they can now reach the player before dying, causing massive damage.

For Build + Brawl, I needed to find some configuration of these hundreds of dials that made the game fun. It felt (and, honestly, still feels) like an impossible task. Eventually, I boiled the problem down to:
  • The game presents the player with a number of different paths they can follow: they can pick upgrade A or B; they can choose to focus their attacks on enemy 1 or 2; they can deftly avoid enemies or crash into them.
  • Each path is configured by a set of dials: the attack speed bestowed by upgrade A; the speed of enemy 1; the damage dealt per second by enemies.
  • The configuration of dials determines the outcome of following each path: if upgrade A is stronger than B, players will win more when choosing it; if enemy 1 is more deadly than enemy 2, players will win more when focusing fire on it; if enemies do a lot of damage very quickly, evasive players will win more than clumsy ones.
  • The distribution of outcomes experienced by players determines whether they have fun: a player might be bored by a game where winning is guaranteed, but frustrated by one where only a few paths can lead to victory.
Put another way:

In a well-balanced game, dials are tuned so that a player's possible paths produce outcomes matching a target distribution.

I used this intuition as the framework for balancing Build + Brawl.

Diving in to the framework

Dial


A dial is any in-game property that can be tuned to impact the player experience. These impacts might not always be clear, and may be dependent on the values of other dials. Not every configurable value needs to be considered a dial: as designer Timothy Cain suggests, you could "put a pin in" some values, and take them as fixed while tuning everything else.

Dials also do not always need to be numerical values. For example, changing an enemy's behavior from "move towards the player" to "move randomly" would of course affect the balance of a game. There's a blurry line between a dial and a game mechanic, and any mechanic that you might change when balancing could be considered a dial containing categorical, rather than numerical, choices.

Path


A path is the combination of a player's in-game choices and real-world competencies, where your goal is to introduce some sort of balance between different paths. A player's path in a turn-based RPG might include their character's class and their personal willingness to carefully optimize their equipment. The paths of a fighting game might be the different playable characters, crossed with the different fast-twitch reaction times a player might have.

Outcome


An outcome is the result of a player's chosen path, given the configuration of the dials. For example, "player health lost" might be the outcome of a level in a roguelike game, or "score achieved" might be the outcome in a shoot-em-up game. An ideal outcome is:

Measurable

You can quantify the outcome for a particular path, and compare the outcome between paths and dial configurations. The outcome can meaningfully change between different paths and configurations.

Continuous

Changes to dials produce a somewhat-smooth change in outcome, rather than large, discrete jumps or no changes at all. A scalar outcome that measures "player health lost" would be preferable to a binary one that measures "did player die." This allows you to see the relative importance of different tweaks to your dials.

Singular

The outcome encapsulates every metric that you care about. While there may be many potential metrics in your game -- the score achieved, the time to completion, the number of lives remaining -- trying to balance all of these at once introduces complexity. It also introduces unclear tradeoffs: if a proposed buff to a path increases its score but reduces its lives remaining, is it actually a buff? Optimization problems are much easier when there is a single target to optimize.

Chunkable

You should be able to measure the outcome not only across the entirety of a play session, but within smaller "chunks" of the game, like individual levels. This allows for finer-grained control of the difficulty curve throughout the game. If you can measure "player health lost" on a per-chunk basis, you can make certain chunks harder by adding more enemies or reducing player upgrades, creating a more interesting difficulty curve. Otherwise, you might estimate that the player loses a certain amount of health across an entire play session, but have no idea whether that health is lost too gradually, leading to boredom, or too suddenly, leading to frustration.

Estimable

For a given path under a given configuration of dials, you have an efficient way to estimate the outcome it would achieve. Ideally, you have some function, in code or in a spreadsheet, that takes as input a path and a dial configuration and returns an estimated outcome. This makes it much faster to compare paths and test tweaks to dials.

Target distribution


The target distribution is how you want the outcome metric to be balanced across different paths. This will set the tone for how impactful a player's choices and skills are, how many ways a player can win your game, and how difficult your game feels.

One (bad) target distribution would be a constant one: every path has the same outcome. This means no path is better or worse than another -- by some definition, "perfectly balanced" -- but also that the player's choices and skills are entirely irrelevant to the outcome of the game.

A harder game might have a distribution skewed towards failure: most paths result in a negative outcome, while only a few result in a positive one. This would mean that most of what the player tries will fail, and that they will have to work hard to find a successful one -- if they stick around long enough to do so.

Picking the target distribution requires a fair amount of intuition about the underlying distribution of paths. What skill level do you expect most players to have? If most will be low-skill, then your target distribution should probably skew positively so that paths in the hands of these players can succeed. Are there certain upgrade paths that are so non-obvious that players will never choose them? If these are the only winning paths, your game is unlikely to be fun for most. Accounting for these intuitions is challenging but might substantially change the shape of your distribution.

You will also need an intuition about what outcome distribution will be "fun" for your players. Do they want to search for a needle of success in a haystack of failures? Do they want to try a bunch of different options and usually have them work well? How often do they expect to win? How much time will they give your game? I don't know of a way to decide this outside of gut checks and playtesting.

Putting the pieces together

Revisiting that initial intuition: in a well-balanced game, dials are tuned so that a player's possible paths produce outcomes matching a target distribution. The key questions a designer must answer within this framework are then:
  • What are the dials I want to tune?
  • What paths might a player employ?
  • What outcome can I estimate from these paths?
  • How do I want these outcomes to be distributed across paths?
The answers to these questions will vary dramatically from game to game and audience to audience. This framework will not magically reveal exactly how to balance your game. Rather, my hope is that it can turn a nebulous, overwhelming challenge -- "make the game balanced" -- into a more concrete, actionable one. 

How I used this framework in Build + Brawl

We have our nice, clean framework. Now, let's see how it played out in practice.

Dials

A snippet from the spreadsheet where I set the enemy attribute dials.

The dials I tuned fell into four buckets:
  • Enemy attributes, like health, speed, and quantity. I set these attributes in a spreadsheet, and would translate them into Unity ScriptableObjects to keep the in-game enemy attributes aligned with the spreadsheet.
  • Upgrade attributes, like the attack speed granted by a particular upgrade or the frequency at which a certain upgrade drops. These were set directly in Unity ScriptableObjects, as the branching tree structure of the upgrades didn't lend themselves well to a spreadsheet.
  • Structure attributes, like the number of walls granted per upgrade or the damage dealt by a turret. These were set in a spreadsheet and translated into Unity ScriptableObjects.
  • The number of upgrades offered to the player in each stage. This was set in a spreadsheet.

Paths

Structure and upgrade options offered to the player.

The paths through Build + Brawl consist of three parts:
  1. The weapon upgrades the player chooses. There are over 200,000 possible weapon upgrade paths, which I reduced to around 10,000 paths by filtering out the rarer, more powerful upgrades.
  2. The structures the player acquires, and how the player places them on the map. This was challenging to define because there are a nearly-infinite arrangement of structures, and structures are highly synergistic. For instance, walls can be very effective at routing enemies towards damage-dealing turrets, but are basically useless if placed haphazardly. I ended up assuming that players always used structures efficiently, routing enemies to turrets through mazes of walls. This was certainly not true, but I expected players to get pretty efficient after a few play sessions.
  3. The player's skill in aiming at enemies while avoiding damage. From playtests, I found most players quickly approached a similar level of skill in this, so I decided not to complicate balancing by accounting for different skill levels.
Ultimately, each path mapped to a different common-upgrade-only path, and assumed that the player used half of their levels to upgrade their weapon and half to acquire structures that would be efficiently placed.

Outcome

The goal of Build + Brawl is to defeat the enemies in all twenty stages without running out of health, so the outcome metric I decided to use was the amount of player health lost before killing all enemies. But was it...

Measurable?

Yes; I can easily see how much health a player lost.

Continuous?

Yes; a player can lose more less health, and this will be somewhat smooth with respect to, say, the damage output by enemies.

Singular?

Yes; to beat the game, all enemies have to be killed before all of the player's health is lost. There weren't secondary metrics like a score or timer I was interested in balancing.

Chunkable?

Yes; I could chunk health loss per stage by looking at the player's potential upgrade paths up to that stage and enemies that appear at that stage.

Estimable?

Yes-ish. The broad idea to estimate player health lost was:
  1. Estimate the damage per second (DPS) the player would deal using each path on each stage.
  2. Determine how much damage the player would take on each stage, given the DPS they can deal.
The journey to a distribution of player damage taken.

For part (1), I downloaded the CSV containing the attributes of enemies and the number of upgrades available to the player at each of those stages. I passed this, along with the upgrade ScriptableObjects, into a Unity editor function that determines the possible upgrade paths at each stage, estimates the DPS a player with that weapon would achieve on that stage, and outputs a histogram of DPS values across paths for each stage.

For part (2), I pulled the histogram back into the spreadsheet. For each "bucket" in the histogram, I calculated a metric called "time to clear" for each enemy type. The "time to clear" for a given DPS value is the time it would take to defeat all enemies of that type when the player has that DPS. It assumes all enemies spawn at once, then the player kills enemies of one type first, then the next type, and so on. The damage the player takes from enemies is then a function of time to clear: if the time to clear is longer than the time it would take for an enemy to arrive at the player, the player would take damage proportional to the difference, the quantity of the enemies, and the damage each deals per second.

Target distribution

I used two types of charts to assess the outcome distribution:

Histograms for the distribution of player health lost on different stages.

The first was a histogram sparkline for the distribution of player health lost in each stage, across paths. This gave a view of the difference in effectiveness between different paths. 

The median and max trendlines for player health lost each stage.

The second was a line chart of player health lost over stages, for both the median DPS and the best DPS. This roughly illustrated the difficulty curve so that I could make sure it had the positive trendline and per-stage variability I was looking for.

I'm still not sure exactly what distributions I wanted these charts to show. I settled on a normal-ish distribution for path outcomes; a spiky, climbing difficulty curve for the median path across stages; and a fairly flat difficulty curve for the best path across stages.

More than checking whether the distributions matched an exact goal, I found these helpful in understanding potential problem areas. For example, stage 17 is clearly skewed towards a few particularly strong strategies, which could be problematic. However, the expected damage taken for that stage is 0, so it's not too worrisome. I can combine that with the qualitative knowledge I have about that stage -- namely, that it's a single high-health enemy rather than many small enemies, so there are a few upgrade paths that will be especially powerful -- and decide whether it needs any tweaks.

What I learned

  • The two hardest parts of balancing were (1) building a useful outcome estimator and (2) determining whether the outcome distribution delivered the experience I wanted.
  • Regardless of the framework you apply, balancing is still largely a game of intuition. You need intuition to pick reasonable initial values; to choose an accurate but not overly-complex outcome estimation method; to determine whether an outcome distribution will be fun; and so on.
  • It was a mistake to not have a more formal estimation of the DPS of different structure selections and placements. Players expressed a wide range of different placement strategies, and many that were cool but not optimal were effectively infeasible. Meanwhile, certain structure upgrades were wildly overpowered in certain configurations and trivialized much of the game.
  • I should have solidified the game's mechanics and core gameplay loop before going too deep into balancing. I initially balanced around a set of mechanics that ended up changing dramatically before the final release, causing me to throw out a lot of the balance work.
  • I should have had a "test bench" of diverse stages for making sure the outcome estimator aligned well with the actual gameplay outcome.
  • I wished I had more insight into what kinds of paths fell on which side of the outcome distribution. A game where certain the feasibility of a path is essentially random is less fun than one where strategic paths succeed and ill-conceived ones fail, but I had no real way to compare feasible vs infeasible paths at scale.
Above all else, balancing is hard! I think it will always be daunting, but my goal is to tackle it more formally each time.

Comments

Popular Posts