Luck
In order to add some excitement and variety, many game developers like to add dice rolls to their games to introduce an element of randomness. Dice rolls, the argument goes, add an element of chance that keeps the game from becoming strictly deterministic, forcing players to adapt to good and bad fortune. While some dice rolls are objectively better than others, potentially causing one player to gain the upper hand over another through luck alone, developers claim that things will "average out" in the long run, with a given player eventually experiencing just as much good luck as bad luck.Most outcomes are near the average, with equal amounts of "good luck" (area above green line) and "bad luck" (area below red line). |
Luck should average out
With the effect of luck averaging out, the player with the better strategy (e.g., the player who obtained better modifiers on their rolls) should still be able to reliably perform better. However, players and developers alike do not often realize just how many rolls are necessary before the effect of strategy can be reliably detected as something above and beyond the effect of luck.Forums are full of players describing which build has the better average, which is often plain to see with some math. For many players, this is all they need to concern themselves with: they have done the math and determined which build is most effective. The question for the designer, however, is whether the players can expect to see a difference within a single game or session. As it turns out, many of these modifiers are so small compared to the massive variance of a pass-fail check that it takes surprisingly long for luck to "average out".
An example: Goofus and Gallant
For the following example, I'll use Dungeons & Dragons, since that's one most gamers are likely familiar with. D&D uses a 20-sided die (1d20) to check for success or failure, and by adjusting the necessary roll, probability of success ranges from 0% to 100% by intervals of 5%. (In future posts I hope to examine other systems of checks, like those used in 2d6 or 3d6-based RPGs or wargames.)Consider two similar level-1 characters, Goofus and Gallant. Gallant, being a smart player, has chosen the Weapon Expertise feat, giving him +1 to-hit. Goofus copied all of Gallant's choices but instead chose the Coordinated Explosion feat because he's some kind of dingus. The result is we have two identical characters, one with a to-hit modifier that is +1 better than the other. So, we expect that, in an average session, Gallant should hit 5% more often than Goofus. But how many rolls do we need before we reliably see Gallant outperforming Goofus?
For now, let's assume a base accuracy of 50%. So, Goofus hits if he rolls an 11 or better on a 20-sided die (50% accuracy), and Gallant hits on a roll of 10 or better(55% accuracy). We'll return to this assumption later and see how it influences our results.
I used the statistical software package R to simulate the expected outcomes for sessions involving 1 to 500 rolls. For each number of rolls, I simulated 10,000 different D&D sessions. Using R for this stuff is easy and fun! Doing this lets us examine the proportion of sessions in which Gallant outperforms Goofus and vice-versa. So, how many trials are needed for Gallant to outperform Goofus?
Goofus hits on 11, Gallant hits on 10 thanks to his +1 bonus. |
One intuitive guess would be that you need 20 rolls, since that 5% bonus is 1 in 20. It turns out, however, that even at 20 trials, Gallant only has a 56% probability of outperforming Goofus.
In order to see Gallant reliably (75%) outperform Goofus requires more than a hundred rolls. Even then, Goofus will still surpass him about 20% of the time. It's difficult to see the modifier make a reliable difference compared to the wild swings of fortune caused by a 50% success rate.
Reducing luck through a more reliable base rate
It turns out these probabilities depend a lot on the base probability of success. When the base probability is close to 50%, combat is "swingy" -- the number of successes may be centered at 50% times the number of trials, but it's also very probable that the number of successes may be rather more or rather less than the expected value. We call this range around the expected value variance. When the base probability is closer to 0% or 100%, the variance shrinks, and the number of successes tends to hang closer to the expected value.This time, let's assume a base accuracy of 85%. Now, Goofus hits on 4 or better (85%), and Gallant hits on 3 or better (90%). How many trials are now necessary to see Gallant reliably outperform Goofus?
This time, things are more stable. For very small numbers of rolls, they're more likely to tie than before. More importantly, the probability of Gallant outperforming Goofus increases more rapidly than before, because successes are less variable at this probability.
Comparing these two graphs against each other, we see the advantages of a higher base rate. For sessions involving fewer than 10 rolls, it is rather less likely that Goofus will outperform Gallant -- they'll tie, if anything. For sessions involving more than 10 rolls, the difference between Goofus and Gallant also becomes more reliable when the base rate is high. Keep in mind that we haven't increased the size of the difference between Goofus and Gallant, which is still just a +1 bonus. Instead, by making a more reliable base rate, we've reduced the influence of luck somewhat. In either case, however, keep in mind that it takes at least 10 rolls before we see Gallant outperform Goofus in just half of sessions. If you're running a competitive strategy game, you'd probably want to see a more pronounced difference than that!
In conclusion
To sum it all up, the issue is that players and developers expect luck to "average out", but they may not realize how many rolls are needed for this to happen. It's one thing to do the math and determine which build has the better expected value; it's another to actually observe that benefit in the typical session. It's my opinion that developers should seek to make these bonuses as reliable and noticeable as possible, but your mileage may vary. This may be more important for certain games & groups than others, after all.
My advice is to center your probabilities of success closer to 100% than to 50%. When the base probability is high, combat is less variable, and it doesn't take as long for luck to average out. Thus, bonuses are more reliably noticed in the course of play, making players observe and enjoy their strategic decisions more.
Less variable checks also have the advantage of allowing players to make more involved plans, since individual actions are less likely to fail. However, when an action does fail, it is more surprising and dramatic than it would otherwise have been when failure is common. Finally, reduced variability allows the party to feel agentic and decisive, rather than being buffeted about by the whims of outrageous fortune.
Another option is to reduce the variance by dividing the result into more fine-grained categories than "success" and "failure" such as "partial success". Some tabletop systems already do this, and even D&D will try to reduce the magnitude of difference between success and failure by letting a powerful ability do half-damage on a miss, again making combat less variable. Upcoming Obsidian Software RPG Pillars of Eternity plans to replace most "misses" with "grazing attacks" that do half-damage instead of no damage, again reducing the role of chance -- a design decision we'll examine in greater detail in next week's post.