A psychologist's thoughts on how and why we play games

Sunday, October 23, 2016

Outrageous Fortune: 2. The variability of rare drops

Growing up, I played a lot of role-playing games for the Super Nintendo. One trope of late-game design for role-playing games are rare drops -- highly desirable items that have a low probability of appearing after a battle. These items are generally included as a way to let players kill an awful lot of time as they roll the dice again and again trying to get the desired item.

For example, in Final Fantasy 4, there is an item called the "pink tail" that grants you the best armor in the game. It has a 1/64 chance of dropping when you fight a particular monster. In Earthbound, the "Sword of Kings" has a 1/128 chance of dropping when you fight a particular monster. In Pokemon, there are "shiny" versions of normal Pokemon that have a very small chance of appearing (in the newest games, the chance is something like 1/4096).

Watching people try to get these items reveals an interesting misunderstanding about how probability works. Intuitively, it makes sense that if the item has a 1/64 chance of dropping, then by the time you've fought the monster 64 times, you should have a pretty good chance of having the item.

Although it's true that the average number of required pulls is 64, there's still a substantial role of chance. This is a recurring theme in gaming and probability -- yes, we know what the average experience is, but the amount of variability around that can be quite large. (See also my old post on how often a +5% chance to hit actually converts into more hits.)

It turns out that if your desired item has a drop rate of 1/64, then after 64 pulls, there's only a 63.6% chance that you have the item.

To understand the variability around the drop chance, we have to use the negative binomial distribution. The binomial distribution takes in a number of attempts and a probability of success to tell us how many successes to expect. The negative binomial inverts this: it takes in the number of desired successes and the probability of success to tell us how many attempts we'll need to make.

In R, we can model this with the function pnbinom(), which gives us the cumulative density function. This tells us what proportion of players will get a success by X number of pulls.
Let's look at some examples.

Earthbound: The Sword of Kings

The Sword of Kings has a drop rate of 1/128 when the player fights a Starman Super. In a normal game, a player probably fights five or ten Starman Supers. How many players can be expected to find a Sword of Kings in the course of normal play? Some players will decide they want a Sword of Kings and will hang out in the dungeon fighting Starman Supers ("Starmen Super"?) until they find one. How long will most players have to grind to get this item?

By the time the player finds a Sword of Kings, the party is probably so overleveled from killing dozens of Starman Supers that they don't really need the Sword anyway. (From http://starmendotnet.tumblr.com/post/96897409329/sijbrenschenkels-finally-found-sword-of)

We use pnbinom() to get the cumulative probabilities. We'll use the dplyr package too because I like piping and being able to use filter() later for a nice table.

eb <- data.frame(x = 1:800) %>% 
  mutate(p = pnbinom(x, size = 1, prob = 1/128))

filter(eb, x %in% c(1, 10, 50, 100, 128, 200, 256, 400))

with(eb, plot(x, p, type = 'l',
              xlab = "Starman Supers defeated",
              ylab = "Probability of at least one drop",

              main = "Grinding for a Sword of Kings"))

A lucky 8% of all players will get the item in their first ten fights, a number that might be found in the course of normal play. 21% of players still won't have gotten one after two hundred combats, and 4% of players won't have gotten the Sword of Kings even after fighting four hundred Starman Supers!

Final Fantasy IV: The Pink Tail

There's a monster in the last dungeon of Final Fantasy IV. When you encounter monsters in a particular room, there is a 1/64 chance that you will find the "Pink Puff" monster. Every Pink Puff you kill has a 1/64 chance of dropping a pink tail.

ff4 <- data.frame(x = 1:400) %>%
  mutate(p = pnbinom(x, size = 1, prob = 1/64))

filter(ff4, x %in% c(1, 5, 10, 50, 64, 100, 200))

Just to find the Pink Puff monster is a major endeavor. A lucky 3% of players will find a Pink Puff on their first combat, and about 16% of players will run into one in the course of normal play (10 combats). But 20% of players won't have found one even after a hundred combats, and 4% of players won't have found a Pink Puff even after two hundred combats.

Finding a pack of Pink Puffs is a 1/64 chance, and that's just the start of it.

After you find and kill the Pink Puff, it still has to drop the pink tail, which is a 1/64 chance per Pink Puff. So 20% of players won't find a pink tail even after killing a hundred Pink Puffs. Consider next that one finds, on average, one group of Pink Puffs per 64 combats, and Pink Puffs come in groups of five. You could run through more than a thousand fights in order to find 100 Pink Puffs and still not get a pink tail. Ridiculous!

Here's a guy on the IGN forums saying "I've been trying for a week and I haven't gotten one."

Shiny Pokemon

A "shiny" pokemon is a rarer version of any other pokemon. In the newest pokemon game, any wild pokemon has a 1/4096 chance of being shiny.

This is so rare that we'll put things into log scale so that we're calling pnbinom() 100 times rather than 22000 times.

pkmn <- data.frame(x = seq(.1, 10, by = .1)) %>% 
  mutate(p = pnbinom(exp(x), size = 1, prob = 1/4096))
with(pkmn, plot(exp(x), p, typ = 'l'))

filter(pkmn, exp(x) <= 500) %>% tail(1)
filter(pkmn, exp(x) <= 2000) %>% tail(1)
filter(pkmn, exp(x) <= 10000) %>% tail(1)

11% of players will find one shiny pokemon within 500 encounters. About 39% will find one within 2000 encounters. 9% of players will grind through ten thousand encounters and still not find one.

There's a video on youtube of a kid playing three or four gameboys at once until he finds a shiny pokemon after about 26000 encounters. (I think this was in one of the earlier pokemon games where the encounter rate was about 1/8000.) There seems to be a whole genre of YouTube streamers showing off their shiny pokemon that they had gone through thousands of encounters looking for.

World of Warcraft

I don't know anything about World of Warcraft, but googling for its idea of a rare drop turns up a sword with a 1 in 1,500 chance of dropping as a quest reward.

wow <- data.frame(x = 1:5e3) %>% 
  mutate(p = pnbinom(x, size = 1, prob = 1/1500))
with(wow, plot(x, p, type = 'l'))

filter(wow, round(p, 3) == .5)
filter(wow, round(p, 3) == .8)
filter(wow, x == 2500)

Suppose your dedicated fanbase decides to try grinding for this item. Players can do 1000 quests and only half of them will get this sword. Among players running 2500 quests, 19% of players still won't have gotten one.

My Thoughts

In general, I feel like rare drops aren't worth grinding for. There's a lot of chance involved in the negative binomial function, and you could get very lucky or very unlucky.

Sometimes single-player RPGs seem to include them as a somewhat cynical way to keep kids busy when they have too much time to kill. In this case, players seem to know they're going to have to grind a long time, but they may not realize just how long they could grind and still not get it. The draw for these items seems to be more about the spectacle of the rarity than about the actual utility of the item.

In massively multiplayer games, it seems like drops are made so rare that players aren't really expected to grind for them. An item with a 1/1500 drop chance isn't something any one player can hope to get, even if they are deliberately trying to farm the item. Thus, rare items in MMOs are more like Willy Wonka Golden Tickets that a few lucky players get, rather than something that one determined player works to get. One player could try a thousand times and still not get the item, but across a hundred thousand players trying once, a few will get it, and that's enough to keep the in-game economy interesting.

My preference is that chance is something that encourages a shift in strategy rather than something that strictly makes your character better or worse. Maybe the player is guaranteed a nice item, but that nice item can be one of three things, each encouraging a different playstyle.

Still, it's fun sometimes to get something unexpected and get a boost from it. Imagine being one of the ~8% of players who gets a Sword of Kings by chance. Another approach is to provide a later way to ensure getting the desired item. In the roguelike Dungeon Crawl, some desirable items can be found early by luck, but they are also guaranteed to appear later in the game by skill.

Anyway, don't bet on getting that rare drop -- you may find yourself grinding much longer than you'd thought.

Tuesday, October 18, 2016

Publishing the Null Shows Sensitivity to Data

Some months ago, a paper argued for the validity of an unusual measurement of aggression. According to this paper, the number of pins a participant sticks into a paper voodoo doll representing their child seems to be a valid proxy for aggressive parenting.

Normally, I might be suspicious of such a paper because the measurement sounds kind of farfetched. Some of my friends in aggression research scoffed at the research, calling bullshit. But I felt I could trust the research.

Why? The first author has published null results before.

I cannot stress enough how much an author's published null results encourages my trust of a published significant result. With some authors, the moment you read the methods section, you know what the results section will say. When every paper supports the lab's theory, one is left wondering whether there are null results hiding in the wings. One starts to worry that the tested hypotheses are never in danger of falsification.

"Attached are ten stickers you can use to harm the child.
You can stick these onto the child to get out your bad feelings.
You could think of this like sticking pins into a Voodoo doll."

In the case of the voodoo doll paper, the first author is Randy McCarthy. Years ago, I became aware of Dr. McCarthy when he carefully tried to replicate the finding that heat-related word primes influence hostile perceptions (DeWall & Bushman, 2009) and reported null results (McCarthy, 2014).

The voodoo doll paper from McCarthy and colleagues is also a replication attempt of sorts. The measure was first presented by DeWall et al. (2013); McCarthy et al. perform conceptual replications testing the measure's validity. On the whole, the replication and extension is quite enthusiastic about the measure. And that means all the more to me given my hunch that McCarthy started this project by saying "I'm not sure I trust this voodoo doll task..."

Similar commendable frankness can be seen in work from Michael McCullough's lab. In 2012, McCullough et al. reported that religious thoughts influence male's stereotypically-male behavior. Iin 2014, one of McCullough's grad students published that she couldn't replicate the 2012 result (Hone & McCullough, 2014).

I see it as something like a Receiver Operating Characteristic curve. If the classifier has only ever given positive responses, that's probably not a very useful classifier -- you can't tell if there's any specificity to the classifier. A classifier that gives a mixture of positive and negative responses is much more likely to be useful.

A researcher that publishes a null now and again is a researcher I trust to call the results as they are.

[Conflict of interest statement: In the spirit of full disclosure, Randy McCarthy once gave me a small Amazon gift card for delivering a lecture to the Bayesian Interest Group at Northern Illinois University.]