Orbus Gameworks

articles

Metrics and Dynamic Difficulty in Ritual's SiN Episodes (Part 1)

Aug 26, 01:07 PM

This will be at least a two-part series, as there’s a lot of material I have to cover, and I’m a little busy getting ready for PAX this weekend.

I had an email discussion the other day with Ken Harward, formerly Lead Programmer and then Studio Director at what was Ritual Entertainment, in Dallas, TX. The discussion began because I saw a comment in a post on Corvus Elrod’s blog talking about how SiN Episodes had extensive metrics tracking for dynamic difficulty adjustment. I said, “Hey, Ken was a lead on that game, I’ll ask him.” Yes, it is nice to know lots of awesome people in the game industry.

Ken’s response was lengthy, and in reading it, it seemed obvious that the metrics work that they did on SiN Episodes was very important, yet unsung. Although they gave a talk on it at GDC, you don’t hear them mentioned in the same breath as Bungie or Valve when it comes to FPS metrics. This probably has to do with SiN Episodes being a pretty low-selling obscure title, even though it was one of the first original third-party games to be released on Steam.

So I wanted to dedicate some space here on the blog to getting the story of the amazing work they did on the Internet, for all to see. Most of this is me reposting what Ken already wrote in an email. So many thanks to him.

Ken worked on the dynamic difficulty (hereby abbreviated DD) system for SiN Episodes (hereby abbreviated SiN) with the help of Aaron Cole. Ken says, “I honestly believe we wrote for Sin Episodes the most sophisticated dynamic measuring system ever, at least for a shipping game.”

The system consisted of four parts: Statistics, Advisors, the DecisionMaker, and Gameplay Variables.

Statistics

The raw metrics collection was pretty simple. The engine for the game was event-driven to begin with, so they created a system where data files would contain the names of messages to look for, along with aggregation instructions. For example, there might be “a ShotgunNPCDamageTakenPerSecond statistic in a data file. The data file would say, hey, every time there is a ShotgunNPC damage event, accumulate it, and make it a ratio against time elapsed.”

The game ended up collecting hundreds of statistics this way, stored locally on the player’s computer.

Advisors

In the DD system, Advisors were agents that monitored statistics. Advisors would be assigned statistics and then given a target range to shoot for. “A PlayerHealth Advisor had some goal range that the player’s health should be in, for example. Each Advisor monitors many statistics (not just one) and as the player’s health gets out of range (for example, they’re not being damaged enough) then the Advisor makes recommendations on how to fix this.” The target ranges were actually chosen by players at the start of the game! There were two sliders: how much challenge the player wants, and how much help the player wants (in testing, some players were positively allergic to the idea of the game actually helping them, so they included this feature).

Again, the Advisor monitored many statistics to make recommendations about its key statistic. For example, if the player’s health was too high, it did not simply advise bullets to do more damage. “That would be lame,” says Harward, “and that’s what usually dynamic difficulty systems do. Instead, the Advisor had many recommendations for the situation. He might recommend stronger AI to come out, more AI to come out, the AI to hide better, the AI to throw more grenades, etc.”

There were lots of Advisors monitoring lots of statistics, with the goal of making recommendations to keep the statistics in line with their target areas.

Harward says of the Advisors that “most dynamic difficulty systems that I’ve researched were really built to make the game harder. Ours was built specifically so that my mother could finish the game. I promised myself that I would build a game that my mother could finish. She’s never played a first-person shooter in her life. But the game would figure that out, and would continually scale things down. Eventually it would get to the point, if she were bad enough, that the enemies would do 0 damage. But, as soon as she started to live a bit too long, then the enemies would start to do a little damage.”

DecisionMaker

The DecisionMaker was a singular entity that would poll the Advisors, normally every 2 minutes, about their mood. They’d report how they felt, and give recommendations. Each Advisor had many different recommendations, based on how they perceived the game experience. These recommendations could conflict (one Advisor wanting the enemies to hide more, another Advisor wanting them to charge in.) The DecisionMaker would pick two recommendations, out of all the possible recommendations, weighted by the Advisor’s mood and each recommendation’s success rate. The two recommendations would cause two Gameplay Variables to be adjusted, “and the player would react/respond, and the stats might change, and the Advisors might adjust, and presto you have a complete feedback loop. Because there was such a variety of recommendations, even as developers we didn’t know exactly what would happen if we started to play better.”

Gameplay Variables

The Gameplay Variables stored the current settings representing all the different ways in which the game could change. Throughout the code, there were references to these many gameplay variables. So as these variables adjusted, up and down, the game was changing likewise. At any given moment, your game would have a unique value for these many variables, and that would represent you. In other words, it is highly unlikely that any two players ever had the exact same value for all the variables at any given point in time. The variables were like a unique DNA that changed, 2 at a time, every 2 minutes.

Measuring Success

Many DD systems just push around numbers until they meet certain targets, but often the numbers can be right while the desired overall effect is wrong. “It was important to me to be able to say ‘is this working.’ There’s no point in having the AI throwing grenades if it doesn’t accomplish what you want.” Their system would remember when an Advisor’s recommendation was implemented. “If the Advisor became happier after that recommendation was picked, it was assumed that the recommendation may have helped,” and the recommendation would be weighted a bit more heavily towards being picked again. According to Harward, over time, the system would converge on a point where it really did know what the helpful recommendations were.

According to Harward, “enough simple things together generated fairly robust results.”

For the next installment, we’ll be looking into some of the metrics collection that Ritual did, along with some of Ken’s conclusions from that. We’ll also look a bit at the players’ reception of the dynamic difficulty system.

— Darius Kazemi

Comment

  1. Being the individual who originally brought up SiN: Episodes on Corvus’ blog this has been a fascinating read.

    Thanks.

    Justin Keverne · Aug 26, 06:07 PM · #

  2. This sounds a lot like my original concept for my MQP (never implemented, sadly) for a fuzzy logic based game management system – the Decision Maker in my engine would have been called the “Fun Meister” and would use fuzzy linguistic variables (Advisors in this system) to measure how much “fun” a player was having.

    — Mike Caprio · Aug 26, 08:43 PM · #

  3. Neat post. I never picked up this game, but I’m more tempted to now more than ever… I wonder how this system compares to Valve’s much touted system for Left 4 Dead? Although, I guess fair comparisons can’t be made until after Left 4 Dead is finished.

    Max Battcher · Aug 27, 04:16 AM · #

Commenting is closed for this article.

recently

Blog

Links