Orbus Gameworks

articles

Aggregation, Part Two

Jul 12, 12:08 PM

So, last time I wrote about aggregation, I told you a little bit about our general aggregation philosophy. Now I’m going to address the age-old question: what the hell do you aggregate, and when?

Less Is More

While the first temptation when collecting gameplay data is to hoard everything, we’ve already shown that at some point, sooner rather than later, you’re going to run out of space. Which is why we’re aggregating in the first place.

But the important thing to note is that while aggregation is compression, it’s lossy compression. That is to say, you’re not putting all your data into a zip file and then uncompressing it later. You’re actively but selectively throwing out data.

The selection process is very important. Let’s illustrate this with an example. Suppose we have a cartoon-style brawl, where our players are attempting to knock each other out with weapons of extreme comedic value. The following table contains the records of an event that’s fired off every time someone hits somebody else with a weapon. It records the time, who did the hitting, who the target was, what weapon was used, whether the weapon’s secondary mode was being used, and whether the hit resulted in a knockout.

Timestamp Game Player Target Weapon Secondary? Knockout?
7/12/2006 10:28:00 1 Tyrone Osbie Banjo 0 0
7/12/2006 10:39:31 1 Katje Roger Banjo 0 1
7/12/2006 10:49:27 1 Tyrone Katje Harmonica 0 1
7/12/2006 10:58:48 1 Roger Jessica Harmonica 0 0
7/12/2006 11:11:29 1 Roger Katje Banjo 1 1
7/12/2006 11:11:55 1 Jessica Tyrone Harmonica 0 1
7/12/2006 11:16:14 2 Katje Tyrone Pie 0 0
7/12/2006 11:17:49 2 Pirate Tyrone Banana 0 0
7/12/2006 11:25:27 2 Tyrone Osbie Banjo 1 0
7/12/2006 11:26:36 2 Pirate Pirate Harmonica 0 0
7/12/2006 11:37:33 2 Roger Katje Banjo 0 1
7/12/2006 11:48:29 2 Jessica Pirate Banana 0 1
7/12/2006 11:55:41 2 Katje Roger Toilet Bowl 1 1
7/12/2006 12:07:04 2 Osbie Roger Pie 1 0

When discussing aggregation of this data set, there are a few steps you need to go through.

First, figure out who the customers of this data set are. Obviously the game designers care about the data because it’ll help them with balance. Let’s also say you have a leaderboard system, and your community managers care about the data as well. So let’s say those are the two customers.

Next you have to sit down with each type of customer and go over when they need the data, when it becomes deprecated, and what data they would like to stick around for a long time.

You sit down with your lead designer. She says, “We really like this hit data. Probably the most important thing to us is how often each weapon is used, how often the secondary mode is used, and the effectiveness of each weapon.”

Your community manager says, “We care about knowing the effectiveness of each player, how they did in each game, who their favorite targets are, and what their favorite weapon is.”

Hmm. So your designer wants depersonalized weapon data, and your community manager wants personal information. Keep in mind, though, that we’re trying to figure out rules for how to compress old data. So you need to follow up with the question: “When is data old enough that you don’t care about it anymore? When is data old enough that you would settle for a summary rather than exact data?”

Your lead designer says that she doesn’t need any stats older than six months. And the community manager says that they need all data for all time, so there’s a continuous leaderboard record.

At this point, you have a solution. You don’t have to pay attention to the weapon effectiveness when you’re aggregating. You just need player effectiveness, game performance, favored targets, and favored weapons. So you might end up with something like this:

Game Player FavTarget FavWeapon Knockouts Receieved Knockouts Inflicted
1 Tyrone Osbie Banjo 1 1
1 Katje Roger Banjo 2 1
1 Roger Jessica Harmonica 1 1
1 Jessica Tyrone Harmonica 1 0
2 Tyrone Osbie Banjo 0 0
2 Katje Tyrone Pie 1 1
2 Roger Katje Banjo 1 1
2 Jessica Pirate Banana 0 1
2 Osbie Roger Pie 0 0
2 Pirate Tyrone Banana 1 0

Granted, this doesn’t look like it’s much compression, but that’s because I didn’t include a lot of data points in the original table. If there were thousands of events per game, you would still end up with an aggregate table for that game where the number of rows was equivalent to the number of players (essentially, that’s what our GROUP BY statement in SQL would contain).

The important thing to remember when you’re aggregating metrics is that you have to talk to the customers of the metrics and determine the best compromise to fit all their needs.

— Darius Kazemi

Comment

Aggregation, Part One

Jul 9, 04:24 PM

One of the trickiest problems that you can run into when building a metrics system is how you’re going to handle aggregation: that is, the process of taking data that meets certain criteria, and compressing it.

In general, most large-scale metrics systems need aggregation. On a big MMO, metrics databases will grow by hundreds of megabytes a day. Let’s imagine a database that is growing by 200 MB/day. And let’s say we don’t want the database to get much bigger than 50 GB, due to limitations of our hardware. This means we only have about 8 months of data recording we can do before we hit our limit. At this point we have two options: we can either prune the oldest data by deleting it, or we can aggregate the oldest data.

At Orbus, we like approach aggregation by defining a cutoff point where data is considered old, and then setting up rules to deal with that data. For example, let’s say we have a simple event called EXP Earned. This event is stored in a table that looks like

Date Player Level EXP
2007-01-01 Bob 3 50
2007-01-12 Alice 6 112
2007-01-21 Bob 3 82
2007-01-06 Bob 4 99
2007-01-24 Charlie 6 111
2007-02-05 Charlie 6 120

So we set up a rule that says, “On any EXP Earned events that are more than 6 months old, take the events and divide them into one-month chunks. Throw out the individual players: I just want to know the average EXP earned, by level, along with the aggregated sample size.”

The result is an aggregate table that looks like

Date Level AvgEXP Size
2007-01 3 66.0 2
2007-01 4 99.0 1
2007-01 6 111.5 2
2007-02 6 120 1

The important thing is to design your aggregation so that while you’re essentially throwing away a ton of data, you’re keeping around the good stuff forever. Later this week, I’ll write about how you can figure out what exactly you’re going to aggregate.

— Darius Kazemi

Comment

recently

Blog

Links