The DDR Elo Rating System

 

What is Elo?

An Elo rating system is a method used in zero-sum games (a game that has two sides: a winner and a loser) to calculate relative skill levels between players. Elo ranking is still used today in many forms. Elo exchanges work by first determining a player's rating and looking at the different from the opponent's rating. For instance, if a player with a ranking of 200 plays someone else with a ranking of 1000, that means there is a high difference in ranking and score exchange is skewed to account for it. If the 200 player loses, then the 1000 player will only take a small number of points. If the 200 player pulls of an upset win, they will take a larger number of points from the 1000 player. This prevents players from “farming” weaker players as the amount of points earned per match diminishes and can eventually round down to nothing.

Chess, the game that it was initially made for, uses it for its FIDE rating. 

Expected score

A players expected score is the probability of winning plus half the probability of drawing, assuming that a draw is considered half a win and half a loss. Expected score can be calculated using the following formula, where Ra and Rb are the ratings for player A and player B:

The expected score varies based on the difference between player ratings. Let’s look at a player ranked 1200 and another player ranked 1300. Plugging in 1200 for Ra and 1300 for Rb, we can see that the expected score, or probability of a win, for player A is roughly .36 while the expected score for player B is .64. The expected score between the two players should add up to 1. 

This formula is mostly arbitrary. Some Elo models use a normal distribution for its skill curve while many others have switched to logistic distribution as it better represents lower rated players. In our case, we would be using a base-10 logistic distribution for the expected score of players. The 400 is also arbitrary and can be adjusted to better fit the player base. With this formula, we are basically looking for the probability of a win for each 400 points that ratings differ, scaled according to our logistic distribution. 

Score exchange

When a match is complete, the end result can be 3 things: a win, a loss, or a draw. A win would be 1, a loss would be 0, and a draw would be .5. Using the previous example of Player A having a rank of 1200 and Player B having a rank of 1300, we can then calculate how many points are exchanged between the players using the following formula, where R'a is player A's new rating, Ra is a player's initial rating, Sa is the actual score of the match for player A, and Ea is the expected score of player A:

In the event that Player A wins, we can plug in the numbers as so: 1200 + K(1-.36). Assuming a K value of 32, which will be explained later, Player A’s new rating would be 1220 and Player B’s new rating would be 1280

In the event that Player B wins, we see 1200 + K(0-.36). Assuming K=32, Player A’s new rating would be 1188 and Player B’s new rating would be 1312

K-factor

The K-factor is an arbitrary amount determined by the administrators of the system that is the maximum possible adjustment per game. In the previous example, you can see that the amount of points being exchanged is the K-factor times the difference of actual outcome and expected outcome. Using the above example, this means that the number of points exchanged is between 0 and 32. This K-factor must be handled delicately as a number too large will cause major sways in player ratings, while a number to small would cause the system to not respond to changes in people’s abilities as effectively.

In some systems, K-factors are broken up depending on what a player's rating is. This makes it so higher rated players do not cause a rating inflation. This also makes it so that lower rated players have the ability to move up the ladder quicker and then have that scale be tapered off as they get higher. If players play someone of a different K-factor ranking, both players would calculate their ratings differences individually and the higher rated, lower K-factor player would essentially be forfeiting some of their points to the system compared to if they had the same K-factor as the lower rated player. This encourages higher rated players to continue to play other higher rated players. For example:

K-factor Rating
32 <2100
24 2100-2400
16 >2400

For Project Mars War, a flat K-factor of 64 has been implemented as the number of tournament plays is still relatively low.

Player rating is not the only thing that can be implemented for these differences. Other things can be included, including whether it is a tournament match, how old the player is, how long the player has been playing ranked matches, etc. Those are not implemented here and can be added later. 

Life 4

Life 4 rankings have also been implemented into the system. These rankings are used primarily for determining a player's initial rating. Player ratings are determined based on the following table:

Life 4 Rank Initial Rating
Copper 200
Bronze 400
Silver 600
Gold 800
Platinum 1000
Diamond 1200
Cobalt 1400
Pearl 1600
Amethyst 1800
Emerald 2000
Onyx 2200

Ideally, this wouldn't be necessary as the system as it is somewhat self-correcting, but for the number of tournament matches there are, it is a decent way to kickstart the data. Currently, ratings do not get adjusted as Life 4 ranks change, but may be implemented in the future.