Ranked Elo Overhaul

Leave your suggestions about the game here!

Re: Ranked Elo Overhaul

Postby SERVAT » Fri Apr 13, 2018 5:11 pm

I myself don’t think ELO is the best way to measure skill in party games. Have you considered implementing other ranking systems, such as trueskill? You could look it up and think it’s bad or hard to implement, but at least give it a shot.
SERVAT
Lookout
Lookout
 
Posts: 96
Joined: Fri Apr 29, 2016 5:17 am

Re: Ranked Elo Overhaul

Postby Flake » Fri Apr 13, 2018 5:31 pm

SERVAT wrote:I myself don’t think ELO is the best way to measure skill in party games. Have you considered implementing other ranking systems, such as trueskill? You could look it up and think it’s bad or hard to implement, but at least give it a shot.

I'll have a look at this and I'll see if I can come up with something; thanks for the suggestion :)
User avatar
Flake
Benefactor
Benefactor
 
Posts: 987
Joined: Tue Oct 28, 2014 8:34 am
Location: England, UK

Re: Ranked Elo Overhaul

Postby wozearly » Fri Apr 13, 2018 8:48 pm

SERVAT wrote:I myself don’t think ELO is the best way to measure skill in party games. Have you considered implementing other ranking systems, such as trueskill? You could look it up and think it’s bad or hard to implement, but at least give it a shot.


Trueskill was designed by Microsoft for a specific purpose, and for games with binary win/loss outcomes (e.g. only one team wins, but one team always wins), so it would need serious surgery to work for ToS as a) the number of teams in play can vary and b) it's possible for more than one nominally opposing team to secure a win (e.g. NE + Town).

There are other major complications with transferring its model to ToS:

1) It assumes that each "team" fundamentally has an equal chance of winning from the start, before player skill is taken into account. This is not a valid proposition for ToS (ie, the goal of balancing is not to see 33% winrates for Town, Mafia and NK respectively).

2) Trueskill's equivalent of placement games require the player to be pitched against players who known to be average to a high degree of the system's confidence. That doesn't really play nicely with ToS' decision to partially reset ELO each season.

3) Trueskill's matchmaking system is built around variable queueing times, as it heavily prioritises trying to get players of similar rank into the same game over ensuring a game begins in a certain timeframe. ToS does the reverse, ensuring games begin within (pretty much) a maximum 3 minute period provided there are enough players to launch the game.

4) Trueskill's calculation is designed for a maximum of 8 players evenly distributed across teams (e.g. 8 free for all, or 8 split into 4 teams of 2). Greater numbers become computationally intensive. ToS has both greater numbers, uneven numbers within the teams and in some cases an uneven number of teams in play (e.g. Legacy Ranked's Any role could alter the number of factions in play).


On the plus side, what Trueskill was aiming to achieve is broadly the same thing Flake's suggestion is trying to achieve;

1) Attempting to better reflect the handicapped nature of faction winrates (ie, even the best players are never going to get 50%+ winrates with NK roles)
2) Attempting to introduce a faster placement system to offset ELO's tendency for slow movements up and down the scale
3) Attempting to introduce a level of "confidence" in someone's ranking; potentially factoring into more significant changes in ELO for wins and losses as the system tries to understand where best to place you
4) Using confidence as a mechanism for rewarding a minor but significant uptick to someone who has consistently performed at high levels for longer (ie. a player at 2,100 ELO after 100 games is less proven at that level than a player with 2,100 ELO after 200 games) as opposed to rewarding this via direct ELO gain, which could open the door to ELO being grindable.

IMO, the one bit which could be helpfully lifted from Trueskill relates to point 4, which is its "conservative skill estimate". Paraphrased into ToS terms, this means your stated ELO would be the lowest limit of your calculated ELO uncertainty range. In principle, it means that it's 95% likely that your skill is at least this ELO.

What this would do is create a numerical "penalty" behind Flake's significance levels. This would be an alternative way of answering rick's hypothetical question of whether we should give the same ELO to someone with a winrate of 75% over 200 games as we would to someone with a winrate of 75% over 500 games.

In Flake's current model, the answer is "Yes, but the one with more games has more bragging rights due to a higher significance level".
In this approach, the answer is "No, uncertainty penalties apply, so the person with over 500 games would have a higher stated ELO. These penalties become pretty minor at higher numbers of games played, but would typically still be enough to slightly separate the ELO of two players with the same winrate but a few hundred games difference until we're into 1,000+ games played."

For the non-statsy people, the example below may illustrate conservative skill estimation more effectively:

Player A
200 matches played this season.
Calculated ELO: 2,000 ELO (this is the ELO you currently see in ToS)
Uncertainty factor: +/- 6.93% (estimate based on 95% confidence interval for a 200 sample of an infinite population)
Uncertainty penalty: - 6.93% (the lowest limit of the uncertainty range)
Conservative skill estimate: 1,861 ELO ... 2000 * (1-0.0693) ... ie, 93.07% of the calculated ELO score

Player B
500 matches played this season
Calculated ELO: 2000 ELO
Uncertainty factor: +/- 4.38% (estimate based on 95% confidence interval for a 500 sample of an infinite population)
Uncertainty penalty: - 4.38%
Conservative skill estimate: 1,912 ELO ... ie, 95.62% of calculated ELO score

Example uncertainty thresholds using this model (all numbers rounded for simplicity):

10 games = +/- 31%
20 games = +/- 22%
50 games = +/- 14%
100 games = +/- 10%
200 games = +/- 7%
500 games = +/- 4%
1000 games = +/- 3%
2000 games = +/- 2%
5000 games = +/- 1%

This exact model is probably unduly impactful for ToS, and too computationally demanding to track on a game-by-game basis, but the same principle could be applied with fixed modifiers at confidence thresholds; most likely on a flatter curve, so that the effects are less fierce (e.g. taking lower quartile values rather than the minimum value as a starting point). It would also need some different language to that which I've used, because no-one likes to hear the word "penalty" in the same sentence as "applied to your ELO". I've just used it for clarity in explaining the mechanic.

Positives of this are that it does produce an ELO "reward" for more games being played and your ELO therefore being more accurate, but this is applied separately from the ELO calculation, and becomes less and less impactful over time, which evades the issue of ELO being grindable.
wozearly
Veteran
Veteran
 
Posts: 494
Joined: Wed Dec 28, 2016 6:48 am

Re: Ranked Elo Overhaul

Postby wozearly » Fri Apr 13, 2018 9:09 pm

Jackparrot wrote:One thing i notice is how you can play with a wide varitety of skilled people EX: you are in silver, yet you can play with really low players(which drag down your ELO) or high players(which boosts your ELO). what i think about this is that the system should make you play within your rank (EX: silver) instead of three to show more skill.

You state that the players who have a lot of games are not at the top rickms, this is bad as it means that the most skilled players are not nesscarily distributed in the proper ELO.


Just to pitch in on your points, Jackparrot:

1) If you were forced to play within your rank, you would need at least 15 people of your rank currently queuing in order to get a game. Given there are only around 60 people in Master rank, that effectively means once you hit the top levels you're prevented from playing. ToS' matchmaking uses the 3 minute window to try to group people as suitably as possible based on how closer their ELO levels are to each other. That you don't play only with people in/just around your own ELO can only be fixed by substantially increasing the player base and waiting for them to distribute across the tiers.

Given the issues Coven Ranked had with getting off the ground due to low volumes, I'd say the current solution is the best we have - even if its not ideal.

2) I presume you meant to say "players who have won a lot of games are not at the top"? Winrates are a bit tricky, because your average winrate is likely to be impacted by the factions you roll and the quality of your allies and opposition. For argument's sake, say that 50% winrates are average across all factions and all games. If you're a Master player getting 50% winrates against Gold opponents, your ELO should be reducing. If you're a Gold player getting 50% winrates against Masters, you should see your ELO rise. Because of point 1, you can't just use winrates as a reference for where people "should" be - however, as with games played, you'd expect winrates to correlate to some extent with people's ELO tiers.
wozearly
Veteran
Veteran
 
Posts: 494
Joined: Wed Dec 28, 2016 6:48 am

Re: Ranked Elo Overhaul

Postby Flake » Thu Jun 07, 2018 9:23 pm

Spoiler:
wozearly wrote:
SERVAT wrote:I myself don’t think ELO is the best way to measure skill in party games. Have you considered implementing other ranking systems, such as trueskill? You could look it up and think it’s bad or hard to implement, but at least give it a shot.


Trueskill was designed by Microsoft for a specific purpose, and for games with binary win/loss outcomes (e.g. only one team wins, but one team always wins), so it would need serious surgery to work for ToS as a) the number of teams in play can vary and b) it's possible for more than one nominally opposing team to secure a win (e.g. NE + Town).

There are other major complications with transferring its model to ToS:

1) It assumes that each "team" fundamentally has an equal chance of winning from the start, before player skill is taken into account. This is not a valid proposition for ToS (ie, the goal of balancing is not to see 33% winrates for Town, Mafia and NK respectively).

2) Trueskill's equivalent of placement games require the player to be pitched against players who known to be average to a high degree of the system's confidence. That doesn't really play nicely with ToS' decision to partially reset ELO each season.

3) Trueskill's matchmaking system is built around variable queueing times, as it heavily prioritises trying to get players of similar rank into the same game over ensuring a game begins in a certain timeframe. ToS does the reverse, ensuring games begin within (pretty much) a maximum 3 minute period provided there are enough players to launch the game.

4) Trueskill's calculation is designed for a maximum of 8 players evenly distributed across teams (e.g. 8 free for all, or 8 split into 4 teams of 2). Greater numbers become computationally intensive. ToS has both greater numbers, uneven numbers within the teams and in some cases an uneven number of teams in play (e.g. Legacy Ranked's Any role could alter the number of factions in play).


On the plus side, what Trueskill was aiming to achieve is broadly the same thing Flake's suggestion is trying to achieve;

1) Attempting to better reflect the handicapped nature of faction winrates (ie, even the best players are never going to get 50%+ winrates with NK roles)
2) Attempting to introduce a faster placement system to offset ELO's tendency for slow movements up and down the scale
3) Attempting to introduce a level of "confidence" in someone's ranking; potentially factoring into more significant changes in ELO for wins and losses as the system tries to understand where best to place you
4) Using confidence as a mechanism for rewarding a minor but significant uptick to someone who has consistently performed at high levels for longer (ie. a player at 2,100 ELO after 100 games is less proven at that level than a player with 2,100 ELO after 200 games) as opposed to rewarding this via direct ELO gain, which could open the door to ELO being grindable.

IMO, the one bit which could be helpfully lifted from Trueskill relates to point 4, which is its "conservative skill estimate". Paraphrased into ToS terms, this means your stated ELO would be the lowest limit of your calculated ELO uncertainty range. In principle, it means that it's 95% likely that your skill is at least this ELO.

What this would do is create a numerical "penalty" behind Flake's significance levels. This would be an alternative way of answering rick's hypothetical question of whether we should give the same ELO to someone with a winrate of 75% over 200 games as we would to someone with a winrate of 75% over 500 games.

In Flake's current model, the answer is "Yes, but the one with more games has more bragging rights due to a higher significance level".
In this approach, the answer is "No, uncertainty penalties apply, so the person with over 500 games would have a higher stated ELO. These penalties become pretty minor at higher numbers of games played, but would typically still be enough to slightly separate the ELO of two players with the same winrate but a few hundred games difference until we're into 1,000+ games played."

For the non-statsy people, the example below may illustrate conservative skill estimation more effectively:

Player A
200 matches played this season.
Calculated ELO: 2,000 ELO (this is the ELO you currently see in ToS)
Uncertainty factor: +/- 6.93% (estimate based on 95% confidence interval for a 200 sample of an infinite population)
Uncertainty penalty: - 6.93% (the lowest limit of the uncertainty range)
Conservative skill estimate: 1,861 ELO ... 2000 * (1-0.0693) ... ie, 93.07% of the calculated ELO score

Player B
500 matches played this season
Calculated ELO: 2000 ELO
Uncertainty factor: +/- 4.38% (estimate based on 95% confidence interval for a 500 sample of an infinite population)
Uncertainty penalty: - 4.38%
Conservative skill estimate: 1,912 ELO ... ie, 95.62% of calculated ELO score

Example uncertainty thresholds using this model (all numbers rounded for simplicity):

10 games = +/- 31%
20 games = +/- 22%
50 games = +/- 14%
100 games = +/- 10%
200 games = +/- 7%
500 games = +/- 4%
1000 games = +/- 3%
2000 games = +/- 2%
5000 games = +/- 1%

This exact model is probably unduly impactful for ToS, and too computationally demanding to track on a game-by-game basis, but the same principle could be applied with fixed modifiers at confidence thresholds; most likely on a flatter curve, so that the effects are less fierce (e.g. taking lower quartile values rather than the minimum value as a starting point). It would also need some different language to that which I've used, because no-one likes to hear the word "penalty" in the same sentence as "applied to your ELO". I've just used it for clarity in explaining the mechanic.

Positives of this are that it does produce an ELO "reward" for more games being played and your ELO therefore being more accurate, but this is applied separately from the ELO calculation, and becomes less and less impactful over time, which evades the issue of ELO being grindable.

Apologies for the late response. I think this is an interesting idea, though I find it hard to see how it would be implemented exactly.
Last edited by Flake on Fri Nov 09, 2018 10:40 am, edited 1 time in total.
User avatar
Flake
Benefactor
Benefactor
 
Posts: 987
Joined: Tue Oct 28, 2014 8:34 am
Location: England, UK

Re: Ranked Elo Overhaul

Postby Flake » Fri Nov 09, 2018 10:32 am

Let's try and get some more suggestions flowing for this. Bump.
User avatar
Flake
Benefactor
Benefactor
 
Posts: 987
Joined: Tue Oct 28, 2014 8:34 am
Location: England, UK

Re: Ranked Elo Overhaul

Postby StrahmDude » Sun Nov 11, 2018 7:48 pm

Suggestion to change how the average ELO of a team is calculated (I don't think this has been suggested before based on my reading of all other replies)

Different roles have different effects on the game. It can easily be pointed out that a jailor can make or break a game. Not to mention having a good Mayor and TP who at least understand what is going on. Not to mention the fact that the godfather is an extremely important role for mafia. For this reason, the TE and OE cannot be normal averages. They must be weighted averages. Each players elo should be multiplied by a number, for example, 0.7-1.5, and then averaged normally. The more influential the role, the higher the number, as their elo is magnified in effect. This evens out due to the subtractive nature of the difference of TE and OE. This means that a low elo jailor will cut down on your elo loss. Different elos are important, but who has different elos are more important.

p.s. I do second the idea that if you die N1 (but only if there is no medium or ret in the game) you shouldn't suffer a loss, or win.
User avatar
StrahmDude
Doctor
Doctor
 
Posts: 187
Joined: Sun Feb 15, 2015 9:10 pm

Previous

Return to Suggestions

Who is online

Users browsing this forum: No registered users and 4 guests