Matchmaking is the most impactful system in any competitive multiplayer game, and the least appreciated when it works. When you get a close, competitive match that goes to the wire, that's not luck. That's a sophisticated algorithm considering your skill, uncertainty, latency, play style, and a dozen other factors to find the best possible opponents. When you get stomped or stomp someone else, that's the algorithm failing. Understanding how matchmaking works is essential for building multiplayer games that feel fair.
Elo: The Foundation
The Elo rating system was invented by physicist Arpad Elo in the 1960s for chess. Its core idea is elegant: every player has a numerical rating, and the outcome of a match adjusts both players' ratings. Win against a higher-rated player, gain more points. Lose against a lower-rated player, lose more points. Over time, ratings converge to reflect true skill.
// Simplified Elo calculation
function calculateElo(ratingA: number, ratingB: number, scoreA: number, K: number = 32) {
const expectedA = 1 / (1 + Math.pow(10, (ratingB - ratingA) / 400));
const newRatingA = ratingA + K * (scoreA - expectedA);
return Math.round(newRatingA);
}
// Player A (1500) beats Player B (1600)
calculateElo(1500, 1600, 1); // → 1518 (gained 18)
calculateElo(1600, 1500, 0); // → 1582 (lost 18)
Elo works well for 1v1 games with binary outcomes (win/loss). It's still used in chess, many fighting games, and as the base layer for more sophisticated systems. Its main limitation: it assumes skill is one-dimensional and static, which is rarely true in games.
Glicko-2: Accounting for Uncertainty
Mark Glickman's Glicko-2 system extends Elo with two additional parameters: rating deviation (RD) and rating volatility. RD represents how uncertain the system is about a player's rating. A new player has high RD (the system doesn't know their skill yet). An active player has low RD (lots of data). Volatility captures how erratically a player's performance varies.
This matters because a player who hasn't played in three months should have a higher RD, the system becomes less sure of their rating over time. When that player returns and wins, their rating changes more dramatically (because the system is uncertain). An active player with low RD has smaller rating changes (because the system is confident).
Glicko-2 is used by many online games and competitive platforms. It produces more accurate ratings than Elo, especially for players with irregular play patterns.
TrueSkill: Microsoft's Team Matchmaking
Microsoft's TrueSkill (and TrueSkill 2) was designed specifically for multiplayer games with teams and multiple rankings. Unlike Elo, which handles 1v1, TrueSkill can rate individual skill from team outcomes. If your team of four beats another team of four, TrueSkill updates each player's individual rating based on the team result and their relative contributions.
TrueSkill uses a Bayesian model with two parameters: mu (μ) representing estimated skill, and sigma (σ) representing uncertainty. A player's conservative rating is typically μ - 3σ, meaning the system is 99.7% confident the player is at least this skilled.
TrueSkill 2 added factors like individual performance within a team match (kills, deaths, objectives in a shooter), making it better at separating a carried player from a carry. It's used across Xbox and many Microsoft-published games.
The SBMM Debate: Skill-Based Matchmaking
Skill-Based Matchmaking (SBMM) is the application of rating systems to public, unranked matches. It's one of the most controversial topics in gaming. Players complain about it constantly, but they'd complain even more without it.
The core tension: SBMM creates fair matches, but fair matches mean you can't relax. If you're a skilled player, every match is sweaty. You never get the power fantasy of dominating weaker players. Casual sessions feel competitive. Many players experience this as exhausting.
Without SBMM, matches are random. New players get destroyed by veterans and quit. The skill gap in most games is enormous, the difference between a 50th-percentile and 95th-percentile player is often 5x in measurable metrics. Random matchmaking exposes this gap constantly.
The modern solution: engagement-optimized matchmaking (EOMM). Rather than optimizing purely for skill balance, EOMM optimizes for player retention. It might give a struggling player an easier match to prevent them from quitting, or challenge a dominant player to keep them engaged. This is sophisticated but controversial, it means the matchmaker is manipulating the player experience for business metrics.
Practical Matchmaking Architecture
Building a matchmaking system involves more than just a rating algorithm. A production matchmaker considers:
- Skill proximity: Match players of similar skill (primary criterion).
- Latency: Prioritize low-ping matches. A fair match at 200ms ping is still a bad experience.
- Queue time: As queue time increases, widen the skill range. Players would rather play an imperfect match than wait 10 minutes.
- Party balance: A 5-stack (premade group) shouldn't face 5 solo players. Parties coordinate better, creating an unfair advantage.
- Role fill: In games with roles, ensure each team has the necessary composition.
- Recency: Avoid matching the same players repeatedly in a short time.
These factors often conflict. The perfect skill match might have terrible latency. The lowest-latency match might have a huge skill gap. Matchmaking is a multi-variable optimization problem with no perfect solution, only tradeoffs.
Measuring Matchmaking Quality
How do you know if your matchmaker is working? Track these metrics:
- Win rate distribution: Should be centered around 50% for most players. A skewed distribution means the matchmaker is systematically over/under-matching someone.
- Predicted vs actual win rate: If the matchmaker predicts a 60/40 split, did it actually pan out? Calibration accuracy shows whether your rating system reflects reality.
- Queue time distribution: P50 and P95 queue times. Long tails in queue time mean some players are having a terrible experience.
- Post-match sentiment: If players rage-quit, leave immediately, or report the match, the matchmaker may have failed.
- Repeat play rate: After a match, does the player queue again? This is the ultimate quality signal.
Build dashboards for these metrics from day one. You can't improve what you can't measure, and matchmaking needs constant tuning as your player base grows and skill distributions shift.
