Data science extracts knowledge and insight from various forms of data. Magic is its infancy when it comes to utilizes data in a formalized manner. Typically, players subjectively look at deck lists, draft picks and results, and try to piece together their best judgement on formats. Contrast this to Checkers (which has now been solved by computers), Chess and Poker, where Machine Learning has been able to allow machines to compete with players, and the best players to improve their play by using machines.
It’s rare to get good data for Magic. You might get a draft viewer, a set of results, and and some decklists, but it’s hard to control across play skill, matchups and data sets. Enter the MOCS, with four complete drafts, complete with full results over three swiss rounds, and decklists. Though some may disagree, it’s reasonable to assume the players were of roughly the same play skill.
The Aether Revolt and Kaladesh draft format has been exciting, challenging and contentious. Pros and commentators have spirited arguments about best colors, correct picks, land counts and draft strategy. Using Data Science, we can study the drafts and results and look for interesting things. The methodology is to look at the cards played, and relate them back to win percentages, and then look for trends. The average winning percentage for an average card should be 50%, so anything about 50% is good, and below 50% bad, with the amount above/below indicating better/worse.
Previously all scraping/parsing/large scale data analysis of limited card winning percentage has run into one of two problems.
- Scraping games played neglects the % of a time a card is stuck in hand and thus discounts opportunity cost (e.g. your 7-drop bomb will always overperform here)
- Looking at decklists doesn’t let you know which cards were the overperformers. You need a gigantic sample to cut through the noise or you might be giving credit to the wrong cards.
This analysis seeks to address these issues by evaluating the deck as a whole, assuming that over three rounds (6 to 9 games) that each card plays a roughly equal role in how well the deck does. This should solve the Ulamog sitting in hand problem, or all the times a player lost without casting Inspired Charge. It also shows opportunity cost – for example Shock shows worse in this analysis as you have to take Shock high and it’s may not be a high impact card so it can take the rest of your deck down with it. With those assumptions in mind, let’s look at the data.
The first question is which set is stronger? Kaladesh was a high powered and fast set, and the numbers are backed up. While Kaladesh accounted for 33% of the cards played, the average Kaladesh card had a 50.2% winning percentage, while Aether Revolt was 67% of the cards played, but with an average winning percentage of 49.7%. Therefore those players with a higher percentage of Kaladesh did better than those who relied more on Aether Revolt.
Drafting the proper colors is key. When the best colors are open you want to be in those, but when the best colors are overdrafted, you can find gold. In the first draft, Red and Green were underdrafted and performed well, and even in the 2nd draft, even with Red and Green heavily drafted they still performed well. Across two drafts, here were the color per deck wins:
This can be deceptive. Two of the 3-0 “Green” decks, namely Lucas Blohon’s first deck and Simon Nielsen’s second deck had few green cards. When we look at the winning percentage per color, a different story emerges:
|COLOR||Cards Played||Win %|
Looking at the individual colors, the individual red card had the best win percentage, and then blue. Green was heavily played, and as mentioned above decks with fewer green cards did much better than decks with heavy green cards (which might have been more one-dimensional). Black didn’t do very well – even with nearly the fewest cards played (109 versus 108), the average card did poorly. And white was at the bottom again. Interesting that the average artifact is above 50% (note that for this analysis an artifact like Welder Automaton was considered a red card). More on card types later. Here is the analysis of the gold cards:
|COLOR||Cards Played||Win %|
But what about color by set – is one set better or worse, let’s see:
|color||Aether Win %||Kaladesh Win %||Total Win %||Kaledesh Delta|
The four columns show the winning percentage of an average card of the color in Aether Revolt, Kaladesh, Total and then the Delta from Kaladesh (sorted by this). This helps as you decide what you can bank on for pack 3 (aka “the blue pack”). Looking at the primary colors, Red is the only color in Aether Revolt above 50%, clearly the strongest color. Interesting White is the 2nd best color in Aether Revolt at 49%, with Blue, Green and Black short behind. Artifacts are strong in Aether Revolt, far above 50%. But now comes pack 3. Red and Blue (long thought of as the two weakest colors in pure Kaladesh) shoot up 10 full points. Green shoots up 8 points, with its strong cards, and over 50%. While Black improves, Black doesn’t get to 50%. And White gets WORSE, by 15 percentage points. Artifacts get worse by 7 points, at a tough time, as people are trying to get artifacts to complete “artifact matter” decks. The lesson is that if you’re going to go White, be sure you can get the quality cards in pack 1, and don’t rely on Kaladesh. But for the Temur colors you can lean more on the last pack picking up quality. But be sure to get your quality artifacts in the first pack 🙂 For completeness, here is the same data for the gold cards:
What about strategies, how did they do:
Vehicles were solid, though the common vehicles have just a 50% winning ratio, it’s the Uncommon and Rares that just dominated. Interesting to see Kaladesh strategies like Artifact Matters and Energy continue to do well. Counters matter (i.e. Black-Green stuff like Foundry Hornet) is below water. But most surprisingly Revolt and Improvise didn’t get there. Too hard to set it Revolt and the payoff wasn’t large enough, and Improvise distorted decks. Interestingly the white Revolt cards were the best, whereas, the Blue Improvise the worst.
What about rarity? Generally people perceive broken rares as the key, but this analysis showed it’s the uncommon card that won out (as players may have overvalued rares that were weaker than thought):
|Win %||Card %|
Note that a typical pack is 7% rare, 21% uncommon and 71% common, so a higher percentage of uncommon and rares played makes sense.
What about casting cost?
|CC||Win %||Avg Per Deck|
Interesting – it’s the LSV theory of magic – two and seven drops 🙂 But the big take-away is the strength of the 2 and 4 drop (the early play, and the bread and butter). Interesting that five drops don’t fall off, but the six drops do. And the one drops just aren’t strong enough. The seven drops are the over-the-top bombs, but also stuff like Gearseeker Serpent. You can see the mana curves – where 2 and 3’s are key, but having some count of 4+ plays is also key.
Speaking of mana curve, what about land count? We’ve been told 16 lands are the key to the format, but the data suggests otherwise. For this analysis, Renegade Map and Attune to Aether are counted as a land. With that in mind, 16 land decks (or which there were 10) only won 40% of their matches, whereas 17 land decks (22 in total) won 55%. In the first draft, more players braved 16 land, but in the 2nd draft, they wised up.
What about card types?
Interesting that artifacts and powerful sorceries had a higher win percentage, and enchantments lower. Some of this was the weakness of white in general. Instants performed better in the first draft where having decklists may have caused some overplaying, whereas for the 2nd draft people adjusted and didn’t try to overthink and play around everything.
What about the best commons? Given the card counts, it’s hard to come up with a good measurement unless a card was played frequently. Limiting to just cards that showed up at least 7 times, yields the following results:
|7||81||Audacious Infiltrator||Powerful, evasive 2 drop|
|9||67||Aether Swooper||Powerful, evasive 2 drop|
|12||64||Chandra’s Revolution||4 dmg red removal, upside|
|8||63||Destructive Tampering||Game winning, modal|
|10||60||Leave in the Dust||Subtle but robust|
|8||58||Riparian Tiger||Great with energy sources|
|8||58||Frontline Rebel||Controversial card|
|9||56||Highspire Infusion||Energy + trick + finisher|
|8||54||Ghirapur Osprey||Overperforming flyer|
|11||52||Aether Poisoner||Versus green, artifact matters|
|7||52||Cruel Finality||Scry helps|
|12||50||Aether Chaser||Highly sought after|
|12||50||Dawnfeather Eagle||Great finisher, but 5 drop white|
|9||48||Druid of the Cowl||Controversial. Higher rated than other green commons.|
|7||48||Countless Gears Renegade||Lacked the staying power|
|11||45||Hinterland Drake||Minor drawback,no synergy|
|12||44||Prey Upon||Situational, blow-back potential|
|9||44||Sweatworks Brawler||While red dominated, improvise didn’t|
|10||43||Shock||May not cut it in a sea of green|
|10||43||Caught in the Brights||Crewing is a large downside|
|7||43||Peema Outrider||Double Green casting cost hurts|
|7||43||Scrounging Bandar||Revolt didn’t pan out|
|9||41||Daring Demolition||Double Black casting cost hurts|
|9||41||Aether Herder||Not far from Peema Outrider|
|8||38||Lifecraft Cavalry||Revolt didn’t pan out.|
|8||38||Silkweaver Elite||Revolt didn’t pan out.|
|7||38||Shipwreck Moray||Energy payoff didn’t cut it|
|8||33||Bastion Inventor||Blue improvise didn’t pan out|
There is more data to show and discuss, down to other commons, etc. But you can see the intrigue of Data Science, how you can find interesting things in data sets. Imagine mining through Magic Online data to compare these results 🙂
Paul’s Take: Setting aside small sample size concerns for now, this is an impressive data-drive methodology. The key is to update prior assumptions where appropriate and to seek more data where your intuition and the data doesn’t add up. As always, there is signal and noise, but there is so much greenfield opportunity for applying a more scientific approach to limited, I’m excited.
Please let us know what you think.
In terms of methodology of figuring out a rough winning percentage, if a card only shows up in one deck then it’s hard, but for decks that show up in multiple decks you can get a trend. Let me show an example of a card that vastly overperformed – Audacious Infiltrator – according the stats this is the best white common:
|Match Wins||Count in Deck||Weight||Max Weight|
You can see five people played with it. 2 went 3-0, 2 when 2-1 and one went 1-2. But Josh had three, so his 3-0 is triple weight. So of all those who had the card, they won 17 out of 21 matches, or a 80% winning percentage. Do we know if they all drew the card on turn 2 and it was the difference? No, but the trend is there. And while Josh had a Copter, he needed other cards to win when he didn’t draw it, or when it was destroyed.
Let’s look at a card on the other side – Caught in the Brights. I watched this card underperform as people could just crew vehicles with the creatures, including Urase who beat Carvalho by using two Caught creatures to crew a Dreadnaught. It’s bad to bounce, blink, disenchant, etc. Here is the data:
|LEE SHI TIAN||0||1||0||3|
Here you see a worse rate – 43% winning percentage as the card underperformed in the first draft, and only slightly above in the 2nd. Based on this we can conclude those with the Infiltrator did better than those without it (even with some players overlapping).
I was skeptical about the sample size, but the data looks along the lines I would expect, for example, the aether creatures are rated properly (in my opinion at least):
|Card||Win %||Cards Played|
Although some might disagree with the Implements (as the sample size was much smaller):
|Card||Win %||Cards Played|
|Implement of Examination||55||3|
|Implement of Combustion||47||5|
|Implement of Ferocity||44||6|
|Implement of Malice||42||4|
|Implement of Improvement||17||2|
Most people rank the green implement #1, but it didn’t do that well (though only 6 were played)…