Football is a simple game. Nothing matters more than goals; either scoring more or conceding less. In this tactical analysis, we’ll focus on the former, looking at Europe’s top goal-scorers and analysing their shooting using machine learning and statistics.
Matchday 33 of La Liga 2018/19 has just concluded and we’re halfway through the final quarter of all the top five leagues in the world. At this moment, it’s worth looking at the top-scorers across the top five leagues and analysing the changes in the goal-scoring scenario. The usual suspect – Lionel Messi – is right there at the top but following the departure of Cristiano Ronaldo to Juventus, the latter is no longer at his goal-scoring best. In Serie A alone, Fabio Quagliarella and Duván Zapata are ahead of him with 22 and 20 goals respectively while the Portuguese is at 19, despite having played similar minutes.
A short hop to the north of the Italian peninsula brings us to Germany where Robert Lewandowski, is reigning supreme. Although having stumbled a little during the beginning of the season, he’s hit top gear again, having scored 21 goals. It’s worth noting that this season he’s on a clear personal decline, having the career-worst expected goals differential (-10.39) since his Borussia Dortmund days.
In France however, a star is on the rise. Kylian Mbappé has scored a record 30 goals, tearing teams apart with pace and finishing, and single-handedly carrying his team’s offence during the time when Edinson Cavani and Neymar were both out injured.
Although there are some major changes in the goal-scoring landscape with new players on the rise and few on the decline, has the art of scoring goals itself changed? Can we look at all players, irrespective of league or age, and unearth a common trend or similar quality which delineates the best from the rest? In this article, we’ve tried to do just that; employing a machine learning algorithm to tell us what makes a true goal-scorer. Although the topic in itself is quite vast as there are a dozen factors which result in a goal or not (with a fair bit of luck added on to it as well), we’ll look at the most crucial of factors – shot location.
Note: For the purpose of this article, we’ll focus on only four players – Duván Zapata, Lionel Messi, Kylian Mbappé, and Robert Lewandowski. We’re only considering the players with the highest number of goals in their respective domestic leagues. Where there is a similar number of goals, we’re picking the player with the better minutes-to-goals ratio.
What is K-means?
To look at the favoured shot location of each of the abovementioned players, we’ll use a flat clustering algorithm called K-means. To give a very brief explanation, it’s an unsupervised machine learning algorithm which takes a dataset and tries to sort them into k different clusters where k is the number of clusters that the scientist wants (say, 2, 10 etc). It then employs a vector calculation method to calculate the Euclidean distance between the data points and then assign them to the nearest centroids, forming clusters in the process. If all this were to be explained in simple terms, what K-means does is simply identify clusters or groups in a given dataset on the basis of a set of factors.
Let’s take a random player – Cristiano Ronaldo, for illustration purposes. Here’s what his shot-map for the 2018/19 Serie A looks like to date.
Here’s a simplified version of it with no adjustment for xG.
At this stage, you might already have a fairly decent idea based on intuition as to where the centroids might lie just by looking at the shot-map. For a random value of k, say k=2, the two clusters would probably be the shots on the left and shots on the right. Let’s check that by running the algorithm for k = 2.
As you can see, it checks out fairly simply. The shots are clustered into two groups, red and blue. Also, note that the Xs are the centroids or the mean positions of the two clusters.
Setting the k
As we checked, our algorithm works out fairly well for two clusters. However, we need to set an optimal value for k which gives us maximum knowledge about the shot clusters.
To do this we use the Elbow-method. There are a few other methods and none of them are superior or inferior to each other so it’s mostly a matter of choice. The Elbow method just clusters the dataset for a range of k values and then calculate the sum of squared distances (SSD) for each k. We’ll then plot the SSDs and it should look like an arm. After that, the elbow of the arm is the k we’re looking for.
Here, k turns out to be four. Henceforth, we’ll use k = 4 for the rest of the players.
Putting it all together
Now that we’ve found out our optimal k as well as gained an idea about what to expect, it’s time to apply the algorithm to the set of five players and see what insight we can derive about their shooting from it.
Lionel Messi has been in sensational form this season. He’s at his most complete goal-scoring form in recent years having scored 33 goals and has 13 assists in the La Liga alone. Although he’s scored a lot of direct free-kicks, we’re not going to analyse those and only focusing on his open-play shots. Here’s what his shot-map along with the clusters and centroids look like this season.
As we can see from the location of the centroids, Messi shoots a lot from zone 14, and the inside left of the 18-yard box. On the right side, his major shots come from either outside the box or at a location slightly ahead of the counter-location on the left side. It’s worth noting that this shots on the right side don’t have the same volume as his shots on the left side, which shows a clear preference for the left.
Most of his shots occur from the 18-yard box and this is where he’s scored his most number of goals as well. He likes to cut inside on his preferred left foot and either curl it in the far-corner or pull it back towards the near-corner, completely wrong-footing the goal-keeper.
He also shoots a lot from within the 18-yard box, mostly from the left. These are a testimonial to the Alba-Messi connection which is an important tactic employed by the Barcelona team. A very common movement employed by the current Barcelona team sees Messi cut inside from the right side of the midfield which triggers a counter-movement by Alba who looks to run into space behind the opposition full-back. Messi then delivers an inch-perfect lofted pass to Alba who then either squares the ball to Suárez or cuts it back from the byline to Messi himself – who usually continues his diagonal run inside the box.
This season, Valverde has mostly lined Barcelona up in a 4-3-3, which sees Messi start on the right wing. However, he has a free role and can do a range of things from dropping deeper into midfield or playing in a second striker role beside Suárez. This role has worked out very well for him as seen from the number of shots he takes from outside of the 6-yard box.
Suárez’s runs usually drag away the defenders and Messi takes advantage of the defensive team’s backward momentum by running laterally into space instead of towards the goal like a traditional number 9. His teammates know that he will be occupying these spaces and their job is to cut the ball back so that Messi can finish those chances.
Zapata has been a revelation for Atalanta this season. Having joined Atalanta from Sampdoria in 2018, the striker has netted 19 non-penalty goals this season. Atalanta, have exclusively used a three-man defence and Zapata usually plays as a lone striker up-front or paired with Josip Iličić in a 3-4-1-2.
His shot map reflects Atalanta’s style of play. Atalanta don’t use long balls too often and prefer short passes to keep possession instead. They’re 17th in the league in terms of long balls attempted per match. However, once inside the final third, Atalanta depends upon crosses to Zapata to score.
The Colombian striker is dominant in the air and can win headers as well as hold up the ball well for his team-mates. Atalanta are also adept at overloading the opposition box and then using second-balls to score from close ranges. This means that Zapata doesn’t need to enter the 6-yard box much, instead staying in a deeper area on the edge of the box to read the game behind him and then finish those second balls which land close to him.
As is noticeable from the clusters, the maximum weight of the shots are of the ones from the inside left position as well as the centre of the 18-yard box or around the penalty spot. This couldn’t be further from the next striker who, although a classic number 9 like Zapata, plays in a very different system and hence has very different goal-scoring tendencies.
Zapata is also tactically versatile. He can play as the central striker as well as in a two-striker system without any effect on this attacking output. Indeed, he plays better with Iličić on the pitch as they can then be 2v2 against the opposition centre-backs.
Robert Lewandowski has been the sole striker for Bayern Munich for the past five years. He’s a Bundesliga legend, currently ranked fifth amongst Bundesliga’s all-time highest scorers. Although his shooting form may have stayed the same – with him taking 4.6 shots per 90, his finishing has dropped as mentioned before. Nonetheless, the lack of competition in the Bundesliga as well as his own technical ability and positional awareness means that he’s still the highest scorer of this season.
Lewandowski is the king of the six-yard box. He’s the perfect Niko Kovač-type striker and this has resulted in him being one of the first names on the team-sheet this season. Bayern Munich have lined up in a 4-2-3-1 or a 4-1-4-1, both of which have Lewandowski as the sole striker up ahead. Lewandowski’s increased shots from the six-yard box is a direct result of Bayern Munich’s wing-oriented offence as well as the onus on the winger to create and deliver the final ball.
As is evident from the clusters, he takes a lot of his shots from inside the 6-yard box. Most of these are tap-ins scored after a sharp cut-across from the winger. This leads him to score a lot of open-goal tap-ins. His predatory nature also leads him to finish moves inside the box rather than coming deep and helping to build up play.
Apart from that, he’s one of the most complete strikers in Europe right now as he can shoot with his left foot, right foot, take long shots from outside of the box with great accuracy and time his movement inside the box to stun opposition defenders. He’s adept at using quick changes of pace and ‘ghosting’ movements between the central defenders to shrug off his markers and convert headers from incoming crosses.
When Edinson Cavani and Neymar were both injured, Thomas Tuchel turned to none other than the French World-cup winner Kylian Mbappé to solve his offensive predicament. And the teenager has not disappointed. He’s scored 30 goals and has seven assists to his name this season. Although he’d begun the season and his career as a winger, he’s been playing as a striker in the PSG’s 3-5-2 alongside Ángel Di María.
His current role is that of a hybrid winger-goalscorer, or more commonly known as a wide forward. He mostly plays on the left side of the pitch and uses his acceleration and pace to beat defenders and cut inside from the by-line or attack spaces on the transition.
Although the competition in Ligue 1 leaves a lot to be desired, Mbappé has made the most out of it. It’s not entirely uncommon for teams in the Ligue 1 to either lose the ball in the middle third due to PSG’s pressing or simply due to the sheer quality of PSG’s midfield.
Mbappé exploits this by moving out wide on the right side when PSG are out of possession and then curving his run to get played in by a through-ball. His most favoured shooting locations are on the right side just outside of the box as well as inside of the 18-yard box. Most of these shots are a result of offensive transitions or counter-attacks. As there’s a lot of space ahead, Mbappé can shoot from distance and still convert his chances. This is evident in how he’s defied his xG by +5.62 this season.
K-means clustering is very useful when grouping together larger datasets and finding patterns within them. It does have some significant limitations such as the adherence to Euclidean distance instead of any other parameter means that it will always try to find equal-sized clusters even though they might not exist in a dataset. However, for limited datasets such as shot-clustering, they’re enough to analyse under-lying shooting behaviours and useful to analyse opponents as well as individual players.
If you love tactical analysis, then you’ll love the digital magazines from totalfootballanalysis.com – a guaranteed 100+ pages of pure tactical analysis covering topics from the Premier League, Serie A, La Liga, Bundesliga and many, many more. Buy your copy of the April issue for just ₤4.99 here, or even better sign up for a ₤50 annual membership (12 monthly issues plus the annual review) right here.