Wednesday, 27 August 2014

How To Find A Betting Edge Using Expected Goal Models

This soccer betting article explains how to calculate expected goal models. By understanding shot data and how to calculate expected goals bettors can identify whether or not a team are being lucky or unlucky over a season, creating an opportunity to find an edge.

The underdeveloped state of soccer analysis is partly down to the lack of accessible data and secondly, the difficulty in describing a fluid sport, where set plays, so common in sports such as baseball are largely absent.
The gradual release of data has begun to address the former problem, while much of the new analysis has concentrated on the few set plays that do exist in soccer.

Penalty kicks are the obvious example of a rigorously enforced soccer set play and many will be aware that a top-flight player facing a similar keeper will have a long term conversion rate of around 78% from a penalty kick.
The logical extension to knowing that a penalty has a goal expectation of 0.78 of a goal is to attempt to apply the same analysis to all shots and headers.

A player may be applauded for a speculative shot 40 yards from goal, however it may benefit the expectation of the fans and temper the optimism of the player if both knew that such attempts result in a goal just once in around 100 attempts.

By attaching probabilistic outcomes to key events in a sport, we can begin to develop useful models to better describe previous games and possibly predict the likely outcome of future events.

We’ll use one of the opening games of the 2014/15 season, Arsenal vs. Crystal Palace, to run through the process.

Data Collection
Models unfortunately require data and although shot location for various leagues can be found at Statszone, Squawka and Whoscored the data isn’t available in a spreadsheet friendly format.

Both Statszone and Squawka can be used to collect accurate shot location coordinates. For example, the perpendicular distance from the goal line combined with the horizontal distance from the centre of the goal can be sourced using freeware such as mobilefish.

Dividing the attacking third into shooting zones and bundling together goal attempts, rather than using individual shooting positions is also a viable alternative.

Goal Expectation
The closer to the goal a shot is taken the more likely it is to produce a goal. Similarly, shots are generally more potent than headers from the same pitch location.

Dividing the pitch into shooting zones and collecting sufficient data to calculate representative conversion rates for both shots and headers within each zone is one route to calculating goal expectations.

For example, shots taken from inside the central portion of the six yard box are converted at rates of nearly 50%, compared to less than 10% for shots from just outside the penalty area.

A more formal approach uses logistic regression, which is particularly useful when estimating the likelihood of events either occurring or not, such as whether or not shots result in a goal.

The coordinates of each shot location are used as the independent variables and the actual outcome of historical attempts on goal as the dependent variable are used to produce an equation, which can be used to estimate the probability of a goal being scored from an attempt from any pitch location.

This method can be further extended to calculate the probability of attempts being either blocked, off target or on target, simply by changing the dependent variable from goals to, for example, blocked shots.

The limiting factor in either approach is the need for relatively large amounts of initial detailed data to create a robust model.

Match Simulation
Knowledge of the goal expectation for individual attempts enables matches to be replayed by simulating the possible outcomes of each attempt taken or faced during an actual game.

For example, the generic success rate for penalties is 78%. By using a randomly generated number between 1 and 0, the spot kick can be considered successful if the random number falls between 0.78 and 0 and unsuccessful if it is between 0.78 and 1.

The larger the value of the goal expectation, the greater the chance that the random number will fall within the successful range and a virtual goal will be scored. This can be extended to include every actual attempt in a match, with the individual successes summed together to simulate a goal total for each team, resulting in a virtual score line.

Repeating this process, typically for 10,000 “games” and summing all score lines which lead to either a home, away win or draw can add context to the actual result and help answer the question did the actual winners create and prevent sufficient shots, once the location is accounted for, to fully deserve their win?

Example: Arsenal 2 Crystal Palace 1
Arsenal opened their season at home to Crystal Palace, out shooting Palace 14-4 and winning the game 2-1 with a last minute Aaron Ramsey goal. Palace’s four efforts are shown below on the screen grab from Statszone.



Their best opportunity came from a 34th minute corner kick which found the head of Brede Hangeland. His header (number 2 on the screen grab) originated inside the six yard box, about 4 yards from goal and close to the near post, a further 4 yards from the centre of the goal.

Typically from a logistic regression analysis of historical data of headers, this type of chance is converted nearly 14% of the time. It has a goal expectation of 0.14.

Palace’s further three efforts, each a shot, were less dangerous. Jason Puncheon had two shots with a goal expectation of 0.10 (shot 4) and 0.01 (shot 3), respectively. Marouane Chamakh’s long-range effort (shot 1) was the least likely to score, leading to a goal on 0.3% of occasions for a goal expectation of 0.003.

In total, Palace’s four attempts when added together had a cumulative goal expectation of 0.25 of a goal and as a general guide would produce, on average, a goal every 4 matches.

These probabilities can be used to simulate the most likely outcomes from the four Crystal Palace goal attempts.
The regression derived conversion rate for Palace’s four chances is shown in column C. A random number between 1 and 0 is generated in column D using the =Rand() spreadsheet function and if that value falls below the value in column C, a goal is considered scored.

In this single iteration, Hangeland’s header was the only attempt to produce a goal.



This is repeated for both Palace’s four attempts and Arsenal’s 14 to generate 10,000 match results based around the quality and quantity of the chances created on the day by each side.

Most Common Score Lines in 10,000 Simulations of Arsenal vs. Crystal Palace
Arsenal Score
Palace Score
Frequency in 10,000 Trials
1
0
28%
2
0
23%
0
0
13%
3
0
9%
1
1
8%
2
1
7%
0
1
4%

The table above shows the seven most common score lines from these shot based simulations. Palace was most likely to remain scoreless from their four attempts, while Arsenal, despite superior shot numbers, also shot frequently from distance or from wide positions.

To see Arsenal’s shot chart visit either Statszone or Squawka.

In total, 73% of the 10,000 simulations resulted in a score that gave Arsenal a win, 21% were draws and just 6% wins for Crystal Palace. So an Arsenal win was consistent with the frequency and quality of attempts made by each team.

Conclusions
This approach looks to extend the use of shot data to try and identify sides that may have been either fortunate or unfortunate in their match day results.

A side can only create or prevent chances consistent with their ability, but they are less in control of when those chances turn into goals. They cannot guarantee that goals will arrive when they are most needed.

Therefore a team may create excellent shooting chances, but that may not be reflected in similarly excellent results, especially over the short term. Whilst an instant change of luck from bad to good shouldn’t be expected, extreme outcomes are often followed by less extreme ones in the future.

In short, this is a tool to use in identifying possibly overrated, but lucky teams or underrated and unlucky ones, who may experience a more usual connection between shots and results in the future, which is valuable for any bettor looking to find an edge over the market.



Friday, 1 August 2014

Using The Possession Stat In Soccer To Make Betting Profits

Possession has become one of the most contentious of the newly available range of soccer statistics that are regularly quoted in live match analysis. It is generally assumed that enjoying more possession is a positive, but does this assumption hold water, and to what degree should bettors use possession stats in handicapping a team’s ability to win games?

Taking Possession of the facts
The success of sides that take a possession based approach – such as Barcelona at club level and Spain at international tournaments – coupled with the aesthetically pleasing brand of passing football practiced by these teams, has made domination of possession an aspiration for many teams.

In addition, Liverpool’s Brendan Rodgers’ often repeated quote that “if you can dominate the game with the ball, you have a 79% chance of winning” is also taken as an endorsement of possession based football.
The success enjoyed by a short passing, possession orientated approach is much more mixed than advocates of this approach imply

However, if we take a more measured view of recent competitions, the success enjoyed by a short passing, possession orientated approach is much more mixed than advocates of this approach imply.

Holders, Spain bowed out of the 2014 World Cup at the group stage, despite having over 60% of the possession in defeats to first the Netherlands and then Chile. Also their success at Euro 2012 owed much to a semi-final penalty shootout win against Portugal, a side that had enjoyed only a minority share of possession throughout the tournament.

Barcelona conquered Europe in 2008/09 and 2010/11 with tournament possession figures in the mid to high 60% and pass numbers averaging around 700 per game. But sandwiched in between were wins for Inter Milan, 45% overall possession in the competition, barely a third in the final and just 400 passes per game in 2009/10, followed by Chelsea’s 47% overall possession in 2011/12.

Bucking the Possession trends
Numerous high profile head to head meetings have fallen to the side that shuns possession, most notably Chelsea’s aggregate win over Barcelona in 2011/12 gained with just 20% of the ball and Real Madrid’s 5-0 aggregate trouncing of a Pep Guardiola led Bayern Munich at the UCL semi-final stage this year with less than 30% overall possession.

Each of these examples is cherry picked and small in sample size, but they do appear to contradict Rodgers’ precise assumption, and the assertion that all a side needs to do to stand an excellent chance to win a match is to keep the ball for longer than their opponents.

Possession as a composite of other key stats
The major flaw with raw possession as a useful indicator is that it is a secondary statistic that has been derived from a composite of many other primary stats that are much more fundamental to a team’s success.
Teams get possession by being good at making tackles or interceptions, they keep it by passing well and they use their possession to create chances. They score goals by converting these chances and they win games by scoring more goals that they allow.

So we have a chain with match result at one end time of possession at the other, but in between are statistics that are ability dependent – often score line dependent – which are better indicators of match success. Possession in itself is not a strong indication of how well a side has performed. Of more importance is what a side is able to achieve with the possession it has.

Highlighting shortcomings with Possession
Swansea, in their early Premier League seasons, typified the confusion that simple possession figures can cause. The Welsh side were unconvincingly compared to Barcelona, primarily because their possession figures rivalled those of the Catalan side.

However, whereas Barcelona possessed the ball high up the field in their opponents final third and combined quick, intricate passing sequences – known as tika-taka – to create chances for world class players, Swansea’s use of possession was much more conservative, rooted firmly in their own half of the pitch, often comprising square or backward passes and designed as a defensive tactic centred around ball retention to prevent the opposition from scoring.

In the same 2011/12 season that Swansea were using the third best possession statistic in the EPL to create 472 chances, Barcelona were topping La Liga’s possession charts and creating 626 opportunities. The four sides immediately surrounding Swansea in the EPL possession table used similar levels of ball retention to create an average of 681 chances over the season, almost 50% higher. At the end of the season, Swansea finished a worthy, but mediocre 11th.

Swansea’s choice of how to use possession in a defensive, rather than attacking fashion was a result of their limitations as a recently promoted team. They protected an game start point in much the same way as more talented teams sometimes use possession to protect a lead.

Stoke City had travelled a similar route in 2008/09, with an inverted approach, this time by conceding possession and relying on chance creation from set pieces and long throws to generate their goal scoring opportunities, whilst making scoring difficult for their opponents with a packed defensive third.

The Mourinho approach to Possession
More illustrious sides, notably those managed by Jose Mourinho, also appear willing to sacrifice the ball in return for defensive solidity and the chance of scoring from counter attacks or set plays, especially when faced with technically superior opponents.

A game is sometimes won by an opponent’s mistake. Errors, in Mourinho’s view are more likely to be made by the team in possession of the ball and he is therefore happy to give them the opportunity to present his teams with a gift and at the same time attempting to ensure that the game remains stalemated.

Mourinho’s Chelsea demonstrated this tactical aspect of possession when his side went to Anfield as underdogs in 2014, enjoyed just 27% of possession, but defeated Liverpool 2-0, with goals from a Gerrard mistake and a swift counter attack, effectively killing the Reds’ title ambitions.

Although happy to possess the ball and dictate the course of the game against inferior sides, Mourinho is prepared to be adaptable when faced with a sterner challenge. The likely possession statistics for the Liverpool-Chelsea meeting were virtually determined before a ball was kicked, and therefore, should have been anticipated by bettors.

Pragmatism the Pulis way
Playing without the ball can often produce unattractive football and negative media reaction, but in many cases sides revert to these tactics to give themselves the best chance of achieving a favourable result.

Stoke City under Tony Pulis were technically inferior to many of their Premier League opponents, so an overtly defensive stance, both at home and away, coupled with long balls into the opponents half followed by brief bouts of pressing, was a pragmatic approach to maximize their chance of gaining points. Virtually every one of their 56 victories achieved under Pulis came with less than 50% of the possession.

Possession is only of minor interest, the real value lies with which primary events took place during those bouts of possession or non- possession

Possession is only of minor interest, the real value lies with which primary events took place during those bouts of possession or non- possession. How many chances did a side create or concede and just as importantly, were the chances of high quality or not? Since these types of statistics can be easily gathered and their correlation to goals and ultimately match result is strong, the need to quote possession diminishes and the temptation to try to draw conclusions merely from possession should be avoided.

True measures of team ability
The true indication of a team’s ability is how efficiently they use their possession in creating chances and how well they can stifle their opponent’s use of the ball when they have it. A side may out possess a side and even create more chances, but those scoring opportunities may be of poorer quality, due to their poor shot location and potentially higher levels of defensive pressure.

Chances created on the counter attack are converted at significantly higher rates than similar open play shots
Chelsea’s counter attack goal at Anfield and Ronaldo’s strike on the break for Real Madrid in the UCL second leg in Munich, easily beat an exposed goalkeeper. These goals from high quality counter attacks aren’t typical of all such chances, but using shot location data, it appears that chances created on the counter attack are converted at significantly higher rates than similar open play shots, which often involve slowly crafted build ups and allow the defence to become organised.

Therefore, it is wiser to attempt to draw conclusions from fundamental statistics such as shots, placed in their proper context, rather than assuming that dominant possession automatically indicates a superior side and any loss was unfortunate.

Low possession teams often have alternative ways of scoring. For example, in La Liga, the home of possession football, lower possession teams tend to score proportionally more of their goals from set pieces and counter attacks than do sides that have followed the Barcelona model.

This year’s champions, Atletico Madrid, scored a third of their goals from set pieces and counters, whilst enjoying 49% possession. That figure fell to 45% in the UEFA Champions League, where over half of their goals were either from counter attacks or set pieces. In the final they came within three minutes of defeating their cross city rivals, Real Madrid.

Possession figures tell you little of how effective a side may be, although possession based teams may also be good teams, creating an illusion of a general widespread correlation. It merely helps to inform you about how they tactically chose or were compelled to play the match.