Saturday, 7 September 2013

Barry George case: new insights on the evidence

Jill Dando
Barry George

Our new paper*  "When ‘neutral’ evidence still has probative value: implications from the Barry George Case" (published in the journal Science and Justice) casts doubts on the reasoning in the 2007 Appeal Court judgement that led to the quashing of Barry George's conviction for the shooting to death of TV celebrity Jill Dando.

The paper examines the transcript of the Appeal in the context of new probabilistic research about the probative value of evidence. George's successful appeal was based primarily on the argument that the prosecution's evidence about a particle of firearm discharge residue (FDR) discovered in George's coat pocket, was presented in a way that may have misled the jury. Specifically, the jury in the original trial had heard that the FDR evidence was very unlikely to have been found if Barry George had not fired the gun that killed Jill Dando. Most people would interpret such an assertion as strong evidence in favour of the prosecution case. However, afterwards the same forensic expert concluded that the FDR evidence was just as unlikely to have been discovered if Barry George had  fired the gun. In such a scenario the evidence is considered to be ‘neutral’ - favouring neither the prosecution nor the defence. Hence, the appeal court considered the verdict unsafe and the conviction was quashed. Following the appeal ruling, the FDR was excluded from the jury at George's retrial and he was acquitted.  However, our paper shows that the FDR evidence may not have been neutral after all. 

Formally, the probative value of evidence is captured by a simple probability formula called the likelihood ratio (LR). The LR is the probability of finding the evidence if the prosecution hypothesis is true divided by the probability of finding the evidence if the defence hypothesis is true. Intuitively, if the LR is greater than one then the evidence supports the prosecution hypothesis; if the LR is less than one it supports the defence hypothesis, and if the LR is equals to one (as in the case of the FDR evidence here) then the evidence favours neither and so is 'neutral'.  Accordingly the LR is a commonly recommended method for forensic scientists to use in order to explain the probative value of evidence. However, the new research in the paper shows that the prosecution and defence hypotheses have to be formulated in a certain way in order for the LR to 'work' as expected. Otherwise it is possible, for example, to have evidence whose LR is equal to one but which still has significant probative value.  Our review of the appeal transcript shows that relevant prosecution and defence hypotheses were not properly formulated and, if one were to follow the arguments recorded in the Appeal judgement verbatim, then contrary to the Appeal conclusion, the probative value of the FDR evidence may not have been neutral as was concluded, but rather still supported the prosecution**.

*Full details: Fenton, N. E., D. Berger, D. Lagnado, M. Neil and A. Hsu, (2013). "When ‘neutral’ evidence still has probative value (with implications from the Barry George Case)", Science and Justice, http://dx.doi.org/10.1016/j.scijus.2013.07.002 published online 19 August 2013. For those who do not have full access to the journal, a pre-publication draft of the article can be found here.

** Although the FDR evidence may have been probative after all, we are not in a position to comment on the overall case against Bary George, which others have argued was not particularly strong. Also, it could be argued that even though the FDR evidence was not 'neutral' as assumed in the Appeal, its probative value may not have been as strongly favourable to the prosecution as implied in the original trial; this may have been sufficient in itself to cast doubt on the safety of the conviction.

Wednesday, 7 August 2013

The problem with predicting football results - you cannot rely on the data


Bloomberg Sports have published their predictions for the forthcoming Premiership season (****see update below for actual results) in the form of the predicted end of season table. Here are some key snippets from their press release:
The table indicates that this season will be a three horse race between Chelsea, Manchester City and Manchester United .... The Bloomberg Sports forecast expects Arsenal to claim the final Champions League place ahead of North London rivals Tottenham Hotspur.... At the bottom of the table, all three newly promoted teams are expected to face the drop...
There is just one problem with this set of 'predictions'. The final table - with very minor adjustments - essentially replicates last season's final positions.  The top seven remain the same (with the only positional changes being Chelsea and Man Utd switch positions 1 and 3, and Liverpool and Everton switch positions 6 and 7). And the bottom three are the three promoted teams so they also 'retain' their positions.

Bloomberg say they are using "mathematically-derived predictions" using "vast amounts of objective data". But herein lies the problem. As we argue in our book, relying on data alone is the classical statistical  approach to this kind of prediction. And classical statistics is great at 'predicting the past'. The problem is that we actually want to predict the future not the past!

Along with my PhD student Anthony Constantinou we have been applying Bayesian networks and related methods to the problem of football prediction for a number of years. The great thing about Bayesian networks is that they enable you to combine the standard statistical data (most obviously historical and recent match results) with subjective factors. And it is the incorporation of the subjective (expert) factors that is the key to improved prediction that 'classical' statisticians just do not seem to get.
 
This combination of data and expert judgement has enabled us to get more accurate predictions then any other published system and has even enabled us to 'beat the bookies' consistently (based on a simple betting strategy) despite the bookies' built-in profit margin. Unlike Bloomberg (and others) we have made our methods, models and results very public (a list of published papers in scholarly journals is below). In fact for the last two years Anthony has posted the predictions for all matches the day before they take place on his website pi-football. The prediction for each match is summarised as a very simple set of probabilities, namely the probability of a home win, draw and away win. Good betting opportunities occur when one of the probabilities is significantly higher than the the equivalent probability from the bookies odds.
Example: Suppose Liverpool are playing at home to Stoke. Because of the historical data the bookies would regard Liverpool as strong favourites. They would typically rate the chances of Stoke winning to be very low - say 10% (which in 'odds terms equates to '9 to 1 against'). They add their 'mark-up' and publish odds of, say, 8 to 1 against a Stoke win (which in probability terms is 1/9 or 11%). But suppose there are specific factors that lead our model to predict that the probability of a Stoke win is 20%. Then the model is saying that the bookmakers odds - even given their mark-up - have significantly underestimated the probability of a Stoke win. Although our model still only gives Stoke a 20% chance of winning it is worth placing a bet. Imagine 10 match scenarios like this. If our predictions are correct then you will win on 2 of the 10 occasions. Assuming you bet £1 each time you will end up spending £10 and getting £18 back - a very healthy 80% profit margin.
Thanks to Alex on the Spurs-list for the tip-off on the Bloomberg report.

****Update: The actual results for the 2013-14 season were very different from the Bloomberg predictions. The title was a two-horse race between Man City and Liverpool with the rest far behind. Liverpool had been predicted to come 6th and would have won the title but for a late collapse. Man Utd finished 7th. Only one of the newly promoted club (Cardiff) was relegated.

References:
  • Constantinou, A., N. E. Fenton and M. Neil (2013) "Profiting from an Inefficient Association Football Gambling Market: Prediction, Risk and Uncertainty Using Bayesian Networks". Knowledge-Based Systems. http://dx.doi.org/10.1016/j.knosys.2013.05.008
  • Constantinou, A. C. and N. E. Fenton (2013). "Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries." Journal of Quantitative Analysis in Sports 9(1): 37-50. http://dx.doi.org/10.1515/jqas-2012-0036
  • Constantinou, A., N. E. Fenton and M. Neil (2012). ""pi-football: A Bayesian network model for forecasting Association Football match outcomes." Knowledge Based Systems, 36, 322-339,  http://dx.doi.org/10.1016/j.knosys.2012.07.008
  • Constantinou, A. , Fenton, N.E., "Solving the problem of inadequate scoring rules for assessing probabilistic football forecasting models", Journal of Quantitative Analysis in Sports, Vol. 8 (1), Article 1, 2012. http://dx.doi.org/10.1515/1559-0410.1418

Friday, 5 July 2013

Flaky DNA (the prosecutor's fallacy yet again and much more to be worried about)

In August 2012 David Butler (who had been jailed for the 2005 murder of Anne Marie Foy) was freed when it was discovered that the DNA evidence - which had essentially been the only evidence against him - was flaky in more senses than one. Tiny traces of DNA, whose profile matched that of Butler, were discovered under Foy's fingernails. A sample of Butler's DNA had been previously stored in a database (the police had mistakenly assumed it belonged to the person who burgled his mother's house). It was a search of this database that revealed his DNA matched that under Foy's fingernails.

The reports here and here give a good overview of the case, focusing on the critical observation that some people - such as Butler - have especially dry skin making it extremely likely to shed tiny amounts of DNA wherever they go. This means that Butler - a cab driver - could have easily transferred his cells simply by handling money that was later passed on to either the victim or the real attacker. A more recent US case - described here - also provides an example of how easily DNA can be innocently 'transferred' to a crime scene and mistakenly assumed to belong to the person who committed the crime.

The reporting of these cases highlights just one important scenario under which the probative value of DNA evidence can be massively exaggerated, namely the fact that there are multiple opportunities for DNA to be 'transferred'. This means that DNA found at a crime scene or on a victim could have come from multiple innocent sources.

But there are many other, less well understood, scenarios under which the probative value of DNA evidence can be massively exaggerated, and the Bulter case actually highlights all of few of them:

  1.  Incorrectly reporting the probabilistic impact: In reporting the impact of the DNA evidence it appears (based on the Telegraph report) that the prosecuting QC has yet again committed the prosecutor's fallacy. The statement that there is “a one billion-to-one chance that the DNA belongs to anyone else" is wrong (just as it was  here here and here). In fact, if the DNA profile was indeed such that it is found in one in a billion people, then it is likely to be shared with about six other (unknown and unrelated) people in the world. In the absence of any other evidence against the defendant there is actually therefore a 6/7 chance that it belongs to 'anyone' else.
  2. The impact of a database search: Finding the matching DNA as a result of a database search, rather than as a result of testing a suspect on the basis of some other evidence, completely changes the impact of the evidence. This is especially devastating when there is so-called 'low-template DNA' - where the random match probabilities are nothing like as low as 1 in a billion.  Let's suppose the DNA at the crime scene is such that it is found in one in every 10,000 people. Then even in a fairly small database - say of 5,000 individuals' DNA samples - there is a good chance (about 50%) that we will find a match to the crime scene DNA. Suppose we find a 'match'. Have we 'got our man'. Almost certainly not. In the UK alone we would expect 6000 people to have the matching DNA. In some cases low-template DNA profiles have a match probability of 1 in 100. In such situations a database match tells us nothing at all. If we charged the first matching person we found we would almost certainly have the wrong person.
  3. The potential for errors in DNA analysis and testing. It is not just the potential for 'innocent transfer' that we have to consider when we think about 'human error'.  Brian McKeown, chief scientist representative from LGC Forensics says:
         "..the science is flawless and must not be ignored. If you do it right you get the right result.”.
     Yet LGC have themselves committed high-profile critical DNA testing errors, such as those reported here and here. When their scientists report the probabilistic impact of DNA matches they never incorporate the very real probability of errors that can be introduced at numerous stages in the process. As we explained here (and we will be reporting much more extensively on this in upcoming papers) when sensible allowance is made for human error, the DNA 'statistics' become very different.
  4. The critical importance of absence of DNA evidence. If a person - especially one who easily sheds DNA - really did rape and strangle the victim then, the fact that only tiny cells of DNA matching theirs are discovered on the victim is actually two pieces of evidence. One is made explicit - that the DNA matches - and it supports the prosecution case. But the other - that no substantive DNA from the defendant was found - is typically ignored; and it may provide very strong support for the defence case. This 'evidence' of  'relative absence of DNA evidence' has been a key (previously ignored) factor in cases I have been recently involved in, so hopefully soon I will be able to reveal more about its impact.
  5. The entire theoretical basis for DNA 'match probabilities' and sampling is itself extremely flaky. This is something I am currently looking at with colleagues and will be writing about soon.
Unlike some others, I am not suggesting the imminent future demise of DNA in the courtroom. However, I am convinced that a far more critical approach to both the presentation and evaluation of DNA evidence is urgently required to avoid future miscarriages of justice. And I am convinced that many - as yet undiscovered - errors in DNA analysis means that innocent people are in jail and guilty people are at large.


Thursday, 11 April 2013

Bayesian networks plagiarism

If, as they say, imitation is the sincerest form of flattery then we are privileged to have discovered (thanks to a tip off by Philip Leicester) that our work on Bayesian network idioms - first published in Neil M, Fenton NE, Nielsen L, ''Building large-scale Bayesian Networks'', The Knowledge Engineering Review, 15(3), 257-284, 2000 (and covered extensively in Chapter 7 of our book) has been re-published - almost verbatim -  in the following publication:
Milan Tuba and Dusan Bulatovic, "Design of an Intruder Detection System Based on Bayesian Networks", WSEAS Transactions on Computers, 5(9), pp 799-809, May 2009. ISSN: 1109-2750
The whole of Section 3 ("Some design aspects of large Bayesian networks") - which constitutes 6 out of the 10 pages - is lifted from our 2000 paper.  Our work was partly inspired by the work of Laskey and Mahoney. The authors reference that work but, of course, not ours, hence confirming the very deliberate plagiarism.

Milan Tuba and Dusan Bulatovic are at the Megatrend University of Belgrade (which we understand is a small private University) and we had not come across them before now. The journal WSEAS Transactions on Computers seems to be an example of one of the dubious journals exposed in this week's New York Times article. Curiously enough, after a colleague distributed that article yesterday I was going to write back to him saying that I disagreed with the rather elitist tone of the article, which suggests that the peer review process of the 'reputable scientific journals' was somehow unimpeachable - in reality there is no consensus on what journals are 'reputable' and even the refereeing of those widely considered to be the best is increasingly erratic and at times bordering on corrupt (which is inevitable when it relies exclusively on volunteer academics).  But at least I would hope that any 'reputable' journal would still be alert to the kind of plagiarism we now see here.

This is not the first time our work has been very blatantly plagiarised. Interestingly, on a previous occasion it was in a book that was published by Wiley Finance (who I am sure are widely considered one of the most reputable publishers). The book was 'written' by a guy who had been our PhD student for a short time at City University before he vanished without notice or explanation. The book contained large chunks of our work (none of which the 'author' had contributed to, as it predated his time as a PhD student with us) without any attribution. Despite informing Wiley of this, and proving to them that a) the author's qualifications as stated in the book were bogus; and b) the endorsements on the back cover were fraudulent, they did nothing about it.

Thursday, 28 February 2013

What chance the next roll of the die is a 3?

In response to my posting yesterday a colleague posed the following question:
The die has rolled 3 3 3 3 3 3 3 in the past. What are the chances of 1 2 4 5 6 being rolled next? The mathematician will say: P(k)=1/6 for each number, forget that short-term evidence. What will the probability expert say? And the statistician? And the philosopher? 
I have provided a detailed solution to this problem here.

In summary, it is based on a Bayesian network in which (except for the 'statistician') it all comes down to what priors they are assuming for the probability of each P(k).
  • The mathematician's prior is that the probability of each P(k) is exactly 1/6.
  •  One type of probability expert (including certain types of Bayesians) will argue that, in the absence of any prior knowledge of the die, the probability distribution for each P(k) is uniform over the interval 0-1 (meaning any value is just as likely as any other).
  • Another probability expert (including most Bayesians) will argue that the prior should be based on dice they have previously seen. They believe most dice are essentially 'fair' but there could be biases due to either imperfections or deliberate tampering. Such an expert might therefore specify the prior distribution for P(k) to be a narrow bell curve centred on 1/6.
  •  A philosopher might consider any of the above but might also reject the notion that 1,2,3,4,5,6 are the only outcomes possible.
Anyway, when we enter the evidence of seven 3's in 7 rolls, the Bayesian calculations (performed using AgenaRisk) result in an updated posterior distribution for each of the P(k)s.

The mathematician's posterior for each P(k) is unchanged: i.e. each P(k) is still 1/6.So there is still just a probability of 1/6 the next roll will be a 3.

For the probability expert with the uniform priors, the posterior for P(3) is now a distribution with mean 0.618. The other probabilities are all reduced accordingly to distributions with mean about 0.079. So in this case the probability of rolling a 3 next time is about 0.618 whereas each of the other numbers has a probability about 0.079

For the probability expert with the bell curve priors, the posterior for P(3) is now a distribution with mean 0.33. The other probabilities are all reduced accordingly to distributions with mean about 0.13. So in this case the probability of rolling a 3 next time is about 0.33 whereas each of the other numbers each has a probability about 0.13.

And what about the statistician? Well a classical statistician cannot give any prior distributions so the above approach does not work for him. What he might do is propose a 'null' hypothesis that the die is 'fair' and use the observed data to accept or reject this hypothesis at some arbitrary 'p-value' (he would reject the null hypothesis in this case at the standard p=0.01 value). But that does not provide much help in answering the question. He could try a straight frequency approach in which case the probability of a three is 1 (since we observed 7 out of 7 threes) and the probability of any other number is 0.

Anyway the detailed solution showing the model and results is here. The model itself - which will run in AgenaRisk is here.

Wednesday, 27 February 2013

"No such thing as probability" in the Law?

David Spiegelhalter has posted an important article about a recent English Court of Appeal judgement in which the judge essentially suggests that it is unacceptable to use probabilities to express uncertainty about unknown events. Some choice quotes David provides from the judgement include:
"..and to express the probability of some event having happened in percentage terms is illusory.
....The chances of something happening in the future may be expressed in terms of percentage. ... But you cannot properly say that there is a 25 per cent chance that something has happened... Either it has or it has not. "
What is interesting about this is that the judge has used almost the same words that we said (in- Chapter 1 of our book Risk Assessment and Decision Analysis with Bayesian Networks) we had heard from several lawyers. One of the quotes we gave there from an eminent lawyer was:
“Look the guy either did it or he didn’t do it. If he did then he is 100% guilty and if he didn’t then he is 0% guilty; so giving the chances of guilt as a probability somewhere in between makes no sense and has no place in the law”. 
Of course, as we show in the book (Chapter 1 is freely available for download) you can actually prove that the this kind of assertion is flawed in the sense that it inevitably leads to irrational decision-making.

The key point is that there can be as much uncertainty about an event that has yet to happen (e.g. whether or not your friend Naomi will roll a 6 on a die) as one that has happened (e.g. whether or not Naomi did roll a six on the die). It all depends on what information you know about the event that has happened. If you did not actually see the die rolled in the second case your uncertainty about the outcome is no different than before it was rolled, even though Naomi knows for certain whether or not it was a six (so for her the probability really is either 1 or 0). As you discover information about the event that has happened (for example, if another reliable friend tells you that an even number was rolled) then your uncertainty changes (in this case from 1/6 to 1/3). And that is exactly what is supposed to happen in a court of law where, typically, nobody (other than the defendant) knows  whether the defendant committed the crime; in this case it is up to the jury to revise their belief in the probability of guilt as they see evidence during the trial.

David Spiegelhalter points out that the judge is not just 'banning' Bayesian reasoning, but also banning the Sherlock Holmes approach to evidence. But it is even worse, because the judge is essentially banning the entire legal rationale for presenting evidence (which is ultimately about helping the jury to determine the probability that the defendant committed the crime).

p.s. There are other aspects of the case which are troubling, notably the assumption that there were just three possible potential causes of the fire (other as yet unknown/unknowable potential causes would have non-zero prior probabilities). However, the judge got some things right including his line of reasoning about the relative likelihood of two unlikely events (the arcing or the smoking) demonstrated that, if these are exhaustive, then the smoking was the most likely cause. 

Tuesday, 15 January 2013

Who is the appropriate expert here: a DNA specialist or a probability specialist?

An interesting issue about expert evidence has arisen in a case on which I am providing input. It can be summarized as follows:

If a DNA expert makes an incorrect probabilistic inference (such as a logical or computational error) arising from a DNA probability, is it appropriate for a probability expert to point out the error or is only a DNA expert qualified to point out the error?

According to many lawyers only a DNA expert is qualified. I believe this is fundamentally wrong, as the following real (but anonymized) example demonstrates:

A partial DNA sample found at the scene of the crime (containing only two clearly identifiable components)  matches the defendant's DNA.  The DNA expert (who we will refer to as expert A) concludes:
"the probability this DNA comes from anybody other than the defendant is very unlikely". 
A probability expert (expert B) believes that the DNA expert's conclusion may be highly misleading; Expert B asks an independent DNA expert (expert C) to check the DNA evidence and provide a match probability. Expert C confirms a two-component match and asserts that the match probability is about 1 in a 100, i..e. the probability of finding such a match in a person not involved is 1 in 100. Expert B uses this information to explain why expert A's statement was misleading as follows:
Expert A is making a statement about the probability of the defendant not being the source of the DNA, when all that expert A can actually conclude is that the probability of getting such a DNA match if the defendant is not the source is 1 in 100. If the term 'very unlikely' is a surrogate for the more precise "1 in 100 probability" then the expert is making the transposed conditional error (prosecutors fallacy). Specifically one cannot make any conclusions about the (posterior) probability of the defendant being or not being the source without knowing something about the prior probability - i.e. without knowing how many other people could have left the DNA sample at the scene. If, for example, there are 1000 people who have not been ruled out then about 10 of these would have the matching partial DNA. In that case "the probability this DNA comes from anybody other than the defendant is about 90%" - which is very different from expert A's conclusion. 
A lawyer rejects expert B's contribution because it is "outside his area of expertise", stating the following:
Expert B is not an expert on DNA - as is proved by the fact that he had to ask another DNA expert (C) to come up with the relevant random match probability - and so is not qualified to comment on DNA evidence.Only a DNA expert can comment on the likelihood that the DNA comes from the defendant.

But expert B is NOT commenting on the DNA evidence. Expert B is commenting on a logically incorrect - and unnecessarily vague - probabilistic inference made by a person who happens to be a DNA expert. In fact the only person here who is venturing outside their area of expertise is expert A because he/she has made an assumption about something which he/she has no information or expertise - namely the number of people who could potentially have been at the scene of the crime.

The logical extension of the lawyer's argument would be to reject all logical, mathematical and statistical analysis about a problem X if it is not presented by a person who is an expert in problem X.