## Tuesday, 17 April 2018

### Explaining Bayesian Networks through a football management problem

Today's Significance Magazine (the magazine of the Royal Statistical Society and the American Statistical Association) has published an article by Anthony Constantinou and Norman Fenton that explains, through the use of an example from football management, the kind of assumptions required to build useful Bayesian networks (BNs) for complex decision-making. The article highlights the need to fuse data with expert knowledge, and describes the challenges in doing so. It also explains why, for fully optimised decision-making, extended versions of BNs, called Bayesian decision networks, are required.

The published pdf (open source) is also available here and here.

Full article details:

Constantinou, A., Fenton, N.E, "Things to know about Bayesian networks", Significance, 15(2), 19-23 April 2018, https://doi.org/10.1111/j.1740-9713.2018.0

## Wednesday, 14 March 2018

### Evidence based decision making turns knowledge into power

A nice 2-page article about our BAYES-KNOWLEDGE project is in the latest issue of EU Research Magazine Beyond the Horizon. A pdf version is here.

## Tuesday, 6 March 2018

### Two coins: one fair one biased

Alexander Bogolmony tweeted this problem:

If there is no reason to assume in advance that either coin is more likely to be the coin tossed once (i.e. the first coin) then all the (correct) solutions show that the first coin is more likely to be biased with a probability of 9/17 (=0.52941). Here is an explicit Bayesian network solution for the problem:

The above figure shows the result after entering the 'evidence' (i.e. one Head on the coin tossed once and two Heads on the coin tossed three times). The tables displayed are the conditional probability tables defined for the associated with the variables.

This model took just a couple of minutes to build in AgenaRisk and requires absolutely no manual calculations as the Binomial distribution is one of many functions pre-defined. The model (which can be run in the free version of AgenaRisk is here). The nice thing about this solution compared to the others is that it is much more easily extendible. It also shows the reasoning very clearly.

If there is no reason to assume in advance that either coin is more likely to be the coin tossed once (i.e. the first coin) then all the (correct) solutions show that the first coin is more likely to be biased with a probability of 9/17 (=0.52941). Here is an explicit Bayesian network solution for the problem:

The above figure shows the result after entering the 'evidence' (i.e. one Head on the coin tossed once and two Heads on the coin tossed three times). The tables displayed are the conditional probability tables defined for the associated with the variables.

This model took just a couple of minutes to build in AgenaRisk and requires absolutely no manual calculations as the Binomial distribution is one of many functions pre-defined. The model (which can be run in the free version of AgenaRisk is here). The nice thing about this solution compared to the others is that it is much more easily extendible. It also shows the reasoning very clearly.

## Monday, 12 February 2018

### An Improved Method for Solving Hybrid Influence Diagrams

Most decisions are made in the face of uncertain factors and outcomes.
In a typical decision problem, uncertainties involve both continuous
factors (e.g. amount of profit) and discrete factors (e.g. presence of a
small number of risk events). Tools such as decision trees and
influence diagrams are used to cope with uncertainty regarding
decisions, but most implementations of these tools can only deal with
discrete or discretized factors and ignore continuous factors and their
distributions.

A paper just published in the International Journal of Approximate Reasoning presents a novel method that overcomes a number of these limitations. The method is able to solve decision problems with both discrete and continuous factors in a fully automated way. The method requires that the decision problem is modelled as a Hybrid Influence Diagrams, which is an extension of influence diagrams containing both discrete and continuous nodes, and solves it by using a state-of-the-art inference algorithm called Dynamic Discretization. The optimal policies calculated by the method are presented in a simplified decision tree.

The full reference is:

Acknowledgements: Part of this work was performed under the auspices of EU project ERC-2013-AdG339182-BAYES_KNOWLEDGE

A paper just published in the International Journal of Approximate Reasoning presents a novel method that overcomes a number of these limitations. The method is able to solve decision problems with both discrete and continuous factors in a fully automated way. The method requires that the decision problem is modelled as a Hybrid Influence Diagrams, which is an extension of influence diagrams containing both discrete and continuous nodes, and solves it by using a state-of-the-art inference algorithm called Dynamic Discretization. The optimal policies calculated by the method are presented in a simplified decision tree.

The full reference is:

Yet, B., Neil, M., Fenton, N., Dementiev, E., & Constantinou, A. (2018). "An Improved Method for Solving Hybrid Influence Diagrams". International Journal of Approximate Reasoning. DOI: 10.1016/j.ijar.2018.01.006 Preprint (open access) available here.

**UPDATE (22 Feb 2018): The full published version the paper is available online for free for 50 days here: https://authors.elsevier.com/c/1Wc6D,KD6ZG8y-**Acknowledgements: Part of this work was performed under the auspices of EU project ERC-2013-AdG339182-BAYES_KNOWLEDGE

## Saturday, 10 February 2018

### Decision-making under uncertainty: computing "Value of Information"

Information gathering is a crucial part of decision making under uncertainty. Whether to collect additional information or not, and how much to invest for such information are vital questions for successful decision making. For example, before making a treatment decision, a physician has to evaluate the benefits and risks of additional imaging or laboratory tests and decide whether to ask for them. Value of Information (VoI) is a quantitative decision analysis technique for answering such questions based on a decision model. It is used to prioritise the parts of a decision model where additional information is expected to be useful for decision making.

However, computing VoI in decision models is challenging especially when the problem involves both discrete and continuous variables. A new paper in the IEEE Access journal illustrates a simple and practical approach that can calculate VoI using Influence Diagram models that contain both discrete and continuous variables. The proposed method can be applied to a wide variety of decision problems as most decisions can be modelled as an influence diagram, and many decision modelling tools, including Decision Trees and Markov models, can be converted to an influence diagram.

The full reference is:

Yet, B., Constantinou, A., Fenton, N., & Neil, M. (2018). Expected Value of Partial Perfect Information in Hybrid Models using Dynamic Discretization. IEEE Access. DOI: 10.1109/ACCESS.2018.2799527

Acknowledgements: Part of this work was performed under the auspices of EU project ERC-2013-AdG339182-BAYES_KNOWLEDGE, EPSRC project EP/P009964/1: PAMBAYESIAN, and ICRAF Contract No SD4/2012/214 issued to Agena.

## Wednesday, 7 February 2018

### Lawnmower v terrorist risk: the saga continues

Kim Kardashian's tweet comparing risk from lawnmowers v terrorists triggered the award and debate |

Yesterday Significance Magazine (the magazine of the Royal Statistical Society and the American Statistical Association) published an article “Lawnmowers versus Terrorists” with the strapline:

Our case, titled “A highly misleading view of risk”, was an edited version of a paper previously publicised in a blog post that itself followed up on original concerns raised by Nicholas Nassim Taleb about the RSS citation and the way it had been publicised. The ‘opposing’ case made by Nick Thieme was essentially a critique of our paper.The Royal Statistical Society’s first ‘International Statistic of the Year’ sparked plenty of online discussion. Here, Norman Fenton and Martin Neil argue against the choice of winner, while Nick Thieme writes in support.

We have today published a response to Nick’s critique.

Links:

- Norman Fenton,Martin Neil,Nick Thieme "Lawnmowers versus terrorists" Significance Magazine Volume 15, Issue 1,February 2018 Pages 12–13
- Norman Fenton and Martin Neil We have today also published a response to Nick’s critique: http://dx.doi.org/10.13140/RG.2.2.30958.72002
- Are lawnmowers a greater risk than terrorists?
- On lawnmowers and terrorists again: the danger of using historical data alone for decision-making

## Monday, 5 February 2018

### Revisiting a Classic Probability Puzzle: the Two Envelopes Problem

Many people have heard about the Monty Hall problem. A similar (but less well known and more mathematically interesting) problem is the

*, which Wikipedia describes as follows:*

**two envelopes problem**“You are given two indistinguishable envelopes, each containing money, one contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?”The problem has been around in various forms since 1953 and has been extensively discussed (see, for example Gerville-Réache for a comprehensive analysis and set of references) although I was not aware of this until recently.

We actually gave this problem (using boxes instead of envelopes) as an exercise in the supplementary material for our Book, after Prof John Barrow of University of Cambridge first alerted us to it. The ‘standard solution’ (as in the Monty Hall problem) says that you should always switch. This is based on the following argument:

If the envelope you choose contains $100 then there is an evens chance the other envelope contains $50 and an evens chance it contains $200. If you do not switch you have won $100. If you do switch you are just as likely to decrease the amount you win as increase it. However, if you win the amount increases by $100 and if you lose it only decreases by $50. So your expected gain is positive (rather than neutral). Formally, if the envelope contains S then the expected amount in the other envelope is 5/4 times X (i.e. 25% more).In fact (as pointed out by a reader Hugh Panton), the problem with the above argument is that it equally applies to the ‘other envelope’ thereby suggesting we have a genuine paradox. In fact, it turns out that the above argument only really works if you actually open the first envelope (which was explicitly not allowed in the problem statement) and discover it contains S. As Gerville-Réache shows, if the first envelope is not opened, the only probabilistic reasoning that does not use supplementary information leads to estimating expectations as infinite amounts of each envelope. Bayesian reasoning can be used to show that there is no benefit in switching, but that is not what I want to describe here.

What I found interesting is that I could not find - in any of the discussions about the problem - a solution for the case where we assume there is a

*, even if we allow that maximum to be as large as we like. With this assumption it turns out that we can prove (without dispute) that there is no benefit to be gained if you stick or switch. See this short paper for the details:*

**finite maximum prize**Fenton N E, "Revisiting a Classic Probability Puzzle: the Two Envelopes Problem" 2018, DOI10.13140/RG.2.2.24641.04960

## Friday, 19 January 2018

### Criminally Incompetent Academic Misinterpretation of Criminal Data - and how the Media Pushed the Fake News

The research paper was written by the world famous computer scientist Hany Farid (along with a student Julia Dressel).

But the real story here is that the paper’s accusation of racial bias (specifically that the algorithm is biased against black people) is based on a fundamental misunderstanding of causation and statistics. The algorithm is no more ‘biased’ against black people than it is biased against white single parents, old people, people living in Beattyville Kentucky, or women called ‘Amber’. In fact, as we show in this brief article, if you choose any factor that correlates with poverty you will inevitably replicate the statistical ‘bias’ claimed in the paper. And if you accept the validity of the claims in the paper then you must also accept, for example, that a charity which uses poverty as a factor to identify and help homeless people is being racist because it is biased against white people (and also, interestingly, Indian Americans).

The fact that the article was published and that none of the media running the story realise that they are pushing fake news is what is most important here. Depressingly, many similar research studies involving the same kind of misinterpretation of statistics result in popular media articles that push a false narrative of one kind or another.

Our article (5 pages): Fenton, N.E., & Neil, M. (2018). "Criminally Incompetent Academic Misinterpretation of Criminal Data - and how the Media Pushed the Fake News" http://dx.doi.org/10.13140/RG.2.2.32052.55680 Also available here.

The research paper: Dressel, J. & Farid, H. The accuracy, fairness, and limits of predicting recidivism. Sci. Adv. 4, eaao5580 (2018).

See some previous articles on poor use of statistics:

## Thursday, 11 January 2018

### On lawnmowers and terrorists again: the danger of using historical data alone for decision-making

The short paper and blog posting
we did last week generated a lot of interest, especially after Nicholas
Taleb retweeted it. An edited version (along with a response from a
representative of the Royal Statistical Society) is going to appear in
the February issue of Significance magazine (which is the magazine of
the RSS and the American Statistical Association). In the mean time we
have produced another short paper that explores further problems with
the 'lawnmower versus terrorist risk' statistics - in particular the
inevitable limitations and dangers of relying on historical data alone
for risk assessment:

Fenton, N.E., & Neil, M. (2018). "Is decision-making using historical data alone more dangerous than lawnmowers?", Open Access Report DOI:10.13140/RG.2.2.20914.71363. Also available here.

## Wednesday, 3 January 2018

### Are lawnmowers a greater risk than terrorists?

Kim Kardashian, whose tweet comparing the threats of lawnmowers and terrorists led to RSS acclaim |

Fenton, N.E., & Neil, M. (2018). "Are lawnmowers a greater risk than terrorists?" Open Access Report DOI:10.13140/RG.2.2.34461.00486/1As you can see from the tweet by Taleb, this use of statistics for risk assessment was not universally welcomed.

See update to this story here.

Subscribe to:
Posts (Atom)