Tuesday, 24 March 2015

The problem with big data and machine learning


The advent of ‘big data’, coupled with fancy statistical machine learning techniques, is increasingly seducing people to believe that new insights and better predictions can be achieved in a wide range of important applications, without relying on the input of domain experts. The applications range from learning how to retain customers through to learning what makes people susceptible to particular diseases. I have written before about the dangers of this kind of 'learning' from data alone (no matter how 'big' the data is).

Contrary to the narrative being sold by the big data community, if you want accurate predictions and improved, decision-making then, invariably, you need to incorporate human knowledge and judgment. This enables you to build rational causal models based on 'smart' data. The main objections to using human knowledge - that it is subjective and difficult to acquire - are, of course,  key drivers of the big data movement. But this movement underestimates the typically very high costs of collecting, managing and analysing big data. So, the sub-optimal outputs you get from pure machine learning do not even come cheap.

To clarify the dangers of relying on big data and machine learning, and to show how smart data and causal modelling (using Bayesian networks) gives you better results, I have collected together the following short stories and examples:
The whole subject of 'smart data' rather than 'big data' is also the focus of the research project BAYES-KNOWLEDGE.

6 comments:

  1. The advent of big data coupled with fancy statistic machine learning techniques is increasingly seducing people to believe that new insights very high costs of collecting managing and analysing big data the sub-optimal outputs you get from pure machine learning.Its nice article in related giving more information post!.

    Regards
    safety training in Chennai
    safety training institute in Chennai
    safety institute Chennai
    fire and safety courses in chennai
    fire and safety course in chennai

    ReplyDelete
  2. The appearance of enormous information combined with favor measurement machine learning strategies is progressively luring individuals to trust that new experiences high expenses of gathering overseeing and dissecting huge information the imperfect yields you get from unadulterated machine learning.Its pleasant article in related giving more data post!.

    Big data

    ReplyDelete