Big Data

There is a lot of interest, and indeed funding, around Big Data. Big Data is a catchy – and suitably vague – phrase which is used to draw attention to the increasingly large amounts of data available to policy makers, natural scientists, and social researchers. Definitions of Big Data are still up for grabs but many centre on the three V’s: velocity, volume and variety. Going further some stress that Big Data are generated in networked systems so that data can be shared across contexts and updated in ‘real time’. One further aspect of Big Data is that while we can set up systems to get at the data we need there is also a lot of data generated for other purposes, or indeed with no clear purpose in mind. Discussion of Big Data often slips over into the Internet of Things – another vague term but one that points out the myriad ways in which we are getting constant streams of data through sensors, monitoring devices and tracking of network activity.

The concept of Big Data is naturally being pushed by hardware and software companies, but it is getting a sympathetic following in government too. It is a big idea in academia and is a priority for research funding [1].

We can be cynical about the concept of Big Data but there is a genuine fascination with the way that technology is changing how we investigate and control our environment. Many of the examples of the application of Big Data seem benign enough – for example monitoring water levels so that early warnings can be given about floods or drought; waste bins that are fitted with sensors so that councils know which bins need emptying and when; car movements tracked so that planners can get immediate feedback on changes to road layouts. However people start getting worried about other contexts. For example music analytics allow the industry to spot emerging musicians and genres but may do so at the expensive of originality; political parties are increasingly adept at tailoring their appeal to different constituencies based on data about those they are targeting; our online movements are tracked in order to build up profiles of us as consumers. Each example of data collection might not be sinister in itself but taken together contribute to our increasing sense of being manipulated. Going further, data are, every day, gathered in ways which most of us would find intrusive if we knew how much was being collected and by whom. A small scale example of this is provided by this blogger (Siraj Datoo) talking about sensors fitted to recycling bins:

http://qz.com/112873/this-recycling-bin-is-following-you/

So what stand should we take on Big Data? One perspective is to realise that many of the issues to do with Big Data are not new – we know how to raise questions about privacy, access rights, security, use and misuse and we need to raise these again. We should also be aware of past attempts to harness large sets of data. For example a forerunner of Big Data was so-called ‘freakonomics’, Levitt and Dubner [2], which crossed over from academia into the book buying public. Freakonomics explored associations within large sets of data and in some cases played around with the data to see what came up – one well publicised example was Levitt and Donohue [2] who claimed an association between legalised abortion (since 1973) in USA and a drop in crime 18 years later. A lot of the examples put forward about Freakonomics were interesting but there was always a question mark, as there is today, about drawing conclusions from ‘associations’ between data rather than offering an explanation or an indication of causality; in other words just because there appears to be a relationship between A and B (they go up or down together) does not mean that A causes B or vice versa. It is ourselves that do the interpretation of the data and the decisions based on data are political ones – for example the monitoring of water levels is in itself very useful but monitoring will not make the water appear or tell you who gets the water and who does not.

We need reminding not to suspend our professional expertise, still less our common sense, when faced with masses of data or (or more correctly algorithms which have organised the data into what seem meaningful ways). A well reported case here is airline pilots’ overreliance on auto pilot and a supposed difficulty in taking decisions when faced with real life emergencies; a more everyday example would be as a motorist when you follow the Sat Nav even when it is taking you north and you want to go south or when you blindly follow directions into a flooded road. Going back to our earlier example a political party or music label will only get so far following trends, no matter how up to the minute their monitoring is, they also have to have a feel for the enterprise they are engaged in and in some way seek to set the agenda.

If looking to be optimistic about Big Data try the NGO Ushahida: http://www.ushahidi.com

For a sympathetic review go to: http://www.ssireview.org/articles/entry/open_source_for_humanitarian_action

Or try the United Nations’ organisation Global Pulse. http://www.unglobalpulse.org/research

I am sure both have their critics but they really are helpful for seeing that technology and political and social action are not incompatible.

[1] The Big Data Family is born – David Willetts MP announces the ESRC Big Data Network http://www.esrc.ac.uk/news-and-events/press-releases/28673/the-big-data-family-is-born-david-willetts-mp-announces-the-esrc-big-data-network.aspx

[2] Levitt, S. and Dubner, S. (2005) Freakonomics: A rogue economist explores the hidden side of everything, New York: William Morrow/HarperCollins.

[3] Donohue, J. and Levitt, S. (2001) ‘The impact of legalized abortion on crime’, The Quarterly Journal of Economics, 116, 2, 379 – 420.