Social scientists are seeing new red flags in their field’s predicted big-data future, finding computerised analyses not just vulnerable to bias but perhaps fundamentally limited in their predictive value.
Concern is rising after a ?where 160 academic research teams, organised by Princeton University sociologists, tried machine-learning methods to predict the life pathways of disadvantaged children.
“The best predictions were not very accurate and were only slightly better” than those developed in traditional models using far fewer data inputs, the Princeton team reported in PNAS.
That result is a major warning sign for the quickly of approaches to the social sciences, said Filiz Garip, a professor of sociology at Cornell University who was not part of the Princeton study.
At Cornell, for instance, between a third and half of graduate students in the social sciences are already taking classes in machine learning, said Professor Garip, who assessed the Princeton experiment for a .
“Everybody feels like they need to learn this, they need to gain these skills, to find any kind of job,” she said in an interview. Yet so far, as the Princeton study showed, “we’re not gaining a whole lot by using these methods”, she said.
The findings come as social scientists are already on the defensive over indications that using large databases and sophisticated computer programmes to guide political and legal may be human biases.
Long-recognised examples include predictive algorithms that identify black defendants as posing a greater risk of future crime because their community histories often show of police attention.
Advocates of such data-driven assessments have argued that problems within algorithms can be identified and eliminated, thereby making them less biased than decisions that rely on humans alone.
The Princeton study, meanwhile, raises the question of whether the teaching of basic skills and perspectives in the social sciences may be getting pushed aside by an overriding desire to amass and analyse the vast troves of data that can be found on almost any human these days.
Such volumes of data may be adding more by outstripping the capacity of social scientists to meaningfully understand what value each individual piece of data is to a necessary answer, Professor Garip said.
For the Princeton study, the participating research teams were given nearly on each of 4,200 families with a child who was born in a large US city around the year 2000, derived largely from visits, assessments and questionnaires over the following years with the child, parents, caregivers and teachers.
Given that information for those children up to age 9, the teams were asked to predict various outcomes for the child and family at age 15, including child school grades and parent job success.
The teams broadly failed to create computer-aided models that worked any better than traditional social sciences analyses that use far less subject data, in painting a picture of how societal conditions affect people’s lives, the Princeton team wrote.
The Princeton authors, led by sociology professors Matthew Salganik and Sara McLanahan, said they expect their social science colleagues will, in coming years, keep improving their methods of big data computer analysis.
Further experimentation, they said, should also help their field better understand what types of societal problems may justify scientists pursuing individual-level predictions, rather than being content with broader understandings of how policies affect people.
Professor Garip said she agreed with such perspectives. But in the meantime, she cautioned, large numbers of younger social scientists and their universities may be betting too heavily on data-intensive training.
“We have to be careful,” she said, “of jumping on this trend or hype.”