Social science research - where being right less than half the time is fine

Key topics:Social science studies show ~50% replication success rateSCORE project tests replicability, reproducibility, robustnessKey issue: overconfidence in published research findings.Sign up for your early morning brew of the BizNews Insider to keep you up to speed with the content that matters. The newsletter will land in your inbox every morning on weekdays. Register here.Support South Africa's bastion of independent journalism, offering balanced insights on investments, business, and the political economy, by joining BizNews Premium. Register here.If you prefer WhatsApp for updates, sign up to the BizNews channel here..By Justin Fox.Let’s say you do a job that involves making predictions about human behavior — you manage money, you sell things, you write opinion columns. Just less than half of your predictions turn out to be more or less right, about 10% are completely wrong, and with the rest it’s hard to say for sure. Would a success rate like that make you good at your job?This ran through my mind as I perused the findings of the Center for Open Science’s huge Systematizing Confidence in Open Research and Evidence project, which were published in a series of articles in the journal Nature at the beginning of this month and are available outside the paywall — along with other papers and supporting data — at the center’s website. In a study that attempted to replicate the findings of 164 randomly selected articles published in social science journals using new data sets, 49.3% of the replications “had statistically significant findings with the same pattern as the original finding,” 9.7% showed an “opposing pattern” and 40.4% showed no statistically significant effect.Others seemed to interpret this result as an indication of failure. “Across the Social Sciences, Half of Research Doesn’t Replicate,” was the headline of an article in Science. At Forbes it was “Only About Half Of Social Science Results Can Be Replicated, Finds New Study.” In their new book The Credibility Crisis in Science: Tweakers, Fraudsters, and the Manipulation of Empirical Results, social scientists Thomas Plümper and Eric Neumayer term the 47% replication rate found in a 2015 analysis of psychology papers “measly.”.Read more:.The Epileptic brain and spiritual experience: Science meets the mystical.Successful replication of findings doesn’t necessarily mean those findings are right, and failure to replicate doesn’t necessarily mean they’re wrong, but they are pretty strong signals. There are fields — air traffic control, bridge design and construction, mushroom-foraging — where being right less than 50% of the time would be a disaster. There is famously one where a 30% success rate makes you a star. In disciplines trying to get a handle on complex and hard-to-measure behavioral phenomena, anything close to 100% success is implausible, and it’s not at all clear what replication rate would represent satisfactory performance. A sub-50% rate is definitely a signal that consumers of such research shouldn’t make too much of the results of a single study, but it doesn’t mean the whole enterprise is suspect.“I am not sure that there is an optimal rate,” Brian Nosek, the co-founder and executive director of the Center for Open Science, said in an email when I asked him about this:Surely, the answer varies depending on how important it is that our expectation is correct. For example, if we roll out an education intervention in schools, I’d want high confidence that the intervention is based on replicable findings. But, if the study findings only suggest an interesting possibility that deserves further research, I don’t need much confidence in their replicability (yet).The replication study was just one part of the SCORE project. There were also examinations of reproducibility (rerunning a study’s analysis on the same data) and robustness (applying different analytical techniques to the same data). Here are the overall success percentages for six social science disciplines across all three approaches..Some of these percentages are based on only a few papers (six for business and robustness, eight for education and reproducibility, 13 for education and replicability) and have huge margins of error, so I wouldn’t make too much of the differences between disciplines. But it is interesting that no discipline stands out as clearly better or worse than the others across all three measures. Also interesting is that psychology, the social science field that has generated the most bad publicity about research fraud and replication failures over the past decade and a half, has a repeatability record similar to that of economics and political science.It was a couple of those psychology scandals and an influential paper titled “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant” (by Joseph Simmons, Leif Nelson and Uri Simonsohn, who have gone on to expose much suspect research on their Data Colada blog) that led Nosek, a psychology professor at the University of Virginia, and Jeff Spies, who was a graduate student at the time, to launch the Center for Open Science in 2013. The 2015 analysis of psychology papers mentioned above was an early project. There was also an examination of preclinical cancer research, published in 2021, in which “46% of effects replicated successfully on more criteria than they failed” — it’s not just the social sciences that land in this zone.The center has led several significant efforts to increase research transparency and reliability. The SCORE project, funded by the US Defense Advanced Research Projects Agency, was aimed mainly at assessing different ways of predicting whether research findings would replicate. Along with the reproducibility and robustness studies, researchers tested aggregated human opinions and several machine-learning methods. The aggregated humans were pretty effective, with a prediction market correctly predicting three-quarters of replications and a “structured group deliberation protocol” two-thirds, but the machine-learning methods didn’t have much success. The SCORE project did not make use of large-language models, which weren’t available when work started in 2019. With advances in AI and the new information provided by SCORE, the machines will likely do better in the future..Read more:. How my fellow scientists are presenting their research wrong: Abdullah Shihipar.As a frequent consumer and disseminator of social science research, the replication findings reinforce lessons that I have slowly been learning over the years about the hazards involved in relying on it. “To me the issue needing to be solved is overconfidence,” Nosek argued. “We tend to act as if published findings are replicable without actually assessing whether they are.” The early years of this century were a heyday of “studies say” journalism, along with bestselling books and much-viewed TED talks by the researchers themselves, much of which has not aged well. Part of the problem was researchers cutting corners to generate findings conclusive enough to be published, or supportive of a particular worldview, and if efforts like those of the Center for Open Science can reduce the corner-cutting, that’s great. But we should also be wary of overestimating how reliable such research can possibly be..© 2026 Bloomberg L.P.

Social science research - where being right less than half the time is fine

Related Stories

Musk’s real AI power play isn’t Grok — It’s the data centres: Parmy Olson

'Toti is becoming a battleground for data centres

Cynthia Stimpel: The heroic sacrifices of SA’s whistleblowers on corruption… — The NdB Sunday Show

BizNews portfolio: Elon Musk is betting his business empire on AI