What's wrong with social science and how to fix it

[+] unishark|5 years ago|reply

> Publishing your own weak papers is one thing, but citing other people's weak papers? This seemed implausible...

This is practically required by reviewers and editors. If you wade into a topic area, you need to review the field and explain where you fit in, even though you know full well many of those key citations are garbage. You basically need to pay homage to the "ground breakers" who claimed that turf first, even if they did it via fraud. They got there first, got cited by others, and so are now the establishment you are operating under.

And making a negative reference to them is not a trivial alternative. For one thing, you need to be certain, not just deeply suspicious of the paper, which just adds work and taking a stand may bring a fight with reviewers that hurts you anyway.

[+] selectionbias|5 years ago|reply

Citing a paper needn't be a celebration of it, you can cite a paper to say "these guys find X but..."

[+] goatinaboat|5 years ago|reply

You basically need to pay homage to the "ground breakers" who claimed that turf first, even if they did it via fraud.

Even referring to it as “science” is fraudulent. Testable theories and repeatable outcomes, anyone? Time this whole field was defunded.

[+] ims|5 years ago|reply

There were some stunning claims being made on Twitter last month based on a recently published study. Instantly skeptical, I dug into the methodology section and found this gem:

"It should be noted that the results cannot be estimated using a physician fixed effect due to a numeric overflow problem in Stata 15 which cannot be overcome without changing the assumptions of the logit model."

... The sad part was they didn't even choose a reasonable model in the first place.

[+] selectionbias|5 years ago|reply

(Ignore my previous reply I found it myself). To be fair to the authors, it is not their primary specification, that was a linear probability model. The logit model is just a robustness check to make sure the linearity assumption isn't driving the results.

[+] bitxbit|5 years ago|reply

I mean they couldn’t just do it in Matlab? or Python? So incredibly lazy.

[+] selectionbias|5 years ago|reply

Which paper was this?

[+] xrisk|5 years ago|reply

Is there a similar study done on the physical sciences? I’m getting a bit of holier-than-thou feeling from this article.

Edit: from all this talk of reproducibility, I wonder what percentage of cutting edge ML research is reproducible (either from lack of public training sets / not enough compute)

[+] unishark|5 years ago|reply

There are definitely studies criticizing ML publications similarly. As a kind of statistics (but often without the rigor), screwups make ML methods appear better than they really are. Hence the literature is packed with screwups.

Other CS subfields that get a lot of criticism are "network science" and bioinformatics.

[+] garden_hermit|5 years ago|reply

There are tons of replication issues across the sciences, they are just most salient in the Social Sciences because the topic is just really hard to study well.

Clinical trials can often be flawed, even if the stats are fine, just in how they sample. For example, women are often excluded from trials due to hormonal changes, but how drugs impact women is really important! Participants are also typically drawn from specific locations, and so may not be representative of people with different diets, lifestyles, and environmental factors.

Physics has its own controversies, though not always directly related to replication. For example, Harry Collins recounts the social factors involved in the discovery of gravitational waves: https://blogs.sciencemag.org/books/2017/03/28/harry-collins-...

[+] CuriouslyC|5 years ago|reply

Biological sciences are more often than not just as difficult to reproduce, mostly due to the difficulty of controlling living organisms, the somewhat random nature of the outcome, and p-hacking.

[+] rmbeard|5 years ago|reply

This one comes close https://www.nature.com/news/1-500-scientists-lift-the-lid-on...

[+] thu2111|5 years ago|reply

He mentions that epidemiology has actually more severe problems than economics. Having read some epi papers I understand why. Not sure if you'd count that as a physical or social science though: at least theoretically it's biologically based, but in reality the data it works with is mostly social and demographic.

[+] not2b|5 years ago|reply

This guy overstates his case somewhat. Consider:

"If the original study says an intervention raises math scores by .5 standard deviations and the replication finds that the effect is .2 standard deviations (though still significant), that is considered a success that vindicates the original study!"

Why the exclamation point here? The replication study isn't magically more accurate than the original study. If the original paper finds an 0.5 standard deviation effect and the replication study finds an 0.2 standard deviation effect, that increases our confidence that a real effect was measured, but there's no reason to believe that the replication study is more accurate than the original study. Maybe the true effect is less than measured, but maybe not. So yes, it should be considered a success.

[+] vharuck|5 years ago|reply

When I advise decision makers on reading statistics (in my case, state-wide health data), I urge them to focus on effect size and only use significance as a filter. Two reasons:

1. Effect size is the most important thing. The point of the study is (usually) to guide decisions. Sticking with the article's example, let's say combining both studies shows the increase is likely 0.35 standard deviations. Is the intervention still worth the cost? Is it still the best option?

2. If there's enough data (e.g., an observational study) or a good chance of omitted variables, there's going to be a "statistically significant" difference. No matter what's measured. I would bet my life's savings there's a statistically significant difference in profits of New York businesses depending on whether the owner's named Jim or Bob. A replication of the experiment with all Jim and Bob businesses in another state would also guarantee significance. So it's a coin toss whether the second study would "successfully replicate" the same direction of effect.

[+] itsdrewmiller|5 years ago|reply

I think his point here is that the effect in replication is closer to 0 than to the original claim. It might be more obvious if he chose an order of magnitude difference as an example - going from the dominant factor to technically-not-nothing might be replication but it's not vindication.

[+] stewbrew|5 years ago|reply

The "social sciences" include a lot. Wrt Sociology, I'd say one problem is the overemphasis on quantitative methods - they try to be as serious as the big boys.

The best sociological research I've read was qualitative though. Questionable replicability is of course built-in in this type of research but the research dealt with relevant questions. Most quantitative sociology seems rather irrelevant to me.

Another problem is of course that most quantitative sociologists don't have a clue what they are doing. They don't even know the basics of statistics and then use some statistical methods they don't understand. It's some kind of overcompensation, I think. Although, psychologists are even worse in this respect. It's really fun to watch an psychologists torturing SAS.

I write this as someone who was originally trained as sociologist and over the years turned into a data scientist.

[+] xrisk|5 years ago|reply

I’m really interested in what you feel about the potential applications of CS/ML to sociology. Or if you might have any resources that talk about that.

I ask because I’m enrolled in a research program in “computational humanities”. My initial feeling towards the program is that it’s kind of a sham.

Computational Humanities seems to be as computational as an accountant using Excel for their work. Not that I particularly mind, I’m not very interested in the computational aspect at all.

[+] crazygringo|5 years ago|reply

I've tried to understand this (obviously quite angry/ranty) article and cannot actually figure out what data it has.

It seems to not be based on actual replication results, but predicted replication results? But then the first chart isn't even predictions from the market, but just the author's predictions?

The author clearly has a real hatred for practices in the social sciences. But I don't see any actual proof of the magnitude of the problem, the article is mostly just a ton of the author's opinions.

Is there any actual "meat" here than I'm missing? Or is all this just opinions based on further opinions?

[+] leftyted|5 years ago|reply

It's based on this: https://www.replicationmarkets.com, which is linked in the first paragraph of the article.

Per https://www.replicationmarkets.com/index.php/rules, volunteers are predicting whether 3000 social science papers are replicable. According to the rules, of those 3000 papers, ~5% will be resolved (i.e. attempts will be made to replicate). According to the article, 175 will be resolved. It's unclear to me who exactly will do that work but I would guess it's people behind replications markets dot com (they are funded by DARPA). The rules say that no one knows ahead of time which papers will be resolved so I assume the ~5% (or 175) will be chosen by random.

The data in the article seems to be based on what the forecasters predicted, not which papers actually replicated (that work hasn't been done yet...or at least hasn't been made public). The author of the article is assuming that the forecasters are accurate. To back up this assumption, he cites previous studies showing that markets are good at this kind of thing.

The tone is ranty but, by participating in the markets, the author is putting his money where his mouth is.

[+] brandmeyer|5 years ago|reply

I think you're right. Take a look at the before/after curves for "this is what the predictions look like after the papers".

The before curves are gaussian+ distributed and pessimistic, but the after curves are all distinctly bimodal (or worse). This suggests that some population of the participants were broadly pessimized by their surveys and another population was broadly optimized by their surveys.

This could instead be a measurement of how people's trust in science is predicated on how well it matches their own prior beliefs.

+ A sharper eye shows they aren't quite bimodal in the prior belief. Even in those cases, the separation between the modes gets much wider.

[+] jonnycomputer|5 years ago|reply

No, you are exactly right.

[+] mhh__|5 years ago|reply

I try not to look down on social science, for the most part data is data as long as you can reason about how it was collected and who by.

The only thing that worries me a little (or a lot sometimes) is that there doesn't seem to be much "bone" for the meat to hang off of - that is, in physics, if your theory doesn't match experiment it's wrong whereas in social science you're never going to have a (mathematical) theory like that so you have to start (in effect) guessing. The data is really muddy, but thanks to recent (good) political developments whatever conclusions can be drawn from it may not be right in their eyes. For example, (apparently) merely commenting on the variation hypothesis can get you fired [https://en.wikipedia.org/wiki/Variability_hypothesis#Contemp...].

[+] kovac|5 years ago|reply

Requiring social science theories to have a mathematical founding might be a little too much to ask the social scientists because unlike Physicists, their command of mathematics is far from adequate to do any serious exploration.

I majored in Mathematics but out of curiosity I took some Psychology modules when I was in university. What I found baffling was their lack attention to details. They just seem to have an intuitive model of their subject and they were just reinforcing that intuition while overlooking any details that could have challenged it. Coming from a field where every symbol, punctuation matters, I realised to Psychologists exact details of a curve don't seem to matter much as long as the general trend made sense.

Someone who really impressed me was Dan Ariely who is a behavirol economist. Even though I didn't see any mathematics in his lectures, I loved his approach to the field. I'd be quite happy if more of social science took a similar approach even if they didn't back it up with rigorous mathematics.

[+] efavdb|5 years ago|reply

I read one guess that 2/3 of the published results in social science are wrong. Suppose you tried to develop a deeper theory of these things and derived consequences from these “results” as one does in math and physics. If your corollary depends on 4 prior results, each with a 1/3 chance of actually being true — assuming no logical errors on your part — then the chance your result really is correct is (1/3)^4 <0.01. With results like this it’s not going to be easy to get much depth that holds water.

[+] stefan_|5 years ago|reply

[deleted]

[+] jessriedel|5 years ago|reply

Summary tweet thread by the author: https://mobile.twitter.com/AlvaroDeMenard/status/13043994376...

[+] throwawaysea|5 years ago|reply

Threadreader version: https://threadreaderapp.com/thread/1304399437641461760.html

He mentions changing the threshold for significance as a possible tweak but the issue is something more fundamental. Humans have flaws - like political biases or a tendency to favor one’s own hypotheses (confirmation bias). Humans also operate within systems that have incentives that can motivate them away from truth seeking (publication bias). All this exacerbated the fundamental problem that statistical techniques are easy to manipulate. Virtually all academic (university) studies, in their published format, simply lack the necessary information, controls, and processes a reader would need to easily detect flawed statistical claims. Instead a reader has to blindly trust - assuming that data was not selectively included/excluded or that the parameters of the experiment were rigorously (neutrally) chosen or whatever else. There is no incentive for the academic world to correct for this - there isn’t for example, a financial consequence for a decision based on bad statistics, as a private company might face.

[+] throwawaysea|5 years ago|reply

I am glad this topic is getting attention. There is significant bias in academia in social science even outside flaws in statistical techniques. The field has been weaponized to build foundational support for political stances and blind institutional trust granted to academia is enabling it. This author mentions the implicit association test (IAT) as an example of a social science farce that is well known to be a farce, and notes that most social science work is undertaken in good faith.

However the damage has been done and it doesn’t matter if MOST work is done in good faith if the bad work has big impact. As an example, IATs have been used to make claims about unconscious biases and form the academic basis of books like “White Fragility” by Robin DiAngelo. Quillette wrote about problems with White Fragility and IAT as early as 2018 (https://quillette.com/2018/08/24/the-problem-with-white-frag...), and others continue to write about it even recently in 2020 (https://newdiscourses.com/2020/06/flaws-white-fragility-theo...). However few people are exposed to these critical analyses, and the flaws in the scientific/statistical underpinnings have not mattered, and they have not stopped books like White Fragility from circulating by the millions.

We need a drastic rethink of academia, the incentives within it, and the controls that regulate it to stop the problem. Until then, it’s simply not worth taking fields like social science seriously.

[+] mNovak|5 years ago|reply

Does anyone have links to the Replication Prediction Market results mentioned in the article? That sounds super interesting.

As an amusing nudge, I bet you could do some ML to predict replicability of a paper (per author's suggestion that it's laughably easy to predict) and release that as a tool for authors to do some introspection on their experimental design (assuming they're not maliciously publishing junk).

[+] throwawaysea|5 years ago|reply

Here’s the paper, published only recently: https://royalsocietypublishing.org/doi/10.1098/rsos.200566

> I bet you could do some ML to predict replicability of a paper (per author's suggestion that it's laughably easy to predict)

I am betting any such ML system could be gamed and addressing the issue would ultimately still need humans in the loop. For example, what if I am selective with my data, beyond the visibility of ML evaluating the final published paper? I don’t think this is “laughably easy” to predict. It may be easy to spot telltale signs today that predict replicability, but as soon as those markers are understood, I imagine authors will simply squeeze papers through the cracks in a different way.

Another issue is this bit from the author on Twitter:

> Just because it replicates doesn't mean it's good. A replication of a badly designed study is still badly designed. There are tons of papers doing correlational analyses yet drawing causal conclusions, and many of them will successfully replicate. Doesn't mean they're justified.

[+] thu2111|5 years ago|reply

IIRC from prior discussions of this, a lot of the accuracy of the markets comes from people just applying common sense - like, if a really surprising claim that people should really have noticed before now comes with a huge effect size, it's probably false. ML can't judge that because it doesn't have the ability to do basic sanity checks on claims like that. It takes a sceptical human with life experience to do that.

[+] Kednicma|5 years ago|reply

> Even if all the statistical, p-hacking, publication bias, etc. issues were fixed, we'd still be left with a ton of ad-hoc hypotheses based, at best, on (WEIRD) folk intuitions.

This is the quiet part which most social scientists, particularly psychologists, don't want to discuss or admit: WEIRD [0] selection bias massively distorts which effects are inherent to humans and which are socially learned. You'll hear people today crowing about how Big Five [1] is globally reproducible, but never explaining why, and never questioning whether personality traits are shaped by society; it's hard not to look at them as we look today at Freudians and Jungians, arrogantly wrong about how people think.

[0] https://en.wikipedia.org/wiki/Psychology#WEIRD_bias

[1] https://en.wikipedia.org/wiki/Big_Five_personality_traits

[+] teorema|5 years ago|reply

I'm not sure that psychologists really even make the distinction between "what is socially learned" and what is "inherent to humans" to be honest. I want to say no one really denies traits are influenced by social factors, but I'm sure you could find some citation to the contrary somewhere.

The Big Five are pretty reproducible in part or in whole, but it's strawman to say psychologists are "never questioning whether personality traits are shaped by society." That's not just not true, nor is it even clear what that question means. Go to Google Scholar and search for "Big Five" and terms like "measurement invariance" or "cultural" or "social" or "societies" and take a look.

The Big Five are meant to be descriptive, the "why" is a different issue. (Just to explain it a different way, let's say you do unsupervised learning of cat images, and find over and over and over and over and over again over decades and different databases that the algorithms always return the same 5 types of cats, plus or minus a little. Wouldn't you make a note of it if you were interested in visual types of cats?) And it's important to remember that some consensus around the Big Five wasn't really until the 90s (even today I'm not even sure there's "consensus" around the Big Five).

I agree that there's a problem with selection of participants, but the only way to do that is to increase participation of the scientific community worldwide. And there are whole fields (cultural psychology) dedicated to the problems surrounding this issue.

The Freudian comparison is also worth commenting on in two respects: first, Freudians got in trouble for not pursuing falsifiable empirical research, which is simply not the case for the things you're talking about. Second, everyone loves to hate on Freud, but the basic tenets of unconscious versus conscious processes that sometimes conflict are still a bedrock of neurobehavioral research, including two-system theories ("fast and slow"), which won someone a Nobel prize and is a darling of cognitive researchers. There are legitimate discussions to be have about the utility of two-system theories but those discussions are far more sophisticated than the criticisms I think you're referring to.

[+] barry-cotter|5 years ago|reply

The Big Five is a much less impressive accomplishment than you’d think for how much people talk about it.

https://carcinisation.com/2020/07/04/the-ongoing-accomplishm...

> The interesting thing about the Five Factor Model is what it gets away with, in terms of being considered a theory, even though it is not causal, and makes no predictions. What counts as a “replication” of the Five Factor Model, as in Soto (2019), is the following: a correlation is found between one or more factors of the Five Factor Model and some other construct, and that correlation is found again in another sample, regardless of the size of the correlation. In almost all cases, and in 100% of Soto (2019)’s measures, the construct compared to a Big Five factor is derived from an online survey instrument.

> What counts as a “consequential life outcome” is also fascinating. In most cases, the life outcome constructs are vague abstractions measured with survey instruments, much like the Big Five themselves. For instance, the life outcome “Inspiration” is measured with the Inspiration Scale, which asks the subject in four ways how often and how deeply inspired they are. Amazingly, this scale correlates a little bit with Extraversion and with Open-mindedness. Do these personality traits “predict” the life outcome of inspiration? Is “Inspiration” as instrumentalized here meaningfully different from the Big Five constructs, such that this correlation is meaningful?

[+] konjin|5 years ago|reply

The people who use Hanlon's razor to explain away malice are both incompetent and malicious. Only someone who is an idiot would ever think to use 'I'm very stupid' as an excuse or explanation why they did something very damaging. If you are smart enough to realize you are incompetent after the fact you were smart enough to realize it before the fact, and that means you were malicious in not recusing yourself.

[+] garden_hermit|5 years ago|reply

I guess I fall under the field of "Progress Studies" though I think I'm much less concerned with the replication crisis than most.

Most new social science research is wrong. But the research that survives over time will have a higher likelihood of being true. This is because a) it is more likely to have been replicated, b) its more likely to have been incorporated into prevailing theory, or even better, have survived a shift in theory, and c) is more likely to have informed practical applications or policy, with noticeable effect.

Physics and other hard sciences have a quick turnaround from publication to "established knowledge". But good social science is Lindy. So skip all the Malcolm Gladwell books and fancy psych findings, and prioritize findings that are still in use after 10 or 20 years.

[+] karaterobot|5 years ago|reply

> This is because a) it is more likely to have been replicated, b) its more likely to have been incorporated into prevailing theory, or even better, have survived a shift in theory

Not if this article is to be believed! He claims that studies that could not be replicated are about as likely to be cited as studies which are. That implies the problem may instead get worse and worse, the structure more and more shaky as time goes on.

[+] scythe|5 years ago|reply

It's common to see this topic: what's "wrong" with social science. But there are always some things wrong with every science. If nothing was wrong, there wouldn't be any science left to do.

Social science asks more of us than any other science. Physics demands that we respect electricity and don't increase the infrared opacity of the atmosphere. Chemistry requires that we not emit sulfur and nitrogen compounds into the air. But social sciences will not rarely call for the restructuring of the whole society.

This is the "problem" with social science, or more properly, with the relationship between the social sciences and the society at large. When we call for "scientific" politics, it is a relatively small ask from the natural sciences, but it is a revolution -- even the social scientists themselves use this word -- when the social sciences are included in the list (Economics is no different). Psychology, as usual, falls somewhere in between.

So the relationship between the social scientists and the politicians may never be as cordial as the relationship between the natural sciences and the politicians. The "physics envy", where social scientists lament that they do not receive the kind of deference that natural scientists do, will have to be tempered by the understanding that the cost of such deference differs widely.

(All of this is ignoring that physics had a 200-year head start)

[+] teorema|5 years ago|reply

Social scientists turn the microscope on themselves also. When the microscope turns elsewhere you see similar patterns to differing extents (cf. recent articles on reanalysis of fMRI data, pharmacology replication rates, Theranos or hydroxychloroquine).

Meta-science has always been the gift of social science. This will all eventually funnel down elsewhere, just like meta-analysis.

But you're right, in that social science hits very close to home, more so than other sciences. Imagine that it suddenly worked very very well, and someone in the field of neuropsychology could manipulate behavior just like you might a lightbulb. Isn't that what critics are really asking for?

[+] konjin|5 years ago|reply

>Physics demands that we respect electricity and don't increase the infrared opacity of the atmosphere.

Physics does no such thing. It tells us that increasing the heat retained in the atmosphere increases the planets surface temperature. It is a descriptive science. Not a prescriptive one. Wanting to have industrial civilization possible in the next century is why you don't increase the infrared opacity of the atmosphere. But that is a value judgment far outside the scope of physics, and one social sciences claim is theirs by right of ... something.

The metaphors people use to think about the natural world are terrible, or as Carl Sagan put it Demon-Haunted.

The reason why physics, and other hard sciences, are so useful and respected is that you can switch dependent and independent variables around with a lot of success.

If I have the ideal gas law:

PV = nRT

Then I can rearrange it and be fairly confident it still works.

P = nRT/V

If you are an engineer this is a godsend. You want to set a hard value for P but can only directly control V or T? Try the second equation! You have a chance at succeeding without having to spend decades building machines that blow up and kill everyone around them!

Politicians see that and are jealous. Surely if those lame eggheads can get things to work like that we can too. So the social sciences give you equations as well. After a bunch of statistics we see that:

time spent in school = a*wealth - c

We can't control wealth, but we can control how long people spend in school:

wealth = (time spend in school + c)/a

So if we force everyone to stay in school until they are 50 everyone will have 20 million dollars in their bank accounts.

And to anyone who asks how this works, politicians say: Why are you against science and hate poor people?

[+] tomrod|5 years ago|reply

> Stupidity: they can't tell which papers will replicate even though it's quite easy.

I am not familiar with this work. What exactly makes a paper predictably replicatible?

[+] itsdrewmiller|5 years ago|reply

There is a footnote for that claim - https://journals.sagepub.com/doi/full/10.1177/25152459209196... - but basically "things that are hard to believe" and "things that barely passed statistical analysis" (high p-values).

[+] emmelaich|5 years ago|reply

Of malice vs stupidity, I'm pretty certain it's stupidity. Or more precisely, self-delusion.

The story of Millikan's oil drop experiment replications and also James Randi's (and CSICOP's) battle with pseudo-scientists convince me of this.

[+] luckylion|5 years ago|reply

There's probably a mix of both. At some point, most people probably realize that there's something fundamentally wrong - but by then, they're a few decades in and too much of their career and personal life depends on it being true, so openly changing course is very daunting. When your identity and career depends on something wrong being considered true, you have no incentive to point out problems, and every incentive to mislead.

[+] DrNuke|5 years ago|reply

Shameless plug with the ten relevant problems I scooped from a very recent literature review: interculturalism, introspection, truth, authenticity, human enhancement, critical thinking, technocracy, privilege, ethics, higher education. Link to free intro: https://www.tenproblems.com/2020/08/01/ten-problems-for-soci...

[+] jonnycomputer|5 years ago|reply

That's a lot to say before even the replication results are actually out.

[+] tkelemen|5 years ago|reply

It's not science.

[+] golemiprague|5 years ago|reply

[deleted]

[+] solinent|5 years ago|reply

The epistemal value of epidemiological studies is very low. Nothing can be done to fix it.

101 comments