In the previous essay I discussed how to assess experts. While people argue based on the views of experts, they also make arguments based on studies (and experiments). While using a study in an argument is reasonable, making a good argument based on a study requires being able to rationally assess studies.
Not surprisingly, people often select the studies they believe based on fallacious reasoning. One erroneous approach is to favor a study simply because it agrees with what one already believes. The mistake is that to infer a study is right because I believe the results gets things backwards. It should be first established that study is plausible, then it is reasonable for me to believe it.
Another erroneous approach is to accept a study as correct because one wants it to be so. For example, a liberal might accept a study that claims to prove that liberals are smarter and more generous than conservatives. This sort of “reasoning” is the classic fallacy of wishful thinking. Wishing that something is true (or false) does not prove that the claim is true (or false).
Sometimes people attempt DIY “studies” by appealing to their own anecdotes. For example, someone might claim that poor people are lazy based on an experience with some poor person. While anecdotes can be interesting, taking an anecdote as evidence is to fall victim to the classic fallacy of anecdotal evidence.
While fully assessing a study requires expertise in the relevant field, non-experts can still make rational evaluations. The following provides a concise guide to evaluating studies and experiments.
In normal talk, people often jam together studies and experiments. While this is fine for informal purposes, the distinction is important. A properly done controlled cause-to-effect experiment is the gold standard of research, although it is not always a viable option.
The objective of such an experiment is to determine the effect of a cause and this is done by the following general method. First, a random sample is selected from the population. Second, the sample is split into two groups: the experimental group and the control group. The two groups need to be as alike as possible and the more alike the two groups, the better the experiment.
The experimental group is then exposed to the causal agent while the control group is not. Ideally, that should be the only difference between the groups. The experiment then runs its course, and the results are examined to determine if there is a statistically significant difference between the two. If there is such a difference, then it is reasonable to infer that the causal factor brought about the difference.
Assuming that the experiment was conducted properly, whether the results are statistically significant depends on the size of the sample and the difference between the control group and experimental group. The idea is that experiments with smaller samples are less able to reliably capture effects. As such, when considering whether an experiment shows there is a causal connection it is important to know the size of the sample. After all, the difference between the experimental and control groups might be large but not significant. For example, imagine an experiment that involves 10 people. 5 people get a diet drug (experimental group) while 5 do not (control group). Suppose that those in the experimental group lose 30% more weight than those in the control group. While this might seem impressive, it is not statistically significant: the sample is so small, the difference could be due entirely to chance.
While the experiment is the gold standard, there are cases in which it would be impractical, impossible or unethical to conduct an experiment. For example, exposing people to pathogens to test their effects would be immoral. In such cases studies are used rather than experiments.
One type of study is the Nonexperimental Cause-to-Effect Study. Like the experiment, it is intended to determine the effect of a cause. The difference between the experiment and this sort of study is that those conducting the study do not expose the experimental group to the suspected cause. Rather, those selected for the experimental group were exposed to the suspected cause by their own actions or by circumstances. For example, a study of this sort might include people who were exposed to pathogens by accident. A control group is then matched to the experimental group and, as with the experiment, the more alike the groups are, the better the study.
After the study has run its course, the results are compared to see if there is a statistically significant difference between the two groups. As with the experiment, merely having a large difference between the groups need not be statistically significant.
Since a study of this sort relies on using an experimental group that was exposed to the suspected cause by the actions of those in the group or by circumstances, the study is weaker (less reliable) than the cause to effect experiment. After all, in the study the researchers must take what they can find rather than conducting a proper experiment.
In some cases, what is known is the effect and what is not known is the cause. For example, we might know that there is a new illness but not know what is causing it. In these cases, a Nonexperimental Effect-to-Cause Study can be used to try to sort things out.
Since this is a study rather than an experiment, those in the experimental group were not exposed to the suspected cause by those conducting the study. In fact, the cause is not known, so those in the experimental group are those showing the effect.
Since this is an effect-to-cause study, the effect is known, but the cause must be determined. This is done by running the study and determining if there is a statistically significant suspected causal factor. If such a factor is found, then that can be tentatively taken as a causal factor—one that will probably require additional study. As with the other study and experiment, the statistical significance of the results depends on the size of the study, which is why a study of adequate size is important.
Of the three methods, the effect-to-cause study is the weakest (least reliable). One reason for this is that those showing the effect might be different in important ways from the rest of the population. For example, a study that links cancer of the mouth to chewing tobacco would involve the complication that those who chew tobacco are often ex-smokers. As such, smoking might be the actual cause rather than the chewing. To sort this out would involve a study involving chewers who are not ex-smokers.
It is also worth referring to my essay on experts—when assessing a study, it is also important to consider the quality of the experts conducting the study. If those conducting the study are biased, lack expertise, and so on, then the study would be less credible. If those conducting it are proper experts, then that increases the credibility of the study.
As a final point, there is also a reasonable concern about psychological effects. If an experiment or study involves people, what people think can influence the results. For example, if an experiment is conducted and one group knows it is getting pain medicine, the people might be influenced to think they are feeling less pain. To counter this, the common approach is a blind study/experiment in which the participants do not know which group they are in, often by the use of placebos. For example, an experiment with pain medicine would include “sugar pills” for those in the control group.
Those conducting the experiment can also be subject to psychological influences—especially if they have a stake in the outcome. As such, there are studies/experiments in which those conducting the research do not know which group is which until the end. In some cases, neither the researchers nor those in the study/experiment know which group is which—this is a double-blind experiment/study.
Overall, here are some key questions to ask when assessing a study:
- Was the study/experiment properly conducted?
- Was the sample size large enough?
- Were the results statistically significant?
- Were those conducting the study/experiment experts?