The Current Status of Observational Studies as Scientific Evidence: A Critical Appraisal.

By Steven Bratman, MD, MPH

Last Update: June 22, 2010


Purely observational studies have long been considered problematic as a source of cause/effect conclusions and, for this reason, have been placed below experimental studies in the hierarchy of evidence. However, in many important areas of medicine and public health, experimental studies (specifically randomized controlled trials) are impractical to perform. In consequence, observational studies are often the primary source of public health recommendations. This practice has recently been called into question by the results of several large randomized controlled trials, which not only failed to confirm predictions based on observational studies but in some cases inverted them, transforming expectation of benefit into a discovery of harm. In this article, I interrogate the use of epidemiologic evidence as a source of “evidence-based” public health advice.


In the late 1990s, while evaluating and synthesizing evidence-based information on alternative medicine, my collaborators and I came to view purely epidemiologic research with great skepticism. This attitude derived in part from the history of natural supplements. A vast array of such supplements had come into popular use based on inferences drawn from large population studies but when tested in randomized controlled trials (RCTs) most proved ineffective. The subsequent findings of the Women’s Health Initiative regarding hormone replacement therapy further strengthened our impression that the results of large population studies cannot be relied upon as medical evidence for the efficacy of a treatment. Therefore, in the electronic database we created, supplement recommendations grounded only in observational studies were classified as “Category C: Lacking Any Reliable Supporting Evidence.”The only category occupying a lower position was “Category D: Meaningful Evidence Against Efficacy.”

However, when I entered a Master of Public Health program in 2008, I found that observational study results were routinely used as sufficient evidence for issuing prescriptions to society at large. Initiatives such as Healthy People 2010 and the Dietary Guidelines for Americans, for example, consist in large part of recommendations based on observational studies alone. I found this startling and disturbing; it seemed that much of official public health policy was based on the type of bad science used to justify ineffective alternative medicine treatments. 

Nonetheless, the fact that such studies are taken so seriously in the field of public health suggested that perhaps they had more scientific value than I had previously assumed. I, therefore, began to interrogate the issue at a deeper level. This paper is the result.

When it comes to human disease and health, observational studies are generally called “epidemiological” studies. In this paper, we shall use the terms use the terms somewhat interchangeably. The related term “population study” refers to a large observational study.

Introduction: From John Stuart Mill to the Women’s Health Initiative

The superiority of experiment over mere observation was established in John Stuart Mill’s 1843 work, System of Logic Ratiocinative and Inductive. The modern randomized controlled trial (RCT), a form of experiment, is grounded in Mill’s reasoning, as refined by the work of the statistician RA Fisher and the epidemiologist Bradford Hill. Fisher used mathematical arguments to show that randomization is necessary to allow valid calculations regarding statistical significance. Bradford Hill, on the other hand, emphasized the value of random assignment to eliminate systematic selection bias.

Bradford Hill is much better known, however for his work in observational studies, where the canonical “Bradford Hill criteria” are widely utilized to draw causal conclusions in non-experimental settings. The Framingham Heart study was one of the first of the large population studies to which these principles were systematically applied. Interpretation of the Framingham findings led to an indictment of cigarettes and high blood pressure that have subsequently been accepted as correct. Many other recommendations came out of Framingham as well, such as that it is important to avoid eating eggs, to reduce salt intake and to avoid saturated fat. These conclusions, however, have proved less robust. Beginning in the 1990s, writer Gary Taubes published a series of award winning articles in the journal Science skewering the evidence used to support these recommendations. The egg recommendation has by now fallen; that of salt and saturated fat are in the process of being quietly withdrawn.

At the same time, women’s health groups such as the Women’s Health Collective had begun to demand randomized controlled trials of hormone replacement therapy (HRT). In those days, HRT was widely prescribed to healthy women on the theory that it reduces risk of cardiovascular disease.  However, while the FDA ordinarily requires positive results in several large randomized, double-blind, placebo controlled trials prior to drug approval, HRT had been approved for cardiovascular disease prevention based on observational studies alone.  It would seem, members of the collective pointed out, that a drug given to healthy people properly requires a higher rather than a lower level of evidence than a drug given to people with clear disease. These complaints finally led to the initiation of several large RCTs. In 1998, the results of the first of these became available. Although the data from the Heart and Estrogen/Progestin Replacement Study (HERS) indicated that HRT does not work, influential epidemiologists continued to defend the use of HRT on the basis of observational evidence alone.

Then came the results of the Women’s Health Initiative. This mega-RCT showed that epidemiologists had not only gotten it wrong, they had gotten it exactly backwards. Rather than protecting women from cardiovascular disease, it turned out that use of HRT increases cardiovascularrisk. The WHI at last provided a shock sufficient to provoke a vigorous debate on the value of evidence drawn from observational studies.

This field is still in flux. The current state of this dialogue is the subject of this paper. 

Click here to download the full paper in PDF format.

