Scientific Inquiry — Conceiving and Testing Hypotheses

Abstract

Scientists seek to understand how the world works by the creative act of generating a possible explanation for some phenomenon and then by testing that possible explanation to determine whether it is valid. Transforming a creative insight into a hypothesis suitable for rigorous testing is an essential skill that any scientist must develop. Hypotheses are tested in a scientific study by measuring the natural world in some way and then comparing those measurements to determine whether they support or refute the hypothesis. A well designed study reduces the possibility that chance or scientific bias accounts for the results of the study. Chance can be reduced by having at least 30 test subjects and 30 controls in a study. Bias is reduced by diligent and honest effort to make the members of the study group so similar to the members of the control group that nothing distinguishes one from the other except the factor being studied. Prospective cohort studies are powerful but expensive tools for studying common diseases or injuries. Retrospective case-control studies are ideal for studying rare entities. Retrospective studies tend to be inexpensive but often lack desirable details that were not recorded at the time of the initial case investigation. For forensic practitioners to honestly call themselves scientists they should perform original scientific research to substantiate hypotheses and advance knowledge in forensic disciplines. Useful research in forensic pathology can be performed with no more investment than time and effort, but financial support is available from several sources.

Keywords

Forensic pathology Forensic sciences

Introduction

“What is truth?” This simple question lies at the heart of mankind's most profound attempts to understand the world in which we live and our place in that world. Artists seek to reveal truth in one way, scientists in another. People often think of art and science as distinct disciplines that are mutually exclusive, shown by common reference in speech to a person or task being “right brained” or “left brained.” Such absolute separation of creativity and logic reveals that few people recognize how scientific inquiry is accomplished. This article discusses the nature of proper scientific inquiry, which requires first a creative solution to a puzzling dilemma and then logical and critical evaluation of that proposed solution. These principles apply to any scientific discipline, but in light of the 2009 call by the National Research Council and the National Academy of Science for scientific rigor in the fields of forensic science, these principles are particularly germane to practitioners of any forensic science (1).

Discussion

Creativity in science

Scientists do not seek knowledge per se. Scientists seek understanding, an understanding of how the world works. Once a scientist understands how or why something occurs, that understanding is added to the framework of knowledge. Working from this enlarged framework extends our reach as we make further inquiries into the workings of the world. This is what Isaac Newton meant when he said, “If I have seen further, it is only by standing on the shoulders of giants” (2).

The creative aspect of scientific inquiry is perfectly displayed by efforts to understand what happens when something burns. Five hundred years ago everyone knew that wood burned and iron rusted, but no one understood the chemical process involved in combustion. The alchemist Johann Joachim Becher proposed that an inflammable substance existed in combustible matter, and that burning occurred as this substance, later called phlogiston, was consumed. Roughly a century passed before careful observation by several men showed that burned metals did not lose weight, as the phlogiston theory would predict, but rather gained weight. This increased mass came from a previously unknown gas, oxygen, discovered by Antoine-Laurent Lavoisier in a series of experiments (3).

The creative act of scientists is to think of an explanation for how or why something occurs in nature (4). For this reason we should not use hindsight to criticize Becher for proposing the theory of phlogiston. No one prior to Becher had offered any explanation for why things burned. Becher took a creative leap into a void and thought of an explanation that made sense to him. Equally important, Becher then shared this proposed explanation, or hypothesis, with others. Becher's hypothesis happened to be wrong, but what of it? Many hypotheses are wrong. Before Becher there was no hypothesis for why things burned, and without a hypothesis to test no progress was made in understanding the process of oxidation. Becher was correct to imagine a hypothesis that would not only explain an observation in the natural world but which could also be tested by experimental observation.

Hypotheses and hypothesis testing

A flash of insight can lead to a hypothesis about how the world works. Forming that insight into a hypothesis suitable for rigorous testing is an essential skill that all scientists must develop. Hypotheses are tested by measuring the natural world in some way so that observations can be compared to each other. The observations being compared may occur before and then after a specific event of interest, e.g. combustion, in the phlogiston experiments. The observations may occur concurrently, as in a rack of test tubes exposed to differing reagents. Measurement may be categorical (one either has cancer or not) or continuous (mass, age, height, etc. all have many possible values) (5). Different study designs and statistical analyses are appropriate depending upon factors such as the number of observations or number of variables, but in all cases it is necessary to generate a hypothesis that can be tested by measurement and comparison.

Let us return to phlogiston for an example of how to construct a hypothesis that can be tested. Becher supposed that phlogiston was odorless, tasteless, colorless, and otherwise undetectable. It is hard to conceive of a test that could measure such a substance, and thus the hypothesis below would be a poor hypothesis.

H: Phlogiston is an undetectable substance that is present in substances that burn.

The hypothesis to test the phlogiston theory instead depended upon the theory's prediction that phlogiston was consumed as a substance burned and that burning ended when all the phlogiston was gone. Therefore, one would expect a substance that burned to weigh less after it burned than it had weighed before it burned. Thus the proper hypothesis to test the phlogiston theory was

H₀: After a substance burns it will weigh less than it weighed before it was burned.

This hypothesis, called the null hypothesis (Hnull or H₀), is the hypothesis that a scientist believes to be true. Truth is hard to establish, however. (It is for this reason that defendants are declared “not guilty” rather than “innocent.”) It is far easier to disprove something than it is to prove a thing true. For this reason the null hypothesis is not the hypothesis that scientists test. Scientists establish an alternate hypothesis (H_a or H_alt), a hypothesis stating the opposite of the null hypothesis. The alternate hypothesis to test the phlogiston theory is

H_a: After a substance burns it will weigh more than it weighed before it was burned.

Remember that the null hypothesis is thought to be true and the alternate hypothesis false. The experiment or study to test the hypothesis is performed with the expectation that it will show the alternate hypothesis to be untrue, just as the scientist supposes it to be. In that case, the scientist is left with no reason to reject the supposed truth of the null hypothesis. In the case of phlogiston, however, the alternate hypothesis turned out to be correct, and so the null hypothesis was rejected, along with the phlogiston theory as an explanation for combustion.

The concepts of null and alternate hypotheses are confusing when first encountered, like wrestling to understand those biochemical pathways where inhibition of an inhibitory pathway results in stimulation. Nevertheless, it is necessary to understand these concepts in order to generate testable hypotheses.

Study outcomes

Scientists attempting to establish the truth of the phlogiston theory were unable to support the theory, and so the phlogiston theory was refuted. This is one possible outcome of a study. The other outcome is that the results of a study show that the theory may be correct. Five possible explanations exist for a scientific study whose outcome is in keeping with the hypothesis being tested. These outcomes are

Chance

Bias

Confounding

Fraud

Causation

It may seem premature to consider outcomes before considering study design, but knowing potential pitfalls will help in designing a study to avoid those pitfalls.

Chance can always explain results. One could throw a die once, and the chance of rolling a 5 would be 1 in 6. The chance of rolling two 5's in a row is 1 in 36. The chance of rolling four 5's in a row is 1 in 1296, an unlikely but possible occurrence. Because of chance, statistical analysis is necessary to estimate the likelihood that chance alone could have led to the outcome observed in a scientific study. A customary cutoff in scientific writing is p>0.05, that is, the odds of the event that is reported happening by chance alone are at most 1 in 20. A p value of 0.001 (1 in 1000), makes chance even more unlikely an explanation for a result, but however unlikely, it is always possible that the outcome occurred by chance alone. By itself a low p value does not make for excellent science or important results, but without a low p value chance is a likely explanation for the results.

As the example of throwing a die shows, an unusual outcome is more likely to occur by chance with fewer trials, whether the trials are throws of a die or the number of subjects in a study. Formulae exist for determining the proper number of subjects for a study, but the central limit theorem provides a handy benchmark of at least 30 subjects for a study in order to make chance an unlikely explanation for an outcome (5–6). Having more subjects in a study not only reduces the likelihood of a result occurring by chance; it also reduces the range of the standard deviation, another desirable outcome.

Bias in science refers to the failure to treat all subjects in a study equally. Control groups, placebos, and blinding of scientists and subjects to every detail of the study are all necessary to try to prevent bias. Of course, someone has to design the study, and that person must know what procedures are being used to choose the study and control groups. To minimize the possibility of bias the study and control groups should be as alike as possible except for the factor that is the subject of the study. Members of the control group are commonly matched to the members of the study group by age and sex, but depending on the subject of the study it might be appropriate to match by other variables such as body mass, education attained, race, exercise regimen, smoking habits, presence of hypertension, history of drug abuse, history of being arrested, etc. The list is endless. As a rule of thumb, except for the factor being studied, a researcher should strive to make the members of the study group so similar to the control group that not even an attorney could reasonably claim that one group had some advantage that the other group lacked.

Confounding concerns a failure to recognize what is actually true. Confounding is easier to explain by example than to define formally. One could easily show that individuals who develop lung cancer have a lighter in their pocket or purse more often than individuals who do not develop lung cancer. One might then falsely conclude that lighters cause lung cancer. Lighters do not cause cancer. Instead, lighters are associated with the true culprit, smoking, and thus lighters would be a confounding factor masking the true cause. This example seems ridiculously obvious because we have some understanding of the etiology of lung cancer, but when studying a process that is not yet understood it is much easier to be led astray by a confounding factor.

Fraud is unusual, but the stakes for researchers and research grants are high, and fraud does sometimes occur. Announcements of fraud tend to come a few years after announcements of spectacular results that seem too good to be true. Because research involves replication of experiments performed elsewhere, the process of science tends to expose fraud, especially for claims too good to be true.

By excluding chance, bias, confounding factors, and fraud as explanations for the results of a study, the only explanation left is that the hypothesis is correct. One study is insufficient to convince everyone of the truth of a theory, but repeated studies and time may validate a theory.

Study designs for hypothesis testing

Many types of scientific studies exist, each with an appropriate means of statistical analysis. A thorough review of study types is beyond the scope of this article; interested readers may refer to Dawson and Trapp for a discussion of varying types of studies used in medical research presented in a way that makes sense to physicians (as opposed to statisticians) (6). Here discussion will be limited to the advantages and disadvantages of two study types - prospective cohort studies and retrospective case-control studies. (Descriptive studies are common in the forensic pathology literature. Descriptive studies have a place, which is the description of a previously unrecognized entity for which insufficient understanding exists to hazard a hypothesis. Descriptive studies do not test a hypothesis, however, and they will not be considered further in this discussion.)

Prospective cohort studies are powerful tools for studying common diseases. The Framing-ham Heart Study is a famous cohort study that has generated much information on cardiovascular disease (7). The great strength of prospective studies is the ability to decide exactly what variables to study before the study begins. Those variables are then measured and recorded with meticulous care. New factors can be added to the study as the study progresses. One need never regret lack of information in a prospective study, but for all these advantages, prospective studies have two significant disadvantages. Prospective studies like the Framingham Heart Study collect data in a longitudinal fashion, that is, over months, years, or even decades. Therefore prospective studies are extremely expensive, and that expense makes prospective studies relatively rare compared to other study designs. In addition, prospective studies are poor ways to study rare entities. In forensic pathology one could design a prospective study for injuries sustained by guns or in automobile wrecks, perhaps incorporating new technology such as whole body imaging. Such a study might even be reasonably inexpensive if, for example, the hypothesis concerns efficiency before and after introduction of whole body scanning. In contrast, a forensic based prospective study of previously undiagnosed malignancy would be frustrating. Forensic pathologists see undiagnosed malignancies now and then, but too infrequently to wait years for enough cases to materialize.

The retrospective case-control study is an ideal choice for rare entities. Case-control studies are relatively inexpensive but often lack details that the researcher might wish to have, making it necessary to drop cases from the study or use more complicated statistical analyses. A detailed database of previous cases is necessary for finding cases and controls for a retrospective study unless the researcher has the time and energy to review cases file by file to find enough subjects for the study. Provided a database of old cases is available, forensic pathology is well suited to case-control studies. Whenever a forensic pathologist has an unusual case for which little or no information can be found in the medical literature, then that pathologist is standing on fertile ground for a case-control study based on the unusual entity. All that is needed is some careful thought, a hypothesis to test, and enough cases and well-matched controls to test the hypothesis.

Conclusion

Scientific inquiries are always made by reaching beyond the limits of our knowledge to explore a new concept. In this way the scientific process is similar to the process by which artists work. Artists and scientists both work within an established field, be it biochemistry or painting, using techniques that they have mastered to either contribute to or rebel against their chosen field (8). Michelangelo and Pasteur extended their fields by contributing to them in unprecedented ways. In contrast, Barry J. Marshall and J. Robin Warren's work on Helicobacter pylori as a cause for gastric ulcers (9) was no less iconoclastic than the paintings of the Impressionists as they rebelled against the conventional wisdom of the French Royal Academy of Art.

Concerning forensic pathology in particular, the National Research Council and the National Academy of Science pointed out in their 2009 report Strengthening Forensic Science in the United States that forensic science needs original research to validate its science (1). Without original research forensic science does not rightfully deserve to call itself science. So many areas of forensic pathology remain unexplored that finding research projects is as easy as wondering “Could it be …” when an unusual case comes along. Useful research can be done with no more investment than time and effort, but money is available for financial support. Modest sums are available from the American Academy of Forensic Sciences (Acorn and Lucas grants) (10) and from the Research committee of the Pathology/ Biology division of the Academy. Larger sums are currently available from the National Institute of Justice (11). Any research grant requires a research proposal to the granting institute. The proposal must include the hypothesis, some background to justify the hypothesis, the proposed means of testing the hypothesis, a timeline for the study, and the estimated costs for the project. Rarely do pathologists utilize all the financial resources mentioned in this paragraph in a given year. If we remain curious as we practice forensic pathology, seeking truth through reflection and careful observation, then we can enlarge the foundation of knowledge in forensic pathology before we pass the practice on to the next generation. Better still, we will have created a tradition of research that will benefit our discipline for years to come.

References

National Research Council, National Academy of Sciences. Strengthening Forensic Science in the United States: A Path Forward. Washington, DC: National Academies Press; 2009. Available from: http://www.nap.edu/catalog.php?record_id=12589

Isaac Newton [Internet]. Wikiquote. Available from: http://en.wikiquote.org/wiki/Isaac_Newton

History of chemistry [Internet]. Historyworld; c2001. Available from: http://www.historyworld.net/wrldhis/PlainTextHistories.asp?ParagraphID=kqg

Bronowski

. The Creative Process. Sci Am. 1958 Sep; 199(3): 59–64.

Howell

D.C.

. Statistical Methods for Psychology, 7th ed. Belmont, California: Wadsworth Publishing; 2009.

Dawson

, Trapp

R.G.

. Basic and Clinical Biostatistics, 4th ed. New York, New York: Lange Medical Books/McGraw-Hill Medical Publishing Division; 2004.

Framingham heart study [Internet]. Framingham, MA: Framingham Heart Study; c2011. Available from: http://www.framinghamheartstudy.org/

Stent

G.S.

. Prematurity and uniqueness in scientific discovery. Sci Am. 1972 Dec; 227(6): 84–93.

Press release: the 2005 Nobel prize in physiology or medicine [Internet]. Nobelprize.org. Available from: http://www.nobelprize.org/nobel_prizes/medicine/laureates/2005/press.html

10.

Forensic Sciences Foundation [Internet]. Colorado Springs, CO: The Forensic Sciences Foundation; c2006–2011. Available from: http://www.forensicsciencesfoundation.org/grants/lucas_grants.htm

11.

Funding [Internet]. Washington: National Institute of Justice; c2011. Available from: http://www.nij.gov/funding/