Reflection

Abstract

Doing surgical research is simple. Just round up one hundred rats and divide them into two equal groups. In one group, hit each rat in the head with a ball-peen hammer. In the control group don't hit any rats. Wait exactly three days and then count the rats in each group who have developed a headache or died, compute p-values, and write the paper. Hell, it's so easy. R.M. Zollinger 1903-1992

Reflecting on what you learned during your career is a benefit of retirement. Could reflection on reaching the end of the trail yield pragmatic suggestions for younger colleagues who still toil in the important academic niche called surgical infection science?

I address here mainly our trainees entering the research year(s) and all instructors or anxious assistant professors chomping at the bit, eager to crank out publications, hopeful to achieve promotion, and as they say in minor league baseball, “get to the big dance.” Many elders in our tribe will pay little heed to my suggestions offered here. We should always keep in mind what seasoned New Zealand shepherds believe, “The older sheep are always the hardest to shear.”

To begin, contrary to Zollie's words in the epigraph above (he uttered them during a cocktail party in my chairman's home, followed by the unique RMZ cackle), I assert that performing worthwhile surgical infection research is too often anything but simple because a menu of possible mental mistakes can complicate our attempts to generate meaningful research products. These mistakes are mostly self-inflicted, accidental wounds—my conclusion based on observations made during a career in academia and the probably relevant fact that I was a fully trained scientist nearly one decade before I ever smelled Betadine, carried a pager, or walked into a surgical intensive care unit. A reasonable conjecture is that the miscues most commonly made in surgical infection research are definable intellectual flubs. Oddly, some of these seem contagious. Errors in thinking should be completely preventable, an optimistic idea defining the necessity of asking how our prevention efforts might be boosted.

I first encountered Dick Simmons at Minnesota 40 years ago, and one of his memorable initial comments to me was, “You will discover that a big part of doing surgery well is knowing what not to do.” This advice was not only absolutely accurate, but I believe that it also applies to the enterprise of conducting research—broadly speaking, there must exist generic “what-not-to-dos” (WNTDs) in every scientific research arena. Here, then, is an opening suggestion for younger colleagues: At the front end of your career, methodically identify relevant WNTDs in your particular area of scientific activity. It is an interesting twist that failing to make such identifications is itself the prime WNTD.

Let's put aside discussing problems that may arise when planning or physically executing experiments. A larger task is thinking about how to think. All of us must be self-critical in five domains: How to best exploit curiosity; how to reason, speak, and write with clarity; how to regard observed phenomena with just the right degree of skepticism; how to appraise hypotheses severely; and how to dodge common mental pitfalls when estimating from samples what probably happens in larger concealed universes that either have been imagined or else actually exist in reality. All of these are germane whether raw data tumble out of animal experiments, thought experiments, bench experiments, clinical trials, computer simulations, or “studies” of raw information squeezed from medical records of patients. In sum, sneaky intellectual traps await unmotivated or uninformed surgical infection scholars. Moreover, many traps are subtle, especially if one does not appreciate how they are disguised.

Any surgical infection researcher pursuing production of worthwhile work products must have some structured understanding of what epidemiology is really about. This is an attainable goal because ramshackle reasoning and flubs in data manipulation or analysis often trace back to superficial regard for, or even ignorance of, epidemiology concepts that are time-tested and nearly axiomatic. The understanding you need does not comprise a facility in glibly spewing fashionable buzzwords. The skill set required does not consist of merely being able to input numbers to some slick statistics software. While it is a given that all of us must clearly understand both descriptive statistics and the separate techniques from inferential statistics used for hypothesis testing—a type of mental kung fu—there really is much more to epidemiology than formulaic “fiddling with statistics,” a very puzzling canard that gives life support to a widespread misconception and completely misses a generalizable point made by the eminent scholar Judea Pearl: Inferential statistical reasoning and searching for causes represent separate, parallel entities [1].

Epidemiology is a large and mature field. Reassuringly, only a modest assortment of tools, tricks, and techniques commonly used in standard analytic epidemiology practice must be mastered and understood by you. It is my opinion that sufficient mastery of relevant areas within the bailiwick of modern epidemiology is nearly impossible without either (a) formal academic training in the field or (b) personal digestion of material learned by studying appropriate textbooks. Most of us in the readership fit snugly into category (b). Accordingly, here are my suggestions regarding eight textbooks you may find highly useful.

Reliable “introductions to epidemiology” have been written by Gerstman [2] and by Rothman [3]. I favor the latter, but your mileage might vary. As a self-study program progresses and scientific perspective gradually broadens, a familiar slogan (“Epidemiologists prove what causes diseases”) should become recognized as misleading advertising. The idea is technically false secondary to semantic fog, but literally false because of a tenet well-recognized for more than a century: It is impossible to prove that any hypothesis represents air-tight, logical Truth. In contrast, we can establish firmly that a given hypothesis is untenable if it is not corroborated by reproducible and sound empiric evidence. Accordingly, only those scientific hypotheses that can survive attempted refutations by sufficiently tough and cunning tests are “accepted” and deemed provisional [4]. Our hypotheses that cannot survive intentionally severe challenges are either discarded or modified.

What do analytic epidemiologists actually do that fits with this picture? They estimate, most often after blunt and sharp dissection of observational data, which putative mechanisms can best serve as explanatory (i.e., causal) hypotheses that pertain to disorders, diseases, or conditions in particular populations. Three front-rank topics in the standard epidemiology catechism have always been Measurement Error, Bias, and Confounding. To use a tired cliché, “There are many moving parts here,” but your time and effort will not be wasted by carefully studying in particular the curious phenomenon of confounding. Ignoring this priority in any biologic science is a WNTD.

A natural property of many observational data sets in biologic sciences, confounding is essentially always encountered when healthcare data are analyzed unless, of course, one is dealing with a properly designed and executed randomized, controlled experiment. But remember this stark fact: Skill is required when interpreting data that come from non-randomized, non-controlled experiences, and those are the predominant kinds of “experiments” most often performed by epidemiology practitioners. A simple instruction here is reasonable advice: Gain a sound comprehension of confounding as early as possible in your surgical infection science career—how to detect it, how to distinguish it from effect modification, and how to uncouple its influences when dealing with crude significant associations of variables present in observational data sets. Such comprehension will allow, for example, your most efficient digestion of the first-rate discussions Rosenbaum has provided [5], which outline with much clarity exactly how purely observational data can be nonetheless reliable, maybe highly valuable, in some types of scientific research.

It is always a WNTD to interpret observational data sets blithely as if they represent the output of randomized controlled experiments in which ceteris paribus was by intentional design very likely valid. Yet, across diverse fields, thousands of journal articles appear yearly in which hints are planted in plain sight that non-random associations discovered in observational data are ipso facto “smoking guns” that constitute credible evidence of cause-effect linkages. Be alert for this common practice when reading the literature. When a statistically significant (i.e., non-random) association between variables X and Y in some observational data set is strong (e.g., computed odds ratio is large and p-value is small), avoid the urge to embrace a fallacious conclusion [3] that strong X-Y association just has to be telegraphing the presence of causal linkage between X and Y.

Dozens of epidemiology texts populate a broad spectrum that ranges from mediocre cookbooks to really useful encyclopedic works. Woodward's 700-page masterpiece [6] is scholarly and practical; it's on my bookshelf, and maybe it should be on yours, too. I have enjoyed reading and re-reading Savitz's treatise, Interpreting Epidemiologic Evidence [7]. It contains wise advice from a seasoned epidemiology guru interspersed with his always pithy and practical insights—I highly recommend this beautifully written book. Get your hands on a copy; it's one of a kind, and I'm not kidding. Finkelstein [8] has also written another diamond in the rough. One of the pioneers in the subject area of “statistics and the law,” he was on the Columbia Law School faculty for many years. I have never read any textbook that can match its concise, careful, and penetrating explanations of nearly one dozen concepts that form the backbone of honest epidemiologic reasoning.

The best comprehensive epidemiology text for every surgical infection scientist could well be that authored by Rothman et al. The elegant third edition [9] is packed densely with fruits of scholarship and contains clarifying asides, real-world examples, and original ideas from experts. To cite a crisp illustration of an indispensable insight: Its meticulous description of Rothman's famous (and widely accepted) causal pie model of multi-causality, which jumps off the page at the very beginning of the book and hits you precisely between the eyes, is either magic or merely amazing—I cannot decide which qualifier is more apt. Nearly two-thirds of the chapters in the book could be relevant to common reasoning issues that might stump even seasoned surgical infection investigators. My advice: Get the book immediately, keep it handy, and study it.

Consider finally the hefty tome authored by Deborah Mayo [10], a philosopher of science with extensive experience. One of her career-long research areas has been the serious examination of how learning from scientific experiments occurs. She has developed and refined, nearly single-handedly, the epistemology of a modern philosophical branch known as Error Statistics. Before you open Mayo's brilliant textbook and dive in, it is advisable to first digest one preparatory article, an in-depth piece she authored with David Cox [11]. Patiently dig through it, read all of it with your number two pencil and pad handy, and prepare to be enlightened. This is neither light nor recreational after-dinner reading, so eschew that glass of port. Forgo the Cuban cigar, put down the bong, turn off the jazz on your iPod.

You will discover while exploring Mayo's innovative book, among its many other contents, the most cogent overview so far published regarding Neyman-Pearson (N-P) statistical hypothesis testing, what its rigmarole actually entails, exactly why the test was invented, and how it has become progressively mangled and frequently misconstrued by presumably intelligent researchers in various fields. You will also find an articulate distillation of the Duhem-Quine Problem (how many in our readership know why it is important?). There is much to learn from her engaging historical recapitulation of the infamous, damaging feud in frequentist statistics that lingered for years after 1928, alienating followers of Ronald Fisher from many folks of the N-P persuasion, setting up huge confusion in scientific literature that has persisted right into the 21st century. Finally, she reminds readers of the various logical fallacies that can poison interpretations of tests we use to subject scientific hypotheses to scrutiny. The most pernicious suspects are described, and the genesis of each is explained by Mayo.

After reading her book, I gained respect for Mayo's conception of the property she has named Test Severity. Hopefully, you will, too. This synthetic concept provides a warrant for rationally appraising scientific hypothesis test results. Such pre-treatment of grist that is headed for the inductive learning mill underpins a paradigm cleverly termed Piecemeal Learning. In sum, Professor Mayo is both a genius and an accomplished writer whose style is provocative but not dogmatic. She explains arcane points from the philosophy of science in plain words, and with frank concision. The result is a narrative that has the rhythm and ring of truth. Ignoring what Mayo has to say about how learning occurs in science would surely constitute a very serious mistake deserving a ranking at the top of your WNTD list. Her Internet blog is a scholarly forum that is fascinating and always informative. The URL for the web site is https://errorstatistics.com/.

We cannot avoid admitting that an old elephant is still clumping around our living room, and this elephant really stinks: Be aware of the widespread presence in peer-reviewed literature of “research studies” that have used knee-jerk testing of provably silly null hypotheses. It is hardly a secret that null hypothesis statistical testing (NHST) has been hijacked: It is being crudely applied willy-nilly to torture all types of observational data. As all of us surely realize, if large enough samples are used in a statistical contrast of data hypothetically arising from two compared sources (e.g., two-sample test of proportions), it is easy to obtain a significance test result that “proves” the existence of a non-zero difference between the compared sources. Particularly grating is the currently popular hocus-pocus of electronically sifting reams of data extracted from medical records and using NHST to make “Big Data Discoveries” that become widely disseminated on the Internet, by national television news programs, and in the lay press. To steal Mark Twain's wisecrack, “One does not know whether to laugh or cry.”

The NHST is so simple and fast that it may be the most frequently chosen kind of statistical hanky-panky used for rapidly producing scientific-sounding manuscripts. Some of these are beauties that come from even our “Best Universities,” the authors somehow getting their “work” accepted by peer-reviewed journals and national meeting program committees alike. Does such bottom-feeding activity somehow allow the perpetrators to feel like actual scientists? This kind of pseudo-epidemiologic activity generates noise that does not advance or refine authentic scientific knowledge. A closely related practice, drawing bogus inferences from “significant correlations” of regression plots, is examined in Silver's new book, The Signal and the Noise [12]. The highlight of its fifth chapter (“Desperate for Signal”) was a detailed discussion of bizarre earthquake prediction schemes, which included one described by the infamous “Toad Paper” that appeared in a prestigious, peer-reviewed journal. Read it for yourself (J Zoology 2010;700:1–9) and ponder how it ever was accepted for publication. Here then is sound advice to follow during your entire career in surgical infection science: Generating analogs of the Toad Paper shall be a definite WNTD.

An Oxford-trained statistician has published lamentations and sound recommendations in an obscure journal [13] we probably don't ever read. That this particular review was not written last month by an American, did not appear in Annals of Surgery, or did not come from a distinguished surgical infection aficionado are each lame ammunition for arguing that we should not read it carefully, then engage in some quiet reflection.

References

Pearl

, Glymour

, Jewell

. Causal Inference in Statistics. Chichester: John Wiley & Sons, 2016.

Gerstman

BB.

Epidemiology Kept Simple. Philadelphia: John Wiley & Sons, 2003.

Rothman

Epidemiology: An Introduction. New York: Oxford University Press, 2012.

Popper

KR.

Objective Knowledge: An Evolutionary Approach. New York: Oxford University Press, 1995.

Rosenbaum

Observation and Experiment: An Introduction to Causal Inference. Cambridge: Harvard University Press, 2017.

Woodward

Epidemiology: Study Design and Data Analysis. New York: Chapman and Hall, 1999.

Savitz

DA.

Interpreting Epidemiologic Evidence: Strategies for Study Design and Analysis. New York: Oxford University Press, 2003.

Finkelstein

MO.

Basic Concepts of Probability and Statistics in the Law. New York: Springer, 2009.

Rothman

, Greenland

, Lash

. Modern Epidemiology. Philadelphia: Lippincott Williams and Wilkins, 2008.

10.

Mayo

Error and the Growth of Experimental Knowledge. Chicago: University of Chicago Press, 1996.

11.

Mayo

, Cox

Frequentist Statistics as a Theory of Inductive Inference. In: 2nd Lehmann Symposium—Optimality. Institute of Mathematical Sciences, Monograph Series. 2006, pp 1–28.

12.

Silver

The Signal and the Noise. London: Penguin Press, 2012.

13.

Laara

Statistics: Reasoning on uncertainty and insignificance of testing null. Ann Zool Fennici, 2009; 46:138–157.