Abstract
Thousands of exoplanets are known to orbit nearby stars. Plans for the next generation of space-based and ground-based telescopes are fueling the anticipation that a precious few habitable planets can be identified in the coming decade. Even more highly anticipated is the chance to find signs of life on these habitable planets by way of biosignature gases. But which gases should we search for? Although a few biosignature gases are prominent in Earth's atmospheric spectrum (O2, CH4, N2O), others have been considered as being produced at or able to accumulate to higher levels on exo-Earths (e.g., dimethyl sulfide and CH3Cl). Life on Earth produces thousands of different gases (although most in very small quantities). Some might be produced and/or accumulate in an exo-Earth atmosphere to high levels, depending on the exo-Earth ecology and surface and atmospheric chemistry.
To maximize our chances of recognizing biosignature gases, we promote the concept that all stable and potentially volatile molecules should initially be considered as viable biosignature gases. We present a new approach to the subject of biosignature gases by systematically constructing lists of volatile molecules in different categories. An exhaustive list up to six non-H atoms is presented, totaling about 14,000 molecules. About 2500 of these are CNOPSH compounds. An approach for extending the list to larger molecules is described. We further show that about one-fourth of CNOPSH molecules (again, up to N = 6 non-H atoms) are known to be produced by life on Earth. The list can be used to study classes of chemicals that might be potential biosignature gases, considering their accumulation and possible false positives on exoplanets with atmospheres and surface environments different from Earth's. The list can also be used for terrestrial biochemistry applications, some examples of which are provided. We provide an online community usage database to serve as a registry for volatile molecules including biogenic compounds. Key Words: Astrobiology—Atmospheric gases—Biosignatures—Exoplanets. Astrobiology 16, 465–485.
1. Introduction: Motivation
T
1.1. A brief background to biosignature gases
For more than half a century, researchers have considered the possibility of inferring the presence of life on planets other than Earth. In fact, the concept of oxygen as an atmospheric biosignature gas was mentioned over 80 years ago 1 (Jeans, 1930). Early work that remains part of the paradigm for exoplanet life-detection focused on gases severely out of thermodynamic equilibrium, arguing that only life could maintain thermodynamic disequilibrium in the atmosphere. Specifically, the detection of oxygen (O2) and methane (CH4) (Lederberg, 1965; Lovelock, 1965) was suggested as the most robust atmospheric evidence that Earth supported life. O2 and its photolytic product ozone (O3) as well as CH4 have been extensively studied as biosignature gases in their own right (e.g., Léger et al., 1996; Schindler and Kasting, 2000; Des Marais et al., 2002; Segura et al., 2003; Kaltenegger et al., 2007). The history of O2 as a biosignature gas and the pros and cons of a thermodynamic disequilibrium as a sign of life are critically reviewed in Seager and Bains (2015). The subject of O2 false positives has been considered with growing interest (Schindler and Kasting, 2000; Selsis et al., 2002; Léger et al., 2011; Hu et al., 2012; Domagal-Goldman et al., 2014; Tian et al., 2014; Wordsworth and Pierrehumbert, 2014; Harman et al., 2015; Luger and Barnes, 2015). Part of the exoplanet community hopes that future-generation telescopes will obtain high-enough quality spectra to provide the planetary environmental context with which to enable sufficient confidence in the identification of O2 as a biosignature gas.
Gases (other than O2 or CH4) that are produced by life on Earth have also been studied, including nitrous oxide (N2O) (Des Marais et al., 2002), dimethyldisulfide (DMDS) (Pilcher, 2003), methyl chloride (CH3Cl) (Segura et al., 2005), and dimethyl sulfide (DMS) and other sulfur gases (Domagal-Goldman et al., 2011). The planetary environment influences the destruction of gases, including surface chemistry and especially the host star EUV radiation that drives photochemistry in the planetary atmosphere (e.g., Segura et al., 2003; Tian et al., 2014). Yet other than differing stellar radiation environments, most biosignature gas research to date focuses on Earth-like planets—that is, planets with Earth-like atmospheres and Earth-like bioflux gas sources.
The notion of habitability is anchored in the concept of surface temperature suitable for life, specifically for the existence of liquid water on the surface. Because the planetary atmosphere masses and compositions—and hence a planet's greenhouse effect that controls the surface temperature—are unknown and expected to be varied (Elkins-Tanton and Seager, 2008), habitability is planet-specific (Seager, 2013). A wide diversity of planet types are expected, with all masses, sizes, and orbits apparently in existence, as far as observations can ascertain (Howard, 2013; Winn and Fabrycky, 2015). Planets orbiting well interior (Abe et al., 2011; Zsom et al., 2013) or exterior (Pierrehumbert and Gaidos, 2011) to an Earth-like planet's habitable zone boundaries (Kopparapu et al., 2013) must be considered. The planet diversity also could well extend to the surface sources and sinks of gases, especially the redox state of the planetary surface—a wide variety of habitable planets have been hypothesized in this regard, from water worlds (Kuchner, 2003; Léger et al., 2004), to planets with hydrogen-rich atmospheres (Pierrehumbert and Gaidos, 2011; Seager et al., 2013a), to Venus-like worlds (Schaefer and Fegley, 2011) and planets with increased volcanism (Kaltenegger and Sasselov, 2010; Hu et al., 2013). The extreme and far UV radiation that drives atmospheric photochemistry will vary depending on host star type and age (Guinan et al., 2003; Shkolnik and Barman, 2014). Last but not least, there should be a diversity of net bioflux emission levels on an exoplanet, but any estimation is beyond current solution (Seager et al., 2013b). This physical and chemical diversity will affect which molecules accumulate in a planet's atmosphere and what false-positive gas sources might occur. The diversity of known exoplanets and the anticipation of diversity of habitable planets therefore further motivate our investigation of alternative volatile biomolecules that might be significant in the atmosphere of a planet other than Earth.
1.2. Gases produced by life (on Earth)
In addition to the diversity of exoplanets, we must recognize the diversity of molecules produced by life on Earth.
The chemicals produced by life on Earth are numbering in the hundreds of thousands [estimated from plant natural products (Gunatilaka, 2012), microbial natural products (Sanchez et al., 2012), and marine natural products (Fusetani, 2012)]. But only a subset of hundreds of these are volatile enough to enter Earth's atmosphere at more than trace concentrations. Only a few of the gases produced by life on Earth—O2 (and O3), CH4, and N2O—have been detected in Earthshine and spacecraft observations of the spatially unresolved “Earth as an exoplanet” (e.g., Christensen and Pearl, 1997; Turnbull et al., 2006; Palle et al., 2009; Robinson et al., 2011). These observations show us what might be possible from space telescopes capable of finding and characterizing Earth-like planets orbiting Sun-like stars in reflected light observations (e.g., Seager et al., 2015; Stapelfeldt et al., 2015).
It is interesting to recognize that life produces all the gases in Earth's atmosphere (specifically the troposphere) present at the parts-per-trillion level by volume or higher (see Appendix Table A1), with the exception of the noble gases. Out of 47 known volatiles in Earth's atmosphere observed at the parts-per-trillion level, 42 are known to be biogenic, though not exclusively so. If we exclude the 10 entirely anthropogenically produced chlorofluorocarbons, the numbers reduce to 37 and 32, respectively. Most of Earth's atmospheric gases are, of course, not unique to life; moreover, life is not the dominant source of atmospheric gases for most cases. Some atmospheric gases are already a basic atmospheric constituent (e.g., N2, CO2, and H2O). Many are produced by geological processes (e.g., CH4 and H2S). However, the relative rate of production of a gas by life is a function of both geological and biological production rates, and both are specific to the planet. On other worlds, biology could be the dominant source of any gas.
The gases produced by life on Earth, when organized into two broad categories, create a conundrum. The gases expected to be produced in abundance by life are ones that are also rife with false positives. These are by-product gases from biological energy extraction from chemical potential energy gradients (combinations of chemicals that are out of thermodynamic equilibrium) that are common and geochemically produced. Such chemical potential energy gradients are widely exploited by life on Earth, but geochemistry has the same gases to work with as life does. So while in some environments life is needed to catalyze the reaction of the disequilibrium chemicals, in others the same reactions will be spontaneously occurring. An example is methane. Methane is a by-product of methanogenesis, but it is also released from vents at mid-ocean ridges because of hydrothermal, abiotic chemistry.
The second broad category of gases is the class of biosignature gases produced for secondary or unknown reasons, such as stress or signaling. Such gases are often organism- and mechanism-specific and hence are expected to be produced in small quantities. But because they are so specialized they will in most cases likely not have geochemical sources that could lead to a false positive in the search for life. Some of these specialized gases are produced in amounts sufficient to affect overall atmospheric chemistry (such as isoprene or DMS) and to possibly be detected remotely on exoplanets (such as methyl chloride; Segura et al., 2005). Which gases are produced in large amounts and which are produced only as trace gases is determined by the functional needs of the organism. The maximum productivity is always constrained by resource limitations, diffusion limits, and energy availability. In the absence of knowledge of the function served by a gas in an alien ecology, the choice of which gas is made in large amounts can appear arbitrary.
For completeness, we mention two other classes of biosignature-related gases. One class is biosignature gases produced as by-product gases from energy-requiring metabolic reactions for biomass building. On Earth these are reactions that capture environmental carbon (and to a lesser extent other elements) in biomass. An example is photosynthesis, which produces O2. The other category is photochemical or chemical reaction by-products of biosignature gases, such as O3 as produced from O2. For a more detailed discussion of the above-described classes of biosignature gases, see Seager et al. (2013b).
We choose to focus on gases emitted by life, rather than solid products or features. For example, while the vegetation “red edge” (for a description, see the review by Arnold, 2008) and other spectral features due to pigments in vegetation or bacteria have been studied as biosignatures in reflected light of Earthshine, their signals are weak (partly due to limited surface coverage) and diminished by clouds (Montañés-Rodríguez et al., 2006). For other planets, surface biosignatures are another area of research (spectra, cover, signal strength) and are not within the scope of this paper. Similarly, technological signatures as signs of life on exoplanets are not within the scope of this paper.
1.3. A new approach
All life on Earth makes gas products, and basic chemistry suggests the same will be true of any other plausible biochemistry. The question is, what products? Given the multitude of gases produced by life on Earth and the very different planetary atmosphere, surface, and stellar radiation environments anticipated on exo-Earths, we are motivated to propose a new approach. The first step is to come up with a list of all molecules that are stable and potentially volatile, not just the molecules produced by life on Earth. The second step is to consider each molecular gas and its viability as a biosignature gas on exo-Earths within different exoplanetary atmosphere and surface environments and based on the strength and wavelength range of its spectroscopic signature. This second step must also include an assessment of false positives and whether or not other gases and their spectral features might support a false-positive scenario. The third step is in the future, to use the future space-based telescopes to search for these candidate biosignature gases on yet-to-be-discovered exo-Earths.
This paper describes the first step: generation of a list of molecules for biosignature gases. As an illustration of the utility of the list beyond exoplanet research, we comment on our findings on the fraction of chemicals in the list that are produced by terrestrial life.
2. Methods
We construct a list of molecules that are stable and likely volatile as pure compounds at standard temperature and pressure (STP). Materials made of small molecules are likely to be volatile, meaning more likely to be in gas form in a planetary atmosphere. Our list is therefore constructed starting with small molecules. We present a list of molecules up to N = 6 non-H atoms. In Section 4 we cover limitations for our method choices, including STP and proxy for volatility, and in addition describe how to extend the list to all molecules that might be volatile and stable.
To construct the list we divide molecules into three categories, for practical purposes.
The first category is that of molecules that only contain the elements C, N, O, P, S, and H, “the CNOPSH List” (see Section 2.1). The second category is defined as any molecule not falling into our first category that also does not contain a halogen atom, “the Extended CNOPSH List” (see Section 2.2). The third category is halogenated compounds, which, although they fall into either the first or second categories, are considered on their own due to the extensive numbers of such compounds, “the Halogenated Compounds List” (see Section 2.3). A summary of the approaches that went into construction of the list of small molecules for all three of our categories is illustrated in Fig. 1.

Illustration of the functional approach to building the list for all small, stable, volatile compounds, for molecules with N ≤ 6 non-H atoms. Four types of sources of molecules are shown, one in each segment of the figure. The top left shows the compounds that came out of combinatorical construction (for CNOPSH and halogenated compounds), with the total number listed outside the diagram. The top right shows the molecules from database trawling. The bottom right shows compounds obtained from searching the chemical literature (mostly for the P-block and metal-containing compounds within the Extended CNOPSH List), and the bottom left shows the chemical compounds from a targeted search for chemicals made by life, especially low-molecular-weight compounds. The numbers inside the diagram give the number of molecules found in the respective searches and show the overlap. For example (top right), 416 molecules were found in chemical databases only, and 82 were found in both databases and the chemical literature search.
2.1. The CNOPSH List
To construct the list of CNOPSH compounds, we took two approaches. The first started with a combinatorics approach, and the second was a supplemental database search (described in Section 2.2.2).
2.1.1. Combinatorics approach for CNOPSH
The first approach in the list construction of CNOPSH compounds is to consider molecules built from the six principal biogenic elements, carbon (C), nitrogen (N), oxygen (O), phosphorus (P), sulfur (S) and hydrogen (H). We call this the CNOPSH category of molecules. We wrote a chemical combinatorics algorithm to generate all possible molecular formulae with up to and including six non-H atoms, a cutoff that is explained below.
We describe a molecule by its non-H atoms because it enables a simple implementation of the structural definition of a molecule as a set of non-H atoms connected to each other by at least one bond each. Any valency not involved in CNOPS bonding is assumed to be populated with an H atom. Each CNOPS combination can have a different number of H atoms populating the open valencies, depending on the arrangement and order (i.e., single, double, or triple) of bonds present among the non-H atoms. In other words, several molecular formulae are generated from each CNOPS combination because different bonding patterns are possible between the atoms, with all unfilled valences filled with H atoms. [For example, C2O could be C2H6O (ethanol or dimethyl ether), or C2H4O (acetaldehyde or ethylene oxide).] We are interested in volatile molecules, so we emphasize that the atomic weight of H is so relatively small that addition of H atoms does not make a substantial difference to the molecular weight (a physical property related to volatility).
We chose the cutoff value of N = 6 non-H atoms for purely pragmatic reasons. The number of possible molecules is an exponential function of the number of non-H atoms (Bains and Seager, 2012). As we had to manually curate all the nonhalogenated molecules for stability and volatility, and manually search the literature for life production, a cutoff of N = 6 was imposed to keep the curation practical. The advantage of adopting the N = 6 cutoff is that a substantial fraction of the molecules will be volatile. As the molecules get larger, fewer and fewer will be volatile, so predicting volatility becomes more and more important (see Section 4.1.1).
We could include H atoms in our structural definition and limit ourselves to N = 6 atoms including H. However, this would exclude many simple molecules known to be important for life, such as ethane (C2H6), and would bias the approach toward generation of more oxidized molecules. Alternatively, we could consider all molecules with a large number of atoms (e.g., N = 12), which would include ethane. However, this would include an unmanageably large number of molecules in total (possibly as many as 3 × 107; Bains and Seager, 2012).
In summary, while the list description of molecules “with up to N = 6 non-H atoms” is an awkward formulation, it is a compromise that avoids artificial biases in generation of our list, includes molecules already studied as biosignature gases, and ensures the practicality of having to research each molecule by manual literature search.
In the combinatorics code, CNOPS atoms were put together in every possible combination. To each of these combinations, the code added the maximum number of H atoms, assuming the lowest-order bonding between atoms, that is, assuming the highest number of valencies were left open for each atom. The computer code then proceeded to consider lower numbers of available valencies with fewer H atoms added. Numerically, the algorithm for the number of H atoms added can be summarized by N H = NV - 2*(a - 1) - 2*n, where a is the number of non-hydrogen atoms; n is an integer n = 0, 1, 2, 3 … such that N ≥ 0; NV is the number of available valence slots for hydrogen atoms (assessed by summing C = 4, N = 3, O = 2, P = 3, S = 2). We note that any missing oxidation states were later included during a chemical databases search approach (see Section 2.2.2).
Not all the molecular formulae will represent realistic chemical structures (i.e., real molecules), and some molecular formulae represent more than one chemical structure. To identify which molecular formulae in the list actually describe one (or more) real chemical structures, we queried the ChemSpider database to identify chemical structures matching each formula. This query process ruled out some number of formulae as not representing real molecules and identified that other formulae represented more than one real molecule. We eliminated radical or molecular fragments from the data set by confirming the reality of molecular structures in the PubChem database, which is focused on experimental study of organic chemicals and hence is more likely to be populated by molecules that actually exist. This filtered out about 40% of the ChemSpider hits (radicals or molecular fragments). After the above filtering, the combinatorically generated CNOPSH list contained about 1300 molecules. After the database and chemical supplier searches (see Section 2.2.2), the number totaled about 2500.
2.2. The Extended CNOPSH List
We call our list of compounds beyond strictly CNOPSH the Extended CNOPSH List. This list includes both organics and inorganics.
2.2.1. Manual search for the Extended CNOPSH List
The number of inorganic and organic, stable, volatile molecules that add to the CNOPSH List is much smaller than those in the CNOPSH List itself; hence we proceeded with a manual search. The first approach was to explicitly search the ChemSpider database for any molecule with a P-block element, up to a molecular mass of 400 Da. There is a high probability that anything with a molecular mass larger than 400 Da will not be significantly volatile. More specifically, the search was for compounds likely to be stable to hydrolysis containing at least one atom from the nonradioactive, nonhalogen P‐block elements other than CNOPS (i.e., B, Al, Ga, In, Tl, Si, Ge, Sn, Pb, As, Sb, Bi, Se, Te) or an element from Group IIB (Zn, Cd, or Hg). Note that these compounds could include the elements CNOPS as well. Compounds containing other metals were also considered, but there are very few such stable volatile compounds; while included for completeness, their stability to hydrolysis has not been systematically tested. The Extended CNOPSH List totals about 1500 and after a stability check is reduced to 862. By stable we mean the intrinsic stability of chemical bonds in molecules as well as hydrolytic stability (reactivity with water). Stability is further addressed in Section 2.4 and Section 4.1.1.
2.2.2. Database trawling for both the CNOPSH List and the Extended CNOPSH List
The second approach in constructing the list of CNOPSH molecules and the extended list of molecules (and also the list of halogenated compounds) that are stable and volatile was to trawl several different chemical databases (see Table A2) of physical properties of molecules (including silicon, germanium, and organometallic compounds) for any compound with a boiling point below 150°C (measured). The boiling point is included for our proxy for volatility. A cutoff temperature choice is made to limit numbers; the choice of 150°C is conservatively inclusive, as at STP any compound with a boiling point above 150°C will have very low vapor pressure and hence will likely be nonvolatile (Boublík et al., 1973). A relatively small subset of the database trawling approach overlaps with the combinatorics approach (described above in Section 2.1.1), but most molecules were new to the list. In other words, these volatile molecules could include atoms other than CNOPSH, including silicon (and halogens, which are included in the list described in Section 2.3).
We also compiled a list of compounds provided by chemical suppliers (see Table A2) of molecular weight 150 or less. These compounds by definition are stable but are not necessarily volatile.
2.3. The Halogenated Compounds List
Halogenated compounds are treated separately for expediency in terms of search efficiency, because of their extensive numbers. Halogenated compounds fall into both organic and inorganic compounds. Inorganic compounds containing halogens were collected as part of the process for the Extended CNOPSH List, described in Section 2.2. Note, however, that the majority of inorganic halogenated compounds are not considered as stable for our purposes, as they are very reactive, particularly to water, and so are implausible atmospheric components. (There are a few exceptions, notably fluorinated compounds discussed below.)
The organic compounds containing halogens (halocarbons) are extensive in number. We constructed lists of members of 10 classes of organic chemicals and then exhaustively substituted the C-H bonds with C-halogen bonds (using Cl, Br, I, and F). For up to N = 6 non-H atoms, this resulted in a total of 10,463 compounds. Out of the total halogenated compounds, 3876 contained F.
2.4. Assessment for stability and volatility
Only stable molecules can accumulate in a planetary atmosphere and are potential biosignature gases. By stability we mean compounds that are stable on the order of days as pure entities at Earth's surface temperatures and pressures (STP) and are likely to be stable to reaction with water. To assess for stability, we take two approaches. First we consider stability of molecules by analogous groups (such as amines, esters, acid chlorides, etc., which are defined by their chemical reactivity), so we do not have to empirically determine the functional stability of every molecule. (Thus, for example, trimethylamine, ethyldimethylamine, methyldiethylamine, etc. can be treated as one group with respect to stability.)
Next we assess stability of analogous groups of molecules at two levels: the stability against the reaction with water and the intrinsic stability of a pure compound. Reactivity with water is critical because most biochemical reactions take place in water (e.g., in the cytoplasm of the cell) and the immediate environment may also be water-based. Reactivity with water is determined by a literature search. Intrinsic stability of a class of molecules is evaluated by checking if its physical properties have been measured (e.g., a particular melting or boiling point or IR spectrum experimentally measured). [Note that stability and volatility in hypothetical exoplanetary atmospheres is an application, not a property of the list, and must be assessed on a case-by-case basis and is an extensive endeavor (Section 4.2.1).]
There is no reliable way to predict volatility from theory. By volatility we mean that the partial pressure over a pure compound at STP is a substantial fraction of 1 atm. Boiling point is a convenient single estimator for volatility. We use measured boiling points when available. Accurate prediction of volatility and boiling points from theory is not possible, so where measured boiling points are not available we use the estimated boiling point as provided by the chemical software EpiSuite 2 (Stein and Brown, 1994).
As previously mentioned (Section 2.2.2), a cutoff temperature choice is made to limit numbers; the choice of 150°C is conservatively inclusive as at STP any compound with a boiling point above 150°C will have very low vapor pressure and hence will likely be nonvolatile. We also chose 150°C because of its relationship to stability; if a molecule is unstable, it will also not exist to be volatile. This is the temperature at which a wide variety of organic molecules become unstable to hydrolysis (Bains et al., 2015) and hence the temperature at which life based on carbon chemistry in water becomes implausible (Kashefi and Lovley, 2003; Cowan, 2004). At temperatures significantly above 150°C, organic matter is degraded wholesale on a timescale of hours or minutes (Katritzky et al., 1996). (Note that non-carbon-based life is beyond the scope if this paper.)
2.5. Assessment for production by life on Earth
To assess what fraction of the list of small, stable, and volatile molecules are produced by life on Earth, we took two separate approaches: a database approach and a manual literature search. The first approach was to automatically query two databases of molecules produced by life, ZINC 3 (Irwin and Shoichet, 2005) and the Dictionary of Natural Products 4 (Buckingham, 1993). These databases include references for the organism name from which a given molecule was isolated. We did not use the extensive UNPD 5 database (Gu et al., 2013), because it does not easily include a method to go from molecule of interest to biological source information.
The manual literature approach involved review of summary papers, chemical classes produced by life, and a targeted search of the literature for specific chemicals, especially inorganic volatiles (because they are rarely incorporated into the databases). References used are listed in the tables, with examples provided in Tables B1, B2, and B3.
3. Results
3.1. A list of stable, volatile molecules up to N = 6 non-H atoms
The main result of this work is the list of molecules that are stable and volatile at STP up to N = 6 non-H atoms. The list is divided into our three classes of CNOPSH molecules, an extended list of CNOPSH molecules, and a list of halogenated compounds. Sample lists are provided in the Appendix in Table B1, Table B2, and Table B3, respectively. Full lists are available at the URL noted in Section 5.
The tables contain the following information. The IUPAC name is included as a standard and unique reference name. The chemical structure information is provided in SMILES string format. The molecular weight is a convenient description of a molecule for additional filters or searches. The boiling point is our proxy for volatility. The boiling point, a relatively simple physical property, is not known for many molecules (for some molecules it has been predicted, indicated by a p = predicted vs. an e = experimental). A column to indicate whether or not the molecule is produced by life is followed by the reference for production by life.
The tables, as expected, contain a large number of organic molecules, a relatively small number of P-block element–containing molecules, very few volatiles that contain metals, and a large number of halogenated compounds. The sample tables provide illustrative entries of the diversity of compounds that could form volatile molecules at STP.
The total number of atoms in a molecule with N ≤ 6 non-H atoms varies greatly and depends on to what extent the molecule is reduced. Thus our total list of all categories contains molecules containing six non-H atoms that have as many as 19 (e.g., methyl diethylamine: C5H13N) or 20 atoms (hexane: C6H14) and as few as 6 (tetrachloroethene: C2Cl4).
Our list of volatile molecules has value in its applications in astrobiology. We now turn to one such application.
3.2. The fraction of molecules known to be produced by life on Earth
The list of all volatile, stable molecules is used to assess the fraction of molecules in the list that are known to be produced by life on Earth. This is partly to inform the search for biosignature gases and partly to further chemoinformatics. The tables (B1, B2, B3) indicate whether or not the molecule is known to be produced by life on Earth.
The value of the tables is in future applications (see discussion in Section 4.3), and here we provide a summary of our statistics.
3.2.1. CNOPSH life-producing molecules
We surveyed all possible potentially volatile stable molecules of the six biogenic elements CNOPSH of up to N = 6 non-H atoms. We find that about 25% are known to be made by life. It is notable that about 63% of those produced by life were not listed in natural product databases (ZINC and DNP) but only recovered from a manual literature search (see Fig. 2), which may be because the existing databases are primarily oriented to pharmaceutical discovery rather than exhaustive cataloging metabolites. The molecules with non-H atoms N ≤ 6 are too small to provide enough specific points of interaction with pharmaceutical targets and are therefore not considered to be good candidates for development as drugs.

Comparison of molecules produced by life as found in published databases and found by a manual literature search. For CNOPSH compounds only.
There may be yet more molecules produced by life that are not yet known. Experimental searches may find them; see Section 4.3.1.
3.2.2. Extended CNOPSH life-producing molecules
Life uses elements other than CNOPSH elements in a very limited number of compounds (e.g., Berg et al., 2002; Wackett et al., 2004). For example, selenium is used widely by life but only for one metabolic function—as a component of glutathione peroxidase, an enzyme widely involved in cellular defense against toxins, metals, and reactive oxygen species (Burke, 1983). Another example is detoxification of heavy metals, which is carried out by their volatilization (e.g., dimethyl mercury). It is not known why some P-block element–containing compounds are used and others are not (Thayer, 2002; Wackett et al., 2004). For example, we found that life produces methyl germanes (Hirner et al., 1998; Rosenberg, 2008) but not methyl silanes (Tacke, 1999), even though silicon and germanium have very similar chemistry (Greenwood and Earnshaw, 1997). We have found that 43 compounds falling into our Extended CNOPSH List are made by life, out of 862 total. (See Table B2.)
3.2.3. Halogenated compound life-producing molecules
Life produces a diverse set of compounds with Cl, Br, or I bonded to a carbon atom. These include small molecules such as methyl chloride (which has already been considered as a biosignature gas; Segura et al., 2005) as well as much more complicated molecules such as the antibiotic tetracycline.
Life does not produce all small organohalogen molecules. There are three categories of halogenated compounds not known to be produced by life. The first category is molecules that are structurally similar to molecules that are known to be produced by life but that have not been detected experimentally. This category is for compounds with Cl, Br, I bonded to carbon. For example, many of the methyl halides are known to be made by life, but some are not (Fig. 3 and see sample Table B3). There does not appear to be any physicochemical or chemical pattern to predict which are made by life and which are not (Gribble, 2003; Paul and Pohnert, 2011). We could speculate that a more exhaustive experimental search for these compounds could expand the list of halocarbons produced by life. Many of them can be easily overlooked, especially those that are produced in trace amounts or by rare sets of organisms.

Halomethanes (excluding F) produced and not produced by life. A halomethane compound is a derivative of methane (CH4), where one or more of the H atoms are replaced by a halogen atom. A carbon atom is implicit in the structures shown in this figure. Out of the 34 (non-F) halomethanes, 12 are not known to be produced be life. Why just 22 are produced by life is not known.
The second category of halogenated volatiles not produced by life is fluorine compounds. Life produces very few organofluorine compounds, and their production appears to be very specialist biochemistry restricted to a small number of species [such as fluoroacetone (e.g., Walker and Chang, 2014)].
The third category is the halogens bonded to any atom other than carbon. Such bonds are usually quite hydrolytically unstable (highly reactive with water, such as the P-Cl bond) or extremely reactive (such as oxides of chlorine or fluorine) so would not be expected to be found as stable chemicals accumulating in a planetary atmosphere. Hypochlorite (chlorine bonded to an oxygen) is one exception of a stable molecule in this category.
Of the 5708 CNOPS compounds that contain Cl, Br, or I, 103 are known to be made by life. Only 3 of 206 inorganic halogenated compounds are known to be made by life (Table B3). However, the halogenated compounds reported are those that are made in the largest amounts by easily accessible species, usually seaweeds (Gribble, 2003; Cabrita et al., 2010; Paul and Pohnert, 2011; Pomin, 2011). Out of the total 4109 F-containing compounds 6 , only three are made by life [fluoroacetate (Moss et al., 2000; Murphy et al., 2003), fluoroacetone (Peters and Shorthouse, 1967), fluoroacetaldehyde (Moss et al., 2000)]. Given that over half of all possible chloro-, bromo- and iodomethanes are made by life (Fig. 3), it is possible that many more halogenated biochemicals await discovery.
4. Discussion
4.1. Caveats and future extensions to the list
4.1.1. Challenges for assessing stability and volatility
For this work we have considered an approximation that all small molecules can be considered to be volatile. Here we review the caveats to our approach, first starting with some background.
A compound is volatile if the interactions between the molecules are of similar energy to the thermal energy of the molecules. Volatility is also influenced by noncovalent interactions. Noncovalent interactions between molecules are van der Waals–type forces, dipole interactions including hydrogen bonds, and electrostatic or charge interactions. Van der Waals interactions to a first order are dependent on molecular weight, so small molecules tend to be volatile. Dipole interactions are dependent on structure, but for small molecules even the most highly polarized molecules such as HF are still volatile. Charges on molecules render them nonvolatile (at least at ambient temperatures). However, for most organic molecules that have a charge at neutral pH, there is a pH at which they are uncharged and hence volatile. (This is not true of quaternary ammonium salts and zwitterions.)
Volatility was assessed by taking measured or calculated boiling points at standard pressure from ChemSpider. For searching ChemSpider and for generating combinatorial sets of halogens, we used a cutoff of a boiling point of 150°C to limit the number of compounds considered and for stability reasons (Section 2.4). The adopted cutoff is a conservative cutoff, such that further work with larger molecules than those included in our list of all small molecules should be investigated more carefully. For smaller data sets, including literature searches, we truncated searches based on molecular size and manually checked that the compounds were volatile based on boiling point.
The 150°C cutoff for volatility is limiting because we may have included molecules that may have low volatility at STP. The cutoff could be replaced by a molecule-by-molecule effort to evaluate its vapor pressure at STP. Individual molecules of choice will have to be evaluated in detail for the specific environment of interest, especially for non-STP conditions for exoplanetary environments different from Earth.
Turning to stability, we used a manual literature search for inherent stability and stability to hydrolysis. Stability is not easily predictable from chemical structure, but this may be less of a problem for larger organic molecules than for the small ones we worked with, as one can argue the stability of a large molecule by analogy with smaller analog molecules.
Stability to hydrolysis depends on the rate at which the most labile bond in a molecule is cleaved by water. For many molecules it is obvious from inspection that there is no water-labile bond present (easily attacked by water). For example, the halocarbons are deemed to be inherently stable to hydrolysis, as the sp3 carbon-halogen bond is inherently stable to hydrolysis. For others, it is clear that there is a bond that will be rapidly attacked by water, for example, the silicon-chlorides. Stability and instability of some classes of molecules can therefore be determined by inspection. For other molecules, stability to hydrolysis cannot be easily determined by inspection. For these we conducted a literature search to look for experimental measures of stability or instability. The challenge is how to assess the class of molecules intermediate between stable and unstable, particularly in the absence of data, and how to extend our understanding of stability to stable or unstable molecules at conditions other than STP. For example, at the bottom of the atmosphere, conditions on Earth approximate STP, and hydrolysis is dominant. However, at the top of the atmosphere, pressure and temperature differ substantially from the surface, and photolysis is the dominant destruction mechanism for most molecules.
We chose STP because this is where most properties are measured or calculated. While STP is appropriate for terrestrial-based applications of our list of molecules, this is a limitation for exoplanets whose environments will differ from STP. Stability and volatility away from STP will have to be estimated or measured for specific biosignature gases of interest; fortunately the range of temperatures and pressures is not infinite but largely constrained to those that support liquid water. Extensions beyond STP are a huge and demanding piece of research that we hope will be initiated in the future.
We consider stability to hydrolysis to be an important feature for a biosignature gas for mainstream astrobiological studies. The consensus is that water is the most likely solvent for life and so will be present on the surface of any inhabited world. In more detail, if we assume that life is based on water, then any volatile molecule will be made in water and will have to diffuse out of the cell in which it is made. While the instability of a molecule in life's surface surroundings and air is also a factor, it is likely not the main factor. Life's existence in more exotic solvents than water has been suggested, with the solvents including water/ammonia mixtures (Fortes, 2000), liquid methane or nitrogen (Bains, 2004; McKay and Smith, 2005), and supercritical CO2 (Budisa and Schulze-Makuch, 2014). If these alternative solvents are to be considered seriously, then reactivity to the appropriate solvent will have to be substituted for reactivity to water. This is work for the future.
4.1.2. Completeness of the list
In construction of a database, one must ask if anything is missed, either important categories or a substantial number of entries within an established category. In terms of search method, our multipronged approach should not have missed any molecules in our computational, database, and literature sources (see Fig. 1). Out of the 85,000 or so references collected, we found about 68,000 unique structures, out of which 14,332 were N ≤ 6 non-H atoms.
Any missing molecules may be from the challenge of assessing volatility and stability. Here we aimed on the conservative side by including molecules that may have limited volatility or stability.
List completeness is not considered a problem for our artificial cutoff for molecules with N ≤ 6 non-H atoms. It may well be a big challenge for a complete list of all volatile and stable molecules at STP, when applications in chemical space require completeness and where the manual multipronged search approach may not be practical.
4.1.3. Extending the list
The validation process of identifying chemicals and searching for stability measures, physical processes, and production by life is very labor intensive. While some processes may be automated (generating molecular structures is straightforward via computation), others are more challenging. Physical properties that are not known cannot be easily calculated. Finding out from the literature whether or not a chemical is known to be made by life is difficult because the literature is not indexed to be searched by chemical structure and the databases of molecules made by life are quite incomplete (Fig. 2). Even identifying the papers requires expertise and is not just a matter of search terms. Understanding which chemicals in the papers are genuinely biological products also requires substantial expertise. Distinguishing between a true natural product (chemical that is a product of a normal metabolism of a living organism) and a potential metabolite of an industrially synthesized chemical such as a drug or a pesticide can sometimes be challenging (see the example of aminourea in Van Poucke et al., 2011) 7 . We advocate for a community effort to register biological molecules.
The CNOPSH List. The number of compounds increases nearly exponentially (based on combinatorics) with the number of non-H atoms. This is because replacing a hydrogen with one new atom from CNOPS increases the opportunity to add more atoms; that is, the rate of change of the number of atoms is a function of the number of atoms. A completely exhaustive combinatorics method is one method, with a series of rules and checks to screen for real molecules. This approach is impractical for our purposes, as it generates trillions of molecules. A more practical approach might be combinatorics based on substructures that can be combined, or using genetic algorithms of graph theory. There are such established computational techniques for drug design for CNOSH molecules, such as those used for the Chemical Universe Database (Reymond et al., 2012), which results in nearly 1 billion structures. Generally the larger/heavier the molecule is, the less likely it is to be volatile. Although boiling point measurements usually are not available and rigorous calculation is very difficult, estimates are straightforward.
The Extended CNOPSH List. Our approach to inorganic compounds and others in the Extended CNOPSH List so far was manual and could be extended by a combination of manual and computational methods. Our existing list could be extended by replacing H atoms with organic groups (or halogens). As a complement we could take our current list of organic compounds, take every C-H bond, and replace the H with any P-block element. In any case, we would expect to reach the volatility limit relatively quickly, but volatility would be difficult to calculate.
The Halogenated Compounds List. The Halogenated Compounds List would be extended from the CNOPS List as already described (Section 2.3). Prediction of the boiling points is fairly reliable for hydrocarbons, the largest class. We must emphasize the anticipated staggeringly large number of fluorinated compounds that will be stable and volatile. There are (for example) five hexanes (C6H14 compounds: n-hexane, 2-methyl pentane, 3-methyl pentane, 2,2-dimethylbutane, and 2,3-dimethylbutane) but 2460 different hexane fluorocarbons, all of which are expected to be stable to heat and hydrolysis and to have a boiling point <80°C.
Production by life. A manual literature search for production by life is the only option at this point (Section 2.5). This is the limiting factor for making progress in any application described below.
4.2. Potential applications for astrobiology
4.2.1. Exoplanets: path forward for a list of potential biosignature gases
We expand on our initial motivation for constructing the list of all volatile, stable molecules: the concept that any gas could potentially be a biosignature gas accumulating in the atmosphere of another world. This concept is supported by many different examples.
Chemicals produced as a result of secondary metabolism (metabolism not related to acquiring energy or biomass buildup) can have very diverse functions that depend on an evolutionary history of a species. More specifically, volatile chemicals produced by life as a result of secondary metabolism could be used for signaling many behavioral cues like aggression, defense, sexual attraction, trail following, and other means for interaction with the environment [thousands of examples are given in the Pherobase 8 (El-Sayed, 2014)]. Many volatile molecules produced as carriers of such signals are unique and originated from coevolution among species; hence it is impossible to predict which specific volatile molecule will be the carrier of which behavioral signal (e.g., Tirindelli et al., 2009; Deisig et al., 2014; Steiger and Stökl, 2014). An illustration of such a multispecies, intricate, and complex volatile chemical signaling network was discovered recently between apple trees, great tits (Parus major), and caterpillars of Operopthera brumata (Amo et al., 2013). Great tits respond to specific volatile molecules (dodecanal and alpha-farnesene) emitted by apple trees infested by Operopthera brumata (Amo et al., 2013). Attracted to the trees, the great tits eat the caterpillars. The identity of volatile chemicals produced by caterpillar-infested trees and those that are free of them is very different and specifically tailored to Parus major olfactory receptors (Amo et al., 2013).
Methyl chloride (CH3Cl) and dimethyl sulfide (CH3SCH3 or DMS) have been studied as exoplanetary biosignature gases (Segura et al., 2003, 2005; Scalo et al., 2007; Domagal-Goldman et al., 2011; Rugheimer et al., 2013; Seager et al., 2013a, 2013b; Rugheimer et al., 2015) and are additional examples of the diversity of gases produced by life whose production is not linked in any predictable way to the physical or chemical properties of Earth. Rather, they happen to be produced in relatively large amounts by organisms that are also relatively common (Bates et al., 1993; Kiene et al., 2000; Yoshida et al., 2004).
Dimethyl sulfide is believed to be produced overwhelmingly by oceanic plankton from dimethylsulfoniopropionate (DMSP; Alcolombri et al., 2015), but other routes such as the methylation of H2S or methanethiol by photosynthetic, partially anaerobic bacterial mats (Visscher et al., 2003) are minor sources on Earth and may be more substantial routes of production on other worlds. DMSP itself is mainly made by oceanic phytoplankton, although the reason they make DMSP is controversial (reviewed in Sunda et al., 2002; Brimblecombe, 2003), but DMSP is also made by macroalgae (Stefels, 2000) and some species of land plants (Otte et al., 2004). Some of those land plants are associated with bacteria that produce DMS from the DMSP (Todd et al., 2007; Johnston et al., 2008), but it is unclear why the plants make DMSP (reviewed in Otte et al., 2004), and the reason is probably different from the reason that marine phytoplankton make it. This emphasizes that the biochemistry and ecology of a potential biosignature vary even among Earth life and so cannot be assumed to be the same on other worlds.
Methyl chloride is specifically synthesized by an enzyme system that halogenates small hydrocarbons (Ni and Hager, 1998; Vaillancourt et al., 2006). Although the biosynthetic pathways are well understood (Rhew et al., 2003; Roje, 2006), the exact ecological role and physiological function of methyl chloride remain to be determined, despite earlier suggestions that methyl chloride is produced as a toxicant to deter predators and/or suppress competitors (Hartmans et al., 1986; Manley, 2002). Methyl chloride is widely produced by many plant and algal species (Wuosmaa and Hager, 1990; Nadalig et al., 2011; Rhew et al., 2014) and also by fungi and marine bacteria (Tait and Moore, 1995; Khalil and Rasmussen, 1999).
While some biogenic gases and biochemicals can be predicted from the thermodynamics of metabolism, it is not straightforward to predict more complex biochemicals, such as hormones and others used in ecological dynamics. Gases that are oddities of biochemistry (i.e., gases that are members of groups of chemicals with very few members made by life and that are usually made by few species) further support the concept that a variety of gases might be produced by life even in a planetary environment with known composition. For example, fluoroacetone is one of the few F-containing compounds made by life, and it is made by only a few species (see the reviews in O'Hagan and Harper, 1999; Gribble, 2002; also discussed in Section 3.2.3).
We now turn back to the path forward for exoplanets. To proceed, we envision taking molecules, or classes of molecules, and assessing (1) how likely they are to accumulate in an exoplanetary atmosphere, (2) whether or not they are spectroscopically detectable by a remote space telescope, and (3) whether or not there are false positives for the gas in question, in context with different exoplanetary atmosphere and surface conditions, and whether any other spectral features could be observed to support or refute a given gas as being of biogenic origin. The first point depends on the UV radiation environment from the host star that drives photochemistry, the atmospheric composition and mass, and the surface and ocean chemistry (including sources and sinks). A practical path forward is to take classes of molecules and determine if they are stable and volatile at non-STP conditions appropriate to the types of exoplanets under consideration and can accumulate in different planetary environments by integration into existing models of planetary chemistry and photochemistry.
The spectroscopic detectability of gases relies on molecular lines or bandhead estimates, which for many molecules do not exist yet. For example, the renowned HITRAN compilation of molecular line lists and cross sections (Rothman et al., 1998) includes about 50 molecules, those relevant for studies of Earth's atmosphere. A useful collection for exoplanets from the Virtual Planet Laboratory 9 has about 130 molecules [compiled from HITRAN, PNNL 10 (Sharpe et al., 2004), and personal collections]. A few expert research groups calculate molecular line lists from ab initio theoretical quantum mechanics calculations (e.g., Tennyson and Yurchenko, 2012), efforts that can take a year or more per molecule. In preliminary work (Zhan et al., unpublished data), we have found gas phase spectra for about 1000 of the 14,000 volatiles in our list. Most of our spectra come from the IR gas phase NIST online infrared database 11 (Sharpe et al., 2004), which is widely distributed and is the basis for most of a few dozen or so commercial online spectral databases. About one-third of the 1000 spectra on our list of volatile molecules found in NIST are actually digitized transmittance spectra data taken experimentally from decades ago (The Coblentz Spectral Collections 12 ) and are lacking path length or other information, preventing cross sections from being derived, and therefore are not useful as input to exoplanet model atmosphere codes. Many of the spectra noted in the above databases are limited in wavelength range. More efforts to calculate even crude spectral information are needed to advance the search for biosignature gases. Our database of all molecules will eventually include links to existing spectra.
Clearly, there is a long way to go in coming up with a list of all potentially useful biosignature gases. The concept for application to potential biosignature gases is illustrated in Fig. 4.

Schematic for the concept of considering all volatile molecules in the search for biosignature gases. The goal is to start with chemistry and generate a list of all small molecules and filter them for utility as biosignature gases. The first filter is for molecules that are stable and volatile in temperature and pressure conditions relevant for exo-Earth planetary atmospheres. Further filters relate to the gas detectability, aided by its Type classification and spectroscopic characteristics. Geophysically or otherwise generated false positives must also be considered. In the ideal situation, this overall conceptual process would lead to a finite but comprehensive list of molecules that could be considered in the search for exoplanetary biosignature gases, based on atmospheric models with surface source and sink input details, as well as strengths of spectroscopic features of molecules. Figure credit: S. Seager and D. Beckner. Figure originally published in Seager and Bains (2015).
The first chance for the search for biosignature gases on exoplanets is with the James Webb Space Telescope (JWST), scheduled for launch in 2018 (Gardner et al., 2006). JWST will be able to observe the atmospheres of small transiting exoplanets transiting small stars (e.g., Deming et al., 2009; Beichman et al., 2014). A dozen or more habitable zone planets are anticipated to be discovered by ground- and space-based surveys including the upcoming MIT-led NASA mission TESS, scheduled for launch in 2017 (Ricker et al., 2014). With its capability of near-IR spectral resolution of a few hundred to a few thousand, JWST will be capable of observing life-produced gases, if they exist. But which gases are best to search for out of our list of molecules will require in-depth study (Zhan et al., 2015).
4.2.2. Nonterrestrial biochemistry
Our list of all volatile, stable molecules is constructed based on the assumption that life uses water as a solvent but is otherwise agnostic to the chemistry of life. Specifically, life originating in very different physical or chemical environments might select different basic sets of atoms and bonds from which to build biochemistry. For example, life on Earth rarely uses the C-F bond (Gribble, 2002; Wackett et al., 2004), which is one of the strongest single bonds known. Life evolving at much higher temperatures could use more C-F bonds to compensate for the greater instability of molecules to hydrolysis at those temperatures, and consequently would generate fluorocarbon biosignature gases. Fluorocarbons are particularly interesting as signature molecules as they are anomalously volatile for their molecular weight.
Small, covalent molecules are built from atoms with between 2 and 5 valencies, which can form the networks of bonds needed to form different structures. However, if they are not to form unbounded structures (polymers), small molecules need monovalent atoms to “cap” the valencies. On Earth, hydrogen nearly always plays this role. Life in extremely low water environments might choose chlorine as a monovalent atom in molecule construction rather than hydrogen, a possibility drawn from studies that address the point of life in low-water-activity environments (Grant, 2004; Schulze-Makuch and Irwin, 2008) and work describing a hypothetical chlorinic photosynthesis (Haas, 2010).
We do not propose the above as specific biochemistries for which to search. Rather, we illustrate that we can conceive of environments where very different chemistry might make sense. In order to be prepared for the biosignatures that we would need to detect to find evidence for life in such environments, we want to be inclusive in the chemical space explored.
4.3. Potential applications to terrestrial biochemistry
Ultimately, we would like to be able to understand more precisely what a biochemistry on another world would look like, including the by-product gases detectable by a distant observer, beyond the speculations given above. Outcomes from this work suggest that such predictions might be at least partly feasible and that higher-order abstractions about the chemistry of life may be possible with an organized structure list of molecules produced and not produced by life.
4.3.1. Yet undiscovered metabolites
One basic application is to ask why some metabolites are not made by Earth life. Our list shows some unexpected gaps in our knowledge of molecules produced by life, molecules that are “missing” from the database. By “missing” we mean molecules not produced by life among a set of many structurally similar molecules that are known to be made by life. For example, there is no report of terrestrial life making CCl3I, although life does make CCl4, CBr4, and CHCl2Br (see Fig. 3). There is only one report of life making dimethyl ether (CH3-O-CH3), although many organisms make other volatile ethers. Focused research in the field or lab to search for organisms producing these molecules could be informative for the biochemical space of molecules produced by life.
4.3.2. Patterns in terrestrial biochemistry's use of elements
We notice that terrestrial life does not use all elements with equal frequency. We quantified the use of elements by life as follows. We postulate that the probability that a compound containing a particular combination of elements is made by life is the product of element-specific probabilities. Thus, for example, carbon and nitrogen have high probabilities (0.71 and 0.32, respectively), and fluorine has a very low probability (0.01). Purely based on these probabilities, the chances that a hydrocarbon or an amine is made by life are therefore high, a fluorocarbon low, and the chances that nitrogen trifluoride is made by life are essentially zero. Predictions of how many compounds containing a set of elements are made by life using these probabilities match the observed frequencies to r 2 = 0.92. In general, CHONS are commonly used; other elements are used less commonly depending (roughly) on their “distance” on the periodic table from these core elements. We do not have a theoretical explanation for this, apart from speculating qualitatively that the chemistry of life is adapted to handle carbon, hydrogen, oxygen, nitrogen, and sulfur atoms; hence the more chemically different from CHONS an element is, the more adaptation of that basic chemistry is required to handle that element (Petkowski, Bains, Zhan, and Seager, unpublished data). Notably, phosphorus is not in this high-probability group; despite its presence (as phosphate) in many metabolites, the chemical diversity of phosphorus biochemicals is very limited (Petkowski, Bains, Zhan, and Seager, unpublished data). Our observation shows that a systematic compilation of all molecules can reveal patterns about the chemistry of life that are different from those made by calculating the bulk composition of life (Chopra and Lineweaver, 2010) and might inform our understanding of why life has the chemistry that it does. Further work to explore these patterns is underway.
4.3.3. Patterns in chemical motifs for chemical scaffolding
The list of volatile, stable molecules as a type of systematic survey can also help identify patterns in Earth life's use of chemistry. In our analysis for the fraction of molecules in our list produced by life, we found that the large majority of the small molecules that life does not make are nevertheless made by living organisms as a chemical substructure contained in other, larger molecules. We call the chemical substructure, or fragment, a “chemical motif.” A good example of a chemical motif is dithioformic acid (CH2S2)—unstable on its own but present in many biological molecules as a chemical motif (Lim et al., 1998). We have found that including chemical motifs in the production of molecules by life more than doubles the fraction of molecules produced by life for the CNOPSH molecules of non-H atoms.
We have further found that molecules or motifs rarely or not at all produced by life appear to fall into distinct categories. For example, we have found that some chemical motifs are very rarely made by life, such as allenes and cumulenes (an allene is a molecule with two consecutive carbon atoms double bonded to each other; a cumulene is a molecule with three or more consecutive carbon-carbon double bonds). We have also found that some chemical motifs are never made by life, such as triply bonded phosphorus [P(III), trivalent phosphorus 13 ]. We believe that further analysis of a complete list of motifs cross referenced to our list of all small molecules, via chemical motifs, can have use in toxicity and pharmacology studies (Petkowski, Bains, Zhan, and Seager, unpublished data).
5. Summary
We have constructed a list of molecules up to N = 6 non-hydrogen atoms that are stable (in the presence of water) and volatile at STP. The list contains about 14,000 molecules: about 2500 are composed of the six biogenic elements, CNOPSH; about 900 are inorganics; and about 11,000 are halogenated compounds (a large set because of the large number of combinatorial possibilities of adding halogens to carbon skeletons). The list was constructed by a combinatorial approach and an intense database and literature search.
We further investigated which of the molecules in our list are known to be produced by life on Earth. About one-quarter of the compounds containing the six biogenic elements up to N = 6 non-hydrogen atoms are known to be produced by life on Earth. Very few of the inorganics are produced by life. A very small fraction of halogenated compounds are known to be produced by life. Even though dozens are found to be produced, for example, by seaweed species, there are thousands of possible N = 6 non-H-atom halogenated organic volatile molecules.
More specifically, for molecules composed of the six biogenic elements for N ≤ 6 non-hydrogen atoms produced by life on Earth, we found that about one-quarter (specifically, 622) were produced by life on Earth. As an aside, via this search we found that database sources of chemicals made by life are substantially incomplete for small molecules (Fig. 2). Our manual literature search revealed that about 60% of the biogenic compounds were not in existing online databases. This reveals a need in the scientific community for a better registry system for biogenic molecules.
The applications of the list are twofold. One is related to the future search for biosignature gases in exoplanetary atmospheres, and that is to identify all molecules that can accumulate in different types of hypothesized exoplanetary atmospheres that also have strong spectral signatures, with future work on false positives in an exoplanetary environmental context. The motivation for the biosignature gas application is that many gases on Earth that have been studied in the context of exoplanets (such as methyl chloride) appear to depend on accidents of ecology and the whims of evolution (such as fluoroacetone or stibine SbH3) and so may be very different on other worlds. Hence we are motivated to construct a list of all molecules that are stable and volatile to be considered as potential biosignature gases on exo-Earths.
Related to the search for biosignature gases, we stated that Earth's atmospheric gases to the parts-per-trillion level by volume are all produced by life (with the exception of the noble gases and the set of fluorocarbons that are entirely anthropogenically produced), though their dominant source may be abiotic.
The second application is for the search for patterns among the molecules or motifs produced or not produced by life for understanding of limits of terrestrial biochemistry. Given that only a few hundred organisms have been studied for their production of volatiles, out of the (probable) millions of species on the planet, this suggests that many more chemicals remain to be discovered as gases produced by life on Earth.
Our database of molecules is available for community use. Our current iteration of the database is in a form that allows for detailed classification of molecules with respect to included basic physicochemical properties and/or production by life. Yet the list of molecules is incomplete, both because of our cutoff factors in construction (including our proxy for volatility, STP, and the N = 6 non-H-atom cutoff) and the lack of information on biological occurrence. To this regard, the main future work is extending the database to larger molecules, for an exhaustive list of stable and volatile molecules. We emphasize the need for a community effort to register all discovered biological molecules in a single database. The community can download and also suggest additions to our database provided at
Footnotes
Acknowledgments
We thank MIT and the MIT Amar G. Bose Research Grant for support. We thank the reviewers for comments that improved the manuscript. We thank Charles Darrow, Victor Pankratius for useful comments, Ehsan Tofigh for computational support in the early phase of this work, and Dr. Jozica Dolenc from ETH Zürich, Informationszentrum (
Abbreviations Used
Appendix A
| Database | Literature reference (if any) | URL | Comment |
|---|---|---|---|
| Dictionary of Natural Products | Dictionary of Natural Products, CRC Press, CD version |
|
|
| ZINC | John J. Irwin and Brian K. Shoichet. (2005) ZINC—A free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182 |
|
Natural products subset of the database. Only available as Web page |
| Sigma catalogue |
|
Commercial catalog, structure of all molecules ≤150 Da molecular weight (Bret Daniel, private communication, 2015) (Sigma-Aldrich) | |
| Product catalogues from industrial gas suppliers | Compiled from Web catalogs of compounds provided by | ||
| Air Liquide ( |
|||
| Linde ( |
|||
| British Oxygen Corp. ( |
|||
| Matheson Trigas ( |
|||
| Reference books | Richard E. Lewis, Sr., editor. (2007) Hawley's Condensed Chemical Dictionary, 15th ed., Wiley, Hoboken, NJ. | From several chemical properties databases hosted by Knovel–apps. Knovel.com | |
| Knovel Solvents—A Properties Database ( |
|||
| Chemical Properties Handbook ( |
|||
| Dictionary of Inorganic and Organometallic Compounds ( |
|||
| James Speight, editor. (2005) Lange's Handbook of Chemistry, 16th ed., McGraw-Hill, New York. | |||
|
Yaws' Handbook of Antoine Coefficients for Vapor Pressure, 2nd electronic edition ( |
|||
| Ernest Flick. (1999) Industrial Solvents Handbook, 5th ed., Noyes Data Corporation, Westwood, NJ. | |||
| Merck Index ( |
|||
| Pherobase |
|
Chemical databases and other sources used for populating our list of stable and volatile chemicals.
Appendix B
| IUPAC name | Mol. formula | SMILES | Mol. weight (Da) | Boiling point (°C) | Life | Ref |
|---|---|---|---|---|---|---|
| Bromotrichloromethane | CBrCl3 | C(Cl)(Cl)(Cl)Br | 198.27 | 105(e) | N | — |
| Bromodiiodomethane | CHBrI2 | IC(I)Br | 346.73 | 221.5(p) | Y | 1 |
| Chlorotribromomethane | CBr3Cl | C(Cl)(Br)(Br)Br | 287.18 | 156(p) | N | — |
| Trichloroacetonitrile | C2Cl3N | N#CC(Cl)(Cl)Cl | 144.38 | 84(e) | Y | 2 |
| 1-Fluoropropane | C3H7F | CCCF | 62.09 | −3(e) | N | — |
| Tetrafluoromethane | CF4 | C(F)(F)(F)F | 88 | −130(e) | N | — |
| Cyanic chloride | CNCl | C(#N)Cl | 61.47 | 13(e) | N | — |
| 2-Bromo-1,1-dichloroethene | C2HBrCl2 | C( = C(Cl)Cl)Br | 175.84 | 109(p) | Y | 3 |
| Sulfur hexafluoride | SF6 | FS(F)(F)(F)(F)F | 146.06 | −64(e) | N | — |
| Trichloroamine | NCl3 | N(Cl)(Cl)Cl | 120.37 | 71(e) | N | — |
Columns are IUPAC name, molecular formula, SMILES description, molecular weight, boiling point (e = experimentally measured, p = predicted), production by life (Y/N), and references for molecules produced by life. References: (1) Gribble (2003); (2) Ballschmiter (2003); (3) Nightingale et al. (1995).
