Abstract
Potential biosignatures that offer the promise of extraterrestrial life (past or present) are to be expected in the coming years and decades, whether from within our own solar system, from an exoplanet atmosphere, or otherwise. With each such potential biosignature, the degree of our uncertainty will be the first question asked. Have we really identified extraterrestrial life? How sure are we? This paper considers the problem of unconceived alternative explanations. We stress that articulating our uncertainty requires an assessment of the extent to which we have explored the relevant possibility space. It is argued that, for most conceivable potential biosignatures, we currently have not explored the relevant possibility space very thoroughly at all. Not only does this severely limit the circumstances in which we could reasonably be confident in our detection of extraterrestrial life, it also poses a significant challenge to any attempt to quantify our degree of uncertainty. The discussion leads us to the following recommendation: when it comes specifically to an extraterrestrial life-detection claim, the astrobiology community should follow the uncertainty assessment approach adopted by the Intergovernmental Panel on Climate Change (IPCC).
Introduction
Due to important technological advances in analytical instruments and new space missions, we should be prepared for the detection of potential biosignatures of extraterrestrial life in the near future. Data from the surface of Mars, for example, will be abundant over the next 20 years as a consequence of rovers such as NASA's Perseverance, the Chinese National Space Administration's Zhurong, and the planned sample return missions. Data from exoplanets, including exoplanet atmosphere composition, will also be abundant over the next 20 years in particular because of the deployment of the James Webb Space Telescope. With each potential biosignature, the degree of our uncertainty will be the first question asked. How sure are we that the signal actually derives from extraterrestrial life? Despite its clear importance, the challenge of assessing uncertainty in biosignature detection has thus far only briefly, and only very recently, been seriously addressed in the literature (Almár and Race, 2011; Green et al., 2021; Meadows et al., 2022; NASEM, 2022b). Moreover, the “problem of unconceived alternatives” discussed by philosophers of science (Stanford, 2006) is highly relevant but almost completely absent from the astrobiology literature.
This article highlights one particularly significant, underappreciated challenge for the uncertainty assessment frameworks proposed in the literature, especially Green et al. (2021) and Meadows et al. (2022). Specifically, we bring to bear the problem of unconceived abiotic explanations for phenomena of interest. Drawing on the Network for Life Detection and the Nexus for Exoplanet System Science (NfoLD/NExSS) community report (Meadows et al., 2022, p 26), we stress that articulating our uncertainty requires an assessment of the extent to which we have explored the relevant possibility space. It is argued that, for most conceivable potential biosignatures, we currently have not explored the relevant possibility space very thoroughly at all. Not only does this severely limit the circumstances in which we could reasonably be confident in our detection of extraterrestrial life, it also poses a significant challenge to any attempt to quantify our degree of uncertainty. The discussion leads us to the following recommendation: when it comes specifically to an extraterrestrial life-detection claim, the astrobiology community would do well to adopt a thoroughly time-tested framework—that utilized within Intergovernmental Panel on Climate Change (IPCC) reports.
The Challenges of Known and Unknown False Positives
Increasingly, we have at our disposal technologies and methods for analyzing the composition of exoplanet atmospheres (Catling et al., 2018). Suppose that one day soon we detect oxygen in the atmosphere of an exoplanet. 1 Moreover, that exoplanet is a rocky planet in the habitable zone of a “friendly” star. Is it a biosignature? There are a range of definitions of biosignature out there (Des Marais et al., 2003, p 234; Catling et al., 2018, p 710; Schwieterman et al., 2018, p 666); the important question to ask is how sure we are that the oxygen we have detected has a biotic cause. At one time there were no plausible abiotic explanations of accumulation of oxygen in the atmosphere of a planet. That is no longer the case; we now know of several different abiotic pathways to such oxygen accumulation (see, e.g., Meadows et al., 2018). These abiotic explanations—previously unconceived—would now give us pause for thought if oxygen was detected (greater pause for thought than before we had such abiotic explanations).
This is a cautionary tale. And in fact there are many such cautionary tales or “lessons from history” in the recent decades of the field of astrobiology (Green et al., 2021, p 575; Meadows et al., 2022, p 51). The worry is that we tend to jump too quickly from “There is no known abiotic explanation of φ” to “φ is probably caused by life.” This is an inference that has been made on several occasions in the history of astrobiology. The following three examples serve to demonstrate that this is not merely a hypothetical concern.
First consider Sinton's (1957) interpretation of a curious, apparently martian absorption spectrum as evidence for the presence of organic molecules and even “vegetation” on Mars. He argued that the evidence made it “extremely likely that plant life exists on Mars” (p 239). Sinton (1959) continued the argument, dismissing carbonates as the cause and concluding that “[T]hese bands are most probably produced by organic molecules” (p 1237). Several years later, Shirk et al. (1965) put forward an abiotic explanation, and soon after it turned out that the bands were due to deuterium in the atmosphere of Earth, not Mars (Rea et al., 1965). Sinton's conclusions were surely too hasty, and the form of expression too confident; Dick (2020, p 697) purported that he was “undoubtedly affected by preconceived ideas.”
Next consider the famous case of ALH84001, a martian meteorite discovered in Antarctica in 1984. McKay et al. (1996, p 929) accepted from the start that individual characteristics of the meteorite were easy enough to explain abiotically. But they contended that the full suite of characteristics—taken as a package—could not be explained abiotically and was thus good evidence of past microbial life on Mars. This conclusion was gradually undermined by work exploring “unconceived alternatives” (Golden et al., 2001; Martel et al., 2012). Arguably, the final word on the matter came in 2022 when Steele et al. (2022) presented evidence that mineral carbonation and serpentinization reactions on early Mars were the cause of the claimed evidence of microorganisms.
Finally, consider the work of Nutman et al. (2016). The authors proposed the existence of stromatolites in the Isua supercrustal belt of southwestern Greenland that dated back 3700 million years, extending the fossil record back by over 200 million years. To reach this conclusion, the authors addressed four areas whereby the structures fail to be explained by known abiotic processes, and hence, “on these grounds, we rule out an abiogenic origin for Isua stromatolites” (p 3). But then, 2 years later, Allwood et al. (2018) offered a plausible abiotic explanation for the proposed stromatolites. The explanation largely attributes the apparent stromatolites to structural deformation and chemical alterations of layered rock. Nutman et al. (2016) might still be right—see their response (Nutman et al., 2019)—but the existence of an abiotic explanation will certainly give the community much pause for thought.
Given such examples (and there are many more 2 ), we need to be highly cautious in a situation where we currently cannot think of a plausible abiotic explanation for the phenomenon in question. A scholar heavily influenced by the noted cautionary tales might even expect that, in due course, a plausible abiotic explanation of a current biosignature will likely be developed. It is not obvious when such an expectation would be irrational.
To illustrate, consider now a much stronger candidate for a genuine biosignature: oxygen-methane disequilibrium. Such a disequilibrium in the atmosphere of a planet is readily explained by the presence of life (Krissansen-Totton et al., 2018). We have a strong oxygen-methane disequilibrium in the atmosphere of our planet, for example, and it is caused by life. We do have plausible abiotic stories to tell concerning very weak oxygen-methane disequilibrium in the atmosphere of a planet, but not (yet) concerning strong oxygen-methane disequilibrium, such as that found on Earth (Simoncini et al., 2013; Thompson et al., 2022).
Suppose now that we detect such a signal. It would certainly be called a “biosignature” by some scientists and journalists. But how confident could we be that it was caused by life? The fact that we cannot think of an abiotic cause would no doubt tempt some commentators to conclude that it is probably caused by life. In the work of Krissansen-Totton et al. (2018), we find the following: The methane flux required to sustain observed quantities of methane in the modern Earth's oxidizing atmosphere is greater than what abiotic processes could plausibly provide, and thus, biological methane leakage must be invoked to explain the persistent disequilibrium.
But we might worry that the word must is too strong, especially when we look at other cases—such as those sketched above—where abiotic explanations were previously unconceived and then later conceived. The fact that we have a range of such examples from the history of astrobiology constitutes a cautionary tale: perhaps we should really expect that a plausible abiotic explanation of a strong oxygen-methane disequilibrium will one day be developed.
This is one consideration. Another comes from considering the extent to which we have explored the relevant possibility space. Consider two extremes: one where we have not even started exploring possible abiotic explanations of an oxygen-methane disequilibrium, and another where we have explored the topic for decades without finding a plausible abiotic explanation. In the first scenario, there should be no excitement at all if we detect an oxygen-methane disequilibrium since, as far as we know, highly plausible abiotic explanations may exist. In the second scenario, we should be very excited, since we are approaching the point where we are sure that such a disequilibrium must have a biotic cause. Thus, to judge how excited we should be about detecting an oxygen-methane disequilibrium, we need to judge where we are on the spectrum of exploration, with the two scenarios given above at its extremes (Fig. 1).

The spectrum of exploration; the extent to which we have explored a certain possibility space, ranging from “not yet started” to “fully explored.” In many scenarios, we will not actually know when the space is half-explored or fully explored.
As Meadows et al. (2022, p 26) noted, “[I]f the scope of possible abiotic explanations is known to be poorly explored, it suggests we cannot adequately reject abiotic mechanisms.” Conversely, if it is known to be thoroughly explored, we probably can reject abiotic mechanisms.
When it comes to the specific case of atmospheric methane, Krissansen-Totton et al. (2018) stated that, “On the basis of current understanding, the conditions required to generate large fluxes of abiotic methane are specific and implausible” (p 8). But they say nothing about the extent of our current understanding. Thus, even though they discuss a handful of possible abiotic methane sources, the reader is left in the dark on the question of just how much relevant possibility space might still be “out there” waiting to be explored. But if we do not know this, we cannot assess the relevant uncertainty.
To put it another way, there are two completely different problems here—currently not clearly distinguished in the literature 3 —both coming under the general heading of “abiotic mimics” or “false positives.” On the one hand, it can already be extremely challenging to consider all currently known potential abiotic mimics for a given signature and adequately rule them out. The extent to which this has been done in a given case is going to be challenging to measure but needs to be measured if the degree of uncertainty is to be adequately articulated. On the other hand, even if one has thoroughly ruled out all plausible abiotic mimics given current knowledge, we cannot articulate our degree of uncertainty—or “confidence of life detection”—until we have also somehow factored in the extent to which we have explored the space of possible abiotic mimics. Ruling out all known abiotic mimics is little comfort if our knowledge of possible abiotic mimics is in its infancy.
We argue that astrobiology is still a young science, and research into “abiotic mimics” is in its infancy. That is, in most relevant contexts, we are much closer to the “under-explored” end of the spectrum than the “thoroughly explored” end. Consider McMahon and Cosmidis (2022) writing on “false biosignatures on Mars”: [K]ey evidence [for confirming possible biosignatures] will come from the investigation of abiotic physicochemical systems and their capacity to mimic the forms and properties of life. Yet this area of enquiry has received rather scant and unsystematic attention from astrobiologists, who have tended to focus their published work on expanding our knowledge of life's signatures rather than its abiotic mimics. […] The reliability of any detected biosignatures on Mars therefore depends crucially on our understanding of the abiotic processes that might mimic them. […] [B]iogenicity criteria are unable to discriminate sensitively and reliably between biosignatures and pseudobiosignatures unless they are grounded in extensive knowledge and understanding of both classes of phenomena. […] However, most known varieties of pseudobiosignature have not been characterized or understood in sufficient detail for this to be possible. Moreover, given the haphazard and unsystematic way in which varieties of false biosignature have so far been identified, we can only assume that many others remain undiscovered.
4
The authors here considered abiotic mimics on Mars, similar to the case of martian meteorite ALH84001. But biosignatures associated with exoplanet atmospheres are not disanalogous in the relevant respects. Here, too, we may reasonably assume that many “false biosignatures” remain undiscovered.
Thus, we must embrace the thought that we have not thoroughly explored the relevant possibility space. We simply cannot say whether an abiotic mimic of a strong methane-oxygen disequilibrium is a serious possibility. The fact that we currently cannot imagine such an abiotic mimic seems like scant reason to believe that no such mimic exists. When we hear scholars say of oxygen and methane, “that combination is very hard to explain [abiotically],” 5 perhaps we should take that as saying more about our current state of knowledge than about what is or is not possible.
In fact, Krissansen-Totton et al. (2018) provided one example of significant abiotic methane production: “For terrestrial planets with a more reducing mantle than Earth, significant CH4 outgassing is conceivable” (p 7). They go on to claim that such an abiotic methane source could be identified via the presence of CO, on the grounds that “CO has few abiotic sinks.” Thus, the issues multiply: How thoroughly explored is the possibility space of abiotic CO sinks? Not very thoroughly at all, we would suggest, but the more fundamental point is that the extent of possibility space exploration needs to be included in any (un)certainty assessment.
Green et al. (2021) are to be commended for offering an initially plausible “Confidence of Life Detection” scale, the “CoLD” scale (Fig. 2). The proposed scale runs from 1 to 7, where 7 indicates very high confidence of detection of extraterrestrial life: the presence of life has been confirmed. It is explicitly put forward as a framework for clear communication with the general public; it is a “progressive one-dimensional scale” that can serve to communicate, with a number between 1 and 7, how confident we are that we have detected extraterrestrial life. We may now ask how it handles the two false-positive challenges noted in the previous section, since both must be factored into any certainty calculation.

The CoLD scale, intended as a measurement of the degree of confidence for a particular life-detection claim. From Green et al. (2021), with permission.
Most obviously, at Level 4 of the CoLD scale we find,
All known non-biological sources of signal shown to be implausible in that environment.
As noted in the previous section, the extent to which we have ruled out known non-biological sources can itself be extremely challenging. But what of currently unconceived abiotic mimics? If research on abiotic mimics is in its infancy, then the fact that all known non-biological sources have been ruled out is very weak evidence that there is not a non-biological source. This thought is strengthened by the fact that there have been unconceived abiotic mimics in the past (later conceived).
We argue that meeting the “Level 4” requirement on the CoLD scale (having already met Level 1–3 requirements) can correspond to both (i) very high confidence (when we have more or less exhausted the space of possible abiotic mimics and shown them all to be implausible) and (ii) very low confidence (when scientific knowledge of possible abiotic mimics is in its infancy). In case (i), we imagine that we have exhausted the relevant possibility space (many teams working over many decades), and we have not found any plausible abiotic explanations of the phenomenon in question. In that scenario, the fact that all known abiotic sources of the signal are implausible entails that a biotic cause is the only plausible explanation; this is the “no alternatives argument” (Dawid et al., 2015) or what Cowie (2022) calls the “argument from elimination” in the context of the ‘Oumuamua debate. On a scale of confidence with 7 steps, this should be close to a “7.” 6 In case (ii), we imagine that we have not even started exploring the relevant possibility space. In this scenario, the fact that all known abiotic sources of the signal have been ruled out (Level 4 met) means nothing. As far as we know, there may be many plausible abiotic explanations.
If meeting the “Level 4” requirement on the CoLD scale can sometimes mean “high confidence” and sometimes mean “low confidence,” then this is a potential point of confusion for the general public. The basic idea of the 1–7 CoLD scale framework was to map distinct and specifiable scientific developments to the “certainty continuum,” 7 where a bigger number means that one is more certain. The problem of unconceived alternatives poses a serious challenge to this goal.
Any criticism of the CoLD scale should not be overstated, since it was merely “the beginning of an important dialogue” (Green et al., 2021, p 575) and “Discourse within the broader community should modify or supplant the scale” (p 578). In 2021–2022, the option to “supplant the scale” was taken up in a serious way by the “Standards of Evidence for Life Detection Community Workshop” (July 19–22, 2021), which ultimately led to a many-authored white paper: Meadows et al. (2022). Here the CoLD scale is left behind and a new framework proposed. The NfoLD/NExSS community of scholars behind this white paper apparently felt uncomfortable with the Green et al. (2021) attempt to map specifiable scientific developments to a one-dimensional numeric scale representing overall “confidence.” 8 Though they do not explicitly criticize the CoLD framework, they do present their own framework based on five “framework questions” (see Fig. 3) and state:

The NfoLD/NExSS framework for life-detection assessments. The five questions are: (1) Have you detected an authentic signal? (2) Have you adequately identified the signal? (3) Are there abiotic sources for your detection? (4) Is it likely that life would produce this expression in this environment? (5) Are there independent lines of evidence to support a biological (or non-biological) explanation? From Meadows et al. (2022), with permission.
While the framework questions are presented in order, there was a strong sense at the workshop that the application of the steps do not need to be in a particular order, and in some cases may be difficult to implement linearly. (Meadows et al., 2022, p 7)
The rejection of “linearity” (other than a distinction between “Level 1” and “Level 2”—see Fig. 3) and preference for “iteration” are stressed throughout the white paper.
We may now ask: Where does this leave the “continuum of confidence,” or “certainty continuum” that Green et al. (2021) took to be at the heart of their framework? We agree with Meadows et al. (2022) that scientific methodology is far less linear, and far more iterative, than the CoLD framework allowed. And yet it remains the case that confidence of life detection should be something one can in principle map onto a linear scale ranging from “no confidence at all” to “full confidence.” With the move from the CoLD scale to the NfoLD/NExSS framework, we correct the presentation of scientific methodology, at the expense of de-prioritizing the goal to effectively communicate confidence of life detection. The NfoLD/NExSS white paper leaves this question behind, except to say that “additional community discussion” will be needed to develop a “ranking or numerical scheme for certainty in biosignature detection and interpretation” (p 60). Apparently, the best we can do is say that we can be confident to the extent that questions 1–5 in the NfoLD/NExSS framework have been satisfactorily answered.
Bayesian techniques are often put forward to quantify uncertainties in science quite generally. And Bayesian techniques are often stressed in the astrobiology literature in particular. 9 In principle, the Bayes formula can be used: we can input values into the formula in order to determine the probability of life, given the evidence, where in this particular case the evidence is a strong methane-oxygen disequilibrium (without accompanying CO). But do we know how to fill in the terms in the equation? If we do not, then we have a case of obscurum per obscurius, 10 but applied to assessing uncertainty: employment of the equation introduces as much or more uncertainty than the uncertainty we hoped to address.
Here is the relevant Bayesian formula, where we have this new evidence E and we want to know whether we have really detected life L:
Perhaps the most difficult thing we need to know is the term p(E│¬L), which asks for the probability of a strong methane-oxygen disequilibrium (the evidence E) caused by some abiotic means (not life, ¬L). On the one hand, we (the relevant scientific community) cannot currently think of any such abiotic story, and this might suggest a very low probability. On the other hand, we have a bunch of examples, from the history of astrobiology, of unconceived abiotic explanations of phenomena being developed where previously the only known explanation involved the presence of life. Moreover, we know that we are very far from exhausting the possibility space of abiotic mimics (recall the point, above, about abiotic CO sinks). This all needs factoring in.
As discussed, it seems reasonable to say that we have only just started exploring abiotic mimics. Thus, the responsible thing to say to someone who demands a value for p(E│¬L) seems to be, “We just don't know; we haven't done the research.” If we are subjective Bayesians, then we have the option to input a subjective value for p(E│¬L), and if we really have no idea, we might choose 0.5. But this seems controversial, since we really have no idea if the probability is anywhere near 0.5. As Stanford (2011, p 898) wrote in another context, The austere Bayesian apparatus does promise to allow us to formally integrate the confirmational significance of various diverse forms of evidence, but this remains a promissory note when we have no way to responsibly determine likelihoods. (emphasis added)
If an individual insists that 0.5 is her actual credence for p(E│¬L), then the posterior probability merely has meaning for her as an individual and does not say anything about the actual, objective probability that the signal has a biotic cause.
Another option is to use imprecise probability theory, essentially introducing an interval for p(E│¬L). We might even conduct a survey of community opinion and (discounting outliers) use the resultant range of values to define the interval. Of course, many of those surveyed might reasonably respond “I have no idea until much more research has been done,” which could only correspond to a maximum interval of [0,1].
The Bayes approach is not completely devoid of information. It does show us clearly that, in a scenario where the possibility space has definitely been exhausted and no plausible abiotic explanation has been found, p(E│¬L) would be zero, and thus our posterior would be 1, whatever our prior p(L)—this is the “no alternatives argument” again (Dawid et al., 2015). But most other scenarios present immediate problems. In a more realistic scenario where we have (roughly speaking) half-explored the possibility space, it is not clear what would be a reasonable value for p(E│¬L). In the half of the space that we have not explored there could either be zero plausible abiotic stories to tell or several.
A Way Forward—the IPCC Uncertainty Framework
One option for moving forward is to try to assess the extent to which the relevant possibility space has been explored. For some such spaces, the parameters are known and well-defined, such as those considered by Harrison et al. (2013), who deliberated the limits for life under multiple extremes. Their figures 1–4 show not the limits for life under multiple extremes (temperature, pH, pressure, salt concentration) but rather our knowledge given the current extent of exploration of the space (see Fig. 4 for an example). One can more or less see, in the figures, the extent to which the space has been explored. 11

But in the case of possible abiotic mimics for, say, a CH4–O2 atmospheric disequilibrium, it is not nearly as clear how to define the possibility space. Probably it should not be conceived literally as a “space,” comparable with the figures of Harrison et al. (2013); probably we should think in terms of a list of possible abiotic causes. 12 In which case, how would we know that the list was complete? Or even halfway complete? In a scenario where we have only halfway explored the possibilities, there would still be no responsible value to input for p(E│¬L), as discussed in the previous section. Inputting a responsible value for this term requires that our exploration of the space of possibilities has been thorough (ideally exhaustive).
Thus, we propose a rather different approach, drawing on the uncertainty language framework (see Fig. 5) used in IPCC reports, including the most recent set of reports known as “AR6” (reports of Working Groups I, II, and III; see

The IPCC uncertainty language framework. The x-axis corresponds to the degree of robustness of the evidence for a claim (assessed by a working group), and the y-axis represents the extent of agreement of the scientific community regarding the same claim. From Mastrandrea et al. (2010), with permission.
The ocean has absorbed about 30% of the anthropogenic carbon dioxide, resulting in ocean acidification and changes to carbonate chemistry that are unprecedented for at least the last 65 million years (high confidence).
Note the term “high confidence.” Most statements in this report include some reference to confidence, with six variations: low confidence low-medium confidence medium confidence medium-high confidence high confidence very high confidence
One finds almost exactly the same framework in the work of Moss and Schneider (2000, pp 44, 45), Figs. 3 and 4, where “continuum of confidence” was analyzed in preparation for the IPCC Assessment Report 3 (AR3), published the following year in 2001. If this uncertainty framework can be considered adequate for the IPCC reports over a period of more than 20 years—when the stakes are so high, and clear communication with laypersons and policymakers could not be more important—then might it not also work adequately for astrobiologists eager to adopt an exactly similar continuum, but now for confidence of (extraterrestrial) life detection?
Here, we do not intend a full analysis of pros and cons of the IPCC uncertainty framework. Over the past 20+ years—including four IPCC reports AR3, 4, 5, and 6—it has been heavily scrutinized (Mastrandrea and Mach, 2011; Mastrandrea et al., 2011; Budescu et al., 2012, 2014; Mach et al., 2017; Rehg and Staley, 2017; Molina and Abadal, 2021; Kause et al., 2022). Whatever your view, in the face of extraordinary scrutiny and the highest possible stakes, this uncertainty assessment framework has stood the test of time.
One possible objection might stem from the thought that, while the IPCC uncertainty framework is acceptable for IPCC statements, when it comes to confidence of life detection, we would prefer to somehow add detail, for example concerning the extent to which known abiotic mimics have been ruled out. This is precisely what Green et al. (2021) attempted, of course, with the CoLD framework. However, it is precisely such details that lead to criticism and community disagreement. As NASEM (2022b, p 1) note in their review of Meadows et al. (2022), “The diversity of opinion from the broader community indicates that universal adoption of any one framework or scale would be challenging.” They later refer to “strong opinions” and “lack of community consensus” (p 5). Thus, if we want to bring the astrobiology community together—as the earth sciences community has come together vis-à-vis uncertainty assessment in IPCC reports—we may want to avoid “overly structured frameworks” (NASEM, 2022b, p 18). We agree that “Any proposed framework should be kept as general as possible” (NASEM, 2022b, p 19). In other words, the lack of structure and detail in the IPCC framework should be considered a strength, not a weakness.
Are we offering here yet another framework to compete with both the CoLD scale and the NfoLD/NExSS framework, thus potentially further fracturing the community? We do not think so. The NfoLD/NExSS framework still has its place, since it was never seriously put forward as a way to “read off” confidence of life detection. Instead, it serves a different function: that of describing/prescribing good-practice scientific methodology when a new potential biosignature is put forward. Thus, we tend to agree with the National Academies of Sciences, Engineering, and Medicine (NASEM, 2022b, p 19) when they write of the NfoLD/NExSS framework: “It could also be argued that approaching all five of these proposed questions in any order is just the execution of the standard scientific process rather than a new assessment framework.”
It may be asked: How would the astrobiology community's adoption of the IPCC uncertainty framework actually work, in practice? This cannot be the place for a full answer to this question, but we can make some initial suggestions, as follows. Imagine that a committee (not unlike an IPCC “Working Group”) has been put in place to assess the uncertainty vis-à-vis a particular “biosignature” that has been put forward. Making use of the IPCC framework, that committee is required to think not only about the “first-order” scientific evidence for the claim under scrutiny (x-axis of Fig. 5) but also the “second-order” contribution coming from the extent of community agreement or “consensus” (y-axis of Fig. 5). This ensures that the committee considers where the community stands on the matter. This is worthwhile, since there may be cases where, although the evidence seems robust to a particular team of scientists, for some reason significant disagreement within the community remains. As Mastrandrea et al. (2011, p 678) put it, “The degree of agreement is a measure of the consensus across the scientific community on a given topic and not just across an author team.” This constraint limits the circumstances in which a team can say they are “confident.” It prevents them from being too confident too soon since it will take some time for a community opinion to take shape. If we cannot say anything concrete about “agreement,” we cannot (yet) say anything about “confidence,” whatever we might think of the (first-order) evidence. 14
What about possible abiotic mimics? How do they get factored into the uncertainty assessment? Consider again the particular case of a CH4–O2 disequilibrium in the atmosphere of an exoplanet. To assess the degree of confidence, our hypothetical committee must consider (i) the first-order scientific evidence and (ii) the extent of agreement in the community of experts. When it comes to (i), a judgement needs to be made concerning the extent to which available evidence links a CH4–O2 disequilibrium with life. The consideration of possible unconceived abiotic mimics is considered implicitly within this judgement. While some scientists might jump too quickly from “There is no known abiotic explanation” to “It is highly likely to be caused by life,” others will be more cautious, and this expresses itself in the measure of agreement—a lack of consensus. The final judgement concerning our overall degree of confidence comes from combining (i) and (ii) and considering where we land on the “Confidence” chart.
To see more fully how it would work in practice, consider the case of Nutman et al. (2016) and Allwood et al. (2018) on the proposed 3.7-billion-year-old stromatolite. Imagine our committee, shortly after the publication of Nutman et al. (2016), attempting to fill in “evidence” and “agreement” to reach a degree of confidence. Even if they considered the evidence strong, they would not be able to give any score for “agreement,” since at that time there had not yet been the kind of peer scrutiny required to establish the extent of agreement. Filling in this score requires waiting for scrutiny to take place. And, once that scrutiny did take place, obvious community disagreement was expressed by way of Allwood et al. (2018). This case demonstrates how the IPCC uncertainty framework will sometimes deliver an undefined confidence score, since in the early stages of a new research program we will not yet have any value for “scientific agreement.” We see this as a virtue of using the IPCC uncertainty framework—when asked “How sure are we that this is extraterrestrial life?” we should indeed sometimes reply “It's too early to say.” Thus, adoption of the IPCC framework requires scientific community activity over a significant period of time. This is just as it should be, we contend, since sensible scores for overall confidence need to be determined by a process that washes out individual biases, perspectives, and blind spots that any individual scientist or any individual team of scientists may have. 15
These are early thoughts on the adoption of the IPCC uncertainty framework within the field of astrobiology and with application to “confidence of life detection” in particular. A key question remains how, exactly, the IPCC framework is supposed to factor in the possibility of unconceived abiotic mimics. Application to a case such as the proposed 3.7-billion-year-old stromatolites is easy, since an abiotic explanation was so easy to find (within 2 years), as soon as the wider scientific community turned its attention to the case. In terms of the confidence scale, we move from “undefined” to “low confidence” over the course of 2016–2019 (not very low confidence, given the response: Nutman et al., 2019).
There are important questions regarding how the framework might apply to other historical examples, such as that of ALH84001, which is notable both for the complexity of the evidence involved and the slow accumulation of comprehensive abiotic explanations. In the years following the publication by McKay et al. (1996) announcing the possible evidence of life on Mars, the community reception was mixed. As Dick (2020, p 705) put it, “the stakes were high and the skeptics numerous.” This suggests that the “agreement” score would never have gotten beyond “low.” Similarly, the various lines of evidence were individually and collectively ambiguous, and it is hard to see how the “evidence” score could get beyond “limited” (see Fig. 5). Thus, in this case we propose that the overall confidence score would have started out low and then over the course of 1996–2022 would have gotten gradually lower in a series of steps coinciding with important publications such as Golden et al. (2001), Martel et al. (2012), and Steele et al. (2022). The final resting place for “confidence” is extremely low.
What would the framework say about a much harder case, such as an oxygen-methane disequilibrium biosignature? It must be accepted that many in the scientific community would be tempted by the move from “No currently conceived plausible abiotic explanation” to “It's probably life.” But a scientific community is heterogeneous—some scientists are more cautious than others 16 —and some proportion of the community would no doubt be more open to possible, currently unconceived, abiotic explanations. We propose that there is no better way to measure the appropriate degree of uncertainty vis-à-vis possible abiotic mimics than to distil our assessment from the cut-and-thrust of scholarly debate—at conferences and in the professional literature—in the years following the detection of the disequilibrium. A committee tasked with determining a confidence score would then hope to factor in the possibility of abiotic mimics via consideration of the degree of community disagreement, following substantial community debate (which may or may not take several years, depending, for one thing, on the overall measure of community attention afforded to the issue).
Despite the argument from community heterogeneity, a worry may still linger that the scientific community would be skewed toward overconfidence, since the inference from “No currently conceived abiotic explanation” to “It's probably life” is tempting, and scientists are not used to factoring into their evidential assessments cautionary tales from the history of science, such as those sketched above, in Section 2. A partial solution to this would be to increase the interaction of scientists with science scholars (philosophers, historians, sociologists, anthropologists), who would offer such perspectives. While cross-disciplinary interactions in the broad field of astrobiology do take place to a certain extent (e.g., at AbSciCon), much more could be done to enrich interdisciplinary learning and dialogue (see Denning and Dick, 2019, especially Section 5.0 “Recommendations”).
This article makes three positive proposals. First and foremost, that the astrobiology community, searching for a way to assess biosignature uncertainty, would profit from adopting the time-tested IPCC uncertainty framework, and that this would help address challenges faced by the assessment frameworks found in the works of Green et al. (2021) and Meadows et al. (2022). The IPCC framework can be adopted alongside another framework such as that found in the work of Meadows et al. (2022), with the latter interpreted as describing, or perhaps prescribing, good-practice scientific methodology (key questions to be asked and answered) when a potential biosignature is put forward. Second, that if we want to assess biosignature uncertainty more rigorously than is currently possible, we need much more data on the topic of abiotic mimics (in the spirit of Rouillard et al., 2018). This is the only way we will be able to say something meaningful concerning what the National Academies of Sciences, Engineering, and Medicine (NASEM, 2022a) call “the reliability of a biosignature” (Q11.2). Third, we argue that astrobiologists who make biosignature judgments often need to factor in considerations that are best described as historical, philosophical, or sociological. This adds to calls for greater support (e.g., funding) for meaningful interdisciplinary engagement in the field (Denning and Dick, 2019, Section 5.0).
Footnotes
Acknowledgments
An early version of his paper was presented at the Isaac Newton Institute for Mathematical Sciences, in Cambridge, UK, in February 2022; huge thanks to delegates for helpful feedback and to Peter Coveney for the invitation. Thanks also to EURiCA project reading group participants for helpful discussion, especially Catriona Sellick, Martin Ward, and Chris Greenwell. For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.
Authors' Contributions
All authors contributed equally to the research; P.V. wrote the article.
Funding Information
This work is part of the EURiCA project: ‘Exploring Uncertainty and Risk in Contemporary Astrobiology’, funded by a Leverhulme Trust Research Project Grant (RPG-2021-274).
Author Disclosure Statement
The authors declare no conflict of interest.
