Abstract

Commentary
In humanitarian crises we fully recognize the unsurmountable challenges in obtaining the right information of the right quality at the right time from the right people. The affected populations are often hard to reach, and the data we need to plan practical and appropriate solutions are difficult to get. For many obvious reasons—urgency, lack of local knowledge, convenience, and cost—critical information and analysis can be omitted, lost, or distorted.
In data science applications, especially in public health research and practice, we encounter missing data in a variety of ways: missing observations, missing variables, missing respondents, missing populations. We call this “data missingness.” We frequently trust that we have the tools to address this missingness, be it in the initial randomized design that should guarantee a representative sample or sophisticated statistical approaches that allow us to account for missing observations. We consider the effect of missingness addressed by ignoring the fraction of data with missing values or imputing the replacements, by ensuring (in a statistical sense) that missingness is inconsequential.
All of these approaches to dealing with missingness result in minimal discourse on the topic and have no effect on statistical inferences and conclusions. But at the core of our “solutions” is the implicit and unverifiable assumption that missing data is rare and when data is missing it is missing at random—meaning the missingness itself is not correlated to the reason that it is missing and hence free of bias. However, missingness is rarely if ever truly random. If we start asking questions about why data are missing, the answers often reveal at least the suspicion of bias, which might be of relevance to the research question.
When it comes to data in the humanitarian field (and more broadly), missingness is often directly related to marginalization and vulnerability, or to social, economic, or political structural problems that prevent the collection and sharing of the needed data. This type of missingness is called structural missingness in the recognition that certain people’s exclusion from the data is directly related to and reinforces existing societal challenges and inequalities. 1
Convenience Drives Designs and Decisions
Take, for example, the multitudes of surveys carried out in humanitarian contexts that are meant to monitor and predict disasters as well as identify individuals, households, and communities for the targeting of aid and programming. While some form of randomization is implicit in the survey design, most random sampling in these contexts is still convenience sampling. We used to call this tarmac bias, the exclusion of hard-to-reach populations, or those not directly accessible or near a road or point of contact.
But structural missingness is far more pervasive than that. For example, mobile populations, such as pastoralists, are rarely included in surveys with significant consequences. While some research has shown the importance of pastoralist systems for livestock production, this is poorly captured in official statistics, resulting in a misunderstanding of the pastoralist system and counter-productive policies. 2 Similarly, when we treat a sedentary village as our sampling unit, we accidentally miss out on households or individual household members that are not physically present. Research in Chad highlights the absence of women from data collection because they spend months at a time in temporary settlements preparing land for planting. 3 Therefore, the input of an entire group is missing, and because their needs are not understood during the needs assessment, they benefit little from programs designed using village-based approaches.
Availability by Choice
Even when more participatory methods are used, elite capture can prevent the inclusion of vulnerable groups. When it comes to famine analysis, mortality data is frequently missing, and geographic coverage is limited both due to security constraints and bureaucratic obstacles. Often, few data are available from the worst affected areas, resulting in the severity of the problem being underestimated. Data availability is not equal to data usability. When data are considered of poor quality, below some predefined standards, recorded in a “wrong” format or with “wrong” instruments, these records are excluded from the analysis. 4 Finally, there is key information that is simply absent from standard data collection protocols such as information on social networks and indigenous systems despite their critical contribution to resilience in the face of disasters. 5,6
Dire Consequences and Potential Solutions
Structural missingness recognizes that some people, populations, and types of information are excluded from data collection and hence programming and policy. This exclusion of people and populations reinforces marginalization in multiple ways. First, they are not getting the humanitarian support they need; second, they are then excluded from the discussion on improving responses to humanitarian crises; and third, they are not represented in the knowledge being produced further reinforcing their marginalized status. 7 The exclusion of information also marginalizes local or indigenous networks and systems that people often rely on. Marginalizing these systems can lead to duplication of efforts, or worse inadvertent undermining of effective solutions. Thus, by ignoring structural missingness, we are rendering certain populations and systems invisible and voiceless.
So, how can we better capture and address structural missingness? Greater transparency around and sharing of data is critical. We need to continue to invest in data dashboards that meet a set of well-grounded principles and metrics. 8 Current examples of data dashboards in the humanitarian field include the World Food Program’s Hunger Map Live dashboard, UNICEF’s and World Bank’s Joint Child Malnutrition Estimates interactive dashboard, and the Global Nutrition and Health Atlas dashboard. However, much of the data that are currently collected in humanitarian contexts are rarely shared, or when shared, aggregated at a level that makes identification of what and who is missing impossible. When done right, dashboards can allow us to assess and visualize that missingness. A recent review of FluNet’s Influenza surveillance data allowed for developing metrics of completeness, forming metadata (or information about the data), and creating heat maps that let users examine data completeness over time and space. 9 Our analysis of completeness of foodborne outbreak surveillance system revealed that data quality could be improved with targeted efforts, well-defined criteria, and continuous assessments. 10
Yet, data dashboards on their own are not sufficient. There needs to be greater collaboration with local experts and multidisciplinary teams to identify what might be missing from the data and why, how the missingness biases the results, how it can be avoided, and how the consequences of missingness can be communicated and addressed. The availability of and transparency around data via dashboards, data quality metrics, and metadata compilation can go a long way in facilitating this collaboration.
We recognize that not every needed piece of information could be known or available. So, we should strive to disclose the extent and reasons for missingness, the obstacles in collecting and examining data. By applying the concepts of intellectual humility, a type of attentiveness to and owning of limits of knowledge, we could foster effective and ethical use of data-powered tools, minimize the risk of distortion and misperceptions, and create better standards for responsible science and science communication. 11
Recent marked advances in theory, modeling capacity, real-time data collection as well as novel approaches for digitization and data analytics can have the potential to make real advances in addressing humanitarian need and predicting and responding to crisis in a timely manner. However, if we do not address structural missingness in the data itself, then no matter the sophistication of our analytical tools, we might be reinforcing the vulnerabilities that contributed to the crisis to begin with and discounting or undermining systems that are supporting resilience yet are invisible to the international community. In order to have a transformational impact, we need to bridge the divide between data scientists and modelers, humanitarian practitioners, government, theorists, and affected populations and commit to collaboration, data sharing, and transparency. We have to prioritize data completeness over data convenience and explore novel approaches for collecting data from hard-to-reach populations. At minimum, we can improve standards for data reporting, ensure transparency in handling data, and start a critical discourse on missingness. In the end, we have a moral imperative to gather and process data that considers and addresses who and what has been left out.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
