Datafication of Public Policy

Abstract

V. Eubanks, Automating Inequality: How high-tech tools profile, police, and punish the poor, St. Martin’s Press, 2018.

C. C. Perez, Invisible women: Exposing data bias in a world designed for men, Abrams Press, 2019.

D. Drey, Data and policy change: The fragility of data in the policy context, Springer Science & Business Media, 1990.

Data and the Policy Environment

Policies, both good and bad, are the direct products of the data that are taken into consideration when making them. Hence, data quality is of critical importance to bureaucrats and policymakers. However, while this criticality cannot be overemphasized, in actuality, few policymakers are cognizant of or even aware of the data-related problems that plague our administrative structures. In this essay, I look at this issue from various angles using the three books by Drey (1990), Eubanks (2018) and Perez (2019), respectively.

What Underlies Data Quality

Sampling

Perez (2019) speaks to the fundamental problems embedded in the process of data collection when she says that in most socio-economic and scientific endeavours, data are often disproportionately more abundant in respect of men. She rues that ‘men’ are taken to be a default proxy for ‘human beings’ in sampling frames and this results in grossly inaccurate estimates that are generalized to the whole of the population. These estimates are then used by policymakers to craft policies that address the requirements of only half the population. Interestingly, there are two demographic constituents that regularly get excluded from sampling frames and who probably need good policies more than others: ‘poor people’ and ‘women’.

Invisible Variables

Reserved slots at car parking lots in offices are often assigned to employees based on ranks in the organization; perhaps, it even makes sense from a hierarchical discipline perspective. Sometimes, there are also slots for differently abled persons. However, is there a need to provide a few slots especially for pregnant women employees as well? After all, how many pregnant women are typically there even in the workforce at a given time, to commend such attention? Facebook COO Sheryl Sandberg faced these questions for the first time in her life when she herself became pregnant. Sandberg’s pregnancy filled a critical data gap for the company since neither she nor Google’s male founders had been pregnant before. Perez (2019) notes that in other instances, the availability and accessibility of government childcare or the location of government housing may be a severe hindrance to working class women when it comes to joining the workforce and making a living. Going beyond the gender pay gap debate in most forms of work, something as basic as the organization of public services such as to enable one to even execute one’s work, also seems to be quite sexist. The consideration of women as datum seems irrelevant or invisible and largely they are supposed to be treated as outliers.

Obfuscating Externalities

Drey (1990) rues that oftentimes project evaluators have already made up their minds about what they want to do based on criteria that is unrelated to project success. Post that, they seek data only to reinforce those opinions. An example given in the book is that of a building project whose manager was very savvy with data presentation, visual charts and financial projections. The evaluators were naturally very positively predisposed with his personability, so much so that, they entirely overlooked the fact that the project was located in a site that was facing imminent immigration of residents owing to the closure of a large local employer. This is a classic data-based fallacy crucial loopholes in the data become falsified by irrelevant externalities. In the end, the project that got sanctioned by such practices ended up surpassing the budget and underperforming in key performance indicators such as resident satisfaction, public service accessibility and property prices.

Self-selection

This is probably the gravest of all data-related sins that a policymaker can be caught doing. Drey (1990) rues that administrative agents may at times, reject the data that they do not feel comfortable with. This kind of self-selection bias can make passage for self-aggrandizing and self-serving projects which do not add value to the public. Predictably enough, this bias too often manifests during project budget allocation stages.

When Data Contracts Bias and What to Do About It

The cost of hindsight is more for high-precision and high-performance systems such as NASA’s space station. The question of absoluteness versus relativity in terms of knowing ‘truth’ is all the trickier for such systems. It is generally accepted that knowing all the variables, that is ‘the complete truth’ is impossible for such complex systems. If known, it would be impossible to act on all of them. The choice then lands on whether to look at ‘false positives’ with a greater degree of weightage or to ‘true negatives’. A false positive indicates the presence of a desirable condition when that is not the case, while a true negative suggests that there is no cause for alarm but which in fact may very well be there. In the aftermath of the space station tragedy, it has often been speculated that the so-called false alarms (or true negatives as we know now) were always there and were disregarded anyway. It stands to consideration that data bias such as this is hard for humans to process many more times than it is to only evaluate one’s own human misconvictions.

The (Broken) Link Between Real-world Processes and Cyber-world Anomalies

Algorithms are not infallible. Eubanks (2018) laboriously documents the infamous welfare automation experiment in Indiana between 2006 and 2010. The experiment showed that by replacing the human discretion of social workers with that of engineers and contractors in determining who qualified for benefits; the politics of discrimination was supercharged. Thousands of poor, black women and sick children who did not fit the algorithm fell out of the social security net during this time. The errors of beneficiary identification and their cost in terms of human suffering were so great that the government sued the contractor, IBM for material breach of contract. By perusing the bare terms of the contract as laid on paper, the justice system could not penalize IBM; while on the other hand, the system’s actual incurred human cost still remains uncalculated till date. The case also touches on the aspect of where precisely the ‘burden of proof’ originates, the contractor, the policymaker or the justice system) and where does it end up by the time everybody is finished (Drey, 1990).

How to Conscientiously Collect, Understand and Protect Data

The protocols of precisely how much or how little data are to be collected, from whom and to what end, must be religiously spelled out and adhered to. What emerges from the case studies in Eubanks’ (2018) book is that data collection that is done without a clearly predefined agenda may eventually lead to unforeseen distortions in usage and policies. Lack of a conscientious position may lead to data becoming hostile and detrimental to the very people it was supposed to serve in the beginning (Drey, 1990).

Combating (or Abetting) Poverty and Deprivation with Data-based Tools

It is important to recognize that data needs to be used to decrease inequities in public service delivery. Automated decision-making tools of the new age surveillance databases or the ‘digital poorhouse’ as Eubanks (2018) calls it, serve to classify and criminalize the poor for now. In Los Angeles, the Allegheny County’s predictive algorithm accesses the homeless peoples’ information database by the VI-SPDAT, to speculate on what kinds of criminal behaviour they are ‘likely’ to undertake in the future. The usage of public resources for discrimination-inducing policies such as these needs to stop. This is where public philosophies such as the ‘right to be forgotten’ (Eubanks, 2018) need to be enforced such that past data that is deconstructive to human life, may be erased after a discrete amount of time. We need to fall back on the fundamental principles of political justice, equality, liberty and national values if we must dismantle the perversions of automation algorithms.

Real World Policies That Worked After Correcting for Data Distortions

Not that all deprivation-related data has been used for unproductive purposes though. For example, from 2006 onwards for about a decade, myriad international agencies unsuccessfully tried to introduce clean, high-efficiency cookstoves (HECs) in underdeveloped countries and to convince rural women to give up traditional and pollutant stoves. This was the case until in 2015, researchers in India decided to actually go and talk to the women about what the real problem in adoption was. It was found that cooking which was strictly a female domain activity required that would be used that need not be ‘split’ as it was something that was physically cumbersome for women to execute. The HEC, in contrast to the traditional stove, made use of split wood and this turned out to be very discouraging for daily use. Eventually, the researchers came up with a metal device, called the ‘Mewar Angithi’ (MA) which could be directly placed inside the traditional stove, could accommodate large, unsplit fuelwood pieces and provide the same airflow mechanism that an HEC does.

Concluding Remarks

In talking about policy-making, all three books address the fundamental issue of data quality at some level. Their common strength resides in the sociological learnings emerging from the books. For instance, Perez (2019) writes her book with as little feminist bias as is possible in writing a book on gender bias. The fact that it is well researched and profusely illustrated with examples and ‘data’, certainly makes her thesis extremely credible while also being an enjoyable read. On the other hand, Eubanks (2018) delves deep into the various forms of ‘digital divides’ that data and automation are creating for citizens. We certainly need more robustly designed empirical research on the issues that these authors have highlighted by way of anecdotes and vignettes. In time, this research would help to inform the process of policy-making and become more data-intensive.

A couple of limitations were noticeable in the synthesizing of the three books. Big data was not a topic that was handled to any extent. I would have liked a rudimentary understanding of how big data corresponds with policy-making in the real world. With big data harvesting becoming increasingly possible by virtue of increasing computing power; policy analysts do not have the luxury of remaining data illiterate. To be literate, they must necessarily think about data on the lines we discuss here and incorporate a protocol of data discipline in their processes. A second gap was that none of the authors spoke enough about how politically feasible it was to communicate a data-based proposal to policy makers; or for that matter, what strategies one may use to effectively communicate such findings. Dery (1990) mentions that data and statistics almost always end up being the last consideration, if at all, in a policy deliberation. This is an uncomfortable state of affairs and needs to be addressed through a more nuanced managerial approach.

Footnotes

ORCID iD

Roshni Das