Abstract

Academics have played an important role in building our understanding of what is happening online. But increasingly, social media companies are denying researchers access to the data they need. Kate Dommett assesses the prospects for change.
What impact does misinformation have on people who consume it? How is data used and sold in the online world? Do political adverts change our minds? Are foreign actors deliberately spreading disinformation to undermine democracy? These questions are critical to our understanding of the impact of the internet on our society. And yet, whilst seen to be important, academics’ ability to provide answers is fundamentally hindered by the issue of data access.
As researchers, it can be easy to take our ability to study our interests for granted. Whatever method we use, be it interviews, surveys, ethnography, time diaries or experimental research, it is often relatively easy to generate data on our interests. When it comes to the internet, however, it is often far from simple for researchers to gain access, gather data or conduct analyses. Whilst by no means facing a unique challenge, data availability has been an ongoing difficulty for those who research the internet.
With large, corporate platforms – such as Facebook, Instagram, YouTube, Google, Snapchat, TikTok and Reddit – dominating the online world, access to information is often highly curtailed. Far from being open and transparent, information is often given at the grace of private corporations, with access defined on terms of their choosing. As a result, it is exceedingly difficult to determine what is happening online, meaning scholars can often provide only partial or limited answers to the questions above. As pressure grows around the need to protect democracy from problematic trends on the internet, there is an urgent need for further research about the online world. In this context, it is interesting to note growing interest from policymakers in the issue of researcher access, with specific moves from within the UK Parliament and European Commission to improve data access and online transparency.
Engaging with these debates, I draw on my recent experience of working as a Special Advisor for the House of Lords Committee on Democracy and Digital Technology to reflect on current debates around data access for researchers. Published in June 2020, the report Digital Technology and the Resurrection of Trust argues that ‘transparency of online platforms is essential if democracy is to flourish’ and makes explicit recommendations for greater researcher access. Reflecting on the experience of producing this report, I consider what has been recommended, and whether any change is likely to come about soon.
Studying the online world
Since the early days of the internet, researchers in academia and beyond have played an important role in building our understanding of what is happening online. Originating from different disciplines, an array of methods and approaches have been used, many of which have been affected by issues of data access.
Perhaps the biggest challenges have been faced by those using computational methods. Scholars in this tradition often gather large data sets (often of social media data) to analyse practices online. Identifying patterns in online interactions, profiling types of user and producing ‘sentiment’ analyses that look at the expression of emotion online, computational approaches have provided a range of insights into the online world. However, in recent years access to computational data – secured largely through Application Programming Interfaces (APIs) – has been heavily limited or withdrawn. Sparked by the Cambridge Analytica scandal, these changes were justified in relation to concerns over user privacy, consent and inappropriate data use, leading many platforms to curtail the information available to scholars (or other outsider actors).
Notably, Facebook dramatically restricted its data access, rolling back API access and substantially limiting scholars’ ability to gather data about activity on its platform. Lynge Asbjørn Møller and Anja Bechmann (2019) have also traced trends across Twitter, YouTube and Instagram, to show that ‘the methods for data exchange provided by the social media platforms are subject to increasingly strict restrictions of data access, making it difficult – if not impossible – to extract substantial social media data for thorough investigations’. For computational analysts, these changes mean that systematic research is now often untenable.
In spotlighting issues of data access, it is, however, not only computational analysts that have been affected. With limited transparency around how platforms’ internal processes work, a culture of non-disclosure agreements and a tendency to refuse interviews and access for ethnographic studies, it is also challenging for researchers using interviews, observation, textual analysis or surveys to generate different kinds of insight. Whilst a few exceptions exist, many researchers have therefore been forced to adapt their methods, or to tailor their research questions to available data, limiting the insights on offer. Whilst these developments are of course frustrating for academics, they also raise more substantial questions about our understanding of what is happening online. With researchers no longer able to provide computational or qualitative scrutiny of the online world, it becomes harder to identify problematic practices and to hold those responsible to account.
This is not to say that no methods for analysis exist. In terms of computational analysis in particular, there have been some attempts to broker access. In addition to small research grants offered by WhatsApp and Instagram, Facebook has been working to facilitate academic research. Under the auspices of the Social Science One initiative launched by Gary King and Nate Persilly, Facebook has agreed to provide access for some scholars to some social media data. This programme has, however, been defined by a series of delays and legal complexities. It therefore took nearly two years for the first URL dataset to be made available, and whilst additional datasets are due to be added (for example, on political advertising) they have a limited scope, curtailing the extent of academic scrutiny. Other avenues for inquiry have similar limitations.
Some companies have chosen to make data about certain parts of their activity available. For example, in 2018 Facebook and Google launched online advertising archives to facilitate public scrutiny – an initiative that Snapchat has also adopted. These schemes provide some insight, but they have been widely criticised as lacking detail and functionality. Moreover, they place power in the hands of companies to decide what and how they disclose information, allowing them to direct the kind of research that can be done. Some scholars have taken a different approach to trying to generate insights. Research has been done by creating mock environments to simulate engagement online, and surveys have been used to gather data on practices online. Whilst research is not impossible, and many scholars have taken inventive approaches to overcome data limitations, it is clear that the breadth and detail of available information is far from ideal.
The politics of data access
Against this backdrop, it is notable that policymakers and civil society organisations have been voicing growing support for the idea of compelling online platforms to provide data access for researchers. Organisations such as Algorithm Watch, for example, have argued that ‘ensuring adequate research access should…be a paramount priority’. Whilst the EU Commission is currently consulting on ‘non-personal data access obligations’ as part of its Digital Services Act. Elsewhere, the Stigler Centre in the US has called on the Federal Trade Commission to ‘moderate independent researchers’ access to these databases’. These calls have drawn attention to the need for independent research in order to provide oversight and enable accountability. Yet they have also stressed the role policymakers can play in brokering access for researchers to online data. This step is seen to be particularly important because, as the Kofi Annan Foundation has argued, ‘even when the platforms have promised to make available data for independent academic research, those promises have often gone unfulfilled’. As such, there is seen to be a need to compel companies to comply with data access requests.
The latest call for greater access to data was made in the newly published report from the House of Lords Democracy and Digital Technology Committee. Set up in September 2019, and chaired by Lord David Putnam, the Committee conducted an expansive inquiry looking to restore trust in digital technology. A key part of its argument contended that ‘For the public to trust individuals with power there must be transparency’, leading the report to consider ‘the information which platforms should share with researchers, what platforms should share about the algorithms that govern them, and how open companies should be about the decision-making process regarding what can stay on their platforms’. In regards to data sharing in particular, the importance of action was stressed, with oral evidence from Ben Scott from Luminate, Paddy McGuiness, Alex Krasodomski-Jones, Professors Helen Margets, Cristian Vaccari and many others emphasising the need for further insight into the online world.
Within the final report, the Committee argued for ‘independent researchers to have greater access to data from technology platforms’ in order to allow them to ‘verify their activities and effects’ and promote public trust (House of Lords, 2020, p.60-1). Reflecting on previous calls from Bruns (2018) and Asbjørn Møller and Bechmann (2019), the Committee recommends that:
[The broadcast regulator] Ofcom should be given the power to compel companies to facilitate research on topics that are in the public interest. The ICO [Information Commissioner’s Office] should, in consultation with Ofcom, prepare statutory guidance under Section 128 of the Data Protection Act 2018 on data sharing between researchers and the technology platforms. Once this guidance is completed, Ofcom should require platforms to:
(a) Provide at least equivalent access for researchers to APIs as that provided to commercial partners;
(b) establish direct partnerships with researchers to undertake user surveys and experiments with user informed consent on matters of substantial public interest;
(c) develop, for sensitive personal information, physical or virtual ‘clean rooms’ where researchers can analyse data.

Outline of Industry–Academic Partnership Model
This recommendation seeks to secure greater access for researchers to platform data and to expand industry partnerships between academics and companies, yet it also aims to develop recent proposals for data analysis infrastructure in a safe space across and outside platforms (Asbjørn Møller and Bechmann, 2019). The idea of ‘clean rooms’ seeks to recognise the privacy and data protection concerns raised by companies by providing infrastructure for researcher access. This is an idea that has garnered significant interest in the EU, and which offers a fruitful path for data access (for more see Lineate, 2019).
Reflecting on my experience as special advisor to the Committee and the likelihood that these recommendations will bring about change, it is clear that there is growing pressure for increased greater data access. There is now a concerted effort to use policy change to secure academic access, recognising that digital companies are not inclined to share insights that are democratically vital.
And yet, whilst the pressure is growing, it became clear through the course of the inquiry that it will not be easy to secure change. When speaking to social media companies there was little appetite to open up, and especially to offer more expansive data access. Whilst parliamentary committees and governmental bodies are voicing increased support for these policies, it is likely that any attempt to legislate for change will face considerable opposition from online companies. It may be some time before academics secure greater data access – but it is reassuring to know that policymakers are committed to fighting for this idea.
Footnotes
Kate Dommett is a Senior Lecturer in the public understanding of politics at the University of Sheffield.
