Abstract
The changing environment in a datafied society pushes the statistical world into a long-distance race where the final line is never reached because the path is continuously moving along the way. Many countries all over the world are searching for new approaches, new tools, skills and new possible roles for the National Statistical Institutes (NSIs). Horizontal issues are essential to address properly these changes. The first one is the access to Big Data (including Internet of Things – IoT) and the legislative framework and ethical principles related to such access. The second one is to communicate these principles to the citizens and to inform about the statistical treatment of the data from these new sources. The pandemic situation has increased even more the use of these data and opened questions on their uses.
Some members of the European Statistical System (ESS) have already elaborated different ethical codes and even ethical assessment tools, which could be overlapped or/and complement the European Statistics Code of practice (ESCP). In parallel, under different projects and groups in the ESS and in the European Commission (related with the EU Data Strategy) some principles have been proposed along with recommendations, some fall under ethical behaviour. The European Statistics Code of Practice has just been updated in 2017, however, this changing environment is demanding new rules and principles that could be incorporated in the Code or even in an amended Regulation 223/2009 on European Statistics. In this paper we analyse the existing ethical principles for Big Data uses (in a broad sense) and under different scenarios and compare them with the current statistical principles. We will try to go even further and think into the future about what kind of principles we would need if we add new roles to the NSIs.
Principles and ethics: Two sides of the same coin
Following the definition of the Oxford dictionary a principle is, as a first meaning, “a moral rule or a strong belief that influences your actions”, while ethics are “moral principles that control or influence a person’s behaviour”. Therefore, principles and ethics talk about how to behave in relation to a concrete subject or area.
When we move to the specific area of official statistics we find more the word “principle” or “value”. In particular, in the European Union, the statistical principles are “constitutionalised” in the Treaty (article 338 TFEU: impartiality, reliability, objectivity, scientific independence, cost-effectiveness, statistical confidentiality; not excessive burdens on economic operators) and defined in secondary law (article 2 of Regulation 223/2009 of the European Parliament and of the Council of 11 March 2009, on European statistics). These statistical principles set up by law are further elaborated in the European Statistics Code of Practice (ESCP) aimed “at ensuring public trust in European statistics by establishing how European statistics are to be developed, produced and disseminated in conformity with the statistical principles as set out in article 2(1) and best international statistical practice.”
In particular, the ESCP is based on the different ethics built upon a series of international, European and national standards and principles related to ethical behaviour, quality and good practices of official statistics. Here, we agree with O’Leary’s view that “codes of ethics provide a signal to those who interact with the relevant group as to what to expect of the group members” [1, p. 81].
Therefore, EU official statistics have their fundamental principles in the EU Treaty and its ethic code in a live document. Then, from the legal point of view, in European statistical law, “principles” are of much higher category than the “ethics” inside the ESCP that guide our behaviour, based on those principles, of course.
Data ethics literacy… what’s new?
Digitalization has placed data ethics issues in the first row. Data Ethics is about responsible and sustainable use of data. However, as David Hand [2, p. 178] recognises “professions typically have their own formal code of practices based on ethical principles, but one of the characteristics of modern data science and technology is that there is no unique profession bearing responsibility for it”.
Then, when we talk about Data Ethics we can find many different ethical principles, guidelines or codes that are related to data but according to a specific area: Data Science code, Computer Ethics, Big Data Ethics, Ethics on Artificial Intelligence, Statistical principles, Digital Ethics, Cyber Ethics, Business Ethics, “Macroethics” (ethics of data, algorithm and practices [3]), etc. Some of them are quite new and have been emerging with the development of new technologies, while others have coexisted for many years or are just different ways of explaining similar issues.
Maybe it is now the moment to stop, look back and forward and analyse if we are really moving to new principles that were left out of the picture when updated our code or it is just a development of the existing principles.
It is important to have clear in mind the starting point and what shall remain in its own regulatory sphere. To this aim, we have to start from the following basic premises:
Fundamental Principles of Official Statistics have existed since 1994. In the European Union, the minimum and fundamental ethical principles for official statistics are ruled in the Treaty and in secondary law (Regulation 223/2009). Since 2005, such EU principles for official statistics are further elaborated in the ESCP. In 2017, the ESS updated its Code of Practices to take into account the latest changes and innovations, such as emerging new data sources and use of new technologies. Many of the codes and ethical principles adopted in parallel or tangential areas are addressed to the private sector or to the public sector without a specific collection mandate.
So, what is new, then? New is everything that is pushing to move and reproduce or adapt the principles. New is the EU Data Strategy and its related legal acts. New is all the different “principles” we are adopting in sectorial statistical areas when using new data sources or techniques.
Different approaches to big data principles
In 2016, Eurostat outsourced a study about Big Data and, one of the deliverables was a report on ethical guidelines on Big Data. The report recognised that “the European Statistics Code of Practice and the UN Fundamental Principles of official statistics (…) constitute the ethical framework of official statistics” [4, p. 2] and made some ethical recommendations from analysing use cases, but all embedded in the ESCP structure and most related with quality issues. This was the approach followed by the amended ESCP in 2017.
In 2018, the European Commission published the “Guidance on sharing private sector data in the European data economy”, defining principles “in order to ensure fair markets for IoT objects and for products and services relying on data created by such objects” (proportionality in the use of private sector data, purpose limitation, do-no-harm, conditions for data re-use, mitigate limitations of private sector data and transparency and social participation). And in 2020, the final report prepared by the High-Level Expert Group on Business-to-Government Data Sharing “Towards a European Strategy on Business-to-Government Data sharing for the Public Interest” adapted those principles (proportionality in the use of private sector data, data-use limitation, risk mitigation and safeguards, compensation, non-discrimination, mitigate limitations of private sector data and transparency and social participation, accountability, fair and ethical data use).
The European Data Strategy1 just mentions the FAIR principles (Findability, Accessibility, Interoperability and Reusability) to “strengthen the governance mechanisms at EU level”, the principle ‘as open as possible, as closed as necessary’ and some general EU principles. The Data Act2 will not change any such existing legislation, but future legislation in these areas should in principle be aligned with the horizontal principles of the Data Act. In particular, it refers to the principles of data minimisation and data protection by design and by default, the principle that the data holder may request reasonable compensation when legally obliged to make data available to the data recipient, the principle of transparency and the once-only principle.
In statistics, the European Statistical System Committee (ESSC) supported, as a good basis to move forward and obtaining support of the stakeholders, the outcome of the work done by the Group on Privately Held Data for Official Statistics. This Group proposed a set of principles concerning the access to and use of privately held data for official statistics to be included in the EU legislative framework3(confidentiality, professional standards, data business interest, minimal data, proportionality, level playing field, equal access, transparency, proper access modality, free data access). In addition, one of the actions also proposed by the Group was to finalise the principles for accessing data using web scraping techniques.4
In parallel, the Commission (Eurostat) signed the Agreement on the exchange of data from online platforms regarding short-stay accommodation stressing, as the main principle, the protection of the privacy of citizens, including guests and hosts, in line with applicable EU legislation. “Data will not allow individual citizens or property owners to be identified”5 (this is our statistical confidentiality principle). The agreement contained some contractual terms (statistical confidentiality, preservation of the commercial interests of the platforms, security measures, etc.).
Summarising, we have our binding principles for official statistics. We also recognise and follow the UN Fundamental Principles of Official Statistics and have our own ethic code of practices and our quality framework. Non-statistical studies recommend some principles for accessing Smart Data for public uses (even mentioning statistical principles as a good example of how to do it well). This is also the aim of the Data Act. However, in our own statistical areas we are proposing new principles (not adopted formally) just for Big/Smart Data and, even more, we are proposing more principles for using specific innovative techniques (see Table 1: Comparison table on B2G principles). Many questions arise: Are those principles really new? Are some already subsumed in the existing statistical principles? Are they really principles? Are we creating too much confusion for the citizens and stakeholders?
Comparison table on B2G principles
Comparison table on B2G principles
Official statistics is a public good but is not a public policy. It is a tool to make proper decisions. Statistical authorities have always been working with data applying its ethical codes. Therefore, what we first realise when examining these new Data Ethics is that most of the principles are already included in our codes or are not applicable to us because we have our specific legislation, for example, the mandate for data collection.
As mentioned at the beginning of this paper, the starting point here shall be that we are a public institution and we should not copy or reproduce principles addressed to private companies. We have our own legal framework and principles and also our ethical code. With this in mind, we analyse the proposals coming from the ESS WG Privately Held Data on possible new principles for Trusted Smart Statistics.
1. “Principle of confidentiality: Accessing and using privately held data for official statistics by NSIs should not compromise the privacy of individuals.”
We already have a statistical confidentiality principle recognised by law, therefore, naming similarly to talk to different issues is not recommendable from the legal point of view. Moreover, for privacy issues we are subject to the General Data Protection Regulation (GDPR) as regards personal data. In addition, principle 6 of UN Fundamental Principles states that “Individual data collected by statistical agencies for statistical compilation, whether they refer to natural or legal persons, are to be strictly confidential and used exclusively for statistical purposes”. This is also in consonance with principle 5 of the ESCP on Statistical Confidentiality and Data Protection: The privacy of data providers, the confidentiality of the information they provide, its use only for statistical purposes and the security of the data are absolutely guaranteed. Therefore, this is not a new principle.
2. “Principle of professional standards: The NSI should act in full accordance with professional standards.”
The explanatory notes of this principle clarify that “the professional standards of the ESS have been codified in the European Statistics Code of Practice (…). The production of official statistics is based on scientific methods and on ethical principles”. This is already covered by the “Objectivity principle”, according to Regulation 223/2009, ”meaning that statistics must be developed, produced and disseminated in a systematic, reliable and unbiased manner;
3. “Principle of business interest: Accessing and using privately held data for official statistics by NSIs should not compromise the reputation and business of the private data holder.”
This is the “Do-not-harm” principle transformed by the High Level Working Group B2G in “Risk Mitigation and Safeguards” and consisting in keeping the possible harm to the business as low as possible. The question that should be asked is: Is this just in regard to Big Data? This principle emerged in the B2G context for any public purposes and without a specific legal framework. We already have access to “privately held data” for official statistics (no Big Data) so, the message should not be “only if the data are huge are your business interest preserved”. We are talking about ethical principles and, in fact, the Treaty compels us to the “impartiality principle”, meaning that statistics must be developed, produced and disseminated in a neutral manner, and that all users must be given equal treatment. We understand the need to send a message to the data holders as regards its business, but it should be addressed to all data holders. This is part of the “protection” or safeguards we offer, so, we think that more than a new principle is a development of Principle 5 of ESCP. Then, we propose to introduce a new indicator 5.7: “Accessing and using privately held data for official statistics by NSIs should not compromise the reputation and business of the private data holder”.
Trusted Smart Statistics “principles” proposal.
4. “Principle of minimal data: No more data should be requested from the private data holder than the minimum necessary for the production of the official statistics targeted by the request.”
This is the “proportionality principle” that most national statistical laws recognised for collecting data for statistical purposes and draws inspiration from the GDPR principle of minimization (it has been also included in the draft Data Act). However, this is already recognised under Principle 9 “Non-excessive Burden on Respondents” (The response burden is proportionate to the needs of the users and is not excessive for respondents. The statistical authorities monitor the response burden and set targets for its reduction over time), in particular, indicator 9.1 “The range and detail of European Statistics demands is limited to what is absolutely necessary”. Therefore, this is not a new principle.
5. “Principle of proportionality: Costs and efforts from the private data holder as well as the NSI should not only be kept to a minimum, but should also be reasonable compared to the envisaged public benefit of the related official statistics.”
This is clearly the ‘cost effectiveness’ principle that, according to the law means that “the costs of producing statistics must be in proportion to the importance of the results and the benefits sought (…)”. Again, this is not a new principle.
6. “Principle of level playing field: In order to guarantee a level playing field, the distribution across private data holders of the burden of giving access and providing data to NSIs should be fair.”
This is related to two principles, “impartiality” and “non-excessive burden on respondents”, in particular, indicator 9.2 “The response burden is spread as widely as possible over survey populations and monitored by the statistical authority”. Notwithstanding, this indicator could be slightly amended to cover not only surveys. Thus, we propose to amend indicator 9.2 as follows “The response burden is spread as widely as possible over survey populations and monitored by the statistical authority.
7. “Principle of equal access: If multiple NSIs need to get access to and use privately held data from the same data holder, they should treat such data holders in an equal way. In turn, these data holders should treat the NSIs concerned in an equal way as well”
This principle has two parts, one addressed to the statistical authorities and the other to the data providers. However, when we are formulating a code of ethics it shall only refer to the actions and behaviours of its members. Thus, focusing on the NSIs, no one will be surprised at this stage if we say that this principle is part of the “impartiality principle” (neutral manner), “objectivity” (systematic, reliable and unbiased manner) and to the said indicator 9.2. Nevertheless, even if equal treatment to respondents is implicit in the neutral manner treatment and unbiased manner, we consider that it should be explicit but not as an independent principle but as part of principle 2, as it is related to the access of the data (collection). As a consequence, we suggest amending indicator 2.4: “Access for statistical purposes to other data, such as privately held data, is facilitated, while ensuring
8. “Principle of transparency: NSIs as well as private data holders should practice full transparency towards the general public as well as those to whom the data pertain.”
Here we find again the two sides of the principle, like in a synallagmatic contract. What we have to rule is what concerns the Statistical Authorities and here the “Quality Declaration of the European Statistical System” included in the code’s preamble recognises that “The development, production and dissemination of our statistics are based on sound methodologies, the best international standards and appropriate procedures that are well documented in a transparent manner.” Moreover, the principle of transparency is already subsumed in the “objectivity principle” (the policies and practices followed are transparent to users and survey respondents). Then, this is not a new principle.
9. “Principle of proper access modality: For implementing access to and use of privately held data, several modalities may be considered. The modality is to be chosen by the NSI in consultation with the business, and by properly taking into account the business interest in accordance with the other principles.”
We have to be careful not to contradict the impartiality and objectivity principles, and apply this criteria to all companies. Anyway, this is related to the do-not-harm principle “taking into account the business interest” and is not a principle but a good practice. We propose just to mention it in a whereas of the law or in a declaration but not as an independent principle.
10. “Principle of free data access: Access and use of privately held data for official statistics should be free of charge for the NSI.”
This is the “compensation principle” that do not follow the general recommendations from B2G groups. From our point of view, this is not a principle neither a part of it and falls under the conditions under which data is made available which might include a compensation for the special treatment of the data when requested by the NSIs, for example. Therefore, we recommend to consider this as part of the conditions to be added in the law when ruling access to new data sources or PHD (similar as in the Data Act, but with our own rules). This could be, if the case, one of the rules of our “PHD access policy”.
Therefore our proposal is to amend the European Statistics Code of Practice to update it including a new indicator 5.7 and amending indicators 2.4 and 9.2 (see Fig. 1: Trusted Smart Statistics “principles” proposal). This should be complemented with the inclusion of clear rules for PHD in the statistical legislation.
As for the proposed web scraping principles by the Big Data ESSnet on Web scraping, after this reflection, it is easier to see that what the experts are proposing is the selection/extraction of part of the statistical principles/indicators. Then, the burden minimisation considering other data sources is part of principle 9; the protection of all personal data is principle 5, to abide by all applicable legislation is not a principle but an obligation, and using scientific principles is part of the objectivity principle and the reliability principle (scientific criteria are used for the selection of sources, methods and procedures). What is new is the “the honour requests made by website owners to refrain from scraping their websites”. This could be an ethical principle, but as it is so specific to these techniques we propose to avoid naming it “principles”. We could have a Webscraping Policy (the same for other specialities that might emerge such as Smart Surveys), in this policy we could declare acting in accordance with our statistical regulations and principles and following some specific guidelines/rules/best practices.
Now is the time to discuss what we need for the future in the context of the amendment of Regulation 223/2009 on European Statistics to cover new data sources, techniques and innovation.
We started our reflection saying that principles and ethics are two sides of the same coin. This is correct in the theoretical sphere, but in the legal sphere they may differ. As mentioned, we have 7 main statistical principles recognised in Primary Law, in the Treaty, and defined in secondary law (Regulation 223/2009). These principles are compulsory and immovable,6 but the definition can be updated if needed. However, the ESCP is more our ethical code that should be flexible in order to cover all ethical issues,7 as we did in 2017. If the current ethical principles of the Code do not suit all statistical activities, we should then adapt it to include the use of new data sources. It is important also to separate what ethical principles are, and what shall be accomplished and monitored by statistical authorities, from what operational rules, possible contractual terms, policies or recommendations.
If we want to maintain and gain trust, our principles must indicate that we have been solid and robust in Data Ethics for years. Trying to elaborate on new principles (sometimes with the same terminology as the existent statistical principles) for different data access or treatment just generate confusion among users and stakeholders leading to a situation of uncertainty and with the risk of breaking our own principles on neutral and equal treatment. If there are some specificities related to webscraping or Smart Surveys, we could elaborate specific policies, protocols or declarations with all these particular aspects and always attaching our general statistical principles.
For these reasons, we propose to take the real gaps and amend some indicators of the code while adopting a declaration or policy when needed.
Concerning a possible revision of Regulation223/2009, apart from the enabling clause and conditions for access, some recommendations regarding new sources and methods could be mentioned in the recitals but, from our point of view, no new principles are needed in the statistical law.8
Footnotes
Until amending the Treaty, of course.
This is the spirit also in article 11: The Code of Practice shall be reviewed and updated as necessary by the ESSC.
And what about New roles of NSIs? this implies new data ethics? Over and above the convenience of including this at EU level, the answer will depend on how far we will go. Going the furthest and becoming National Data Stewardship, a National Statistical Institute will have two hats that shouldn’t be wore at the same time. Each role will have its own principles, but ruling the data ethics for non-statistical purposes is out of our mandate and business. So, the said NSI will be subject to two different legal frameworks with their different principles and this should be explained clearly to the public.
