Abstract
The author assesses the extent to which 361 consumer-oriented commercial Web sites post disclosures that describe their information practices and whether these disclosures reflect fair information practices. Although approximately 67% of the sites sampled post a privacy disclosure, only 14% of these disclosures constitute a comprehensive privacy policy. The study was initiated by the private sector as a progress report to the Federal Trade Commission (FTC) and is one in a series of efforts designed to assess whether consumer privacy can be protected through industry self-regulation or whether legislation is required. Although the FTC does not recommend legislation at this time, the study suggests that an effective self-regulatory regime for consumer privacy online has yet to emerge.
Privacy can be defined as people's ability to control the terms under which their personal information is acquired and used (Westin 1967). Personal information is information that can be associated with an identifiable individual. From a business perspective, privacy is really about making consumers comfortable disclosing the personal information needed for relationship marketing. This involves simultaneously communicating to the consumer the benefits of disclosure and providing assurances that disclosure of personal information is a low-risk proposition (Culnan and Milberg 1999).
In the Framework for Global Electronic Commerce (Clinton and Gore 1997), the Clinton administration stated that ensuring consumer privacy was essential if electronic commerce was to realize its full potential. The Internet is causing a fundamental shift to a customer-centered world in which customer relationship management becomes the core activity of an e-business. When consumers are dissatisfied with their interactions with a Web site and go elsewhere, the probability of reengaging that consumer is significantly reduced (PriceWaterhouseCoopers 1999).
Privacy concerns, or an unwillingness to disclose personal information, were seen by the Clinton administration as threatening electronic commerce and the emerging digital economy. For example, a recent public opinion survey finds that privacy concerns are an important reason that people who are not already online do not go online (Green 1998). Other surveys of Internet users find that consumers who are online decline to provide information requested by a Web site or provided false information if the site does not provide notice about why personal information is being collected and how it will be used (Georgia Tech Research Corporation 1997; Privacy and American Business 1997).
The section on privacy in the framework concludes with the following statement: “The Administration considers data protection critically important. We believe that private efforts of industry working in cooperation with consumer groups are preferable to government regulation, but if effective privacy protection cannot be provided in this way, we will reevaluate this policy” (Clinton and Gore 1997).
Self-regulation differs from a pure market approach in which consumer preferences drive company behavior. Under a pure market approach, it is assumed that consumers prefer to do business with firms that have implemented strong privacy protections and avoid firms that have breached privacy. In contrast, self-regulation is based on the three traditional components of government—legislation, enforcement, and adjudication—and these functions are carried out by the private sector rather than the government (Swire 1997). Legislation refers to the question of defining the appropriate rules, enforcement to the initiation of an enforcement action when the rules are broken, and adjudication to whether or not a company has violated the privacy rules (Swire 1997).
Fair information practices define the privacy rules for a self-regulatory regime. They are global principles that balance the privacy interests of individuals with the legitimate need of business to derive value from customer information. At the heart of fair information practices are the following principles:
Notice of the firm's information practices regarding what personal information it collects and how the information is used;
Choice regarding subsequent uses of the information, particularly when the information is used by an organization for purposes other than those for which the information was collected, such as marketing;
Access, or the ability of users to view the data about themselves the organization has collected and to contest the data's accuracy and completeness;
Security/integrity, which requires the organization to take reasonable steps to ensure that personal information is secure during transmission and storage and that it is accurate and timely; and
Enforcement/redress, which means there must be mechanisms to ensure that organizations comply with fair information practices and that meaningful sanctions apply for noncompliance (FTC 1998).
Fair information practices represent good public policy for both consumers and business. Prior research on privacy finds that people are willing to disclose personal information in exchange for some economic or social benefit subject to the “privacy calculus,” an assessment that their personal information will subsequently be used fairly and they will not suffer negative consequences in the future (Laufer and Wolfe 1977; Milne and Gordon 1993). Assuming that the firm's practices are consistent with what it discloses, fair information practices signal to the consumer that the firm will abide by a set of rules that most consumers perceive as fair and will not behave opportunistically (Bradrach and Eccles 1989; Shapiro 1987; Spence 1974). Because fair information practices minimize the risk of disclosure, they help build trust and promote the disclosure of the personal information needed for relationship marketing (Culnan and Armstrong 1999; Lewicki, McAllister, and Bies 1998; Milne and Boza 1999). Therefore, observing fair information practices is good for business. The remaining question is whether their implementation online should be governed by self-regulation or required by law.
The FTC's Privacy Initiative
In 1995, the FTC's Bureau of Consumer Protection launched a Consumer Privacy Initiative, an ongoing effort to bring consumers and businesses together to address the consumer privacy issues raised by the emerging electronic marketplace. The hallmark of the initiative was a series of public workshops. The chairs of the House and Senate Commerce Committees requested a summary of the June 1997 Public Workshop on Consumer Privacy. In their response to Representative Thomas Bliley and Senator John McCain, the FTC stated that during the next 12 months, it would monitor the information practices of commercial Web sites and report to Congress on the effectiveness of self-regulation (FTC 1997).
In June 1998, the FTC issued its report, Privacy Online: A Report to Congress (FTC 1998). This report contains the findings from the Web sweep conducted by the FTC in March 1998. This study surveyed the privacy disclosures posted by Web sites from six target populations: a comprehensive sample of .com Web sites; Web sites drawn from the health, retail, and financial sectors; Web sites targeted at children; and the most popular Web sites. The FTC found that though more than 85% of the first four populations collect personal information from consumers, only 14% provide any notice with respect to their information practices, and approximately 2% provide notice by means of a comprehensive privacy policy. The FTC found that 97% of the 111 most popular Web sites collected personal information, and 71% of these sites provided some form of privacy disclosure (FTC 1998).
The FTC concluded that an effective self-regulatory regime had yet to emerge. It recommended legislation to place parents in control of the online collection and use of personal information from children. In fall 1998, Congress enacted the Children's Online Privacy Protection Act (16 U.S.C. 650 and 16 C.F.R. Part 312, October 21, 1999), which regulates the collection and use of personal information collected by Web sites directed at children under 13 years of age.
The commission further found that the majority of other online businesses had failed to adopt even the most fundamental elements of fair information practices: notice and choice. Furthermore, the majority of existing industry self-regulatory programs failed to provide meaningful enforcement. The commission did not recommend legislation for adults but stated it would recommend an appropriate response to protect consumer privacy at a later date (FTC 1998).
The Georgetown Study
Background
The Georgetown study was commissioned by the private sector in December 1998 as a progress report on self-regulation to the FTC. It was hoped that recent industry efforts to strengthen self-regulatory programs would persuade the FTC not to call for additional legislation. The Georgetown study asked the same three questions as the FTC study:
What types of personal information do Web sites collect from consumers?
How many Web sites have posted privacy disclosures?
To what extent do these privacy disclosures reflect fair information practices?
The study was funded by contributions of $5,000 or less from 17 different private sector companies and associations: America Online, American Express, BBBOnline, Compaq, eBay, eDirect, Ernst & Young, Direct Marketing Association, IBM, Media Metrix, Microsoft, MatchLogic, Online Privacy Alliance, Privaseek, Time Warner, TRUSTe, and Wave Systems.
Methodology
The methodology of the Georgetown study was modeled after that of the 1998 FTC study but was not an exact replication. The Georgetown study differed from the FTC study in four primary ways:
The number of sectors sampled: The Georgetown study was based on a single comprehensive sample of U.S. commercial Web sites likely to be of interest to consumers. Because the FTC did not find significant differences among its results for the general sample and the sectoral samples for retail, health, and financial services sites, no sectoral samples were used in this study. Furthermore, the Children's Online Privacy Protection Act was enacted into law during 1998, eliminating the need to survey Web sites specifically targeted at children. Finally, the most popular Web sites were surveyed in a separate study of the Top 100 funded by the Online Privacy Alliance (see Culnan 1999b).
The sampling frame: The samples for the two studies were drawn from different populations. The comprehensive sample for the FTC study was based on a random sample of the entire .com universe drawn from a Dun & Bradstreet database. The Georgetown study was drawn from a random sample of .com Web sites based on unduplicated traffic.
The sample size: The FTC's comprehensive sample had 674 observations, whereas the Georgetown study had 361 observations.
The survey questions: The Georgetown questionnaire items asked a wider range of questions about the content of privacy disclosures.
During the design phase of the study, the advisory group for the Georgetown study, composed of representatives from the business and privacy/consumer advocacy communities, provided substantive advice on the design of the sample and the survey form. The FTC also provided extensive advice on all aspects of the study. The advisory group discussions and discussions with the FTC about the sampling plan focused on the following criteria:
The study should be based on a random sample.
The sample should reflect Web traffic rather than the entire World Wide Web, which includes many sites that have little or no consumer traffic.
The sample should go deep enough into the Web to represent a significant proportion of consumer Web traffic and a large enough number of Web sites rather than focus only on the largest or the most popular Web sites.
The sample should be large enough to minimize sampling errors.
The final number of Web sites to be surveyed would also be subject to several operational constraints, including the number of workstations in the facility that would be used for data collection and the need to collect the data during a one-week period.
In discussions with the study's advisory group, many participants expressed the view that neither of the two general groups used in the 1998 FTC study, the comprehensive random sample and the census of the most popular sites, fully satisfied the sampling criteria. The comprehensive sample based on a random sample of the entire .com universe from the Dun & Bradstreet database did not necessarily reflect the Web sites that most consumers visit. The census of the top sites did not represent a random sample and could not be generalized beyond that list.
Furthermore, operational constraints made it unfeasible to survey three samples—including the replication of the general group from the 1998 FTC study, the most popular Web site, and a third group based on Web traffic—and to ensure that the two random samples were large enough to address the concerns about sampling errors. Therefore, the decision reached by the study director was that the policy process would be best served by a sample of at least 300 Web sites that were drawn from a sampling frame with reach as close to 100% of Web traffic as possible. The process of developing the actual sample consisted of three steps:
Identifying a listing of sites that could be used to represent the target population. This list constituted the sampling frame.
Generating a random sample from the sampling frame. The sites in the sample constituted the sampling pool.
Identifying qualifying sites from the sampling pool for the sample. Sites in the sampling pool were randomly examined until the number of examined sites qualifying for inclusion in the survey met or exceeded the target sample size. The sample consists of the examined qualifying sites, which were subsequently surveyed for information collection practices and information practice disclosures.
The Georgetown study surveyed a random sample of 361 commercial U.S. Web sites drawn from a list supplied by Media Metrix of the top 7500 URLs (Uniform Resource Locator, the address of a computer or document on the Internet) based on unduplicated traffic by consumers surfing the Web from home during January 1999. These 7500 URLs constitute a total reach of 98.8% of the World Wide Web, which means that the sampling frame represented 98.8% or nearly all consumers who visited Web sites during this period. The audience of the sites in the sample ranged from more than 4.5 million unduplicated visitors per month to at least 32,000 unduplicated visitors per month. The Web sites not included in the sampling frame include .com Web sites with fewer than 32,000 unique visitors per month or Web sites from other domains (e.g., .edu or .net). Table 1 shows the sample distribution by audience.
Sample Distribution by Web Site Audience
Notes: Data were supplied by Media Metrix (www.mediametrix.com). Rank is based on audience. In the sampling frame, URLs were ranked in descending order on the basis of audience. For example, the first URL on the list had the largest audience. Audience represents the minimum number of visitors for January 1999 based on unduplicated reach. Unduplicated reach means that each visitor to a Web site during a given period is counted only once, even if the person makes multiple visits to the same URL. For example, each URL in the last group was visited by at least 32,000 and by no more than 39,999 different people during January 1999. The top 25 URLs in the list had a minimum of 4,580,000 unduplicated visitors during January 1999.
Whereas the unit of analysis for the survey was the domain or Web site, the sampling frame was based on server rather than domain. Some of the larger sites or domains had more than one server or URL in the sampling frame, and these carried over to the sampling pool before they were detected and eliminated. Appendix B of the report discusses how duplicates were handled (Culnan 1999a).
Before data analysis, the Media Metrix rank of the URLs for which duplicates were detected was recoded to reflect the rank of the URL from the sampling frame with the largest audience for that Web site. This was done to provide the most accurate picture of the audience represented by the sample, as rank serves as a surrogate for audience size. For example, in the original ranked list, America Online was second. However the URL for America Online that was included in the random sample was ranked 329. Therefore, the URL for America Online in the sample was recoded from 329 to 2 to represent its true audience.
Data for the study were collected by 15 graduate students (“surfers”) from Georgetown University and George Mason University during the week of March 8–12, 1999. Each surfer first determined whether a site was a consumer site, which meant that the site would be of interest to at least some consumers. Purely business-to-business sites were eliminated, as were pornographic or adult sites, foreign sites without any U.S. presence, and nonworking sites. After a surfer concluded that the Web site qualified for inclusion in the sample, the surfer searched the site and completed a survey form for the site. Appendix C of the Georgetown report contains both a list of the Web sites in the final sample and the questionnaire with response frequencies (Culnan 1999a).
Summary of Findings
Question 1: Personal Information Collection
The first question asked what types of personal information Web sites collect from consumers. This study adopted the definition of “personal information” used by the FTC in its 1998 report. It included two broad categories of information: personal identifying information and demographic or preference information (hereafter referred to as demographic information). Personal identifying information includes information that can be used to identify a consumer, such as a name or an e-mail address. Demographic information by itself cannot be used to identify a consumer. It can be used in aggregate, nonidentifying form for market research or in conjunction with personal identifying information to create consumer profiles.
The majority of sites collected at least one type of personal information: A total of 335 sites (92.8%) collected personal identifying information, and 205 (56.8%) sites collected demographic information. Thirty-five sites (10.3%) collected e-mail addresses only. Twenty-four sites (6.6%) collected no personal information, whereas two sites (.5%) collected only demographic information. Other types of demographic information collected included marital status, computer/software/online use information, personal characteristics (e.g., height, weight), and time zone. More than half the sites (56.2%) collected both personal identifying and demographic information. Table 2 shows how often each type of personal information was collected.
Personal Information Collected (Base = 361)
Question 2: Frequency of Privacy Disclosures
The second research question asked how many Web sites posted privacy disclosures. The surfers searched the Web sites in the sample for two types of privacy disclosures—privacy policy notices and information practice statements—using the definitions provided in the 1998 FTC report. A privacy policy notice was defined as a comprehensive description of a site's practices that is located in one place on the site that may be reached by clicking on an icon or a hyperlink. An information practice statement was defined as a discrete statement that describes a particular information practice or policy from which at least one potential use could be inferred. Examples of information practice statements include the following:
“Click here if you do not want to receive e-mail from us.”
“We do not share your personal information with anyone.”
“We only share aggregate information with third parties.”
“Your order will be processed on our secure server.”
Nearly two-thirds of the Web sites in the sample (n = 238, 65.9%) contained at least one privacy disclosure. Seven percent posted a privacy policy only, 22% posted an information practice statement only, and 36% posted both types of disclosures. Thirty-four percent of the sample did not post either type of disclosure. Table 3 shows the frequency of privacy disclosures by the type of disclosure. Of the 337 Web sites that collect at least one type of personal information, 236 (70%) had posted at least one type of privacy disclosure.
Types of Privacy Disclosures (Base = 361)
Of the 157 sites that posted a privacy policy notice, 79.6% linked the policy from the home page, and 74.5% linked the policy from at least one Web page on which personal information was collected. Of the 212 sites that posted at least one information practice statement, 81.1% of these statements appeared on at least one Web page on which personal information was collected.
Question 3: Nature of Disclosures
The third research question asked to what extent the privacy disclosures posted by Web sites are based on fair information practices. The contents of these privacy disclosures were analyzed to determine if they included notice, choice, access, and security. These four elements of fair information practices were operationalized in the survey as follows:
Notice was defined to include statements about what information is collected, how the information is collected, how the information collected will be used, whether the information will be reused or disclosed to third parties, and whether the site says anything about its use or nonuse of cookies. Consistent with the FTC study, the Georgetown study did not collect data about whether or not a Web site actually placed a cookie. Only disclosures about cookies were counted.
Choice was defined to include statements regarding choice offered about being contacted again by the same organization and choice about having nonaggregate personal information collected by the Web site disclosed to third parties.
Access was defined to include allowing consumers to review or ask questions about the information the site has collected and whether the sites disclosed how inaccuracies in personal information the site had collected were handled.
Security was defined to include protecting information during transmission and subsequent storage.
The privacy disclosures were further analyzed to determine if they provided contact information a consumer could use to ask a question about the site's information practices or to complain to the company or another organization about privacy. Contact information is the first step in providing consumer redress and enforcement. Redress and enforcement are also elements of fair information practices. Because this study focused only on disclosures, it was not possible to determine whether a Web site had implemented redress or enforcement procedures or whether the Web site's practices differed from its disclosures.
Of the 236 Web sites that collected personal information and posted a privacy disclosure,
89.8% included at least one survey element for notice,
61.9% contained at least one survey element for choice,
40.3% contained at least one survey element for access,
45.8% contained at least one survey element for security, and
48.7% contained at least one element for contact information.
For the same privacy disclosures posted by 236 Web sites, 21.2% contained only one of these five elements of fair information practices (i.e., the disclosure contained only notice, choice, access, security, or contact information), 22.5% contained any two of the five elements (e.g., the disclosure contained notice and security but did not include any of the other elements), 18.6% contained any three of the five elements, 24.9% contained any four of the five elements, and 13.6% contained all five elements.
Thirty-six (15.2%) of the 236 Web sites contained at least one survey item for all four elements of fair information practices. Of these 36 sites, 32 sites (13.6% of 236 sites) also included at least one survey item for contact information in their privacy disclosure(s).
The Stakeholder Responses
The results of the Georgetown study provide ammunition for stakeholders on both sides of the issue. For the industry, the glass is 67% full. One organization commented, for example, that “The [Direct Marketing Association] believes the study shows that business has heeded the call from the White House and the Federal Trade Commission to promote privacy protection online through the adoption of self-regulatory measures” (Direct Marketing Association 1999). Consumer groups, in contrast, find the glass closer to empty. For example, the Consumer Federation of America (1999) stated that “The 1999 results, graded on a pass-fail basis, are being portrayed by the industry as a sign of great progress. When actual performance according to the FTC's fair information practices standards is graded, the industry fails. Meaningful and effective privacy protections for consumers are largely missing.”
Despite repeated cautions about drawing direct comparisons between the results of the Georgetown study and the results of the 1998 FTC study because the studies are based on different populations, the industry nonetheless cited the increase in the number of Web sites that post some form of notice to argue that though work remained to be done, new legislation was not needed. The consumer groups concluded that fair information practices were still the exception rather than the norm on the Web on the basis of the small number of Web sites posting comprehensive privacy polices that included all the elements of fair information practices (Center for Democracy and Technology 1999). Comments on the results of the Georgetown study submitted by members of the study's advisory group can be found in Appendix E of the Georgetown Report (Culnan 1999a).
In July 1999, the FTC issued a report to Congress recommending that though the private sector continued to face significant challenges in promoting more widespread adoption of fair information practices, the commission did not recommend additional legislation at this time (FTC 1999a, b). The commission cited the substantial effort and commitment to fair information on the part of industry leaders and outlined the next steps it would take, including conducting another survey of online privacy disclosures in the spring of 2000 (FTC 1999b).
Beyond the Statistics: Discussion and Unanswered Questions
The Georgetown study provides statistics about the number of consumer-oriented Web sites that have posted privacy disclosures. These statistics can be used to make an initial assessment of whether an effective self-regulatory regime is emerging for Internet privacy. Although the majority of Web sites sampled here posted notice of their information practices, these disclosures did not fully reflect fair information practices. Furthermore, nearly one-third of the Web sites did not post any disclosures, which suggests that a full self-regulatory regime has not emerged.
Currently, there is no consensus about how to operationalize fair information practices online. Without operational definitions for the elements of fair information practices, Web sites are unlikely to post disclosures that satisfy the most demanding critics. The privacy seal programs such as TRUSTe and BBBOnline require certified firms to adhere to standards, but not all Web sites belong to these programs (see, e.g., www.truste.org and www.bbbonline.org). The absence of operational definitions for fair information practices, combined with the dynamic nature of the Web, provides additional measurement challenges for replicating the existing surveys or surveying different aspects of the Internet.
Furthermore, it is unlikely that all Web sites will self-regulate. The current method of enforcement is for the FTC to prosecute firms whose practices are at variance with their disclosures for engaging in a deceptive trade practice. However, Web sites are not required to post any disclosures, and without a posted privacy policy, the FTC has no basis for acting under its current authority. Therefore, it is likely at some point that legislation will be required. On the basis of the record of prior privacy legislation, the chances for new legislation improve when the interests of consumer groups and industry align (Regan 1995). This may occur when any proposed legislation is consistent with the existing practices of the firms with the greatest lobbying clout. The following three issues also merit attention during the policy process and cannot be addressed by the existing data.
First, the study's results are limited to the population sampled, which means that the results of the Georgetown study cannot be generalized to Web sites that have fewer than 32,000 different visitors per month. The population of Web sites used in the Georgetown study is the appropriate place to start the policy discussions about privacy on the World Wide Web, as it represents the commercial sites that the majority of consumers visit. However, ultimately the same protections should apply to consumers who visit the “one-stoplight towns” as to those who visit the “large metropolitan areas,” the sites that were surveyed in the Georgetown study. The 1998 FTC study addressed the former question by using a sample of .com Web sites that was not based on traffic. Unanswered questions, then, relate to how well self-regulation is working for the entire World Wide Web and how the Web should be defined and measured. For example, should a Web site maintained by a small business with five or fewer employees be exempted, as it would be from some other regulations, particularly if the Web site serves only as an electronic brochure with an e-mail link to contact the organization?
Second, it is important to look beyond the statistics in assessing whether privacy disclosures adequately reflect fair information practices for a specific context. The Georgetown study found that approximately 14% of 236 Web sites that collect at least one type of personal information and have at least one privacy disclosure had posted disclosures that included all four elements of fair information practices: notice, choice, access, and security. However, it is important to note that what constitutes an effective privacy disclosure is a function of the Web site's information practices and business model. For example, a site that collects personal information but does not use the information to contact the consumer for marketing or other unrelated purposes and does not share the information with affiliates or third parties does not need to give choice. It would be useful to understand whether Web sites with different information practices post the privacy disclosures that are appropriate for their business models (e.g., electronic commerce versus information dissemination only). Furthermore, the Georgetown study provides no evidence for determining what types of privacy disclosures communicate most effectively to consumers. These issues merit further investigation and require different research methods.
Third, the Georgetown study results do not provide any insights about what has worked to date in promoting effective self-regulation. The Georgetown study did not code Web sites by business model; industry; or membership in a trade association, seal program, or other type of business relationship. It would be useful to assess whether the Web sites with the best privacy disclosures have anything in common. By understanding what characteristics the sites that post privacy policies share as well as what, if anything, sites without policies have in common, the effectiveness of various industry initiatives can be assessed, and strategies can be developed for promoting self-regulation more widely.
