Big Data,Small Personas: How Algorithms Shape the Demographic Representation of Data-Driven User Segments

Abstract

Derived from the notion of algorithmic bias, it is possible that creating user segments such as personas from data results in over- or under-representing certain segments (FAIRNESS), does not properly represent the diversity of the user populations (DIVERSITY), or produces inconsistent results when hyperparameters are changed (CONSISTENCY). Collecting user data on 363M video views from a global news and media organization, we compare personas created from this data using different algorithms. Results indicate that the algorithms fall into two groups: those that generate personas with low diversity–high fairness and those that generate personas with high diversity–low fairness. The algorithms that rank high on diversity tend to rank low on fairness (Spearman's correlation: −0.83). The algorithm that best balances diversity, fairness, and consistency is Spectral Embedding. The results imply that the choice of algorithm is a crucial step in data-driven user segmentation, because the algorithm fundamentally impacts the demographic attributes of the generated personas and thus influences how decision makers view the user population. The results have implications for algorithmic bias in user segmentation and creating user segments that not only consider commercial segmentation criteria but also consider criteria derived from ethical discussions in the computing community.

Introduction

Conceptual underpinnings

Personas, introduced to computer science and human–computer interaction (HCI) by Cooper,¹ are defined as fictitious people who represent real user and customer types. Researchers believe that personas evoke a sense of empathy^2,3 that directs product and user experience designers, software developers, marketers, and other stakeholders to make more user-centric decisions regarding products, services, and other outputs offered to end-users and customers. Personas are, therefore, a personified form of user segmentation, that is, dividing the overall user or customer population into demographically or behaviorally defined segments.⁴

Algorithmic personas (APs) are developed from quantitative data to represent demographic and behavioral characteristics of the user base.⁵ Consequently, AP generation is the employment of algorithms and big data to create personas.⁶ The promise of algorithms for persona generation was observed first by Aoyama in two articles from 2005⁷ and 2007.⁸ McGinn and Kotamraju applied an algorithmic approach for personas in their seminal article in 2008.⁶ Since then, AP generation has become increasingly common in HCI, marketing,⁹ health informatics,^10,11 cybersecurity,¹² video game studies,^13,14 and many other domains that use personified user segmentation for understanding users or customers.

The shift from manually created personas to APs has been characterized as transformative,¹⁵ as AP generation can address the challenge of analyzing large collections of personified big data¹⁶ from social media and online analytics platforms, making sense of this big user data. Manual methods are ill-equipped to analyze such amounts of user data for user segmentation and persona generation. Rapid changes in online user behavior exacerbate the challenge, as these changes require APs to be constantly updated to keep up with the user behaviors and characteristics.¹⁷ Thus, scalable and efficient user segmentation algorithms are beneficial for transforming this data into persona profiles (Fig. 1) or other forms of user segments that describe the user populations' key behaviors and demographics.

FIG. 1.

Creating a persona from data using DR. The left-side figure shows DR using the UMAP algorithm on the dataset collected for this study. The right-side figure shows Siim, an algorithmic persona that corresponds to the pattern circled in the left-side figure. Different algorithms may identify different patterns in the data, but previous research has not examined how this affects the composition of the generated personas. DR, dimensionality reduction; UMAP, Uniform Manifold Approximation and Projection.

Since the introduction of data-driven personas,⁶ researchers have applied a wide range of algorithms for persona generation,^13,18–23 the most common being clustering (grouping users in a way that users in the same group are more similar to each other than to users in other groups), principal component analysis (PCA) (summarizing the user information into smaller summary indices that aim at capturing most user information in a computationally efficient format), non-negative matrix factorization (NMF) (similar to PCA, NMF reduces user data from a higher dimensionality to a lower dimensionality that captures essential behavioral and/or demographic information), and latent semantic analysis (LSA) (identifying associations between a set of users and the content they are interested in by producing a set of latent concepts related to the users and content).²⁴

In their review of literature, Zhu et al²³ report the use of decision trees, exploratory factor analysis, hierarchical clustering, k-means clustering, LSA, multidimensional scaling analysis, and weighted graphs for persona development. Minichiello et al²¹ provide a similar list of methods: cluster analysis (CA), factor analysis, PCA, and LSA. A shared trait of these methods is the attempt to simplify the user population into segments that are then transformed (“enriched”), one way or another, into finalized persona profiles for end-users (see Figs. 1 and 2).

FIG. 2.

The algorithmic persona generation process. APG, automatic persona generation; API, application programming interface; DFC, diversity, fairness, consistency.

Research problem

This collaborative process of persona generation between humans and algorithms involves multiple challenges. Although the algorithmic process is opaque to humans, they need to trust that the algorithm performs in a desirable way—that is, not exacerbating demographic biases in the data or favor one group over another when selecting traits for the personas or segments. In the following, we detail three important research problems (RPs) and our approaches to address these problems in the current study.

First, there is a lack of studies comparing algorithms for persona generation (RP1). Apart from a brief comparison by Brickey et al,¹⁸ none of the studies address what kind of personas different algorithms produce. The lack of studies poses a major hindrance for understanding how the choice of algorithm affects persona generation and whether there is a risk for algorithmic bias in persona generation (and user segmentation in general).

Second, there is a lack of using non-accuracy metrics for persona evaluation (RP2). What we mean by this is that apart from²⁵ that evaluated inclusivity, quantitative persona studies tend to focus on evaluating personas' technical accuracy^26,27 as opposed to what kind of personas are generated. However, the characteristics of the generated personas matter because persona traits (e.g., age, gender, ethnicity) risk aggravating stereotypical thinking about user populations.^28,29

Third, there is a general lack of sharing resources for persona generation (RP3). The lack of publicly available resources—code, data, algorithms, computational notebooks—poses a major hindrance to research on data-driven personas²⁴ and the broader field of user segmentation. Even though making results are resources available for others is a basic principle for scientific progress,³⁰ a few data-driven persona studies have made their resources available for others, thus limiting their contribution to the field.

To address RP1, we conduct an experimental study where we fix the (a) dataset, (b) the number of personas generated, and (c) the method of enriching the personas, only varying the algorithm that processes the baseline user data. Although several effects of algorithms on persona generation (and more broadly on customer segmentation) could be examined, including, for example, accuracy, computational efficiency, or run-time, we focus on studying the effect of algorithm on model outputs, that is, the generated personas. We specifically address the effect of algorithm on the demographics of the persona set—we choose demographics as the unit of analysis because of their widespread use in both persona generation and customer segmentation overall.

To address RP2, we examine three aspects vital to decision makers interested in personas for representing their user, audience, or customer base. These aspects are diversity (i.e., the personas cover many unique characteristics found the user base), fairness (i.e., the personas truthfully reflect the underlying data on the users), and consistency (i.e., the algorithm retains central persona traits when changing the hyperparameters). We refer to these design goals for data-driven personas and customer segments as the diversity, fairness, consistency (DFC) criteria.

To address RP3, we make the resources (code, data, algorithms) available in an online repository^† to further advance empirical persona generation and customer segmentation experiments. To protect business sensitive information, the data are made available in a scrambled format.

Building on the earlier reasonings, our research question is: How do diversity, fairness, and consistency of the generated personas' demographic attributes vary by algorithm? The results have implications for creating more diverse, fair, and consistent personas (and other forms of user segments) from digital user data. Our results inform persona developers (and marketers) of the strengths and weaknesses of different algorithms when applied to persona generation (or customer segmentation).

They also provide wider implications for the potential of algorithmic bias when using algorithms for user or customer segmentation and draw attention to the use of non-commercial segmentation criteria influenced by the fairness/bias discussion going on in the computer science and HCI communities,³¹ which has not thus far been addressed in the persona generation and customer segmentation domains. The main contributions include the following: (a) Providing the most extensive comparative study to date, comparing six different algorithms for persona generation; (b) introducing and applying novel evaluation metrics in the persona generation context, while also providing suggestions for applying different algorithms for different objectives; and (c) sharing computational resources to further persona generation research and development.

Literature Review

Table 1 shows a list of commonly applied persona generation algorithms based on a sample of 63 research papers. The sample of articles was obtained as follows. First, we searched two major academic services: Google Scholar and ACM Digital Library. We used search phrases relating to persona generation (“algorithmic personas,” “data-driven personas,” “quantitative personas,” “procedural personas”). The found articles were manually screened by reading the abstracts. We excluded articles if they were not (a) full research articles written in English, (b) published in a peer-reviewed journal or conference, and (c) developed personas using data-driven approaches. We also carried out snowball sampling to fully account for relevant literature.³²

Table 1.

Top 5 methods for algorithmic persona generation from literature review

Method	Description	Frequency, N (%)	Examples (Refs.)
CA	Groups a dataset using a predetermined number of clusters. Popular approaches are partitioning-based approaches such as k-means and agglomerative such as hierarchical clustering	22 (34.9)	^{33,37,38,41,42}
PCA	Linear dimension-reduction algorithm used to extract information by removing non-essential elements with a relatively small variation	5 (7.9)	^{18,34,44–46}
LSA	Data analysis algorithm that uses singular value decomposition to detect hidden semantic relationships between words	5 (7.9)	^{18,40,46,47,83}
NMF	Method in which data matrices are constrained as non-negative and decomposed to extract sparse and meaningful features	4 (6.4)	^48–50,54
LDA	A generative statistical model that models each item of a collection (typically text) as a finite mixture over an underlying set of patterns	6 (9.5)	^{49,50,55,56,83,84}

Percentages are of total reviewed articles. The rest (N = 21, 33.3%) are classified as “Other.”

CA, cluster analysis; LDA, Latent Dirichlet allocation; LSA, latent semantic analysis; NMF, non-negative matrix factorization; PCA, principal component analysis.

In total, the database searches and snowballing yielded 163 articles for full-text screening. For the screening, we applied the same criteria (a–c). After the full-text review, a set of 63 final articles remained. We then reviewed these articles to identify the algorithm(s) applied.

From the reviewed articles, we identified five commonly used algorithms for persona generation: (1) CA, (2) PCA, (3) LSA, (4) NMF, and (5) Latent Dirichlet allocation (LDA). Table 1 provides an overview of these algorithms and the studies adopting them. The use of the five algorithms for persona generation is discussed in the following subsections. A technical description of each algorithm and their implementation for this study is given in Appendix A1. The focus of this study is not on the technical traits of these algorithms but on their implications for APs and user segmentation in general.

Clustering

Tanenbaum et al³³ utilized CA (k-means) to develop personas of diabetes patients and gauge patients' readiness for adopting different medical interventions. Validation was done by calculating the Euclidean distance between the different variables and conducting Chi-squared tests. Wang et al³⁴ also calculated Euclidean distances of different medical and demographic variables for their analysis of regional health data. A few articles qualitatively validated clusters by engaging subject experts as well as users themselves in reviewing the cluster results.^23,35,36

These individuals were tasked with assessing how representative the generated clusters were of real-life scenarios. An et al^37,38 applied CA (k-means) for persona generation, and they observed that using individual-level data is expensive to collect and has concerns regarding privacy. However, the researchers did not address what kind of personas different algorithms would produce.

Kwak et al,³⁹ using CA (k-means), found the limitation that a single demographic group must fall into one persona. In contrast, various personas can be found from one demographic group, as people in the same demographic group often behave differently. A potential issue of CA is the “need for specialists to use expert judgment during clustering [to define hyperparameters].”^21(p.19) However, this issue concerns other algorithms, too, as persona developers typically need to set the number of personas as part of the process.

Miaskiewicz et al⁴⁰ and Mesgari et al⁴¹ applied hierarchical CA to develop clusters (and ultimately personas) of university members' experiences with learning management and institutional knowledge systems. Both studies validated their results by looking at the relations between variables within clusters. The former calculated cosine similarity (of the angles between pairs of non-zero vectors), whereas the latter calculated Pearson correlation (the extent of a linear relationship between two variables).

Holden et al⁴² used hierarchical CA to analyze the medical and psycho-social variables of older adults with heart failure. Results were validated using the Kruskal–Wallis test and Welch's analysis of variance (ANOVA) to determine statistical significance between the variables as well as testing for variance, respectively. However, the researchers did not address what kind of personas different algorithms would produce.

Principal component analysis

The PCA was used for persona generation by Sinha,⁴³ who created personas based on users' characteristics. Wang et al³⁴ used PCA in combination with CA to develop health personas of regional groups. Similarly, Brickey et al¹⁸ used PCA in combination with LSA and CA to develop personas for users of an army knowledge management system. As such, the combination of this method with others is prevalent. In fact, all of the found studies that applied PCA complemented it with at least one other quantitative method.

As a result, validation metrics also varied and included Cohen's kappa (a statistical measure of interrater agreement of generated vs. expert clusters),¹⁸ Euclidean distances of different variables,³⁴ Spearman's $ρ$ (direct association between two ranked variables),⁴⁴ and a qualitative review with survey participants.⁴⁵ However, the researchers did not address what kind of personas different algorithms would produce.

Latent semantic analysis

Apart from the aforementioned studies,^18,40,46 Dupree et al⁴⁷ used LSA to investigate attitudes toward digital privacy and cybersecurity among university students, and they recruited a separate population (different from initial survey participants) to review the developed personas.⁴⁷ The evaluation focused on tasking individuals with self-identifying with one of the five personas and providing feedback on how realistic they were. However, the researchers did not address what kind of personas different algorithms would produce.

Non-negative matrix factorization

An et al^48,49 used NMF to decompose an interaction matrix constructed from the view counts of an organization's social media content. They obtained latent content consumption patterns, associating each distinctive behavior pattern with demographics of users (i.e., age, gender, country) by a weight assessment that encodes the strength of the relationship between the demographic groups and the underlying pattern. The demographic group with the highest NMF weight was chosen as the representative persona demographics for its corresponding behavioral video-viewing pattern.

The demographic group was then augmented with a name, occupation, photo, and other characteristics, yielding complete persona profiles. Similar approach was applied in other studies from the same research team,^17,50–54 with the general goal of developing personas from social media user statistics. However, the researchers did not address what kind of personas different algorithms would produce. For evaluation, rankings of the generated demographics groups in these studies were compared with true rankings based on real content engagements using Kendall rank correlation coefficient.⁴⁹

Researchers identified discriminative content for personas (i.e., content that a persona has a higher chance of engaging with compared to other personas) using a Chi-square test.³⁸ Also, cosine similarity has been applied to calculate among pairs of personas until the closest pairs are determined.⁴⁸ Further, Salminen et al⁵⁴ used qualitative data of social media users in a geographical region in the forms of Instagram profiles and semi-structured interviews to create what they term as “hybrid personas” (the core algorithm being NMF).

Qualitative research was used to enrich further and improve their hybrid personas. However, the researchers did not address what kind of personas different algorithms would produce.

Latent Dirichlet allocation

The LDA was utilized in two studies^49,50 to understand the viewing behavior of different demographic groups and develop user personas for a YouTube channel. In the two studies, the authors built LDA topic models to construct matrices (in combination with NMF). Dhakad et al⁵⁵ developed buyer personas from click logs on an e-commerce portal. They employed LDA to model persona preferences for different occasions by sampling fashion styles and relevant fashion items across online shoppers' activity. Further, Smith and Nayar⁵⁶ used LDA to develop gaming personas based on controller input data from video game analytics systems. However, the researchers did not address what kind of personas different algorithms would produce.

Methodology

Research design

Conceptually, our research methodology uses different algorithms on the same dataset to produce sets of 5, 10, and 15 segments that are the bases for the personas (common numbers used in persona research^48,49). We complete the persona generation using a standardized approach to fully generate persona profiles with a name, picture, age, gender, country, interests, and other information. We then compare these sets of personas using three quantitative metrics (see the Evaluation Metrics section) to examine how the generated personas differ by algorithm. The process is as follows, with a technical approach mentioned in parentheses:

The experiment set-up is as follows: each algorithm processes the same data using the same number of segments (N = {5, 10, 15}), and we then enrich these segments using the same algorithmic process of assigning demographics and other information. In total, there are $3 \times 6 = 18$ persona sets (3 number levels and 6 algorithms). Using three different levels for the number of personas enables us to analyze the consistency of the results.

Note that the same enrichment process is employed, producing 5, 10, and 15 personas for each algorithm. Fifteen personas is more than the conventional number of “less than 10 personas” widely cited in the HCI literature.^1,3,57,58 In this study, we consider the higher number of personas sensible, as many organizations deal with large and diverse online audiences that cannot be captured in a handful of personas. The outcome of the data collection and application of the algorithms is a collection of datasets representing the main segments or dimensions in the data.

Data collection and pre-processing

We partner with an international news and media organization with 5.08 M subscribers at the time of writing (April 2020) to collect a dataset that contains both behavioral (i.e., what videos were viewed and how many times) and demographic statistics about users. We collect the data from the organization's YouTube Channel by leveraging the YouTube Analytics Application Programming Interface.^‡ The dataset contains 363M views for 12.3K videos published between March 2007 and December 2019.

The justification for the dataset in terms of helping us achieve our research goals relies on the following rationale: (a) the dataset is large (typical for online user data), (b) its structure is typical for Web analytics platforms (e.g., Google Analytics and Facebook Insights provide the same output), and (c) its analysis extends beyond what can be done manually, requiring algorithmic processing to build robust personas. Also, (d) this dataset is typical for many large online businesses that generate much content or have many products/pages to offer.

The distributions of view counts by gender and age are shown in Figure 3. Geographic distribution is shown in Figure 3c.

FIG. 3.

Distribution of video view counts by (a) gender, (b) age, and country (c) in the baseline data. Distribution of video view counts by country and region (the smaller plot shows Top-20). The distribution indicates a highly imbalanced dataset, typical for online user data.⁸⁵ This is also observed from descriptive statistics: there are 185 countries and regions; on average, a country or region has 1.97M views but the standard deviation is 9.06M (4.6 times the average).

After collecting the user data from an online analytics platform, the data are transformed into an interaction matrix that captures the engagement between user groups (rows) and the content items (columns). In this dataset, the content indicates online videos, but depending on the dataset, the content can be webpages, e-commerce products, flight destinations, or other entities of interest.

The cell values in the interaction matrix indicate the number of interactions that a given demographic group (row) has for a given content (column). The demographic groups come from the online analytics and social media analytics platforms that use this grouping to aggregate user data and to protect the privacy of individual users. The age buckets used by these platforms—for example, YouTube Analytics, Google Analytics, and Facebook Insights—include 13–17, 18–24, 25–34, 35–44, 45–54, 55–64, and 65+. These groups are used to set up the interaction matrix.

For example, the demographic group “Finnish, 35–44, Male” can have 1200 views for “Video ABC.” Thus, the values of the matrix are counts (15, 4000, 55,867, …), always either positive or zero. Many demographic groups typically have zero values for a given content, but this sparsity depends on the dataset. In our data matrix, 98.764% of the values are zeros, indicating high sparsity (the more content and demographic groups there are, the higher the sparsity tends to be because not all groups would be interested in all content).

Figure 4 provides a more formal explanation using the example of the NMF algorithm. In Figure 4, $V$ indicates the $g$ $\times$ $c$ matrix of $g$ user groups and $c$ online contents. The element of $V$ , $V_{i j}$ , is any number that reflects the user group G_i's interest or engagement or intent toward content C_j. In the case of Google Analytics, $V_{i j}$ is typically a session for a given webpage, C_j from user group G_i. The NMF decomposes $V$ to two lower dimensionality matrices, $W (d e m o g r a p h i c s)$ and $H (c o n t e n t s)$ , both of which are defined by $p$ latent patterns⁴⁹; $p$ being a hyperparameter that indicates the number of personas generated. More details about NMF can be found in the seminal work by Lee and Seung.⁵⁹

FIG. 4.

Automatic persona generation. A user segmentation algorithm derives latent patterns from user data (left-hand side of the illustration). The APG then enriches these latent patterns with personified information such as name, picture, demographics, and so on (right-hand side of the illustration).

Each algorithm independently processes this interaction matrix, finding p patterns (clusters, segments, components), where p = {5, 10, 15}. In this case, because the content is online videos, the segmentation aims at preserving information on the video viewing behaviors of different demographic groups. These p segments become p personas when they are enriched by personified information (e.g., name, picture, topics of interest) to create the final personas.

Personification and enrichment are standard procedures for persona generation^6,21,60; without it, the segments would remain as nameless and faceless user representations—the general benefit of personification is that human attributes increase stakeholders' empathy toward the segment that the persona represents,⁶¹ whereas enrichment provides a more rounded, detailed information about the persona.²

Persona generation

For the enrichment, we use automatic persona generation^§ (APG), a system for APG. This system has numerous advantages, including standardization. What this means in our experiments is that all outputs by the algorithms undergo the same enrichment process that involves no manual intervention. For example, each piece of content is topically classified (explained in An et al⁴⁹), providing topics of interest for each persona.

Based on the outputs, APG chooses a representative demographic group for each latent pattern and enriches this demographic group with personified and other information (picture, name, topics of interest, quotes, etc.) to create a complete persona profile (Fig. 4). The result is a distinct set of personas based on behavioral and demographic attributes of the user population. Note that APG's procedure for assigning the demographic group is identical and deterministic—with the given data and algorithm, a set of p personas will always have the same age, gender, and country when using APG for persona generation. As the only variable that changes in the process of our experiment is the algorithm, the differences in the generated personas stem from the algorithms. For a more detailed explanation and validation of APG, see An et al.^48,49

Choice and implementation of algorithms

The algorithms identified for persona generation in this study are described in Table 2. Out of the chosen algorithms, CA, PCA, NMF, and LDA have previously been applied for persona generation. The LSA was not selected because it is a type of matrix factorization algorithm (using singular value decomposition), and we already selected NMF for testing. Because we also wanted to test new methods for persona generation, we selected two novel (as in previously not applied for APs) algorithms: Uniform Manifold Approximation and Projection (UMAP) and Spectral Embedding (SE).

Table 2.

Properties of the chosen algorithms

	Sensitive to imbalanced data	Linearity	Compute complexity	Interpretability	No. of hyper parameters	Topology preservation	Impact of sparsity	Global/local	Impact of dimensionality
CA	No	L	High	High	3	No	Medium	Global	High
PCA	Yes	L	Low	Low	1	Yes	Low	Global	Low
NMF	Yes	L	Medium	Medium	1	No	Medium	Global	Low
LDA	No	L	Medium	Low	1	No	High	Global	Medium
UMAP	No	N	Medium	High	1	Yes	Low	Local	Low
SE	Yes	N	High	Medium	2	Yes	Low	Local	Low

Topology Preservation means whether the algorithm preserves the structure of the data: if two items are neighbors in the high dimension, will they also be neighbors in the low dimension. Global/Local: whether the algorithm tries to preserve global structure or local interactions. Usually, linear algorithms focus on global and non-linear ones on local. Impact of dimensionality: what the impact of the original dimension size is. For example, PCA and NMF work well for millions of dimensions, whereas CA quickly becomes inoperable.

L, linear; N, non-linear; SE, Spectral Embedding; UMAP, Uniform Manifold Approximation and Projection.

Given our analysis of existing algorithms, these two seem like next logical algorithms for persona generation to take on. Thus, we ended up with six algorithms to test. The chosen algorithms support the goals of this study for multiple reasons: they (a) represent the most often used algorithms in persona research, so comparing them is relevant; (b) are standard approaches in computer science, which affords implementation and replicability in future studies; (c) involve variability (Table 2), which means that the comparison is likely to result in meaningful differences (e.g., in terms of linear and non-linear algorithms); and (d) are readily available in software packages and data science libraries, which facilitates their deployment in persona generation projects in practice. The specific implementation we use for each algorithm is explained in Appendix A1.

Evaluation metrics

Reasoning

The generated persona sets are compared by three metrics, explained in the following subsections. The metrics that we propose for this problem are vital for data-driven personas and customer segmentation in general, because they address non-commercial and ethical aspects of segmentation efforts, areas that are lacking attention in these problem domains but that are broadly acknowledged as important within the computer science community.^62,63,64

Diversity matters as a user segmentation goal, because the segments should represent demographically diverse groups of people. If they do not, decision makers may end up receiving information about only select groups and thus ignore the needs and wants of other groups, posing disadvantage to those groups. As put by Drosou et al,^31(p.73) “diversity [is] an important component of a data-responsible society.”

Fairness is understood in terms of equity (“equity is defined as the quality of being fair and impartial**”) so that the demographic segment's probability of being included in a finite number of segments shown to decision makers should reflect the segment's share of voice (i.e., representativeness, size, importance) in the baseline dataset, meaning the dataset that the user segments are created from. If a segment is highly prevalent in the baseline data but hardly visible among the segments, that would not be a fair (or equitable) representation of the data.

Consistency matters because one would expect that if making changes to the algorithm's parameters, such as the number of segments created, on different runs of the algorithm the same or similar segments should be identified by the algorithm. If the algorithm instead identifies very different segments at each run, it is behaving in an unstable or random way and its outputs should be less trusted than those of a more consistent algorithm.

Regarding the interpretation of the obtained scores, high diversity indicates that a lot of different demographic groups are represented in the generated personas (segments). High fairness indicates that the generated personas (segments) correspond well with the most engaged users in the source data. High consistency indicates that the personas (segments) generated using a smaller number of personas (segments) appear also when changing the hyperparameter to higher number of personas (segments), thereby indicating higher reliability that these particular segments are important—or otherwise put, that the algorithm is not randomly selecting the segments.

Diversity

We use the count of unique attributes for comparing diversity. The count of unique attributes (D) is the number of persona attribute values (age groups, genders, countries) present in each persona set. For 15 personas, a value such as $D_{15}$ = 16 would indicate that the algorithm designed a set of personas with 16 unique demographic attributes, for example, 2 genders, 5 age groups, and 9 countries (D = 2 + 5 + 9 = 16). Note that D can be computed by considering all three demographic attributes and each demographic attribute separately.

For example, $D_{a g e}$ = 5 means the persona set contains personas from five unique age groups. D affords a straightforward interpretation of diversity between the persona sets. For example, if Persona Set A has personas from three age groups and Persona Set B has personas from six age groups, the latter is considered to be (6–3)/3 = 100% more diverse (in terms of age) than the former.

Fairness

We use statistical parity (SP) for measuring fairness. Fairness assessments in machine learning tend to focus on prediction or classification.^62,65 However, persona generation is not a classification task, but an unsupervised learning task. However, we can apply the existing principles from computing studies developing tools for fairness assessment. For example, Dwork et al⁶² propose that individual fairness is defined such that similar individuals are treated similarly. In its more elementary interpretation, this implies that a member i in Group A that has the same characteristics (e.g., race, gender) as a member j in Group B will have an equal probability of succeeding (e.g., being chosen for a job).

A classic example of analyzing fairness is using personal attributes such as gender or race to predict whether a person is rich or not (e.g., 50k+ annual salary or not). In the case of personas or customer segmentation in general, the analogy of demographic groups being significant is the question, “what is the probability of this demographic class being selected by the algorithm among the generated personas?.”

There are (at least) two ways to approach this issue⁶⁴: equality and equity. Equality would translate to any demographic group having the same expected probability of being included in the generated persona set. An equity-based approach would translate to some groups having a higher expected probability of being included in the personas, because of their special needs or other factors. In our setting, we apply the equity-based approach, and the “other factor” is the demographic group's share of the engagements in the baseline data.

This means that demographic segments with more views are expected to have a higher chance of appearing in the persona set—this is fair because it corresponds to a truthful representation of the user population. Although fairness criteria are always subject to some degree of relativism, for the objective of finding the bias of algorithms, an essential question is “If 30% of the total views come from the US, then is it fair to say that 30% of the personas in the persona set should be from the US?.” If this statement is fair, then SP is an appropriate fairness indicator.⁶²

Therefore, we calculate SP as the difference between two values: $S P_{i} = \frac{p_{i}}{P} - \frac{n_{i}}{N},$ (1)

where SP for a given demographic attribute i (e.g., p = “Male”) is its fraction in the persona set P divided by the corresponding fraction of that demographic group's engagement counts (n) from the total N engagements. For example, if five personas are from the United States in a persona set of 10, then 50% of the personas are from the United States. Given that 30% of the views in the original data are from the United States, the value will be SP = 0.5 – 0.3 = 0.2. The total SP is calculated by taking an average across all demographic attribute values.

Consistency

The notion of consistency matters, because some researchers have found personas to be abstract and inconsistent.⁶⁶ For example, if the personas included in a 5-persona set would be very different from the 10-persona set, this would call into question the validity of the method. To evaluate how consistent the generated personas are, we computed a Consistency Score (CS) for each algorithm. For this, we take the demographic groups an algorithm generates in the 5 persona set, and we compare how many are the same as in the 10 persona set; then, comparing the 10-persona set with 15 personas.

For example, if all five demographic groups are in the 10 persona set, the score is 1.00. We carry out this calculation three times: comparing 5 personas with 10 personas, 10 personas with 15 personas, and 5 personas with 15 personas. In other words, we also calculate how many demographic groups from the 10 persona set are in the 15 persona set, and how many from the 5 persona set are in the 15 persona set. Thus, we end up with three fractions (e.g., 1.00, 0.80, 0.20)—their average is the final CS.

Using the number of personas is appropriate here since the number of segments is the major hyperparameter shared by all the unsupervised algorithms tested here (hyperparameter refers to an external value of a given parameter that is set by the research as opposed to being internally optimized by the algorithm itself).

Formally, the CS used in our study can be expressed as follows. Let us have three persona sets, denoted Set A, Set B, and Set C. The number of personas in Set A is n_A. The number of personas in Set B is n_B. The number of personas in Set C is n_C. Without loss of generality, we set $n_{A} < = n_{B} < = n_{C}$ . Comparing the three sets will result in a sub-CS for three combinations: Set A—Set B, Set A—Set C, and Set B—Set C. That means, we need to choose 2 in 3 sets each time for comparison of consistency. Thus, there are $(\begin{matrix} 3 \\ 2 \end{matrix})$ combinations = $\frac{3!}{2! 1!}$ = 3 combinations.

Then, the CS for each algorithm is calculated as follows:

where $P_{A i}$ denotes the Persona i of Set A and ${P_{B}}$ denotes all the personas of Set B. $_{P_{A i} \in {P_{B}}}$ is the indicator function, which is:

Note that, to obtain each fraction, we divide by the lower persona set. For example, when comparing the 5 persona set and the 10 persona set, if there are five matches (which is the maximum possible), we divide by 5 (not 10). The maximum of CS is, therefore, 5/5 + 5/5 + 10/10 = 3, and 3/3 = 1. Also, note that we consider a demographic group match only once. For example, if “Male, 65+, USA” appears once in 5 persona set and it appears three times in the 15 persona set, we count one match, as we consider that group represented at least once.

The above formula shows the special case of the CS metric for our study; the general case of the CS is provided in Appendix A2.

Results

Diversity

The diversity results are shown in Table 3. Results from a two-factor repeated-measures ANOVA show that the algorithms significantly differ by their D values, F(5, 10) = 12.49, p < 0.001. The algorithms with the highest D values tend to be LDA, UMAP, and SE. A post hoc analysis (Welch's t-test) indicates that algorithms in Group 1: LDA, UMAP, and SE generate significantly more unique persona attributes (M = 16) than algorithms in Group 2: CA, PCA, NMF (M = 11), t(13.79) = −2.45, p = 0.028. The observed effect size (d = 1.16) indicates that the magnitude of the difference between the groups is large.

Table 3.

Diversity results

	CA	PCA	NMF	LDA	UMAP	SE
5 Personas
D_age	4	2	3	4	3	4
D_gender	1	2	2	2	2	2
D_country	2	3	2	5	4	4
D_total	7	7	7	11	9	10
10 Personas
D_age	5	3	3	5	6	5
D_gender	2	2	2	2	2	2
D_country	3	6	6	10	9	8
D_total	10	11	11	17	17	15
15 Personas
D_age	5	3	3	6	6	6
D_gender	2	2	2	1	2	2
D_country	5	10	11	15	14	13
D_total	12	15	16	22	22	21

Higher is better. For example, the value of 2 for CA, 15 personas (first column, third row from the bottom) indicates that among the 15 personas, CA generated personas from two different age groups. “Total” indicates the sum of unique age groups, genders, and countries. Highest total values of diversity bolded.

For the five personas set, LDA produces personas with 57.1% more unique demographic attributes than CA, PCA, and NMF. For the 10 personas set, LDA and UMAP produce personas with 70.0% more unique demographic attributes than CA and 54.5% more than PCA and NMF. For the 15 personas set, LDA and UMAP produce personas with 83.3% more unique demographic attributes than CA, 46.7% more than PCA, and 37.5% more than NMF.

In terms of age, two rare age groups are the youngest (13–17) and the oldest (65+) age group. The age group 13–17 appears in five persona sets (LDA₅, LDA₁₀, UMAP₅, UMAP₁₀, and UMAP₁₅), whereas the age group 65+ only appears in three persona sets (UMAP₁₀, UMAP₁₅, and SE₁₅). An example of a persona from this age group is shown in Figure 5c. Although the age groups of these personas are less common, the countries of the generated personas tend to belong to the Top-10 countries in the baseline data, with the curious exception of Antigua and Barbuda. Interestingly, none of the persona sets contain personas from all age groups, implying that more personas beyond the number of 15 are needed to cover all age groups in the data.

FIG. 5.

“Three personas you would otherwise not see.” Among the examples of personas with unique demographics, there is only one persona from Ireland (a), only one from Finland (b), and only one from the age group of 65+ (c). The rarer demographics emerge only with certain algorithms when increasing the number of persona in a set. (a) Conor—(SE₁₅), (b) Ville (UMAP₁₅), (c) Sarah (SE₁₅). SE, Spectral Embedding.

All the algorithms generate personas from both genders, apart from two cases, in which none of the personas generated by the algorithm were female: CA₅, and LDA₁₅ (Table 3).

The countries of the personas show an interesting finding in that some of the algorithms generate personas that also represent fringe geographics, that is, countries that have a very low proportion of views in the baseline data. LDA, UMAP, and SE account for most of the marginalized personas, examples shown in Figure 5. For example, view counts from users from Finland are only 0.145% of the total view counts; however, UMAP₁₅ generates as Finnish persona (Fig. 5b).

Fairness

Fairness results are shown in Table 4. A two-factor repeated-measures ANOVA shows that the algorithms significantly differ by their SP scores, F(5, 10) = 8.21, p = 0.003. The lowest SP scores (reversely, the highest fairness) tend to be among CA, PCA, and NMF. Similarly, the persona sets significantly differ by their SP scores, F(2, 10) = 19.28, p < 0.001. This indicates that the number of personas affects the diversity scores. However, unlike in the case of diversity, the best fairness values are obtained with 10 personas. As with D, the post hoc analysis indicates two distinct groups emerging from the fairness results.

Table 4.

Statistical parity scores

	CA	PCA	NMF	LDA	UMAP	SE
5 Personas
SP_age	0.04	0.11	0.05	0.19	0.12	0.14
SP_gender	0.14	0.06	0.06	0.26	0.26	0.26
SP_country	0.13	0.10	0.13	0.08	0.11	0.08
10 Personas
SP_age	0.03	0.08	0.08	0.13	0.11	0.09
SP_gender	0.04	0.04	0.04	0.06	0.06	0.06
SP_country	0.10	0.06	0.06	0.10	0.07	0.07
15 Personas
SP_age	0.05	0.09	0.10	0.10	0.07	0.09
SP_gender	0.01	0.08	0.08	0.14	0.19	0.12
SP_country	0.07	0.04	0.02	0.08	0.08	0.08

Lower is better (closer to the baseline data). The lowest numbers bolded. The values were obtained as follows: (a) First, we converted all the negative SP values to their absolute values. Then, (b) we calculated the mean of SP for each algorithm's each persona set. This was repeated (c) for all the attribute values in a demographic category. For example, age values in the table are averages of all seven age groups, indicating how well the given algorithm represents the age group distribution in the baseline data.

SP, statistical parity.

Group 1: CA, PCA, and NMF has significantly smaller SP scores (M = 0.070) than Group 2: LDA, UMAP, and SE (M = 0.118), t(11.84) = −3.475, p < 0.005. The observed standardized effect size is large (d = 1.64). Note that smaller SP score indicates higher fairness (the lower the value, the closer the persona attributes are to the baseline data).

Consistency

The results of CS indicate that most of the tested algorithms produce consistent personas according to our definition of the persona demographic groups not changing when changing the number of personas generated. Four algorithms achieve the perfect score of CS = 1.0: CA, PCA, NMF, and SE. The two remaining ones behave more inconsistently, with UMAP (CS = 0.40) scoring higher than LDA (CS = 0.10). A perfect consistency implies that the same personas that were part of the smaller number of personas are part of the larger number of personas as well.

The fact that all the algorithms, except LDA and UMAP, rank perfectly on CS implies that the algorithms behave consistently but differently—that is, an algorithm generates the same personas in 10 and 15 persona sets than in the 5 persona set, but the personas differ by the algorithm. Each of the algorithms tries to identify key patterns but have different definitions of importance. Thus, the generated personas tend to be different (Table 5).

Table 5.

The sets of five personas exemplifies how the personas are different, even when generating a small number

	Persona 1	Persona 2	Persona 3	Persona 4	Persona 5
CA	M 25–34 United States	M 25–34 India	M 35–44 United States	M 18–24 India	M 45–54 United States
PCA	M 25–34 United States	M 25–34 India	M 18–24 United States	F 25–34 United States	M 25–34 Philippines
NMF	M 25–34 United States	M 25–34 India	M 35–44 United States	M 18–24 United States	F 25–34 United States
LDA	M 35–44 South Africa	M 13–17 India	F 13–17 United States	F 55–64 Malaysia	M 45–54 Serbia
SE	M 45–54 United States	M 55–64 United States	F 25–34 Singapore	M 55–64 Namibia	F 35–44 Grenada
UMAP	M 35–44 Brazil	M 35–44 Cote d'Ivoire	M 25–34 Venezuela	F 25–34 Venezuela	F 13–17 Dominican Republic

Personas appearing at least two times are bolded.

F, female; M, male.

We can further quantify how different the personas are by computing the Jaccard coefficient (J) for each pair of algorithms. J indicates the intersection over the union of two persona sets A and B designed by two different algorithms, which can be interpreted as a similarity of persona attributes. Here, J compares the similarity of sets of personas that are defined by age-gender-country, and it indicates the intersection over the union of two persona sets A and B generated by two different algorithms. J is equal to 1 if the sets are the same and 0 if they are completely different. The results in Figure 6 show that the personas outputted by different algorithms differ substantially, showing a clear clustering among CA, PCA, and NMF.

FIG. 6.

Pairwise Jaccard coefficient values for personas generated from the data. The values indicate the overlap of the personas in terms of age, gender, and country. The circles illustrate the tendency of CA, PCA, and NMF to generate similar personas across the different generations. In contrast, the behavior of LDA, UMAP, and SE is more sporadic. (a) 5 personas, (b) 10 personas, and (c) 15 personas. CA, cluster analysis; LDA, Latent Dirichlet allocation; NMF, non-negative matrix factorization; PCA, principal component analysis.

The inconsistency of the UMAP and LDA can partially explain the high D scores of these algorithms. In other words, the inconsistency of LDA and UMAP is because they choose novel demographic groups when generating the persona segments. This proposition is supported by the observed strong negative relationship between CS and D ( $ρ$ = −0.823), which implies diversity-consistency trade-off.

This trade-off is defined as follows: If optimizing for DFC decrease. Conversely, if optimizing for fairness or consistency, there will be less diversity in the personas. This trade-off implies that DFC are conflicting design goals for AP generation, at least when increasing the number of personas. We can tackle this trade-off by taking the average rank of the algorithms and the DFC metrics to assign a composite rank score for each algorithm. When doing so, SE appears as the most “balanced” algorithm (Table 6), followed by NMF. CA, while being the most commonly used algorithm in AP research, ranks the lowest on this composite comparison.

Table 6.

Unweighted (left-hand side) and weighted (right-hand side) rankings of the algorithms based on average rank by Diversity, Fairness, Consistency criteria

Algorithm	Rank (MRS) unweighted	Algorithm	Rank (MRS) weighted
SE	1 (2.50)	NMF	1 (0.83)
NMF	2 (2.83)	PCA	2.5 (0.90)
PCA	3 (3.17)	SE	2.5 (0.90)
LDA	4 (4.08)	CA	4 (1.37)
CA	5 (4.17)	LDA	5 (1.48)
UMAP	6 (4.25)	UMAP	6 (1.52)

Bold indicate changing of the “best” algorithm when applying weights. Weights provide a simple technique for persona creators to adjust the diversity, fairness, consistency criteria according to their design goals.

Depending on the use case, persona developers may want to prioritize certain design goals, such as diversity over fairness (or vice versa). For this, the computations can be further developed by introducing three parameters: Diversity Penalty ( $α$ ), Fairness Penalty ( $β$ ), and Consistency Penalty ( $γ$ ). Let us say that for a particular use case, the persona developers want to increase fairness, but at the same time, they want to main a high diversity and consistency of the personas. Thus, they consider all metrics important, but fairness three times as important as the two others. Accordingly, they set $α$ and $γ$ to 0.20, and $β$ to 0.60, such that the $M R S_{i} = R_{D, i} \times 0.20 + R_{S P, i} \times 0.60 + R_{C S, i} \times 0.20,$ (4)

where the Mean Rank Score MRS of algorithm i is calculated as a weighted sum of Rank R of i for each metric.

Using these penalty parameters to compute the MRS, the ranking of the algorithms is now in favor of NMF, with SE falling a shared second position with PCA (Table 6). Note that these parameters are presented as examples only; future work should conduct a proper sensitivity analysis. Nonetheless, considering the DFC design goals as “weights” for the algorithms is intuitive.

Discussion

Research contribution

As far as we know, this is the most extensive study to experiment with different algorithms for persona generation to date. Overall, our results suggest that the tested algorithms can be categorized into two groups: (a) those with low diversity and high fairness (CA, PCA, NMF), and (b) those with high diversity and low fairness (LDA, UMAP, and SE). This relationship is supported by the strong negative correlation (Spearman's $ρ$ = −0.83) between the diversity and fairness rankings of the algorithms. Further, the results indicate that the highest diversity is consistently achieved with 15 personas rather than with 10 or 5 personas.

The same cannot be said for fairness; even though the number of personas has a statistically significant effect on SP, the best average performance is obtained with 10 personas. These findings have several implications. First, concerning AP generation, our findings expand the AP work by Chapman et al²⁶ and Chapman and Milham²⁷ and Brickey et al.^18,46 In regards to the former, we provide quantitative metrics for persona evaluation²⁷ and evaluate demographic variables rather than coverage or prevalence.²⁶ Our study addresses the lack of standardized metrics for persona evaluation²⁴ by using three relevant metrics to assess persona generation outcomes.

We also confirm the findings that different algorithms tend to “disagree”¹⁸—that is, design different personas from the same baseline data. Although Brickey et al¹⁸ tested two algorithms (CA, PCA), our study considers these two and four additional algorithms. Also, Brickey et al¹⁸ used inter-rater agreement (Cohen's $κ$ ) for evaluation, whereas we used metrics specifically tailored for persona generation goals.

In terms of findings, it is interesting that the three methods that maximize diversity—LDA, UMAP, and SE—are also the three methods most rarely applied in the persona literature (Table 1). No previous study uses SE and UMAP to generate APs. The LDA has been previously used,^48,49 but CA, PCA, and NMF are dominant methods in persona generation. Our findings imply that the research and practice of persona generation (and user segmentation) benefits from experimentation with novel algorithms, as these novel approaches can result in fairer or more diverse persona sets.

Technically oriented researchers often see the proclaimed objectivity of algorithms as an advantage relative to manual persona generation and user segmentation.^5,6,15,36 However, our findings suggest that it is possible that algorithmically created segments might not be any more diverse, fair, or consistent than those created completely by humans using manual means. Instead, APs can also be subject to demographic biases that may originate from multiple sources, such as data distributions, the way algorithms process the data (in the mathematical sense), or from the assigned hyperparameters.⁶³

To this end, the use of the DFC metrics shifts persona evaluation away from the traditional technical metrics (e.g., perplexity, accuracy, loss, error, etc.) toward evaluating the outputs of persona generation in terms of the kind of personas related to the algorithm design. The metrics we use support the design goals of diverse, fair, and consistent personas, taking a step toward ethically robust APs and user segments that portray the diversity of the user base in an accurate manner.²⁸ The connection of persona generation and user segmentation to algorithmic fairness is an important contribution that should be further expanded upon in computational studies dealing with customer segmentation using big data.

Hence, the results suggest there is a need for discussion around algorithmic bias in customer segmentation literature. As we suggest, these concerns can be addressed and awareness to them created by leveraging new metrics inspired by the ongoing fairness/bias discussion in the computing community.

Concerning the real-world impact of our findings, there are crucial observations to be made about the large impact that the choice of one algorithm over another has on the composition of the generated personas or user segments. Given that customer segmentation permeates almost every organization on the planet, there is a crucial need for awareness on how a simple change of algorithm can drastically alter the outcomes obtained from the same customer data. Because firms and decision makers are not looking to offer “everything for everyone” but instead tailor their offerings based on segments, whether using personified segments (i.e., personas) or some other types of segments, the impact of the chosen algorithm seems to be not trivial but drastic.

This observation puts pressure on organizations from two sides: first, (a) which persona generation or customer segmentation algorithm should they choose for a given situation? Second, (b) given the obscure behavior of algorithms for persona generation and customer segmentation, is their use dangerous and potentially misleading? Should new segmentation techniques be developed from scratch? Because of these fundamental questions, we expect this study not to be an isolated incidence, but part of a larger research agenda on improving AP generation and customer segmentation. Although fairness frameworks and acronyms such as Fairness, Accountability, and Transparency, Findable, Accessible, Interoperable, and Reusable,⁶⁷ Fairness, Accountability, Transparency, Ethics, Safety and Security,⁶⁸ and Equity, Accountability, Trust, and Explainability⁶⁹ have been developed by the research community and industry actors to scrutinize the use of algorithmic decision making in many fields, the application of these frameworks or concepts in the domain of APs (or user segmentation in general) is lacking.

Hence, our study makes an important contribution to investigating fairness in the context of algorithmic user segmentation. Focusing on what type of personas are created is essential, because AP studies often assume that the use of algorithms and quantitative data prevents persona developers from injecting their biased interpretations into the created personas.¹⁵ However, if the use of algorithms would involve aspects of unfair, inaccurate, or inconsistent personas, this would present a major issue for the ethics of persona generation or other type of algorithmic user segmentation.

Algorithms as conveyors of partial truths about the user base

Our results imply that the choice of an algorithm has a fundamental impact on the personas generated. This is an important discovery since APs tend to have an air of objectivity, credibility, and truthfulness in the eyes of stakeholders.⁷⁰ Our findings imply that attributing these properties to data-driven user segmentation might not be justified at all times. More precisely, it seems impossible to argue that any of the applied algorithms captures the “truth” about the users. Instead, each algorithm focuses on certain facets of the user population.

Moreover, the complexity of the algorithms (from a mathematical point of view) typically makes it intractable to understand why a specific trait was chosen over another. This intractability concerns personas and all user segmentation efforts carried out using algorithms. The use of algorithms is always “biased” in the sense that different algorithms produce different outputs. Nevertheless, the use of algorithms is always “objective” because, given the same data and the same parameters, an algorithm always produces the same set of personas. Therefore, it is crucial to disentangle the concepts of truth and objectivity—they refer not to the same thing.

Despite this, researchers can define design goals and desiderata for APs. Perhaps even more so because there is no one perfect method for persona generation. In the absence of this perfect method, the focus should be on what kind of personas are being designed by different algorithms: are they diverse, are they fair, are they consistent?

The outcome of using the same data, but getting different results is a conundrum for the application of data science methods for persona generation and user segmentation in general. It not only stresses the “design power” that the algorithms have but also involves a more fundamental, perhaps unanswerable question of which algorithm correctly portrays the users. This question, associated with epistemological standpoints such as truthfulness and objectivity of algorithms when creating data-driven user segments, can be traced back to the discussion on the (im)possibility of scientific verifiability and falsifiability of personas and data-driven user segments.²⁷

Within the scope of this study, we are unable to provide definite answers in this regard. However, we express the concern that a precarious use of quantitative methods, coupled with stakeholders' overconfidence in algorithmic superiority due to the mystique involved with quantitative data and mathematical formulas,⁷⁰ can result in a disservice. In turn, broader awareness of there not necessarily being “one truth” about the user segments is likely to increase confusion among end-users of personas who the “real” personas are. Perhaps what is needed is to switch the argument for AP generation from one single objectivity to relative subjectivity: here is what an algorithm has to say about your users—but it is not the whole truth.

Practical implications

If all algorithms generate different personas, which one should a decision maker choose? An answer arises, on one hand, from preference and context for which the personas are developed (i.e., the purpose) and, on the other hand, from the intimate understanding of the nature of different algorithms when exposed to specific data (i.e., the know-how). Persona developers may choose to maximize DFC to generate persona sets that are the most applicable for their use case. Our results show that using different target metrics yields mixed results. The choice of the algorithm depends on the goal of persona generation. In particular,

To optimize for diversity, use LDA, UMAP, or SE.

To optimize for fairness, use CA or NMF.

To optimize for consistency, use CA, PCA, NMF, or SE.

When accounting for all the three criteria, use SE or NMF.

Our results show that different algorithms design different personas from the same user data. Thus, the practitioner's choice of algorithm ultimately results in different personas. This implies that the choice of the algorithm should not be taken lightly. More precisely, the practitioner faces two important choices: (a) the choice of the algorithm for persona generation, and (b) the choice of the hyperparameters for the selected algorithm. The ethical implication is that rather than hiding these choices under the parlance of “statistical,” “objective,” and “data,” transparency and discussion of the pros and cons of these choices should be undertaken by the wielders of the algorithms.

If one cannot explain it, one probably should not be using it. Moreover, an important guideline is to consider the goal of the customer segmentation or persona generation exercise in the first place—for example, if one seeks to get as varied understanding of the user population as possible, then using an algorithm that maximizes diversity would be beneficial. If, instead, one seeks to get a tight understanding of the most engaged segments, a fairness-based algorithm would be applicable. The paradigm of “here is our data—algorithm, please show our segments” needs to be revised to “here is our data AND our goal—algorithm, please show our segments.”

Implications for segmentation researchers

Concerning replicability and applicability, the fact that the dominant online platforms tend to output a data structure that is compatible with persona generation means that, by using this data and publicly available data science algorithms, anyone with access to data and necessary programming skills can generate personas. Thus, the practical implications of this study range across many industries and contexts, like the method of personas itself. For example, from YouTube Analytics, one can collect videos and their view counts; from Google Analytics, pages and their session counts; from Facebook Ads, the ads and their interaction metrics (views, clicks, purchases).

To advance the use of data-driven personas, we share our source code and data (with content IDs masked and view counts randomized for protecting business-sensitive information). Researchers and practitioners can obtain these resources via the code repository.^†† Sharing data, algorithms, and code is crucial for achieving progress within the user segmentation research and practice²⁴ and we hope that research contributes to setting an example of making computational materials and resources for persona generation and customer segmentation available to both research and practitioner communities.

Limitations and future research directions

There are several directions to pursue from our findings.

First, the properties of the algorithms most likely explain some of the results. For example, some algorithms may be more sensitive to imbalanced data. Although Table 2 provides descriptive information of the tested algorithms, we did not test how these properties of the algorithms affect the results. This is primarily because of the parsimonious experimental setting that focused on observing the effects of using different algorithms on the demographic composition of the generated personas. More work is required to establish explanations as to why the algorithms behave differently, but providing such explanations is beyond the scope of this work and thus left for future research.

Second, future research should also be directed toward a systematic understanding of how the dataset properties affect the results. These properties may include (a) prevalence of different user demographics (i.e., the number of rows), (b) distributions of engagement across those user attributes, (c) size of content (e.g., small organizations vs. big content producers), and (d) sparsity of the interaction matrix. The more datasets one would analyze, given they contain variation along with these properties, the better one would understand the relationship of data properties and the personas generated by the algorithms.

Third, another interesting question is whether personas generated by different algorithms could be more/less similar under a different parametrization. In this study, we kept the hyperparameters (mainly the number of personas generated) fixed to control the effect of number on the results, but future work could investigate how the manipulation of the algorithms' hyperparameters affects persona generation. We limited the persona sets to three since the main focus of the study was on the algorithms and not the number of personas.

We chose the numbers of personas for the sets (5, 10, 15) based on the fact that previous research tends to favor a relatively low number of personas in a set. Nevertheless, both of these parameters could be altered by (a) comparing more sets, and (b) increasing the number of personas beyond 15. Such extended analyses would help better understand the effect of the number hyperparameter on the APs.

Fourth, there is a grave need for explainability and interpretation of unsupervised algorithms such as the ones we deployed in our study—generally, this a challenge for the whole machine learning domain (see, e.g., Hasani et al⁷¹). As our findings show, due to the unpredictable nature of most algorithms, explaining their “thought process” of choosing the specific set of demographic segments should be scrutinized in dedicated studies.

Related to explainability and to the “disagreement” among the algorithms about what segments to highlight, there is a lingering question about the design of entirely new data-driven persona and customer segmentation algorithms. Here, using interactive and intelligent system functionalities alongside with computational techniques such as top-N picking and outlier detection can, we believe, yield results that are simpler and provide more meaningful and interpretable results for stakeholders than the currently used black-box algorithms.

Fifth, other algorithms beyond the ones we tested could be experimented with. We chose the specific algorithms based on their commonness in persona generation and customer segmentation, but more advanced or differently designed algorithms in the current body of computer science could deliver complementing results. The chosen algorithms also include derivative version such as constrained NMF⁷² that could be explored in future studies.

Finally, the evaluation of personas and customer segments is generally considered an ongoing research area with room for contribution.^27,73 Although we propose metrics to quantify “good personas” according to certain design goals, more quantitative metrics for persona evaluation could be devised, which remains an important goal for future research. User segmentation research could also investigate ways to incorporate the metrics directly into the algorithm's objective functions, rather than focusing on a post hoc analysis of the personas. New algorithms could make it possible for creators to specify their DFC targets before persona or segment generation.

Conclusion

Persona generation via algorithms is widely considered as objective in contrast to manual persona generation, but it is largely overlooked that different algorithms actually generate very different personas. Our results indicate two groups of algorithms that produce very different outcomes for persona generation: algorithms that generate personas with low diversity/high fairness and those that generate personas with high diversity/low fairness. Most algorithms produce consistent results independent of the number of personas.

Persona developers should take care when selecting an algorithm for persona generation (or user segmentation in general), as the algorithm's choice impacts the DFC of the personas. The fact that the algorithms create different personas from the same user data implies that algorithms have more influence in the user segmentation process than commonly understood.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

No funding was received for this article.

Abbreviations Used

Appendix

References

Cooper

The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity (2nd ed.). Pearson Higher Education: Indianapolis, IN, USA; 2004.

Nielsen

. Personas—User Focused Design (2nd ed. 2019 edition ed.). Springer: New York, NY, USA; 2019.

Nielsen

, Nielsen

, Stage

, et al. Going global with personas. In: Proceedings of the INTERACT 2013 conference (2013). Springer: Berlin, Heidelberg, Cape Town, South Africa; 2013; pp. 350–357.

Jenkinson

Beyond segmentation. J Target Meas Anal Mark, 1994; 3(1):60–72.

Salminen

, Jansen

, An

, et al. Are personas done? Evaluating their usefulness in the age of digital analytics. Persona Stud, 2018; 4(2):47–65; doi: 10.21153/psj2018vol4no2art737.

McGinn

, Kotamraju

. Data-driven persona development. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM: Florence, Italy;, 2008; pp. 1521–1524; doi: 10.1145/1357054.1357292.

Aoyama

Persona-and-scenario based requirements engineering for software embedded in digital consumer products. In: Proceedings of the 13th IEEE International Conference on Requirements Engineering (RE'05). Washington, DC, USA;, 2005; pp. 85–94; doi: 10.1109/RE.2005.50.

Aoyama

Persona-scenario-goal methodology for user-centered requirements engineering. In: Proceedings of the 15th IEEE International Requirements Engineering Conference (RE 2007). Delhi, India;, 2007; pp. 185–194; doi: 10.1109/RE.2007.50.

Clarke

MF.

The work of mad men that makes the methods of math men work: Practically occasioned segment design. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM: Seoul, Republic of Korea; 2015; pp. 3275–3284.

10.

Gonzalez De Heredia

, Goodman-Deane

, Waller

, et al. Personas for policy-making and healthcare design. In: Proceedings of International Design Conference DESIGN. 2018; vol. 6; pp. 2645–2656.

11.

LeRouge

, Ma

, Sneha

, et al. User profiles and personas in the design and development of consumer health technologies. Int J Med Inform, 2013; 82(11): e251–e268.

12.

Alaqra

, Wästlund

. Reciprocities or incentives?

Understanding privacy intrusion perspectives and sharing behaviors

. In: HCI for

Cybersecurity

, Privacy and Trust: Lecture Notes in Computer

Science

. ( Moallem

. ed.) Springer International Publishing: Cham; 2019; vol. 11594; pp. 355–370.

13.

Holmgard

, Green

, Liapis

, et al. Automated playtesting with procedural personas with evolved heuristics. IEEE Trans Games, 2018; 99:1; doi: 10.1109/TG.2018.2808198.

14.

Salminen

, Vahlo

, Koponen

, et al. Designing prototype player personas from a game preference survey. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI'20). Association for Computing Machinery: Honolulu, HI, USA; 2020; pp. 1–8.

15.

Mijač

, Jadrić

, Ćukušić

. The potential and issues in data-driven development of web personas. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (2018). 2018; pp. 1237–1242.

16.

Cichocki

, Zdunek

, Phan

, et al. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. John Wiley & Sons. Google-Books-ID: KaxssMiWgswC; 2009.

17.

Jung

S-G

, Salminen

, Jansen

. Personas changing over time: Analyzing variations of data-driven personas during a two-year period. In: Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (CHI EA'19). ACM: Glasgow, UK;, 2019; pp. LBW2714:1–LBW2714:6.

18.

Brickey

, Walczak

, Burgess

. Comparing semi-automated clustering methods for persona development. IEEE Trans Softw Eng, 2012; 38(3):537–546; doi: 10.1109/TSE.2011.60.

19.

Guo

, Binte Razikin

Anthropological user research: A data-driven approach to personas development. In: Proceedings of the Annual Meeting of the Australian Special Interest Group for Computer Human Interaction (OzCHI'15). ACM: New York, NY, USA; 2015; pp. 417–421.

20.

Hirskyj-Douglas

, Read

, Horton

. Animal personas: Representing dog stakeholders in interaction design. In: Proceedings of the 31st British Computer Society Human Computer Interaction Conference (HCI'17). BCS Learning & Development Ltd.: Swindon, UK;, 2017; pp. 37:1–37:13.

21.

Minichiello

, Hood

, Harkness

. Bringing user experience design to bear on STEM education: A narrative literature review. J STEM Educ Res, 2018; 1(1–2):7–33.

22.

Watanabe

, Washizaki

, Honda

, et al. ID3P: Iterative data-driven development of persona based on quantitative evaluation and revision. In: Proceedings of the 10th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE'17). IEEE Press: Piscataway, NJ, USA; 2017; pp. 49–55.

23.

Zhu

, Wang

, Carroll

. Creating persona skeletons from imbalanced datasets—A case study using U.S. Older Adults' Health Data. In: Proceedings of the 2019 on Designing Interactive Systems Conference—DIS'19. ACM Press: San Diego, CA, USA; 2019; pp. 61–70.

24.

Salminen

, Guan

, Jung

S-G

, et al. A literature review of quantitative persona creation. In: Proceedings of the ACM Conference of Human Factors in Computing Systems (CHI'20) (2020). ACM: Honolulu, HI, USA; 2020.

25.

Goodman-Deane

, Waller

, Demin

, et al. Evaluating inclusivity using quantitative personas. In: Design as a Catalyst for Change—DRS International Conference 2018. (Storni C, Leahy K, McMahon M, et al. eds.) 25–28 June, 2018, Limerick, Ireland.

26.

Chapman

, Love

, Milham

, et al. Quantitative evaluation of personas as information. In: Human Factors and Ergonomics Society 52nd Annual Meeting. 2008; pp. 1107–1111.

27.

Chapman

, Milham

. The Personas' New Clothes: Methodological and practical arguments against a popular method. In: Human Factors and Ergonomics Society Annual

Meeting

. 2006; vol. 50; pp. 634–636.

28.

Salminen

, Froneman

, Jung

S-G

, et al. The ethics of data-driven personas. In: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI'20). Association for Computing Machinery: Honolulu, HI, USA; 2020; pp. 1–9.

29.

Turner

, Turner

. Is stereotyping inevitable when designing with personas?. Design Stud, 2011; 32(1):30–44.

30.

Kuhn

TS.

The Structure of Scientific Revolutions. University of Chicago Press: Chicago, IL; 1970.

31.

Drosou

, Jagadish

, Pitoura

, et al. Diversity in big data: A review. Big Data, 2017; 5(2):73–84; doi: 10.1089/big.2016.0054.

32.

Radjenović

, Heričko

, Torkar

, et al. Software fault prediction metrics: A systematic literature review. Inform Softw Technol, 2013; 55(8):1397–1418.

33.

Tanenbaum

, Adams

, Iturralde

, et al. From wary wearers to d-embracers: personas of readiness to use diabetes devices. J Diabetes Sci Technol, 2018; 12(6):1101–1107; doi: 10.1177/1932296818793756.

34.

Wang

, Li

, Cai

, et al. Analysis of Regional Group Health Persona Based on Image Recognition. In: 2018 Sixth International Conference on Enterprise Systems (ES). 2018; pp. 166–171.

35.

Vosbergen

, Mulder-Wiggers

JMR

, Lacroix

, et al. Using personas to tailor educational messages to the preferences of coronary heart disease patients. J Biomed Inform, 2015; 53:100–112; doi: 10.1016/j.jbi.2014.09.004.

36.

Zhang

, Brown

H-F

, Shankar

Data-driven personas: Constructing archetypal users with clickstreams and user telemetry. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (2016) (CHI'16). ACM: San Jose, CA, USA; 2016; pp. 5350–5359.

37.

, Kwak

, Jansen

. Towards automatic persona generation using social media. In: Proceedings of Third International Symposium on Social Networks Analysis, Management and Security (SNAMS 2016), The 4th International Conference on Future Internet of Things and Cloud. IEEE: Vienna, Austria; 2016.

38.

, Kwak

, Jansen

. Validating social media data for automatic persona generation. In: Proceedings of Second International Workshop on Online Social Networks Technologies (OSNT-2016), 13th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA). IEEE: Agadir, Morocco; 2016.

39.

Kwak

, An

, Jansen

. Automatic generation of personas using youtube social media data. In: Proceedings of the Hawaii International Conference on System Sciences (HICSS-50). Waikoloa, HI, USA; 2017; pp. 833–842.

40.

Miaskiewicz

, Sumner

, Kozar

. A latent semantic analysis methodology for the identification and creation of personas. In: Proceeding of the Twenty-Sixth Annual CHI Conference on Human Factors in Computing Systems—CHI'08. ACM Press: Florence, Italy; 2008; p. 1501; doi: 10.1145/1357054.1357290.

41.

Mesgari

, Okoli

, de Guinea

. Affordance-based user personas: A mixed-method approach to persona development. In: AMCIS 2015 Proceedings, Puerto Rico, August 13–15, 2015. Available from: https://aisel.aisnet.org/amcis2015/HCI/GeneralPresentations/1 (last accessed June 1, 2021).

42.

Holden

, Kulanthaivel

, Purkayastha

, et al. Know thy eHealth user: Development of biopsychosocial personas from a study of older adults with heart failure. Int J Med Inform, 2017; 108:158–167; doi: 10.1016/j.ijmedinf.2017.10.006.

43.

Sinha

Persona development for information-rich domains. CHI '03 Extended Abstracts on Human Factors in Computing Systems,, 2003; 830–831; doi: 10.1145/765891.766017.

44.

Dang-Pham

, Pittayachawan

, Nkhoma

. Demystifying online personas of Vietnamese young adults on Facebook: A Q-methodology approach. Austral J Inform Syst, 2015; 19:1204; doi: 10.3127/ajis.v19i0.1204.

45.

, Dong

, Rau

, et al. Using cluster analysis in Persona development. In: 2010 8th International Conference on Supply Chain Management and Information. 2010; pp. 1–5.

46.

Brickey

, Walczak

, Burgess

. A comparative analysis of persona clustering methods. In: AMCIS 2010 Proceedings (Paper 217). 2010.

47.

Dupree

, Devries

, Berry

, et al. Privacy personas: Clustering users via attitudes and behaviors toward security practices. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI'16). ACM: New York, NY, USA; 2016; pp. 5228–5239.

48.

, Kwak

, Jung

, et al. Customer segmentation using online platforms: Isolating behavioral and demographic segments for persona creation via aggregated user data. Soc Netw Analysis Mining, 2018; 8:1; doi: 10.1007/s13278-018-0531-0.

49.

, Kwak

, Salminen

, et al. Imaginary people representing real numbers: Generating personas from online social media data. ACM Trans Web (TWEB), 2018; 12:3.

50.

, Kwak

, Jansen

. Personas for content creators via decomposed aggregate audience statistics. Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining:, 2017; 632–635; doi: 10.1145/3110025.3110072.

51.

Jung

S-G

, Salminen

, An

, et al. Automatically Conceptualizing Social Media Analytics Data via Personas. Proceedings of the International AAAI Conference on Web and Social Media, San Francisco, CA, USA; 2018.

52.

Jung

S-G

, Salminen

, Kwak

, et al. Automatic persona generation (APG): A rationale and demonstration. In: Proceedings of the 2018 Conference on Human Information Interaction and Retrieval. ACM: New Brunswick, NJ, USA; 2018; pp. 321–324.

53.

Salminen

, Şengün

, Kwak

, et al. Generating cultural personas from social data: A perspective of middle eastern users. In: Proceedings of The Fourth International Symposium on Social Networks Analysis, Management and Security (SNAMS-2017). IEEE: Prague, Czech Republic; 2017.

54.

Salminen

, Şengün

, Kwak

, et al. From 2,772 segments to five personas: Summarizing a diverse online audience by generating culturally adapted personas. First Monday, 2018; 23(6); doi: 10.5210/fm.v23i6.8415.

55.

Dhakad

, Das

, Bhattacharyya

, et al. SOPER: Discovering the influence of fashion and the many faces of user from session logs using stick breaking process. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management—CIKM'17. ACM Press: Singapore, Singapore; 2017; pp. 1609–1618.

56.

Smith

, Nayar

. Mining controller inputs to understand gameplay. In: Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST'16). Association for Computing Machinery: Tokyo, Japan; 2016; pp. 157–168.

57.

Pruitt

, Grudin

Personas: Practice and theory (DUX'03). ACM: San Francisco, CA, USA; 2003; pp. 1–15.

58.

Nielsen

, Storgaard Hansen

, Stage

, et al. A template for design personas: Analysis of 47 Persona Descriptions from Danish Industries and Organizations. Int J Sociotechnol Knowl Dev, 2015; 7(1):45–61; doi: 10.4018/ijskd.2015010104.

59.

Lee

, Seung

. Learning the parts of objects by non-negative matrix factorization. Nature, 1999; 401(6755):788–791.

60.

Wöckl

, Yildizoglu

, Buber

, et al. Basic senior personas: A representative design tool covering the Spectrum of European Older Adults. In: Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS'12). ACM: New York, NY, USA; 2012; pp. 25–32.

61.

Stevenson

, Mattson

. The personification of big data. Proc Design Soc Int Conf Eng Design, 2019; 1(1):4019–4028; doi: 10.1017/dsi.2019.409.

62.

Dwork

, Hardt

, Pitassi

, et al. Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference: Cambridge, MA USA. 2012; pp. 214–226.

63.

Hajian

, Bonchi

, Castillo

Algorithmic bias: From discrimination discovery to fairness-aware data mining. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016). ACM: New York, NY, 2016; pp. 2125–2126.

64.

Kleinberg

, Ludwig

, Mullainathan

, et al. 2018. Algorithmic fairness. Aea Papers Proceed, 2018; 108:22–27; doi: 10.1257/pandp.20181018.

65.

Chouldechova

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 2017; 5(2):153–163.

66.

Matthews

, Judge

, Whittaker

. How do designers and user experience professionals actually perceive and use personas?. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'12). ACM: New York, NY, USA;, 2012; pp. 1219–1228; doi: 10.1145/2207676.2208573.

67.

Wilkinson

, Dumontier

, Aalbersberg

, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scient Data, 2016; 3(1):160018; doi: 10.1038/sdata.2016.18.

68.

Wing

. Data for Good: FATES

Elaborated

. 2020. Available from: https://datascience.columbia.edu/FATES-Elaborated (last accessed June 1, 2021).

69.

College of Information and Computer Sciences

EQUATE

. 2020. Available from: https://groups.cs.umass.edu/equate/ (last accessed June 1, 2021).

70.

Siegel

DA.

The mystique of numbers: belief in quantitative approaches to segmentation and persona development. In: CHI'10 Extended Abstracts on Human Factors in Computing Systems (CHI EA'10). ACM: New York, NY, USA; 2010; pp. 4721–4732.

71.

Hasani

, Thirumuruganathan

, Koudas

, et al. Shahin: Faster algorithms for generating explanations for multiple predictions. In: Proceedings of the 2021 International Conference on Management of Data (New York, NY, USA, 2021-06-09) (SIGMOD/PODS'21). Association for Computing Machinery; 2021; pp. 2235–2243.

72.

Cai

, Liu

, Xiao

, et al. Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization. Inform Sci, 2020; 536(2020):171–184; doi: 10.1016/j.ins.2020.05.073.

73.

Salminen

, Santos

, Kwak

, et al. Persona perception scale: Development and exploratory validation of an instrument for evaluating individuals' perceptions of personas. Int J Hum Comput Stud, 2020; 2020:102437; doi: 10.1016/j.ijhcs.2020.102437.

74.

Saxena

, Prasad

, Gupta

, et al. A review of clustering techniques and developments. Neurocomputing, 2017; 267:664–681; doi: 10.1016/j.neucom.2017.06.053.

75.

Wold

, Esbensen

, Geladi

. Principal component analysis. Chemometr Intell Lab Syst, 1987; 2(1):37–52; doi: 10.1016/0169-7439(87)80084-9.

76.

Paatero

, Tapper

. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 1994; 5(2):111–126; doi: 10.1002/env.3170050203.

77.

Yoshida

Learning and utilizing a pool of features in non-negative matrix factorization. In: Active Media Technology (Lecture Notes in Computer Science). (Yoshida T, Kou G, Skowron A, et al. eds.) Springer International Publishing: Cham; 2003; pp. 96–105.

78.

Xue J-H Titterington

Do unbalanced data have a negative effect on LDA?. Pattern Recogn, 2008; 41:1558–1571; doi: 10.1016/j.patcog.2007.11.008.

79.

Luo

, Wilson

, Hancock

. Spectral embedding of graphs. Pattern Recogn, 2003; 36(10):2213–2230; doi: 10.1016/S0031-3203(03)00084-0.

80.

Qian

, Saligrama

. Spectral clustering with unbalanced data. 2013; arXiv:1302.5134 [stat].

81.

McInnes

, Healy

, Melville

. UMAP: Uniform manifold approximation and projection for dimension reduction. 2018; arXiv:1802.03426 [cs, stat].

82.

van der Maaten

, Hinton

. Visualizing Data using t-SNE. J Mach Learn Res, 2008; 9:2579–2605.

83.

Bamman

, O'Connor

, Smith

. Learning latent personas of film characters. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia, Bulgaria; 2013; p. 10.

84.

Kross

, Guo

. Students, systems, and interactions: Synthesizing the first four years of learning@scale and charting the future. In: Proceedings of the Fifth Annual ACM Conference on Learning at Scale (L@S'18). Association for Computing Machinery: London, United Kingdom; 2018; pp. 1–10.

85.

Avramova

, Wittevrongel

, Bruneel

, et al. Analysis and modeling of video popularity evolution in various online video content systems: Power-law versus exponential decay. In: 2009 First International Conference on Evolving Internet. IEEE; 2009; pp. 95–100.