Abstract
One limitation of the naturalistic observation method is that it is understudied how accurately personal relationships may be judged by observers in real-life settings. To assess this judgment accuracy, we observed 285 dyads of individuals in public places and then asked whether they were affiliated or strangers. We found that human observers were very accurate in judging peoples’ actual personal relationships. Moreover, several nonverbal cues, including direct interaction and age similarities, were identified as correlates of affiliation. We conclude that researchers may accurately judge personal relationships from nonverbal observational data and recommend that this should be utilized as a methodological tool in naturalistic observational studies.
Introduction
Naturalistic observation is one of the fundamental ways by which social scientists study real-life human behavior—yet it is also a surprisingly underutilized method (Nippert-Eng 2015; Reiss 1992). This is linked to the belief that mere observation offers “thin” information about the social content of social action (e.g., its meaning, motivation, and context) (Geertz 1973; Takyi 2015). In turn, this appears to be one reason why self-reporting is the default method in much social science (Baumeister et al. 2007; Potter and Hepburn 2007). On the other hand, it should be appreciated that more social content may be extracted from observational data than is typically assumed (Collins 1983; Nassauer and Legewie 2018; Reiss 1971). Erving Goffman (1971) was an early advocate of this view and suggested that face-to-face behavior should be studied with methods of “interaction ethology,” adapting techniques of naturalistic observation found within animal ethology. He identified, among other things, the nonverbal “tie-signs” that people use to signal the existence of their personal relationships (e.g., by holding hands or with body alignments).
On the whole, these interaction–ethological insights have had a limited impact within Goffman’s field of micro-sociology and related fields, in which the concept of tie-signs is rarely utilized in naturalistic observational studies (but see Afifi and Johnson 1999; Fine et al. 1984; Petronio and Bourhis 1987). Thus, we have the odd fact that “probably more is known about interactions between chimpanzees than interactions between humans” (Martin 2017:118), as demonstrated by the fact that animal ethologists routinely do what social scientists rarely attempt: Infer relationships from observational data on interacting nonhuman primates and other social animals (Hinde 1983; Schülke et al. 2020). This situation is not satisfactory for the social sciences, although the insights from animal ethology may serve as a reminder that tie-signs are an evolutionary adaptive capacity for social animals, human and nonhuman alike (Byrne and Whiten 1988; Christakis 2019).
The purpose of the current article is to demonstrate that naturalistic observation of tie-signs by trained human observers may provide methodologically valid information about personal relationships. In doing so, we address what perhaps is the major stumbling block for the scholarly application of this method: that the accuracy of inferences made from tie-signs about personal relationships is surprisingly understudied. There exists a body of research showing that people display and often correctly perceive tie-signs in laboratory settings (Barnes and Sternberg 1989; Liberman et al. 2014), but the generalizability of these results to research purposes in real-life settings remains unclear. A few studies have reported convincing interrater reliabilities of relationship judgments of people in public places (e.g., Ejbye-Ernst et al. 2020; Weenink et al. 2021). However, given that the observers may simply be systematically false (Pajo 2017), these reliabilities do not guarantee a high validity in measuring the “ground truth” of the relationships.
Finally, in a rare study testing relationship judgment against a ground truth in a real-life public setting (Ge et al. 2012), it was found that human observers and a computer vision algorithm measuring the proximity and velocity of pedestrians could accurately judge relationship ties with substantial accuracy. This result is promising, but if relationship judgment should become part of the methodological toolbox of observational research, reproducibility testing is needed (Freese and Peterson 2017).
In addition to its methodological utility, there are also good theoretical reasons for why relationship judgment should find wider use, namely that the methodological tools we have available shape how the social world is theorized (Collins 1994). This may be illustrated by the line of research that examines bystander-helping behavior in real-life violent emergencies using naturalistic observation techniques. The studies that pioneered this approach did not use relationship judgment (Parks et al. 2013), despite their stated interest in relationship dynamics (Levine et al. 2011). Hence, their results offered no direct evidence for relationship influences on bystander helping. However, this is consistently found to be a chief predictor of bystander helping in subsequent studies that routinely use relationship judgment methods (Ejbye-Ernst et al. 2020; Liebst et al. 2019; Lindegaard et al. 2017). More broadly considered, the under-utilization of relationship judgment methods is part of the reason why social psychology and related fields overemphasize the power of the situation at the expense of group-based agency and influences (Smith 2015; Swann, Jr. and Jetten 2017).
Methods
Data were 285 dyads of two adult individuals who were systematically observed in public urban settings in Copenhagen, Denmark, during the fall of 2019 (data, materials, and full statistical outputs are available at osf.io/x4je9). We followed the practice of animal ethologists to develop an ethogram of detailed behavioral definitions (Jones et al. 2018). These definitions were compiled from open-ended qualitative field observations and tie-sign definitions found in the existing literature (Afifi and Johnson 2005; Goffman 1971; Morris 1971). Next, we field tested and adjusted the ethogram and then tested the reliability of its measures by comparing two independent observers’ ratings of 43 dyads. For all measures (see Figure 1), the interrater reliability scores were above Krippendorff’s (2004) alpha of 0.7, which is considered a good level of agreement in observational research (Hallgren 2012). Forest plot of six separate logit models regressing actual personal relationships on tie-sign displays. Note: The confidence intervals with the thick line width are at the 95% level, and their counterparts with the narrow line widths are Bonferroni corrected to a 99.3% level. The numbers above the intervals are the point estimates (i.e., odds ratios). All models were specified with robust standard errors.
A troublesome part of the design was to define a relevant sampling frame, given that probability sampling would likely result in a sample heavily skewed toward strangers because people in urban spaces are largely strangers to each other (Lofland 1973). As such, we decided to sample dyads of persons who had been within each other’s co-presence, defined by Goffman (1963:17) as follows: “[P]ersons must sense that they are close enough to be perceived in whatever they are doing, including their experiencing of others, and close enough to be perceived in this sensing of being perceived.” The included dyads thus needed to display at least some minimal awareness of each other, such as glance briefly at or make room for the other person. Note that the exclusion of dyads that did not comply with the co-presence criterion most probably made the relationship judgment more difficult (e.g., two persons standing in opposite corners of a town square without paying any attention to each other are—given their lack of co-presence—in all likelihood strangers).
After a dyad had been selected, it was observed between one and 5 minutes, and at the end of the observation period, the observer judged the relationship status of the dyad members. Note that observers were instructed to make the best guess regarding the relationship, relying on their intuition and all available nonverbal information. Hereafter, the observers approached the dyad members and questioned whether they were affiliated or strangers. In situations where the dyad members moved away from each other, the onsite observers coordinated that each person was approached by one observer. The response rate of the questionnaires was very high, judged to be well above 90%. Note that our observations were conducted covertly and that this complies with the ethics code of the American Psychological Association (2010, § 8.05), stating that scholars are exempt from obtaining informed consent if the study—as in this case—consists of naturalistic observations in public places.
Measures
First, we measured the actual and judged relationships with self-report and observation, respectively. Actual personal relationship was a binary variable, where 1 = the dyad was affiliated, and 0 = the dyad was strangers to each other. Judged personal relationship was a binary variable where 1 = the dyad was judged to be affiliated, and 0 = judged to be strangers.
Second, we captured four measures of behavioral displays of tie-signs. Direct interaction was captured with a binary measure where 1 = the dyad had some interactional exchange (e.g., talking, joint activity), and 0 = no interaction. Arm’s reach was a binary measure where 1 = the dyad members were within arm’s reach for at least 60 seconds, and 0 = the dyad did not satisfy this criterion. Moving in proximity was a binary measure where 1 = the dyad was walking at the same speed and in the same direction for at least 10 seconds, and 0 = the dyad did not satisfy these criteria (i.e., persons standing or sitting would be assigned a 0 per default). Note that this measure was constructed as a combination of three raw measures (i.e., whether the dyad members were walking, and if so, at the same speed and in the same direction). Touching was a binary variable where 1 = the dyad members touched each other, 0 = no observed touches.
Third, three measures operationalized tie-signs as personal similarities between the dyad members. Similar age was a binary variable where 1 = the dyad was judged to be within 5 years of age, and 0 = they appeared to have a larger age difference. Similar gender was a binary variable where 1 = the dyad was judged to have a similar gender, and 0 = the members appeared to have different genders. Similar ethnicity was a binary variable where 1 = the dyad was judged to have a similar ethnicity, and 0 = the members appeared to have different ethnicities. Finally, observation time captured for how many seconds the dyad was observed. Each included dyad was observed for at least 1 minute and a maximum of 5 minutes.
Analytical Approach
To evaluate the judgment accuracy, we used a number of statistical techniques. We report the percent agreement between the observer judgment and the relationship ground truth. However, given that a simple percent figure may overestimate the judgment accuracy because it does not account for chance agreements (Cohen 1960), we also evaluated the judgment accuracy with a Krippendorff’s (2004) alpha test. Next, we regressed the actual personal relationships on the measured tie-sign displays using separate logit regression models (Breen et al. 2018). Here, one concern was that we faced a multiple comparisons problem, given that we examined the association between tie-signs and personal relationships with seven separate tests (Gelman et al. 2020). We addressed this issue by using a Bonferroni corrected alpha level (0.05/7 = 0.007) alongside the traditional 5% significance level (Abdi 2007).
The sample size of 285 dyads satisfied several statistical power scenarios (Faul et al. 2007), suggesting that around 210 cases would be sufficient to detect a small to medium effect size (Cohen’s f2 ∼ 0.05) with a power of 90%. We accepted that the study could not detect effect sizes below the chosen thresholds because practical circumstances made data collection labor-intensive, and we had limited resources available. Note that we did not perform an a prior power analysis for the interrater reliability tests, although we stress that our sample satisfied a general recommendation of a minimum of 30 comparisons (McHugh 2012).
Results
Summary of Means and Krippendorff’s Alphas for all Measures.
Note: The means were calculated from the sample of 285 dyads, while the alpha scores were calculated from the interrater sample of 43 dyads. Actual personal relationship has no alpha score available because it was measured with the questionnaire. Observation time has no alpha score because, for practical reasons, it was not coded independently by the two observers in the interrater reliability testing procedure.
Next, we ran six separate logit models, regressing the actual personal relationship variable on the tie-sign display predictors, see Figure 1’s forest plot. 1 All tie-sign displays, except similar gender and ethnicity, positively predicted personal relationships—both evaluated at a 5% and a Bonferroni-adjusted significance level.
In terms of effect sizes, the single strongest predictor was direct interaction, which increased the odds of affiliation by around a factor of 100. The effects of the other significant effects—arm’s reach, moving in proximity, other touching, and similar age—were substantially smaller, although odds ratios in their range of 5–15 should be considered large in magnitude (Rosenthal 1996).
What distinguished direct interaction (and the second strongest predictor, similar age) from the other behaviorally displayed tie-signs was that its association with the relationship outcome was symmetrical rather than asymmetrical (see Lieberson 1985). That is, when direct interaction was present, the proportion of affiliated dyads was high (221/231 or 96%), and when direct interaction was absent, the proportion of affiliated was inversely low (9/54 or 17%). By comparison, for example, the association between touching and relationship was asymmetrical: When touching was present, the proportion of affiliated dyads was very high (62/64 or 97%), but when touching was absent, the majority of dyads were also affiliated (168/221 or 76%). In other words, both the absence and presence of direct interaction, while only the presence of touching, was informative for whether people were affiliated or not.
As robustness checks, we considered the impact of the observation time, and this covariate was found to be positively associated with actual personal relationships (OR = 1.67, CI 95% [1.25, 2.22], p < 0.001, note that observation time was z-standardized for this estimation). However, the inclusion of this covariate in each of the models yielded the same overall results, except that the estimate of moving in proximity was enlarged in magnitude. Further, we evaluated whether the very large odds ratio of direct interaction was inflated due to quasi-separation (i.e., the outcome variable separates a predictor variable almost completely). A Firth’s logistic regression is robust against this issue (Heinze and Schemper 2002), and estimation of the above models with this approach yielded similar results.
Discussion
Here, we have examined how accurately trained human observers judge the personal relationship status of people in public places. Our results provide compelling evidence that human observers were highly accurate in judging whether people were affiliated or strangers. Further, we examined what kinds of tie-signs predicted personal relationships. We found that several of the included tie-signs correlated with people’s actual personal relationships, with direct interaction (e.g., talking, joint activity) having the largest effect size. This coincides with a symmetrical association between direct interaction and person relationships (Lieberson 1985), which aligns with research showing that affiliated persons are engaged in frequent face-to-face exchanges (Lawler et al. 2009), while strangers are often restricted to unfocused co-presence (Goffman 1963). Taken together, our results confirm that tie-sign displays are informative of personal relationships (Ge et al. 2012; Goffman 1971; Murphy 2016) and stress that relationship judgment is so accurate that it may be used as a valid methodological tool to ascertain personal relationships in real-life observational studies.
However, it should also be acknowledged that the current results may not be fully transferable to other study settings, groups, and designs. First, the plausibly evolutionary and universal basis of tie-signs does not preclude cultural influences and idiosyncrasies that may make their displays and recognition less predictable. For example, research shows that the tie-sign component of preferred interpersonal distance varies between national cultures (Hall 1966; Sorokowska et al. 2017). It is thus to be expected that the current findings do not generalize fully to other national contexts and types of situations with other tie-sign practices. Relatedly, it should be acknowledged that our sample is somewhat biased toward a younger age group, because part of the data was sampled in night-time economy settings (the age average of our sample and the Danish population is 37 and 42 years, respectively).
Second, it is likely that other studies will face less ideal observation conditions than those in the current study—e.g., they may lack the ability to follow those observed, as in the case of video footage recorded by fixed surveillance cameras (Levine et al. 2011). This would limit the information available to the observers on which to base their relationship judgment, thus leading to lower judgment accuracy (e.g., as in the case of Ge et al. 2012).
Third and finally, it should be noted that we achieved perfect interrater reliability for the relationship judgment measure, which is not common in other observational studies where the score ranges from excellent to moderate (e.g., Ejbye-Ernst et al. 2020; Liebst et al. 2020; Weenink et al. 2021).2 Given that high interrater reliability is necessary for high construct validity (Pajo 2017), it may be assumed that the nonperfect interrater reliability scores found in the existing research correspond to judgments with a somewhat lower validity than currently reported. In sum, it seems fair to assume that the current judgment accuracy results are transferable to other studies with high interrater reliability for personal relationships, while a low reliability would indicate weaker transferability.
Obviously, the methodological utility of tie-sign judgments is chiefly relevant for scholars practicing non-participatory observation, where the persons observed cannot be simply questioned about their personal relationships. For example, this may be preferred because non-participation offers more unobtrusive and objective measures (Reiss 1971), or minimizes the observer’s exposure to potential dangers in the field (Adang 2018). Further, non-participatory observations increasingly rely on video recordings from public settings (Nassauer and Legewie 2018), which provide fine-grained behavioral information but only limited personal information about those recorded (Philpot et al. 2019). To circumvent this limitation, video observational researchers have begun measuring relationship ties from nonverbal behavioral cues—without knowing how well these judgments correspond with actual relationships (Ejbye-Ernst et al. 2020; Liebst et al. 2019). Our results validate and encourage the application of tie-sign judgments within video-based observational studies.
The existence of the correlations between tie-sign displays and personal relationships offers a possible explanation why the human observers could accurately judge the relationship: The tie-signs they observed in situ provided them with precise information about the personal relationships, and—as assumed by Goffman (1971) and confirmed in subsequent research (Hall et al. 2019)—humans have the psychological capacity to translate such nonverbal information into accurate judgments about personal relationships. However, it should be acknowledged that we do, in fact, not know whether their judgments were cognitively informed by the examined tie-signs, a combination of these, or perhaps some unobserved tie-signs. It is plausible that the training involved in developing and testing the ethogram improved the judgment accuracy, but the judgment may also have been an intuition-based capacity that requires no training. We suggest that future research addresses this question by comparing the judgment accuracy of trained and untrained observers.
Regarding the specific tie-signs we have tested as predictors of personal relationships, it is noteworthy that all of those displayed through bodily behaviors were found to contain relationship information (i.e., direct interaction, arm’s reach, moving in proximity, touching). By contrast, only one of the three (age vs. ethnicity and gender) potential tie-signs displayed through person-specific similarities between the dyad members were linked with personal relationships. This indicates that behavioral displays are potentially more unambiguous in communicating tie-sign information than static person similarities. This makes sense in that behaviorally displayed tie-signs serve the function of signaling relationships, while person similarities typically do not serve such communicative purpose (with uniforms as one exception). Instead, person similarities contain relationship information if they coincide with some operative homophily principle—e.g., the tendency that affiliation is more typical among similar persons (McPherson et al. 2001).
However, if this homophily tendency is weak (e.g., because it does not manifest itself in public settings), the person similarities only offer a noisy tie-sign signal, as it may have been the case with respect to the non-significant age and ethnicity measures. This argument supports the current practice that computer vision algorithms for recognizing relationship ties focus on behaviorally displayed tie-signs (Bernasco et al. 2022; Ge et al. 2012).
The implications of the current results go beyond the application of tie-signs to ascertain relationships; in our view, this is but one methodological step toward a wider appreciation of naturalistic observation within the social sciences, as envisioned by Goffman (1971). The current results regarding relationship judgment expand the scope of what type of social content observational researchers may extract from nonverbal cues (Murphy 2016)—be it emotions identified from bodily cues (Tracy and Matsumoto 2008) or verbal expressions inferred from nonverbal functional equivalents (Eibl-Eibesfeldt 1989). In sum, and put metaphorically (Phillips 2001), although naturalistic observation is centered around knowledge of the “hand,” it also bears witness to what the “head” thinks and says, and the “heart” feels, and thus with whom we are socio–emotionally affiliated.
Footnotes
Authorship Contributions
The current work builds on the Master’s dissertation written by LB and KLD (
) and supervised by LSL. The article is co-first authored by LSL, LB, and KLD, and the authorship contributions are as follows: Conceptualization: LSL, LB, KLD. Methodology: LB, KLD. Validation: LSL. Formal analysis: LSL, LB, KLD. Investigation: LB, KLD. Data Curation: LB, KLD. Writing–Original Draft: LSL, LB, KLD, VP. Writing—Review and Editing: LSL, LB, KLD, VP, MRL.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
