Abstract
This study examines the linguistic construction of the girl figure in Turkish children’s literature through a corpus-based collocational analysis of physical descriptors. The data derive from the Turkish Children’s Literature Corpus (TCLC), comprising 1,089 books published between 1970 and 2012 (8.6 million tokens). Using systematic adjective filtering and semantic categorization, the study analyzes recurring adjective–noun patterns modifying the word kız (girl) in order to identify how physical description contributes to the discursive organization of girlhood. The findings reveal three central representational patterns. First, girlhood is structured along a developmental trajectory: childhood is associated with smallness and domestic positioning, adolescence foregrounds appearance-based evaluation, and adulthood is increasingly framed through marriage-related expectations. Second, physical descriptions converge on a relatively homogeneous prototype characterized by beauty, slenderness, tallness, and light-colored features. Third, aesthetic evaluation frequently extends into moral assessment, as positive appearance descriptors systematically co-occur with virtue-related traits. The analysis further identifies divergence between literary prototypes and demographic indicators in Türkiye, with light-colored features appearing at higher rates in the corpus than in population data. While no causal claims are made regarding reader effects, the findings indicate that physical appearance functions as a central organizing dimension in the linguistic construction of girlhood. By linking collocational regularities to subject positioning, performative repetition, and normalization, the study demonstrates how recurring linguistic patterns may contribute to the stabilization of gendered norms and offers a replicable corpus-based framework for future comparative research.
Plain Language Summary
This study looks at how girls are described in Turkish children’s books. We examined over one thousand books published between 1970 and 2012 and analyzed the words most often used to describe girls’ physical appearance. By studying recurring word patterns, we explored what kinds of images of “girlhood” are repeatedly presented to young readers. The results show three main patterns. First, descriptions of girls tend to follow a life-stage pattern: young girls are often described as small and connected to home life, adolescent girls are more frequently evaluated based on appearance, and adult women are often linked to marriage. Second, many books describe girls using a similar physical ideal, emphasizing beauty, slimness, tallness, and light-colored features. Third, positive physical traits are often connected to positive personality traits, suggesting that appearance is closely tied to moral value in these texts. Although this study does not examine how children respond to these books, it shows that repeated language patterns may shape common expectations about what girls should look like. The findings provide a clearer understanding of how children’s literature can contribute to shared ideas about gender.
Keywords
Introduction
Children’s literature constitutes a powerful site of discursive socialization, where linguistic and narrative patterns contribute to shaping children’s understandings of identity, social roles, and normative expectations. Beyond entertainment or moral instruction, children’s books participate in the symbolic organization of social reality by repeatedly associating particular traits, actions, and values with specific gendered figures. Through such patterned representations, texts do more than reflect social norms; they actively contribute to their circulation and stabilization within early cognitive and affective frameworks (Kortenhaus & Demarest, 1993; McCabe et al., 2011). In this sense, children’s literature operates as a formative discursive environment in which gendered identities are linguistically structured and rendered socially recognizable.
An important dimension of this discursive environment concerns gender representation. In children’s books, male and female characters are differentiated through roles and actions as well as through recurrent descriptive patterns that foreground specific traits as appropriate, desirable, or normative (Fox, 1993; McCabe et al., 2011). When particular attributes, behaviors, or values are repeatedly linked to a given gendered figure, these associations may acquire the appearance of naturalness or inevitability (Diekman & Murnen, 2004). The cumulative effect of such repetition lies not in isolated depictions, but in the systematic clustering of linguistic features that construct recognizable models of femininity and masculinity. These descriptive elements function as discursive resources that position characters within moral, aesthetic, and developmental hierarchies emerging through recurrent descriptive clustering.
Such patterned repetition can be understood as a mechanism of norm formation. From a Foucauldian perspective (Foucault, 1980), discourse does not simply describe reality but organizes it by rendering certain configurations visible, legitimate, and normative. Butler’s (1990) theory of performativity further clarifies how gendered norms become stabilized: repeated linguistic and narrative acts sediment into recognizable and seemingly natural identities. In methodological terms, Butler’s notion of performative repetition finds its linguistic counterpart in collocational regularity; it is precisely through repeated co-occurrence that certain descriptive configurations acquire normative force. In this sense, gender is not simply represented in texts but iteratively constituted through recurring descriptive, evaluative, and relational structures.
Davies’s (1993) subject positioning framework provides a bridge between these macro-level accounts of norm formation and the micro-level linguistic patterns examined in this study. According to this approach, gender is constituted through the position options made available in discourse—through qualifying adjectives, labels, verb attributions, spatial markers, and evaluative expressions. While subject positioning operates across multiple linguistic dimensions, the present study focuses specifically on adjectival modification, as adjectives constitute the primary site of explicit physical evaluation in the corpus. In the present study, this positioning process is therefore traced through recurrent adjectival modification patterns surrounding the lexical item kız (girl). By examining how physical descriptors systematically cluster around this node word, the analysis identifies how linguistic regularities contribute to the construction and stabilization of a prototypical model of girlhood.
In this context, the study investigates how collocational patterns of physical descriptors construct a prototypical model of girlhood through the recurrent adjectival modification of the word kız. The analysis is limited to adjectives denoting physical attributes that modify this lexical item, thereby focusing on the linguistic mechanisms through which bodily and aesthetic norms are discursively organized. The data derive from the Turkish Children’s Literature Corpus (TCLC; 1970–2012; 8.6 million tokens), one of the largest available corpora of Turkish children’s literature. The selected period captures the formative decades during which contemporary representational conventions were consolidated, establishing the linguistic substrate that continues to shape subsequent children’s literature.
While prior research has provided valuable qualitative insights into gender representation through interpretive discourse analysis and small-scale textual readings, the present study complements this tradition by offering a large-scale, corpus-based and replicable examination of how specific adjective-noun configurations systematically cluster around kız. Rather than focusing solely on thematic content, it analyzes how recurring linguistic combinations contribute to the normalization of particular aesthetic, developmental, and evaluative expectations. In doing so, the study provides (a) a transparent methodological framework, (b) quantitative insight into the descriptive construction of the girl figure, and (c) a measurable benchmark for future cross-cultural and longitudinal comparison.
Collocational Analysis in Gender Studies
In corpus linguistics, collocation is generally defined as the habitual co-occurrence of two or more words (Firth, 1957) or, from a statistical viewpoint, as a lexical item’s significantly frequent co-occurrence with others in a given textual context beyond what would be expected by chance (Sinclair, 1987). More than mere proximity, collocation involves meaningful and structural associations between words, contributing to both the construction and reflection of meaning (Halliday, 1961). These co-occurring items, known as collocates, play a crucial role in shaping the meaning of the target word within specific contexts, and as such, collocation has come to be seen as “the key to meaning” (Firth, 1957).
Collocational analysis is fundamentally defined as the study of the tendency of words to occur together within a text (Stubbs, 2002). Such analysis can only be conducted using either ready-made corpora or specially compiled digital corpora designed for specific research purposes. Major online reference corpora often provide built-in collocation tools; however, researchers working with self-compiled corpora must rely on concordance software such as AntConc (Anthony, 2024) or WordSmith Tools (Scott, 2024). In both cases, the researcher enters a node word—the keyword determined by the research questions—into the system and can observe its occurrences within the corpus through the concordance interface.
Although the data are retrieved through computational methods, the decision regarding which collocates to evaluate ultimately depends on the researcher’s analytical judgment. While frequency-based approaches are commonly adopted in collocation studies, there is no full consensus on what exactly constitutes a collocational relationship. Depending on the research scope, questions, and corpus size, some scholars may even consider single observed co-occurrences as collocational. At this stage, it is crucial that the criteria for inclusion be clearly defined and applied consistently (Baker, 2014).
The next step after identifying collocates is interpretation, which shifts the analysis from a purely computational process to a human-centered one. To ground interpretations scientifically, researchers are expected to compare their findings with relevant statistical data and prior scholarship and to interpret them within established theoretical frameworks or approaches. This stage therefore requires thorough preliminary work and conceptual preparation (Baker, 2014). The collocation analysis method, by systematically examining which words co-occur with others in a given context, reveals not only surface-level frequencies but also implicit semantic, grammatical, and discursive associations between words. This analytical power has made collocation analysis a prominent tool across fields including corpus linguistics, lexicography, language education, literary analysis, and sociolinguistics. Since the early 2000s, this method has also been increasingly employed in studies that investigate how gender is represented and reproduced through language. By analyzing recurrent word associations in various types of texts, collocation analysis enables researchers to uncover the linguistic traces of gender-related stereotypes, normative roles, and ideological tendencies, thereby establishing itself as a powerful analytical framework in the field of gender and language.
One of the earliest gender studies to apply collocation analysis was conducted by Pearce (2008). In this pioneering study, Pearce (2008) examined the collocations of the words woman and man in the British National Corpus, finding that both genders are often stereotyped, with men being represented in terms of power, physicality, and violence, and women being depicted in terms of their relationships with men or their ability to have children. Subsequent studies have explored similar gendered norms, including those where men are evaluated based on their roles and status in society while women are analyzed in terms of their appearance and sexuality (Caldas-Coulthard & Moon, 2010); where women and men are associated with different actions (Herdağdelen & Baroni, 2011); and even where the polarity of emotions differs by gender (Norberg, 2012). Findings of women’s marginalization in certain contexts have also emerged (Aull & Brown, 2013). Among these contributions, Paul Baker’s (2014) work is particularly notable for demonstrating how corpus-based tools can be systematically applied to analyze gendered discourse across diverse linguistic contexts. In particular, his study of the words girl and boy using Sketch Engine exemplifies how collocation analysis can reveal implicit ideologies—especially through verb associations, adjective patterns, and syntactic roles such as subject and object.
These studies are important not only because they highlight the contribution of collocation analysis to gender studies but also because they demonstrate how different collocations can yield different results when analyzed across various corpora, emphasizing the significance of corpus selection in research (Herdağdelen & Baroni, 2011). Furthermore, it is noteworthy that the collocational features of girl are more frequently altered than those of boy (Macalister, 2011).
Methodology
This study examines how gender representations are constructed at the linguistic level through the physical descriptions of the girl figure in Turkish children’s literature, using a corpus-based collocational analysis. The procedure follows corpus-driven approaches to gender research (Baker, 2014) and is interpreted within the frameworks of subject positioning (Davies, 1993), performative repetition (Butler, 1990), and normalization (Foucault, 1977). These frameworks guide the interpretation of recurring collocational patterns as indicators of discursively stabilized gendered expectations.
Research Questions
In line with the focus of the study, the following research questions were formulated:
Which physical attributes most frequently modify the word kız (girl) in Turkish children’s literature?
How do these attributes cluster into recurring representational patterns at different developmental stages (childhood, adolescence, adulthood)?
What evaluative and contextual patterns co-occur with physical descriptors, and how are these configurations descriptively contextualized within the socio-cultural setting of Türkiye?
Corpus and Data Preparation
The primary data source is the Turkish Children’s Literature Corpus (TCLC, n.d.), which consists of 1,089 Turkish-language children’s books published between 1970 and 2012 and contains approximately 8.6 million tokens across multiple genres for readers aged 2 to 16.
All orthographic instances of kız were retrieved. Because the corpus search function returns all character-string matches, each occurrence was manually screened to retain only cases in which kız denotes the common noun “girl.” Instances where kız functioned as part of a proper noun or fixed lexical unit (e.g., conventionalized character titles) were excluded. Such constructions represent formulaic or culturally frozen expressions and would artificially inflate frequency counts without reflecting productive adjectival modification in narrative contexts.
Since TCLC does not provide built-in collocation analysis, all validated concordance lines were exported and compiled into a separate dataset for analysis in AntConc.
Collocate Retrieval
Collocational analysis was conducted in AntConc. As adjectives in Turkish canonically precede the noun they modify, only the left span of the node word kız was examined. A five-word window to the left (L5–0) was selected to account for stacked or multi-word modifiers. In line with the study’s aim of foregrounding discursive patterning, statistical evaluation was limited to distributional frequency.
Adjective Identification and Semantic Categorization
Retrieved collocates were manually examined to identify adjectival forms functioning as modifiers in context. Non-adjectival forms (e.g., nouns, verbs, predicates) were excluded. A total of 14,077 adjectival tokens modifying kız were identified.
Semantic classification was conducted using the USAS Semantic Tagset—a multi-layered annotation framework developed at Lancaster University that organizes lexical items into 21 general discourse fields and their subcategories (Rayson et al., 2004). Adjectives were filtered according to two domains:
O4 (Physical Attributes): including general appearance, aesthetic judgment, color/visual properties, and shape/proportions.
T3 (Time: Age): age-related descriptors (e.g., little, young), included due to the intrinsic link between chronological age and bodily development in children’s narratives.
Classification focused on semantic range rather than lexical frequency. Infrequent adjectives were retained when they contributed to broader semantic domains relevant to physical representation, ensuring comprehensive coverage of descriptive patterns.
Semantic tagging was conducted over a period exceeding 1 year to ensure exhaustive and iterative review of the dataset. Two researchers independently assigned USAS categories. A third researcher reviewed all coding decisions. In cases of disagreement, concordance lines were re-examined collectively and consensus was reached through discussion. Given the context-sensitive and interpretive nature of semantic categorization in narrative data, consensus-based validation was preferred over statistical reliability coefficients. This procedure aimed to minimize subjective bias while maintaining analytic rigor.
Contextual (KWIC) Analysis
All sentences containing adjectives categorized under O4 and T3 were compiled into a separate dataset. Using AntConc’s KWIC function, immediate concordance environments were examined to identify recurring evaluative and thematic patterns, including emotional states, actions, social roles, and spatial settings.
Socio-Cultural Contextualization
The identified distributional patterns were interpreted by relating them to accessible socio-cultural indicators (e.g., national statistics on hair and eye color distribution; publicly available reports on age and marriage patterns) in order to situate the literary prototypes within broader cultural reference frameworks.
Methodological Limitations
The study relies on frequency-based collocational analysis within a left-sided window (L5–0) and does not employ statistical association measures. The findings therefore reflect distributional tendencies rather than statistically tested collocational strength. The analysis is limited to adjectival modification of kız and does not include verbal constructions, syntactic positioning, or visual representations. Although semantic categorization was conducted using the USAS taxonomy and verified through a tripartite consensus procedure, interpretive judgment remains inherent in qualitative classification. Contextual comparisons are descriptive and do not establish causal relationships between literary representation and social outcomes.
Translation Policy
All corpus examples originate from Turkish-language texts. For accessibility, examples are presented in English translation. Original Turkish forms are provided at first mention where necessary to preserve semantic nuance.
Results
Distribution of Physical Descriptors
The analysis of the Turkish Children’s Literature Corpus (TCLC, n.d.) shows that the lexical item kız (girl) occurs approximately 33,000 times, ranking 70th among the most frequent words in the corpus. This frequency is notably higher than its 141st position in general Turkish usage (Aksan et al., 2017), indicating the relatively high linguistic visibility of girl characters in children’s literature.
A total of 14,077 adjectives were identified as collocates modifying kız. Among these, 5,490 items (39%) were classified as physical descriptors according to the criteria outlined in the methodology. The remaining adjectives referred to non-physical domains such as personality, behavior, or evaluation and were therefore excluded from the present analysis.
The physical descriptors were grouped into semantic subcategories using the USAS semantic framework. Table 1 presents the distribution of these categories.
Semantic Subcategories and Frequency of Adjectives Collocating With the Word kız.
As shown in Table 1, age-related descriptors constitute the largest category, accounting for 60% of all physical adjectives (n = 3,303). These are followed by judgments of appearance, which represent 26% of the dataset (n = 1,432). Descriptors referring to specific body parts make up 8.4% (n = 459), while clothing (2.4%), weight (1.8%), height (0.8%), and personal care (0.4%) together account for the remaining 5.4% of the dataset.
This distribution shows that physical descriptions of the girl figure are concentrated primarily in two domains: developmental stage (age) and aesthetic evaluation. In the following sections, categories of hair, eyes, skin, weight, and height are analyzed together under the theme of “The Idealized Beauty Prototype” (“The Aesthetic Prototype: Beauty Judgments and Body Ideals” section) to provide a holistic view of recurring physical patterns.
Age-Based Representation of the Girl Figure
Age-related adjectives constitute the largest category of physical descriptors (60%, n = 3,303), indicating that developmental stage functions as a primary organizing principle in representations of the girl figure. Within the age-related set, descriptors cluster into three phases: childhood (66%, n = 2,180), adolescence (31%, n = 1,024), and adulthood (3%, n = 99). In addition to these size- and age-marking adjectives, numerical age adjectives provide a more explicit picture of the age ranges foregrounded in the corpus. As shown in Table 2, girl characters are predominantly portrayed within the 3 to 18 age range, while infancy (0–2 years) and adulthood are comparatively rare.
Developmental-Period Classification of Numerical-Age Adjectives Collocating With the Word kız.
The following subsections examine how physical description, spatial association, and attributed roles shift across these developmental stages.
Childhood: Smallness and Domestic Settings
Within childhood-related descriptors, the adjective küçük (little) accounts for 95% (n = 2,071) of cases. In Turkish, this adjective can denote both age and physical size; however, when modifying kız, it primarily functions as an age marker. Analogical expressions such as damla gibi (as small as a drop) or fındık gibi (as small as a hazelnut) further reinforce diminutive size.
In this age group, the girl figure is rarely described with additional physical attributes. When aesthetic evaluation occurs, it tends to involve affectionate terms such as sevimli (cute) and tatlı (sweet), rather than güzel (beautiful). These patterns indicate that childhood representations foreground smallness and affection rather than attractiveness.
Collocational patterns show strong associations between little girl and domestic environments. Kinship terms (e.g., mother, father) and spatial references (e.g., house, door, window) are among the most frequent collocates. Verbal contexts emphasize playfulness (playing, running) and observation (looking, watching). These patterns suggest that childhood is primarily represented within family-centered domestic settings.
Emotional states most frequently associated with little girl include crying, happiness, and surprise, while behavioral descriptors such as giggling or being silly construct childhood as a phase of emotional expressiveness and play.
Adolescence: Emergence of Beauty-Oriented Descriptions
As descriptors shift from küçük kız (little girl) to genç kız (young girl), the linguistic profile changes. The expression genç kız typically refers to girls aged fourteen or older, corresponding approximately to high school age.
Unlike childhood descriptors, which emphasize smallness and affection, the phrase young girl is strongly associated with güzel (beautiful). It frequently co-occurs with adjectives denoting slenderness, grace, delicacy, and positive emotional traits. These patterns indicate a shift toward appearance-based evaluation during adolescence.
Emotional contexts involving young girl most frequently evoke joy, and spatial references continue to center on domestic settings such as the house or window. In addition, the phrase young girl frequently co-occurs with genç delikanlı (young man), suggesting that adolescence is linguistically framed as a stage associated with romantic or relational contexts.
Adulthood: Marriageability and Behavioral Expectations
Age-related adjectives referring to adulthood are comparatively rare (3%, n = 99). When present, they predominantly center on the notion of marriageability. The expression gelinlik kız (marriageable girl) denotes a girl considered ready for marriage and appears in contexts that articulate expectations regarding clothing, behavior, or mobility.
Corpus examples indicate that marriageability is treated as a limited temporal phase. Expressions such as yaşlı kız (“old girl”) or evde kalmış kız (“spinster”) are used for women who remain unmarried beyond the expected age. The idiomatic construction evde kal- (lit. “to be left at home”) is noteworthy, as it reflects the conventionalization of a spatial association between “girl” and “home” in linguistic usage.
Other adulthood-related expressions, including koca (big), kazık kadar (as big as a stake), or eşek kadar (as big as a donkey), often appear in contexts that impose behavioral expectations. These forms typically function as evaluative or corrective expressions rather than neutral physical descriptions.
You’re a big girl now, make the coffee and be useful!
You’ve become girls as big as a stake; you can’t ride a bike!
You’ve become girls as big as a donkey, aren’t you ashamed to play the darbuka and dance?
The Aesthetic Prototype: Beauty Judgments and Body Ideals
Physical descriptors related to appearance cluster around a limited set of facial and bodily features that collectively form a recurring beauty prototype in the corpus. Rather than appearing as isolated traits, these features tend to co-occur, producing a relatively homogeneous image of the girl figure across the corpus.
Facial Features and Color-Based Ideals
Adjectives describing body parts in relation to the girl figure most frequently refer to hair (39%, n = 179), skin (19%, n = 87), eyes (18%, n = 83), face (11%, n = 50), and cheeks (3%, n = 14). Other body parts are mentioned less frequently. This distribution reflects a selective pattern of physical representation in which the face and its components are foregrounded over other anatomical features.
Hair
Adjectives related to hair predominantly focus on color (63%, n = 113), length (19%, n = 34), and shape (18%, n = 32). Regarding hair color, blond hair is the most commonly attributed feature (58%), followed by black (28%), red (11%), and brown (3%). The most frequently depicted hairstyles are long, curly or wavy, and tied. Descriptions of long and blond hair frequently co-occur with adjectives denoting beauty in the corpus:
4. A girl with long blond hair braided at the back, large eyes, and a sweet face also came over to us.
Skin
In skin descriptions associated with the girl figure, there is a clear asymmetry between adjectives denoting whiteness/brightness (72%, n = 63) and those denoting darkness/paleness (28%, n = 24). References to brightness and whiteness are consistently associated with vitality and beauty, whereas darkness and paleness tend to carry negative connotations such as ugliness, weakness, or poor health:
5. The prince was surprised when he came across a dark-skinned and ugly girl instead of the beautiful, moon-like girl he left by the spring.
Eyes
Adjectives related to eyes mainly focus on color, condition, and shape. The most frequently mentioned eye color is blue (56%, n = 46), followed by green/hazel (24%, n = 20), black/brown (19%, n = 16), and gray (1%, n = 1). In depictions of girls, “blue eyes,”“blondness,” and “beauty/attractiveness” are used as significant collocates. In addition, descriptions of large and bright eyes frequently co-occur with concepts of beauty and admiration (see Example 4).
Face and Cheeks
Adjectives describing the face fall into four subcategories: facial expressions, judgments of appearance, color, and shape. Among these, facial expressions account for the largest share (42%, n = 21). Girls are portrayed as smiling, cheerful, or friendly in 95% of the relevant instances, whereas sullen or neutral expressions appear rarely:
6. Smile a little! Otherwise they’ll gossip about you, saying, “What a sullen girl.” Girls are supposed to be both friendly and smiling.
Adjectives describing the face overwhelmingly emphasize beauty: 94% of face-related descriptors are positive, most commonly using beautiful or related expressions, while negative descriptors such as ugly are rare.
Color-based descriptions cluster around red and its shades. Rosy cheeks are favored in 94% of cases and often connote health, beauty, and vitality. In addition, facial features are frequently presented in cumulative sequences and co-occur with beauty-related expressions, forming composite descriptions that include multiple attributes:
7. Rosy-cheeked, red-lipped, sweet-voiced girl.
Body Norms: Slenderness and Height
Adjectives related to body size and height reinforce a relatively narrow physical ideal. Among weight-related adjectives, 64% (n = 63) refer to thinness, whereas 36% (n = 35) refer to fatness. References to fatness predominantly occur in negative or evaluative contexts, while descriptions of thinness typically appear in positive contexts. A similar pattern is observed in height-related descriptions: 85% (n = 36) of tallness references co-occur with beauty-related adjectives. Tallness, slenderness, and beauty function as significant collocates. Moreover, adjectives denoting height and thinness frequently co-occur with descriptors such as youth and blondness:
8. The chubby-cheeked baby from 10 years ago has now become a very slender, tall, blond-haired, very beautiful girl, with her hair falling over her shoulders.
Beauty as an Evaluative Framework
Adjectives used to describe appearance are overwhelmingly positive (96%, n = 1,374). Among these, beautiful and related expressions stand out as the dominant descriptors, indicating that beauty functions as the central evaluative trait attributed to girls.
There is a recurrent narrative link between growing up and becoming beautiful, suggesting that physical attractiveness is presented as an expected outcome of female maturation:
9. The time for the girls who were growing up and becoming more beautiful to marry had drawn near.
When negative descriptions appear, they are largely centered around the adjective ugly (çirkin). Beauty tends to co-occur with references to brightness and lightness, whereas ugliness is more often associated with darker imagery (Example 5).
Physical appearance is also frequently linked to character evaluation. Girls described as beautiful are often associated with positive moral or personality traits, while those labeled ugly tend to appear in contexts involving undesirable behaviors:
10. The two ugly-tempered and ugly-faced sisters were not happy with this situation.
11. She grew up and became a beautiful and good-natured young girl.
This recurring pattern suggests that aesthetic evaluation functions not only as a physical description but also as a broader framework for character assessment within the corpus narratives.
Marginal Physical Categories
Compared to the two sections discussed above, the girl figure co-occurs less frequently with adjectives related to other aspects of physical appearance, including clothing (51%, n = 80), accessories (29%, n = 45), personal care (15%, n = 24), and make-up (5%, n = 7). Within these categories, no distinctive pattern has been identified beyond clothing types and colors.
In clothing-related descriptions, dresses constitute the most frequently mentioned category (28%, n = 22), followed by smaller groups such as outerwear, trousers, school uniforms, and skirts.
Color adjectives occur frequently in clothing descriptions. The most common color categories are white (23%, n = 15), red (17%, n = 11), and blue (15%, n = 10), while pink appears less frequently (9%, n = 6). In many instances, color terms function not merely as modifiers of specific garments but as central elements in the visual portrayal of the girl figure:
12. Two young girls in white entered the room.
Discussion
Cross-Cultural Patterns and the Turkish Context
Comparative research on gender representations in children’s literature indicates that gender norms tend to display recurring structural patterns across cultural contexts, even when their specific manifestations vary (Adam & Harper, 2023). Moattar’s (2010) comparison of children’s books from Sweden and Iran illustrates this continuity: while Swedish narratives more frequently foreground female agency and Iranian texts more often position girls within domestic and marriage-oriented roles, both contexts retain underlying androcentric narrative structures.
The patterns identified in the Turkish Children’s Literature Corpus largely align with these international tendencies: the girl figure is predominantly constructed through physical descriptors emphasizing age, appearance-based evaluation, and bodily norms across an innocence–beauty–marriageability continuum. The equation of beauty with goodness and narrative reward has similarly been documented in Western fairy tales and children’s books (Baker-Sperry & Grauerholz, 2003; Hamilton et al., 2006), while the positive portrayal of thinness and tallness, coupled with the negative framing of larger body types, corresponds to findings in content analyses of children’s media across cultural settings (Brugeilles et al., 2002; Herbozo et al., 2004). The Turkish corpus thus appears to reflect widely observed representational conventions rather than an isolated national pattern.
Subject Positioning, Performativity, and Normalization
From the perspective of Davies’s (1993) subject-positioning framework, the recurring collocational clusters identified in this study establish a limited range of position options for the girl figure. The systematic co-occurrence of adjectives referring to beauty, slenderness, height, and light-colored features stabilizes expectations concerning how a girl should look and be evaluated. These patterns contribute to the discursive organization of normative subject positions. The repeated association between physical attractiveness and positive moral traits can be interpreted, in line with Butler (1990), as performative reinforcement. In Foucauldian terms, such regularities constitute processes of normalization, operating through the recurring presentation of a narrow aesthetic standard as natural and desirable (Foucault, 1977).
Body-related descriptors reinforce this pattern: the distributional asymmetry that privileges thinness and tallness in positive contexts, while associating larger body types with evaluative or corrective environments, resonates with feminist analyses of bodily discipline and the beauty regime (Bartky, 1990; Bordo, 1993)—a regulation that emerges through patterned co-occurrence and evaluative framing.
Demographic Divergence and the Construction of a Narrow Prototype
The corpus reveals marked divergence between literary prototypes and Turkish demographic characteristics. While dark hair and brown eyes predominate in the Turkish population (62% and 70%, respectively; Nalçaoğlu, 2012), the corpus disproportionately foregrounds light-colored features—blonde hair (58%) and blue eyes (56%)—in positive evaluative contexts. Blonde hair is thus overrepresented nearly 15-fold, and blue eyes more than fourfold, relative to population prevalence. Similarly, thinness and tallness are privileged despite demographic realities in which substantial proportions do not conform to these body types (TURKSTAT, 2019).
Comparable tendencies have been documented in studies of Chinese children’s books, where lighter skin tones and patterns described as “white preference” have been observed in award-winning titles (Yang et al., 2022). The Turkish data therefore align with broader cross-cultural patterns in which lightness and specific facial features function as recurring aesthetic markers. Beyond physical traits, the corpus shows that beauty becomes increasingly associated with marriageability in adolescence and adulthood. Similar linkages between female maturation and romantic or marital trajectories have been reported in East Asian contexts (Charlesworth et al., 2021), while analyses of Western fairy-tale corpora likewise demonstrate the centrality of romantic resolution (Baker-Sperry & Grauerholz, 2003). Recent research suggests a partial shift in some Western narratives toward representations emphasizing agency and self-sufficiency (Clarke et al., 2024).
Implications for Self-Worth and Bodily Diversity
Although the present corpus-based design does not permit direct claims about reader effects, prior research in body image and objectification theory provides a relevant interpretive framework. Studies indicate that repeated exposure to narrow aesthetic ideals is associated with appearance-based self-evaluation, body dissatisfaction, and diminished self-worth among children (Dittmar et al., 2006; Fredrickson & Roberts, 1997; Tiggemann & Slater, 2004).
In the present corpus, physical appearance is not presented as a neutral descriptor but is systematically linked to moral evaluation. Beauty co-occurs with virtue-related traits, while ugliness appears in negatively evaluated contexts (Section 2.3.3), establishing a dual framework in which children whose bodies diverge from the prototype may encounter implicit messages that position them as inadequate on both aesthetic and moral dimensions. Given that the idealized features identified in the corpus—blonde hair, blue eyes, slenderness, and tallness—are not representative of the broader demographic profile of Türkiye, this narrowing has implications for the representation of bodily diversity.
Rather than promoting diversity or self-acceptance, the corpus constructs a narrow ideal that implicitly positions many Turkish children as deviations from what may be perceived as a culturally foreign standard, communicating implicit messages about desirability, belonging, and social value to those whose characteristics align with Türkiye’s majority demographic profile. This normalization operates not through explicit instruction but through the repeated naturalization of a limited aesthetic range as self-evidently desirable—a process through which children’s literature actively participates in the reproduction of beauty norms rather than merely reflecting them (Bartky, 1990).
Vulnerability, Constrained Agency, and Developmental Trajectories
In addition to aesthetic norms, the corpus reveals recurring patterns in which girls are positioned as vulnerable figures requiring protection. Childhood representations emphasize dependency, while later stages link physical maturation to marriageability and behavioral regulation (Sections 2.2.1–2.2.3). This developmental sequencing—innocence, appearance-centered evaluation, and relational positioning—constructs a trajectory in which physical appearance and relational status are foregrounded over autonomy and capability.
National statistical reports document the continued presence of early marriage and protection-related concerns regarding girls in Türkiye (TBMM, 2010; TURKSTAT, 2021). While the corpus design does not establish causal connections between literary representation and social outcomes, the recurrence of such motifs suggests that children’s literature may participate in the discursive circulation of gendered expectations, naturalizing early marriage as a normative developmental outcome. This positioning functions as normalization: the repeated presentation of limited subject positions as natural and inevitable constructs a girlhood defined less by possibility than by constraint—a trajectory oriented toward fulfillment of gendered social roles rather than autonomy.
Directions for Future Research
The present findings open several avenues for further investigation. Future research could extend the present design by examining additional linguistic dimensions, such as verb-based agency and syntactic positioning, and by integrating visual analysis in multimodal studies of picture books. Longitudinal comparisons across time periods, publishers, and genres would help determine whether the identified patterns are stable or undergoing transformation. Reader-response studies examining how children interpret and negotiate these representational norms would provide valuable insight into reception processes and the extent to which literary prototypes influence self-concept and identity formation.
Conclusion
This study examined the linguistic representation of the girl figure in Turkish children’s literature through a corpus-based collocational analysis of physical descriptors. Three central representational patterns emerged. First, girlhood is structured along a developmental trajectory in which childhood is associated with smallness and domesticity, adolescence foregrounds appearance-based evaluation, and adulthood becomes increasingly linked to marriage-related expectations. Second, physical descriptions converge on a relatively homogeneous prototype characterized by beauty, slenderness, tallness, and light-colored features. Third, aesthetic evaluation frequently functions as a broader framework for moral assessment, with positive appearance descriptors systematically co-occurring with virtue-related traits.
These patterns are broadly comparable to those reported in international studies of children’s literature, suggesting that the Turkish corpus reflects widely documented representational conventions rather than an isolated national configuration. At the same time, the analysis reveals a marked divergence between literary prototypes and demographic characteristics in Türkiye: blonde hair and blue eyes are substantially overrepresented in positive evaluative contexts relative to their population prevalence (Nalçaoğlu, 2012). This demographic asymmetry, combined with the recurrent linkage between physical appearance and moral worth, indicates that the corpus constructs a relatively narrow aesthetic range that does not fully correspond to bodily diversity within the broader society.
From a theoretical perspective, the findings demonstrate how micro-level linguistic regularities may contribute to the stabilization of macro-level gender norms. In line with Davies’s (1993) subject-positioning framework, recurring collocational clusters delineate age-specific position options concerning how the girl figure should look and how she should be evaluated. Consistent with Butler’s (1990) notion of performative repetition, these patterned associations operate through iterative reinforcement rather than explicit prescription. In Foucauldian terms (Foucault, 1977, 1980), the convergence of aesthetic and evaluative descriptors can be understood as a form of normalization, whereby particular bodily traits are repeatedly presented as natural and desirable. By applying these frameworks to a large-scale Turkish-language corpus, the study extends their empirical grounding beyond predominantly Western datasets and illustrates the analytical value of corpus-based methods for examining ideological patterning in children’s literature.
As discussed in Section 3.4, the present design does not permit causal claims about reader effects; nevertheless, existing research suggests that sustained exposure to narrow aesthetic ideals may be associated with appearance-based self-evaluation among children (Dittmar et al., 2006; Fredrickson & Roberts, 1997; Tiggemann & Slater, 2004). The patterns identified here therefore warrant attention within broader discussions of representation, diversity, and inclusion in children’s media. The findings do not imply direct psychological outcomes; rather, they highlight how recurring descriptive structures may shape the symbolic environment within which young readers encounter models of femininity.
Several limitations should be acknowledged. The analysis focuses exclusively on textual collocational patterns and does not incorporate visual representations, syntactic role distributions, or reader-response data; accordingly, the interpretations offered here should be understood as analytically grounded but contextually bounded.
Future research may extend this design by incorporating verb-based agency, syntactic positioning, and multimodal analysis of textual–visual interaction in picture books. Longitudinal comparisons across publication periods and publishers would clarify whether the identified patterns are stable or undergoing transformation. Cross-cultural parallel corpora could further illuminate how local and global conventions intersect in shaping girlhood representations. Finally, reader-response studies would provide valuable insight into how children interpret, negotiate, or resist the aesthetic and evaluative patterns identified in the present corpus.
In sum, this study provides empirical evidence that physical appearance functions as a central organizing dimension in the linguistic construction of girlhood within Turkish children’s literature. By demonstrating how recurrent collocational patterns contribute to the stabilization of gendered expectations, the analysis underscores the importance of examining micro-level linguistic distributions in understanding broader processes of cultural norm formation. These findings may further inform efforts by publishers and educators to engage critically with narrow aesthetic standards and to promote more diverse representations of girlhood.
Footnotes
Ethical Considerations
This study is based exclusively on the analysis of published textual data from the Turkish Children’s Literature Corpus (TCLC), a publicly available digital corpus. No human participants were involved at any stage of the research, and no data were collected through interaction with individuals. Ethical approval was not required for this study, as it involved only the analysis of publicly available texts and posed no risk to individuals.
Consent to Participate
In accordance with Section 8.05 of the APA Ethical Principles of Psychologists and Code of Conduct, informed consent was not required.
Author Contributions
All authors whose names appear on the submission (Şahru Pilten-Ufuk, Gülhiz Pilten, Merve Yorulmaz-Kahve), (1) made substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data; (2) drafted the work or revised it critically for important intellectual content; (3) approved the version to be published; and (4) agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
