Abstract
The sonic aesthetics of technical communication have been underconsidered compared to their visual and textual counterparts. We highlight the aesthetic overlaps and contrasts between human-produced audio technical communication and Generative AI audio that mimics the aesthetic patterns of successful science communication podcasts, specifically Radiolab, while exploring the sonic dimensions of ethos, relatability, trust, and narrative. Applying close listening analysis, we explore two particular sonic aesthetic features: prosodic variation and shared conversational syntax. We argue prosodic range and shared syntax contribute to the aesthetics of either overarching agreeableness/certainty or confrontation/uncertainty, which in turn constitutes an orientation to scientific knowledge.
Technical and professional communication (TPC) takes place across many media, including audio-only modes, yet the styles and aesthetics of sonic technical communication have been underconsidered in comparison to their visual and textual counterparts. Relationships among texts and audiences are mediated, in part, through aesthetics (Kostelnick, 2020). Science communicators invite engagement, establish trust, and manage audience expectations via aesthetic choices and conventions. In the case of audio media, aesthetically pleasant sound design can elevate content that may seem dry or disengaging, enhancing arguments and enticing audiences to continue listening in part for enjoyment's sake. Aesthetically jarring or disruptive audio can surprise, unsettle, or derail attention, for better or worse. Prior work on aesthetics in scientific and technical communication has focused on visuals; the aesthetics of sonic TPC requires further investigation.
The award-winning radio show and podcast Radiolab is a quintessential example of sonic art combined with technical science journalism; its weekly episodes consistently leverage aesthetic experience as part of a playful exploration and explication of technical and scientific subjects. As “an exploration of science, philosophy and ethics using innovative composition and sound design,” Radiolab foregrounds aesthetic sonic elements to delight, surprise, and educate listeners (Radiolab, n.d.). Radiolab is a public-facing, audio-based technical communication project, and the aesthetic qualities of the show cannot be separated from its technical and scientific messaging.
But what happens when the carefully crafted aesthetic features of successful human-made content are appropriated by systems of Generative AI (GAI)? We begin to explore this question by comparing the aesthetic elements of Radiolab with those of similar audio content generated using the Audio Overview feature within Google's NotebookLM (NLM) and distributed as scientific, technical communication podcasts. Applying a close listening analysis to four episode pairs, we highlight the aesthetic overlaps and contrasts between each set of listening experiences. Comparing sonic details in this way allows us to think about the ways vocal aesthetics and oral delivery contribute to effective audio-based technical communication. In doing so, we offer an exploratory analysis and theorization of audio-based TPC aesthetics.
Marketed as an AI-powered “research and thinking partner” from Google, NLM allows users to upload or link to specific sources and leverage the latest Gemini AI models to engage with those sources. NotebookLM's “Audio Overview” draws on user-provided sources to export an AI-voiced dialogue summarizing their contents, “podcast style.” Following Sterne et al. (2008), Bowie (2012), Ceraso (2014), and Detweiler (2021), we understand a podcast as digital audio, distributed via RSS feed, and serialized (i.e., produced episodically, over time). On its own, audio output from NLM is merely an audio file until someone uploads it to an RSS feed. Accordingly, AI-generated podcast content relies on layers of human labor: not only in the training and design of generative models, tailoring prompts, and (in the case of NLM) supplying source material, but also in curating, editing, and uploading the audio for distribution. Unlike a human-made podcast, where producers can be interrogated regarding their choices and processes, NLM's content generation lacks transparency; the system's proprietary nature means we can only guess at the intentions or designs underlying the ways those products are created (Bathaee, 2018). Despite the relative lack of accountability and intention behind them, we can nonetheless observe and interrogate the artifacts themselves as they circulate for human audiences. In analyzing the sonic aesthetics included in Radiolab and various products of NLM's Audio Overview, we begin to delineate the limitations, affordances, and implications of artificially generated audio.
In what follows, we engage with threads of existing scholarship on technical communication aesthetics, sonic rhetorics, and AI audio. Subsequent sections describe our sampling process, review our methods, and showcase key observations and analyses that emerged from our investigation. We then highlight possibilities for future research and pedagogical application, concluding with two main takeaways for TPC scholars, teachers, and practitioners. Our observations and recommendations here are far from comprehensive, and we invite more scholarly attention to the sonic aesthetic conventions of TPC, urging TPC pedagogues to consider how we might develop more robust sonic media and information literacies with our students, given the proliferation of AI-generated content across mediascapes. Ultimately, we argue for the value of teaching and practicing critical listening as part of more fully engaging with the aesthetic dimensions of effective sonic technical communication artifacts, whether as audiences, communicators, or both.
Literature Review
Designing aesthetic TPC content requires a balance between convention and novelty, a careful leveraging of established cultural means of understanding to “foster creativity, one of the aspects of technical communication that makes it human and satisfying, to both designers and audiences alike” (Kostelnick, 2020, p. 24). Aesthetic choices mark intent, emotion, convey trust, build narratives, and keep audiences engaged; aesthetic experience and vocabulary (Ishizaki, 2011) are integral to building relationships across TPC practitioners, artifacts, and audiences. Functionally, aesthetics feed into our “cultural knowledge” over time, in a “process of acculturation” that helps us know how to interpret or use scientific and technical artifacts (Kostelnick, 1995). Beyond this, aesthetics rely and build on the imagination (Kant, 1952), engaging the senses and stretching us beyond ourselves toward trust in the epistemological and ontological orientations of others (Carusi, 2008), providing vivid, meaningful context through storytelling (Barton & Barton, 1988; Jones & Walton, 2018; Vealey & Gerding, 2021). When stories are presented sonically, audiences’ sensory engagement and emotional investment can be even deeper and more impactful (Pettman, 2017; Swiatek, 2018). The intimacy of human or human-seeming voices has great communicative and aesthetic power.
Aesthetic-focused work within TPC, even when framed using multimodality, has prioritized the visual (Hardesty & Hollinger, 2020; Ishizaki, 2011; Kostelnick, 1995; Slater & Rosselot-Merritt, 2024; Welhausen, 2018). While sonic composition has been taken up by scholars across the overlapping fields of new media, sound writing, and podcast studies (Anderson, 2014; Beckstead et al., 2024; Blevins, 2019; Bowie, 2012; Ceraso, 2014; Clarke & Bjork, 2023; Knight, 2013; McKee, 2006; Selfe, 2009; Sterne et al., 2008), there has been less attention to the specific affordances and qualities of sonic modalities in TPC. As Wright (2019) observes, though it may be “tempting to dismiss the rhetorical dimensions of sound as something apart from technical communication […] it is nearly impossible to separate the rhetorical effects of sound from their signaling function” (p. 367). Sonic aesthetics have rhetorical impacts: they build conventions, elicit trust, and nurture cultures, all of which audiences use in interpreting interrelated texts and making sense of the world, as sound scholars have repeatedly recognized (Eckstein, 2018; Kjeldsen, 2018; Weidman, 2015). Features such as tone, pitch, tempo, timbre, and other aspects of aurality are also powerful carriers of intent and meaning (Fjellro, 2025; Juslin & Scherer, 2008; Kišiček, 2018; Ogden, 2006), as well as contributors to aesthetic expression and experience (Bjork, 2021).
Despite these threads of scholarship addressing aesthetics within TPC, there is a clear deficit of scholarship analyzing sonic TPC and aural aesthetics; many scholars and science communicators see an opportunity for more (Bjork, 2025; Husein et al., 2019). We have an opportunity to examine how the sonic aesthetics of human-generated audio are reproduced within GAI audio. Initial studies on AI voices specifically and their use within AI-generated podcasts not only indicate some optimism for the potential of public, multimodal, and “customizable” or “personalized” science communication but also point to the inconsistencies, hallucinations, mispronunciations, and mischaracterizations that haunt AI-generated podcast content and much AI content in general (Banihashemi et al., 2025; Do et al., 2025; Jordan et al., 2025). The human aesthetics that Kostelnick (2020) notes as essential for meaningful TPC are seemingly being mimicked within GAI TPC podcasts, but without the same intention, imagination, or accountability.
Methods
Though it is not always easy to detect, AI-generated content (in text, audio, video, and other modes) is widespread across digital platforms. In gathering a reliable sample of GAI audio for our comparison, we turned to Listen Notes, one of the largest podcast-specific searchable databases. Listen Notes maintains a list of AI-generated (or, in their terms, “fake”) podcasts (Listen Notes, 2025), which as of late 2025 included over 24,000 entries. From this starting point, we identified 245 that, based on their titles (e.g., Exploring the Cosmos or The Isotope Effect), centered on science-related topics. We narrowed our sample further by accessing the podcast feeds for each and eliminating shows with fewer than five episodes, shows with episodes created in a single download dump with no continuous updates, shows summarizing a single daily news brief, and those with descriptions that revealed a pseudoscientific orientation.
From this subset, we selected a handful of podcasts that demonstrated (1) scientific exploration as their focus, (2) an intended general audience, (3) a sustained output of episodes, and (4) substantial episode length. Ultimately, we chose to analyze episodes from four AI-generated podcasts: Ask Science, Science Unveiled, Simply Science, and The Dead Scientists Podcast. To keep our focus on aesthetic features more than differences in topic or content, we identified target episodes that spanned a range of scientific disciplines by listening exploratorily to ∼5 minute segments of older and newer content from each, then selected episodes on topics that we could also find in Radiolab's archives. Matching subject matter in this way allows us to directly contrast aesthetic choices rather than the specifics of content. Choosing older and newer episodes from the Radiolab catalog helps us consider science communication aesthetics across a range of hosts and production styles.
We center Radiolab in our analysis in part because of its plentiful archives and the fact that its impact on the culture and aesthetics of podcasting has been undeniable. Since its founding in 2002, Radiolab's influence has fundamentally shaped the form and function of public-facing science communication podcasting. Because (the far newer) GAI technologies draw on existing public recordings across time, it's likely that GAI audio has pulled from Radiolab's archives across all 20+ years of its production. No form of GAI audio has yet grown to the level of recognition and influence that Radiolab has achieved, so the available selection of GAI artifacts for this research was much narrower.
The four pairs of GAI podcasts and Radiolab episodes are listed below:
- - - -
Once our episode set was established, we each listened closely to all episodes, noting what stood out to us aesthetically (Schulze, 2021). Close listening as methodology requires attentive engagement with the sonic dimensions of a recording, including (but not limited to) its acoustic qualities, noises included in a recording but not transcribed to text, languages, genres, features of the voices and ambience, alongside characteristics such as accents, pauses, and background noises (Hoffman, 2021). In our application of this method we took care to listen past the easy points of aesthetic difference (such as the presence of musical accompaniment, the inclusion of interviewees’ voices from diverse linguistic backgrounds, the relative length and complexity of each episode, etc.) in order to attend more closely to the sonic micro-aesthetics such as vocal tone, rhythm, pronunciation, pitch, pacing, emphasis, and structure.
We annotated copies of the official WNYC transcripts for the four Radiolab episodes and used basic Otter.ai transcripts for the NLM content. The transcriptions we include in this article for the GAI episodes have been edited for accuracy and readability. In the excerpts we include below, our annotations are marked in curly braces { } for clarity. Those in [ ] are from the official Radiolab transcripts; italics in their transcripts denote recorded interviews.
After independently close listening and annotating our transcripts, we compared notes and collated our observations, negotiating to a consensus on what was most significant or worth deeper discussion. Across all eight episodes, we noticed the strongest aesthetic contrasts between human and GAI audio regarding speakers’ pacing, cadence, emotional expressiveness, and levels of agreeableness versus expressions of doubt or uncertainty. From among these themes, we outline two dimensions of sonic aesthetics particularly salient for audio TPC: prosody and shared syntax.
Observations and Analysis
Among the variances we noticed across our small sample, two points of comparison seemed most aesthetically significant: (1) differing amounts of prosodic variation, including pitch, pacing, tone, and emotional register, and (2) similar conversational patterns and the use of what we term “shared syntax.” We explain each along with examples, noting how sonic aesthetics in both categories contribute to contrasting ontological orientations toward science, along with varying emotional and narrative approaches to TPC.
Prosodic Variation
Prosody is not simply a nice embellishment to audio-based TPC: its aesthetics help shape content into accessible narrative forms. In linguistics, prosody refers to the ways intonation, stress, rhythm, loudness, emotional tone, pitch, lengths of sounds, timbre, pacing, and the like contribute to overall impressions of what is being communicated. Eckstein (2017) divides these aural strategies into diegetic and nondiegetic, observing that “both diegetic and non-diegetic sound mutually inform one another … [and] diegetic sound is tied to the progression of a narrative …[whereas] non-diegetic sound cues listeners to the proper sentiment to interpret the events” (p. 665). Both are at work in sonic TPC, especially in multimedia contexts where audiences expect nondiegetic elements of delight, playfulness, or emphasis along with diegetic verbal elements that may seem dull or unimportant otherwise. Alongside syntactic content, prosody can contribute to rhetorically cohesive—if layered and messy—significations.
The rhetorical aesthetics of sound design and attention shaping that Eckstein (2017) notes are largely absent or muted in the AI podcasts in our sample. Variations in volume, pitch, and pacing that are present are not as wide-ranging, sustained, or distinct. The NLM voices are not monotone but nor are they as fully expressive as the hosts and guests of Radiolab. From Truesdale and Pell (2018), we understand that “greater maximum pitch and variability, greater loudness, and faster speech” typically signal more passionate, engaged conversation (p. 126). We found that the AI voices remained in the middle range of intonation and inflection with predictable, constrained rhythms and patterns.
In contrast, the human hosts of Radiolab articulate with great prosodic variation. Throughout our listening experience, we heard several ways in which our sampled Radiolab content demonstrates a wider range of aesthetic elements in pacing, pitch, signs of embodiment, emotional tone, and types of responses. In “Zoozve” (February 2024), for example, we noted several gasps, laughter, and even a full yawn from Latif Nasser in the opening moments of the episode. These inclusions and other variations of tone and pitch signal anticipation and excitement: LATIF: Okay, wait. But he did actually … LULU: [gasps {small, short, anticipatory intake of breath}] Yes? LATIF: … say one thing.
LATIF: Can you tell me if it's—in which direction it's leaning?
GARETH WILLIAMS: It's leaning for.
LATIF: [gasps {–big, open, prolonged “ohh”}]
LULU: [gasps {–more sudden, higher pitched excited gasp}] LATIF: Really? (Radiolab, 2024)
In both the human-made and AI podcasts, we heard sonic signs of embodiment: audible breaths, gasps, mouth sounds, and other indications of physicality. However, the relatively unvarying and sometimes uncannily flat simulated sounds of “humanness” in the AI podcasts stood out in contrast to the far more exuberant and distinctive versions from Radiolab's hosts. When we heard embodied noises such as laughter or breath in the sample of AI podcast content, these seemed more uncanny than realistic. One brief example from Science Unveiled, “The Pluto-Neptune Illusion” (February 2025), in which the male voice vocalizes a monotone “ha ha” in response to the female voice's comment about having seen “way too many disaster movies,” stands out as especially off-putting. The lack of prosodic range makes this response sound empty and flat.
We also found the pacing of the AI audio unvarying, almost relentless in its momentum. The pacing and tempo sounded more expansive and free-ranging in Radiolab's audio. A few key excerpts help us illustrate these differences. For example, in Simply Science, “Colors of the Noise Explained,” the AI voices conclude with “Thank you for joining me” and “Of course this was fun,” but the second voice does not pause between “Of course” and “this was fun.” Rather than a gracious comment in response to the other host's gratitude, we hear a strangely emphatic statement with a continuous falling intonation about the conversation having, “of course,” been fun, with a tone of inevitability rather than of collegial enjoyment. The simulated tone and rhythm approximate a friendly exchange, but in a thin and uncanny way. To our ears, the relentless, incongruous pacing of the comment disjoints, rather than connects.
In contrast, during Radiolab's follow-up to their 2012 “Colors” episode, “Rippin’ the Rainbow an Even Newer One” (2018), we hear pauses and varied rhythms that bounce and stutter, start and stop, then start again: ROBERT: That's what's called an umwelt. Like, every animal in the world lives with its own senses in a world that is defined by those senses. And {…} you will never know what a bat knows when it echolocates. You will never know what a deer, {slightly sad, wavering} when it looks out—because we know that deer don’t see orange. That's why all the hunters wear bright orange and yellow. They just don’t see that range. JAD: Huh! {a soft, short murmur} ROBERT: Do they see more of something else? I don’t know. JAD: Well, how do—what's it? Umwelt? ROBERT: It's U-M-W—it's a great word, umwelt. It's the word that says… {long, pregnant pause} that you are limited by what you can feel, touch, smell, see… JAD: Yeah. On some level, I mean, I feel like that's a problem that exists even between people. {Krulwich chuckles}
The delight and solemn struggle expressed through these sonic aesthetics characterize exploration documented through story. Prosody gives shape to the technical content, tracing a narrative based on an ongoing search for meaning. Slater and Rosselot-Merritt (2024), drawing on Kenneth Burke's rhetorical aesthetic of form, argue that “because form emphasizes an audience's experience with a text or work of art, the delight and fulfillment that audiences find in a work does not depend solely on a mere recitation of facts or information but also comes from watching a ‘drama’ unfold through the creation and fulfillment of desire” (p. 62). Employing a range of prosodic features, Radiolab does more than summarize scientific information but gives it dramatic and relatable narrative forms. Rather than passively accepting a static overview of scientific knowledge, Radiolab listeners are invited to wander and wonder alongside the hosts, seeking meaning beyond mere facts and occasionally sitting with uncertainty.
Conversational Rhythm and Shared Syntax
In both sets of podcasts, we heard amiable conversational rhythms and aesthetically pleasant exchanges between hosts. Such exchanges, whether scripted, constructed in editing, or captured from interviews, are a foundational ingredient in many podcasts and digital media in general. Conversation in podcasting becomes a powerful tool for technical communication, as the interlocution of voices creates opportunities to “combine both depth and casualness” (Beckstead et al., 2024, p. 149). Popular science podcasts make use of this depth and casualness to take complicated ideas and render them approachable and engaging to their audiences. Spinelli and Dann (2019) specifically credit Radiolab with producing and promoting a markedly less didactic, far more playful and relatable form of science communication for the masses. Conversational audio contains simpler vocabulary, shorter sentences, and messier structure than written text, all of which can indicate the ongoing development of ideas through dialogue rather than a precise, final, definitive articulation (Beckstead et al., 2024). Brevity and quick exchange together construct an ethos of accessible expertise.
Aesthetically, the prosody of podcasting conversation is often characterized by regular rhythms of dialogue between speakers along with upbeat, congenial handoffs. Vocal tone and timbre in the exchanges we listened to were largely positive and upbeat, contributing to a sense of rapport between cohosts and of favorable, interested attitudes toward the subject matter. Vocal patterns, rhythms, language choice, and paralinguistic sonic cues all work together to construct an affect of awe (Juslin & Scherer, 2008). The affective impacts of pitch contouring and perturbation, intensity, volume, rapidity of onset, velocity of speech, and pauses combine to express what we recognized as expressions of curiosity, interest, enthusiasm, and delight in both AI and human-created podcasts.
Beyond general conversational rhythm, we noted another common feature: what we have termed “shared syntax.” Whether edited together from separate recordings (as is often clearly heard in Radiolab's work) or generated that way via prompts to an AI system, shared syntax is the pattern of conversational flow where one speaker hands over the narration to another, while keeping grammatical structures, syntactic frames, and overall flow carefully intact. Through repeated phrases, finishing another's sentence, or other indications of shared syntax, the internal consistency and rapport built by this layering of voices communicates cohesion to listeners. Radiolab's “Zoozve” (February 2024) shows such weaving between the host Nasser's narration and interviewee Paul Weighert's explanations: LATIF: … you have to understand this one fact about the solar system. PAUL WIEGERT: It's an ironclad rule of our solar system … LATIF: That every celestial body moves in an orbit. And even though it can get gravitationally nudged around by other things near it, it primarily orbits one thing.
PAUL WIEGERT: And so the moons orbit planets, the planets orbit the Sun.
LULU: Wait, but moons—doesn’t a moon technically orbit the planet and the Sun?
PAUL WIEGERT: The answer is sort of technically, but we’re actually talking about something different.
In the Ask Science episode “The Secret Science of Fruit Ripening,” (2025) we hear a similar conversational exchange characterized by shared syntax: SPEAKER 1 (male voice): If it's climacteric bananas, avocados, pears, mangoes, and it's a bit firm, that's okay. You know it can ripen at home. SPEAKER 2 (female voice): Exactly. You can let it sit. Maybe use the bag trick if you’re impatient. SPEAKER 1: But if it's non-climacteric berries, grapes, cherries, pineapple, watermelon… SPEAKER 2: …Then you need to choose carefully at the store. Pick the one that looks, feels, and maybe smells ripe already, because that's as sweet as it's gonna get. SPEAKER 1: What you see is what you get, sweetness wise. SPEAKER 2: Pretty much sums it up.
Both excerpts above employ shared syntax and pleasant interactions between hosts, though marked by different prosodic features and in service of different goals. In Radiolab's content, shared syntax, with broader prosodic ranges of expression, builds an engaging narrative. Potential disagreement and expressed confusion come through even when multiple voices share the same sentence structures. In contrast, the AI voices rarely, if ever, express doubt or uncertainty. The NLM audio employs shared syntax in service of agreeableness, favoring pleasant exchange, tidy conclusions, and the flat, even prosody noted earlier.
Beauty in Uncertainty
We each came away with a strong impression that, despite the casual, conversational similarities between Radiolab episodes and our sampled AI podcasts, Radiolab content contained much more frequent instances of disagreement and vocalized confusion when compared with the far more agreeable, affirmation-heavy AI content. Though there aren’t simple linguistic, paralinguistic, or aesthetic markers of agreeableness or disagreeableness, indications of agreement or disagreement, consent or challenge, are signaled along a spectrum by combinations of word choice and delivery (Ogden, 2006). We noted many instances of cheerful, unquestioning affirmation throughout the shared syntax and prosody of the AI podcasts. Such vocal agreeableness may be part of what many have recognized as the “sycophantic” nature of large language models in general (Sharma et al., 2023; Sun & Wang, 2026). Even when an AI voice expressed something we heard as surprise, the affect seemed flat. We noted no welling sadness nor soaring enthusiasm, no halting confusion nor deeper clarity. Beneath relatively flat prosody, we sensed flatness in ideas. Available GAI technologies cannot yet model a full range of sonic aesthetics or produce scientific communication beyond basic positivist summaries.
For example, in The Dead Scientist Podcast's “Instincts in Evolution: Darwin on Natural Selection and Behavior” (September 26, 2024), we hear the following exchange: SPEAKER 2 (female voice): And Darwin, being the keen observer he was, didn’t stay away from these, {slight pause} shall we say, morally ambiguous examples. In fact, he used them to challenge us humans who tend to impose our moral judgments onto the natural world. SPEAKER 1 (male voice): Good point, nature's playing by a different rule book, often with surprising, even unsettling results. {inhale} But Darwin, he didn’t just focus on individual animals, right? He was also fascinated by how instinct plays out in those complex social structures like ant colonies. Am I right? SPEAKER 2: You’re spot on. And that leads us to another captivating example, the case of slave-making ants. SPEAKER 1: Slave-making ants, ok now you’ve got my attention.
In contrast, and in alignment with the wider range of positionalities and emotional expressions we heard from Radiolab, its speakers do challenge and question each other earnestly and often. From a midpoint of the episode “Rippin’ the Rainbow an Even Newer One” (2018), the hosts discuss how scientific understandings of vision have changed: JAD: Okay, so they {meaning mantis shrimps} still see colors that we don’t see, but they might not just be seeing as many {emphasis/stress on many} colors as we thought. Like, maybe their rainbow is more a series of {stumbles through enunciating this confidently} rather focused, discreet bands of color with not a lot in between {isolated musical tones representing each color, but don’t resolve into a harmony}. ROBERT: I am very, very—well, that's actually extraordinarily puzzling {rushes through the extraordinarily puzzling}. They’re given the equipment that we use to see {pause} {slower} various shades of color, and they don’t use it to see shades of color? What do they use it to see? JAD: Well, they use it to—{uh, then restarts with a tone of doubt} I mean, this is all speculative, but, {uh} you know, Tom Cronin was telling us in the fish shop that, like, the science seems to be heading towards this idea {pause} that they use colors to communicate. And if that's the case, like, they don’t need to see {emphasis on see} all the colors {a little somber tone to his voice, almost sad}, they just need to see the ones that mean {emphasis on mean} something.
To meaningfully communicate complex scientific and technical information, Radiolab embraces and wields emotional markers of confusion, contradiction, and hedging. The show's orientation toward science and science communication is explicitly framed as evolving and ongoing, all the more exciting and engaging because of the aesthetic mess such an orientation allows. Radiolab's audio content incorporates guests, interviews, field recordings, background audio, and multiple layers of aesthetically thick sonic arrangement. Taking up a range of positionalities, hosts and guests disagree, interrupt, misspeak, and occasionally speak over and under one another in ways that convey eagerness to participate in the messiness of science and science communication. Through aesthetically varied audio storytelling, where hosts regularly position themselves and their guests as limited, potentially wrong, and in the middle of an ongoing journey of discovery, Radiolab invites audiences to share a likewise curious, nimble approach to scientific understanding.
In the sonic aesthetics of these TPC artifacts, we can sense each podcast's orientation to scientific knowledge. Either, as the GAI voices seem to offer, science is largely settled, contained, and easily summarizable, or, as Radiolab indicates, true scientific understanding dances and shifts as we pursue it, opening more mysterious questions with every set of answers, inviting endless, ongoing exploration.
Applications for Teaching and Praxis
Understanding the rhetorical and ontological impacts of aesthetics and the practical value of audio storytelling can only benefit students as they learn to communicate clearly and effectively. Teachers of TPC can model productive listening skills and provide opportunities for students to practice careful listening in a variety of ways. We might spend time in class listening carefully, helping students to focus and take time to actively hear and comprehend a range of sounds. Assigning podcast episodes as “readings” can help students explore the ways TPC is enacted across media. Starting with in-person listening activities and soundscape analysis assignments such as those outlined in Angeli (2024) and Blevins (2019), we can help students recognize, analyze, and understand a wider range of voices, sounds, and audio-based rhetorical choices. Further sound-focused activities such as Droumeva & Murphy's (2018) aural literacy assignments or Cicchino's (2020) sound-mapping exercises ask students to attend carefully to the affordances of digital audio, practicing not only listening skills but visualization, analysis, and argument. In one of our recent courses, students completed a sound-mapping assignment along with charts of each audio “scene,” using both to write a focused rhetorical analysis of a podcast episode. Assignments like these give us practice extending the concepts of visual topoi (Welhausen, 2018) toward aural or sonic topoi.
In more general TPC courses, we might draw on McKee (2006), Bjork (2021; 2025), and Eberly (2025) to give students relevant and useful vocabulary for talking about and analyzing the aesthetics and rhetorical impacts of audio. Depending on the course, we might also include exam questions that involve critical and careful listening and analysis of 2–3 min of relevant podcast audio (where headphones and listening devices are allowed as an option) to center sonic ontologies and modes of inquiry in our assessments.
Limitations
This study has been exploratory in nature; we acknowledge a range of limitations to our approach. We trusted the data collection and vetting processes of Listen Notes as the foundation of our sampling. We did not fully consider NLM's newer generative features and more diverse voice options (Malik, 2025), nor has our analysis considered the proliferation of AI audio-generating tools beyond NLM. Our sample of only eight episodes is a tiny representation of sonic science and technical communication. Furthermore, the GAI episodes we reviewed and sampled from were generally far shorter than Radiolab's average output.
Our methods of close listening are constrained by our fallible human perception, attention, and interpretative ability. Representations of what we heard are also limited by the constraints of writing for a text-based journal, as we rely on the diegetic to describe the nondiegetic and inevitably fall short. Transcriptions can begin to capture the layers of sound that texture the sonic landscapes of these podcasts, but the full sonic experience of listening indisputably extends far beyond the transcribed linguistic text.
There are also ethical concerns that arose from our research methods. In the process of conducting this research, we have grown the listenership for four GAI podcasts, which could encourage the proliferation of such content. Responsibility for GAI content creation, publication, information validation, and circulation is too easily washed out in the black box of these technologies; it takes real focus to remember and account for the materiality and human labor behind it all. The ongoing energy and enthusiasm for proliferating AI technologies come at environmental and social costs that are difficult to render visible but nevertheless crucial for us to consider (Aguilar, 2025; Edwards, 2025; Sano-Franchini, 2025). We hope that with all future iterations of these technologies, further research on their contributions to TPC and aesthetics will continue to help us refine reasonable and careful approaches beyond what we have begun to do in this essay.
Conclusions
Well-designed and carefully composed audio can build a strong sense of ethos and develop trust among technical communicators and audiences. Aesthetics, sensation, and affect are not mere adornments or superficial extras. Aesthetics serve meaningful rhetorical purposes for technical communicators of all kinds to relate to their work and to each other. We find that GAI audio can mimic some of the prosody and shared syntax of sonic TPC aesthetics, but with little variance or function in service of the narrative, emotional, and ontological orientation of science communication. What results is a surface-level facsimile that orients listeners to science as a knowable, containable discipline rather than a wide-ranging set of practices largely marked by unknowns and processes of discovery.
Our recommendations are twofold: (1) those who use GAI must clearly disclose their use for ethical reasons, and (2) all of us should work to tune and refine our own sonic literacies.
First, the humans proliferating AI content must disclose how such content is created and what sources it draws from. Anyone who uses any kind of AI system to generate audio should openly disclose their processes and mark such content as artificially voiced. Honestly disclosing one's selection of GAI engines, input materials, prompts, and the instructions provided to GAI will help build trust and allow more informed engagement with the content being distributed. Moreover, given the labor and resources required to “improve” the aesthetics of GAI audio in the long term, it may be more ethical and efficient to invest in human talent.
Second, we emphasize the importance of a carefully honed ear and intentional, critical listening practices as key building blocks of media literacy. There is much about audio content like podcasts that is messy, disorienting, confusing, and complicated, requiring time and energy to engage with. Nevertheless, tuning our ears to rich and varied audio aesthetics is valuable. Allowing sonic literacies to atrophy means surrendering curiosity and open exploration in exchange for superficial mastery, and prioritizing stability and sameness over the messy complications that offer opportunities for growth.
Inattention to sonic aesthetics in technical communication—or worse, willful disregard for it—means risking alienating audiences, lost opportunities, and missed connections among texts, communities, and conventions. Listening, rather than hearing, is an essential intervention for practitioners and pedagogues of TPC. Allowing ourselves to ignore these aspects of communication will, over time, diminish our critical listening skills and result in lower-resolution experiences with the otherwise richly meaningful aesthetics of sonic content.
Footnotes
Acknowledgments
The authors greatly appreciate our colleagues who have indulged us in meandering scholarly discussions and supported our work in writing groups and beyond: Angela Beck, Shannon Lodoen, Phil Chauveau, and Jonathan Adams.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data sets used in this article are publicly available in the corresponding citation.
