Abstract
This article introduces ‘prompt culture’ as an emergent ecosystem at the intersection of artificial intelligence, language, market economy, and visual culture. Drawing on Vilém Flusser’s hybrid scholarship, this study seeks to make a theoretical contribution to the field of AI critical studies by positioning prompt culture between textual sequences and visual simultaneity. While Flusser observed that lines (text) historically emerged to interpret surfaces (images), prompt culture represents an unprecedented inversion, allowing texts to generate images through ‘technical images’: visual outputs that signify concepts rather than phenomena. The article maps prompt culture’s multidimensional ecosystem across four interconnected domains: divergent professional practices among prompt engineers and artistic creators; markets and industries evolving around prompts; linguistic and rhetorical dimensions of prompting as an inverted ekphrasis; and creation of ‘hyper-technical images’, technical images of technical images. This article establishes a theoretical foundation for understanding how AI image generation technologies reshape relationships between textual and visual meaning-making.
Keywords
Introduction
In the early decades of the 21st century, a shift has emerged in how humans interact with visual technologies to create images. Where once we manipulated tools directly, dragging brushes across canvases, or clicking buttons on a camera, we now increasingly operate through a peculiar linguistic intermediary: the prompt (Liu and Chilton, 2022; Oppenlaender, 2022). While commonly defined as a text in natural language used to create images with an AI system, prompts are far from ‘natural’. They constitute a multilayered techno-linguistic formation that is changing the relationship between images and words and between humans and machines. The practice of writing textual inputs for AI generative models represents more than a simple technological shift; it signals the emergence of what I term ‘prompt culture’: a multifaceted ecosystem encompassing the practices, markets, communities, aesthetics and theoretical implications of text-based image generation.
Prompts operate across multiple registers, functioning simultaneously as linguistic artifacts, technical instruments, and cultural mediators. As linguistic artifacts, prompts employ natural language, i.e. the language humans use to communicate with each other in order to communicate with machines. These machines, LLMs, cannot understand this language hermeneutically, the way humans do, but operate according to statistical probabilities and pattern recognition (Bender and Koller, 2020; Shanahan et al., 2023). While humans use language, embedded with sociopolitical cultural meanings, machines process it as statistical relationships between vectors in a latent space. This misalignment reveals the paradox at the heart of prompt culture: the more successfully we communicate with machines through natural language, the more we risk obscuring the very meaning of being human, that is the plurality, that language conveys.
As technical instruments, prompts function by way of commands or code that require a set of specific skills, thus establishing prompting as a new professional practice that needs to be learned and requires specialized knowledge (Oppenlaender et al., 2025). This need has constituted a new industry, with new professional categories, such as prompt engineers, and new markets where prompts can be bought and sold. As cultural mediators, prompts invoke and transform a rich tradition of verbal image-making. They echo the rhetorical practice of ekphrasis – the vivid description of artworks which served as a device that allowed an audience in antiquity to visualize with the eyes of their mind an art work which they could not see. Prompts, on the other hand, present a form of inverted ekphrasis, as words are used to create an image that materializes in front of the eyes in seconds. Thus, within prompt culture, words no longer describe but rather instantiate visual representation.
These shifts carry profound implications for culture in general and visual culture in particular as AI-generated images become increasingly integrated into our cultural ecologies. While traditional art pedagogy maintains clear distinctions between tools, materials, apparatuses, and artifacts, AI image generators collapse these distinctions. The prompt functions simultaneously as a tool (like a pencil or brush) and a material (like paint or clay), while the model operates as the apparatus (like the camera) for mediating visuality, introducing a new participant into this ecology: the algorithmic interpreter that transforms linguistic descriptions into visual outputs.
The theoretical anchor for this article draws on the hybrid scholarship of Vilém Flusser, who wrote in the 1970s and 1980s, among other things, on the coevolution of humans and media. 1 Following Flusser’s (2002) distinction between line and surface, between textual sequences and visual simultaneity, I argue that AI image generation represents an unprecedented event in the relationship between texts and images. While Flusser observed that lines (text) historically emerged to interpret and analyse surfaces (images), prompt culture inverts this relationship, allowing lines (text) to generate surfaces (images), thereby creating new temporal and cognitive dynamics in visual production. This article maps the contours of prompt culture across four interconnected domains: the professional practices emerging around prompting; the markets and industries that commodify it; its rhetorical and linguistic dimensions as a form of inverted ekphrasis; and its production of what I call ‘hyper-technical images’ – technical images derived from technical images that add another layer to Flusser’s framework.
Theoretical Anchor: Lines, Surfaces, and the Zero Dimensionality of the Latent Space
Vilém Flusser’s media archaeology of lines and surfaces provides the theoretical framework for this article’s analysis of how AI image generation technologies are reshaping the relationship between textual and visual modes of meaning-making. Drawing on the phenomenological difference Flusser traces between lines and surfaces, this article argues that prompt-based image generation creates a ‘double net’ effect: a structure of compounded conceptual filtering that both enables and constrains the (impossible) translation between text and image in ways Flusser’s framework illuminates.
Situating Flusser’s work, however, requires attention to both his method and his reception. The label ‘media theorist’, now commonly applied to Flusser, emerged primarily through the posthumous framing of his German-language publications; Flusser himself did not use this designation; he saw himself mainly as a philosopher (Finger et al., 2011). His approach is better understood through his practice. In ‘Towards a Theory of Techno-Imagination’, for instance, Flusser explicitly employs ‘phenomenological intuition’ to analyse the relationship between photographers and cameras (Flusser, 2012: 197). This phenomenological attention to lived experience, clearly visible in Gestures (Flusser, 2014), operates alongside, not in opposition to, his analysis of technical apparatuses. Flusser’s inquiry moves between the structure of apparatuses, the consciousness of those who operate them, and the embodied ways in which they do so.
This hybridity extends to his research methodology. Unlike German media theory in the Kittler tradition, which adopts what Wellbery (1990: xii) characterizes as a ‘post-hermeneutic’ stance, bracketing phenomenological experience in favour of the autonomous logic of technical systems, Flusser insists on holding both together: the apparatus and the consciousness that encounters it. It is this dual attention that makes his framework productive for analysing prompt culture, where the lived experience of writing a prompt and the computational processing of that prompt by an apparatus must be theorized simultaneously. This article draws primarily on three of Flusser’s texts: Writings (Flusser, 2002), Towards a Philosophy of Photography (Flusser, 2007), and Into the Universe of Technical Images (Flusser, 2011), because these works develop his most sustained treatment of the line-surface dialectic and the concept of the technical image.
By positioning writing (lines) and images (surfaces) within a historical dialectic, Flusser reveals how media forms structure thought itself, creating distinct modes of being-in-the-world. Flusser’s framing of this dialectic between image and concept resonates with, but differs from, other accounts of the image-concept relationship in the history of science and technology, notably Peter Galison’s (1997) analysis of how ‘image’ and ‘logic’ traditions in physics produce knowledge through their interaction rather than their opposition. Where Galison examines how material instruments mediate between pictorial and propositional reasoning within scientific practice, Flusser generalizes the dialectic to the level of civilizational modes of consciousness. For the Western Cartesian tradition, he notes, ‘lines are discourses of points’ and ‘each point is a symbol of something’ out there in the world (a ‘concept’). Therefore, ‘lines represent the world by projecting it as a series of successions, in the form of a “process”’ (Flusser, 2002: 21). Yet, the world in which we live, observed Flusser already in 1973, is increasingly a world of surfaces, of images. This shift from a predominantly linear to a predominantly imaginal culture provoked Flusser to formulate a series of vital questions: ‘What do these surfaces mean? Do they represent the world in the same way as lines? Is thought expressed in lines and thought expressed in surfaces the same kind of thought?’ (Flusser, 2002: 22).
Flusser’s response focuses not on the ‘essence’ of lines and surfaces but on their different temporal structures. ‘We must follow the written text if we want to get at its message, but in pictures, we may get the message first, and then try to decompose it’ (Flusser, 2002: 23). A glimpse of a few seconds may suffice for ‘getting’ an image; describing that same image in words may take far longer and inevitably transform it, because ‘in reading lines we follow a structure imposed upon us, whereas in reading pictures we move rather freely within a structure that has been proposed to us’ (Flusser, 2002: 22). A terminological note though is necessary here. In ‘Line and Surface’, Flusser uses ‘surfaces’ and ‘images’ in overlapping but not identical ways. ‘Surfaces’ designate the two-dimensional medium – the plane that can be scanned by the eye, whether a TV screen, photograph, painting, or cave wall. ‘Images’ slides between two registers: sometimes referring to what appears on surfaces (the visual content), sometimes to a mode of thought itself (‘imaginal thought’ as opposed to ‘conceptual thought’). Flusser does not rigorously distinguish these terms; at one point he writes of ‘surface images’ as a compound (Flusser, 2002: 25), suggesting the image is what emerges through engagement with the surface, but not always necessarily the case. For the analysis that follows, I refer to surfaces as what AI systems generate (arrangements of pixels) and to images as what users envision, perceive and interpret, though the boundary between the two is precisely what prompt culture destabilizes.
Until the appearance of mass media, Flusser writes, Western thought was primarily expressed through lines. Surfaces, which historically preceded lines (cave paintings, for example) and were one of the first methods humans used to produce intersubjectivity, enact a different structure on thought, representing the world through images and implying what Flusser calls an ‘unhistorical being-in-the-world’. Mass media, mainly through film and television, generate a new form of thought creating a ‘posthistorical being-in-the-world’ (Flusser, 2002: 26). Yet, this apparent progression from image to concept to image is more complex than it seems. Our contemporary ‘imaginal media’, Flusser observes, ‘are obviously developments from conceptual thought; for one thing, they result from science and technology, which are conceptual’ (Flusser, 2002: 31). While a painting may tell a story, a film tells its story differently: historically, sequentially, along a line.
Prompt culture intensifies this complexity. Where Flusser’s temporal analysis concerns the distinct temporalities of experiencing lines and surfaces, prompt culture introduces a further temporal complication at the level of production. In prompt culture, the act of production and the act of experience are no longer separable: the prompt writer simultaneously composes (production) and anticipates the visual result (experience), while the viewer of a generated AI image encounters a surface whose production temporality is inscribed in its visual characteristics. Flusser himself gestured toward this collapse when he argued that technical images blur the distinction between producer and consumer, turning both into ‘players’ of the apparatus (Flusser, 2007).
Prompts operate in linear time – typed sequentially, processed algorithmically – but aim toward the production of surface time: the immediate visual impact of the generated image. At the same time, the generated image itself exists in a peculiar temporal state: it references long historical times of image making. Think, for example, of the four years Michelangelo and his assistants needed to paint the Sistine Chapel versus the time it took to take good photographs of the Chapel, the time it took to digitize these photographs (cropping, colour, light adjustments) and upload them to the web, and the time for their circulation through different platforms. To these times we need to add the time of tagging, encoding, and decoding historical visual cultures, as well as the time invested in developing AI systems.
Prompt culture collapses these multiple temporal registers into one ‘algorithmic time’. In this new temporal order, the different temporalities are computationally flattened into statistical relationships that can be instantly recombined through prompts. The prompt itself operates in this collapsed temporality, functioning simultaneously as a technical command (processed sequentially by algorithms) and a cultural invocation (drawing upon aesthetic traditions, genres and styles instantaneously). In this way, the ‘spiral movement from image through concept to image’ that Flusser described becomes not just dialectical but recursive and exponential, each generated image potentially becoming training data for future models, creating closed loops of ‘technical images’ that reference other ‘technical images’.
Flusser’s account of technical images rests on a broader model of abstraction. At the final step, ‘the situation disintegrates into a swarm of particles and quanta, and the writing subject into a swarm of bits and bytes, moments of decision, and molecules of action. What remains are particles without dimension that can be neither grasped nor represented nor understood. They are inaccessible to hands, eyes, or fingers. But they can be calculated’ (Flusser, 2011: 10). Technical images emerge when these dimensionless particles are computed into ‘mosaic-like combinations’ assembled into visible surfaces through calculation rather than depiction. This yields Flusser’s crucial distinction: ‘The difference between traditional and technical images, then, would be this: the first are observations of objects, the second computations of concepts’ (Flusser, 2011: 10). Elsewhere Flusser describes such images as products of ‘zero-dimensional elements’ generated through what he calls a ‘calculating, formal consciousness’ (Flusser, 2002: 14; Krtilova, 2016: 2–3).
To position the dialectic historically, Flusser suggests ‘a ladder with five rungs’ (Flusser, 2011: 6). The model depicts a progressive distancing from concrete experience across five dimensional levels: from four-dimensional spatial-temporal continuum of lived experience, through three-dimensional objects (tools, sculpture), to two-dimensional surfaces (traditional images), to one-dimensional lines (writing, text), and finally into zero-dimensionality, the realm of calculation and computation, where even the linearity of text dissolves into dimensionless particles, points, and bits (Flusser, 2011: 6–7).
Flusser insists that the program underlying technical images ‘is calculations’, a formulation that anticipates the statistical operations of contemporary AI image generation with remarkable precision (Flusser, 2002: 114; Krtilova, 2016: 3). Yet as Krtilova observes, Flusser’s turn to technical images as the privileged site for reflecting computation risks obscures ‘the specifics of numerical thinking’, since ‘the link between image and number tends to push the specifics of numerical thinking into the background, preferring the vivid and seemingly obvious image’ (Krtilova, 2016: 7–8). This attention matters for the present analysis: the computational register, the zone of probability distributions, vector operations, and latent space, operates according to a logic irreducible to either the textual or the imaginal. The ‘double net’ I develop below must be understood as encompassing this computational layer, not as a simple oscillation between lines and surfaces.
Following his distinction between traditional images and technical images, Flusser distinguishes between traditional ‘imagination’, the capacity to decode images by identifying a human producer’s intention, and ‘techno-imagination’, which he defines as ‘the ability to lay technical images bare as symbols and to decipher them, bringing their hidden and masked “intentions” to the recipient’s attention’ (Flusser, 2012: 196). Technical images, unlike traditional pictures, appear to be ‘symptoms’ caused by what they depict, as footprints are caused by feet. This symptomatic appearance is an illusion, Flusser argues, because techno-images remain symbolic, but their encoding follows rules ‘originating from scientific theories’ rather than from individual human intention (Flusser, 2012: 199). The theories that make technical images possible become, in Flusser’s formulation, ‘pre-texts’: texts that ‘precede them in the form of scientific theories and render them possible as such’ (Flusser, 2012: 200).
What makes techno-imagination phenomenologically distinctive is that it requires a different mode of consciousness than either traditional image-reading or text-reading. Where traditional imagination involves recognizing a human producer’s intention behind an image, and conceptual thought involves following the sequential logic of a text, techno-imagination demands that the viewer decode an image whose ‘intention’ belongs not to a human agent but to the apparatus’s program. In prompt culture, this challenge is intensified: the user must simultaneously exercise conceptual thought (composing the linear prompt), imaginal thought (envisioning the desired surface), and a nascent form of techno-imaginal thought (anticipating how the apparatus will process the intersection of the two). The prompt writer thus occupies a phenomenological position without clear precedent, translating between modes of consciousness that Flusser’s framework identifies as fundamentally distinct.
Flusser considered photographs as the first technical images, followed by videos and computer-processed images. Against the magic of photographic indexicality, Flusser argues that, as abstractions, images cannot be considered true or real representations of reality. For example, the very act of making a scene visible through a photograph necessarily renders some aspects of the same scene invisible. Through framing, focus, temporal compression, or technological constraints, every photograph creates a partial view that can be mistaken for completeness. Sometimes, images obscure the reality they aimed to show, not despite their representational authority but because of it (Offert and Phan, 2024). The ambiguity encoded in images, their capacity to evoke different feelings, memories and actions, to contain more than what can be conveyed in words is the source of their power and may be the reason the Western tradition thought to privilege texts’ conceptual clarity over images’ overloaded complexity.
Flusser challenges this tradition by introducing a metaphor: ‘when we translate image into concept, we decompose the image, we analyse it. We throw, so to speak, a conceptual point-net over the image, and capture only such meaning as did not escape through the meshes of the net’ (Flusser, 2002: 28). Like a fishing net, our conceptual frameworks can only grasp certain types of meaning while others slip through undetected. Conceptual thought offers clarity and distinctness at the cost of fullness; imaginal thought represents facts more fully and completely, while conceptual thought represents them more clearly (Flusser, 2002: 28). Describing a painting in words, we gain analytical precision and logical structure but lose the simultaneous richness, ambiguity, and complexity that the image contains and that can be perceived all at once.
Extending Flusser’s metaphor, I propose that AI image generation involves not one but two successive point-nets, corresponding to two architecturally distinct operations within contemporary multimodal systems: a structure I call the ‘double net’ effect. In systems such as Stable Diffusion or DALL-E, a contrastive encoder learns associations between textual and visual fragments through self-supervised training on billions of web-scraped image-text pairs, while a separate generative model produces new images from the encoded representations (Impett and Offert, 2024; Offert and Phan, 2024; Radford et al., 2021). These are not two stages of a single process but two distinct apparatuses joined at the zero-dimensional space of the latent vector.
The first net is cast by the encoder as it compresses visual culture and language alike into vector coordinates: dimensionless points in a high-dimensional mathematical space. Unlike earlier supervised systems such as ImageNet, where human annotators imposed categorical labels through the WordNet hierarchy (Denton et al., 2021), contemporary self-supervised models derive their associations statistically, without direct human mediation. As Impett and Offert (2024) have argued, this shift from supervised to self-supervised learning fundamentally changes the quality of what is omitted: the filtering is no longer governed by the biases of human categorization but by the statistical regularities the apparatus itself extracts. Yet the supervised paradigm is not simply superseded. The web-scraped data on which contemporary models train carries within it the archaeological traces of decades of taxonomic practice: ImageNet-era categories, platform tagging conventions, and the ‘ghost work’ of labelling economies (Gray and Suri, 2019). Crawford’s (2021) critique of classification as erasure describes not the current mechanism but a sedimented layer whose effects persist within the training data.
The second net operates through the generative decoder, which fabricates new surfaces from the vector coordinates produced by the encoder. The prompt activates specific regions of the latent space, but the decoder reconstructs visual information by drawing on statistical constellations that exceed what the prompt specified, filling gaps according to probability distributions rather than semantic intention. This second operation is performed by a different apparatus, often trained on different data, with no structural guarantee that the associations learned by the encoder correspond to the generative capacities of the decoder. The ‘double net’ thus describes not a metaphorical doubling but an architectural discontinuity at the heart of multimodal systems: the descent from surfaces and lines into zero-dimensionality (encoding) and the ascent from zero-dimensionality back into surfaces (generation) are performed by separate black boxes, joined only by the numerical coordinates they exchange. Each passage filters differently: the first compresses qualitative cultural material into quantitative vectors, the second projects those vectors into simulated visual surfaces; and the gap between them is where meaning is both lost and fabricated. A prompt invoking ‘wabi-sabi aesthetics’ activates vector coordinates proximate to certain textures and colour palettes, but the Japanese philosophical tradition and the concept’s relationship to impermanence that gives the concept its meaning, inseparable from material practice and temporal experience, has no vector representation. What the decoder produces is the statistical shadow of a cultural formation.
This recursive feedback points toward the technological realization of Flusser’s vision of a post-historical visual culture where the relationship between text and image becomes cyclical rather than linear, yet at the cost of missing meaning in each cycle. Each generated image carries traces of countless previous images that were once decomposed into lines, now reconstituted into new surfaces that will themselves be subject to further analysis and regeneration. In this new paradigm, prompts do not simply describe desired images: they participate in an ongoing dialogue between conceptual and imaginal modes of meaning-making, creating closed loops of images and concepts that cannot escape the double net and risk collapsing into themselves (Shumailov et al., 2024).
Professional Practices: Between Engineers and Artists
Prompt culture transforms visual craft and expertise into ‘prompt craft’, that is, the practice of understanding how different linguistic units, from words and phrasings to stylistic references and technical parameters, influence generated images. This craft has developed into a new professional practice: prompt engineering (Dang et al., 2022; Deckers et al., 2024; Liu and Chilton, 2022; Oppenlaender et al., 2025). Despite the ‘engineering’ label, prompting is more than a technical skill; it constitutes a new form of cultural craftsmanship that operates at the intersection of the double net problem. Practitioners must navigate the limitations of both the original training data translation and the constraints of natural language description, developing expertise around what cannot, by definition, be fully resolved.
This new craftsmanship builds on three discursive lineages that reveal how practitioners attempt to maximize what survives the conceptual nets’ filtering processes: artistic discourse (art history, schools, movements), technical discourse (apparatus, lighting techniques, materials), and stylistic discourse (aesthetic qualities, cultural references, emotional tones, metaphors) (Laba, 2026; Oppenlaender et al., 2025). Within practitioners’ communities, certain keywords function as linguistic triggers that improve output quality or invoke specific styles (Liu and Chilton, 2022; Oppenlaender, 2023). Expert practitioners are distinguished by their skillful deployment of ‘prompt modifiers’: objects, adjectives, settings, colors, mood, atmosphere, perspective, composition, analogies, comparisons, metaphors and specific artistic styles that function as linguistic triggers with the model (Oppenlaender, 2022, 2023; Xie et al., 2023). Because initial outputs are highly random, as a consequence of the ineffectiveness of discrete language for describing images, effective prompting often requires several iterations, revealing how cultural knowledge becomes technical expertise navigating the gap between qualitative intention and computational processing (Oppenlaender, 2023).
Practitioners adopt distinct strategies within the field. Some reclaim creative agency by positioning themselves as artists working within algorithmic systems, asserting human intentionality despite the double net constraints (Duester, 2024), while other practitioners speak about co-creation with AI technologies (Moura, 2024; Oppenlaender et al., 2025). Others adopt a critical stance, deliberately crafting prompts or influencing the dataset and the system in ways that expose or subvert generative AI’s limitations, making visible what escapes the conceptual nets (Crawford and Paglen, 2021; Paglen and Downey, 2024). Still others focus on community building, forming professional networks that share techniques, establish ethical standards, and advocate for recognition as legitimate cultural workers operating in this new techno-linguistic-visual field.
Amid these different strategies, the persistent challenge that practitioners face may reflect a deeper transformation which Joanna Zylinska (2023) identifies as a transformation in human perception by algorithmically driven images. Rather than maintaining human creative agency within algorithmic systems, Zylinska’s framework suggests practitioners participate in what she calls the ‘perception machine’, a hybrid system where human and machine vision become indistinguishable. Within this framework, prompt modifiers and specialized vocabularies are not only human expertise operating on machines but evidence of a transformation in human perception itself. Prompt engineers are not just learning to communicate with AI systems; they are being trained to judge, see and imagine in ways that are compatible with machine vision (Musih and Fisher, 2024). In Flusser’s terms, this constitutes the emergence of a new form of techno-imagination: practitioners learn to imagine through the statistical logic of the apparatus, internalizing its associative patterns as a condition of effective practice. This transformation of human perception creates new forms of value that extend beyond creative practices, as the linguistic competencies required for effective prompting become commodified resources in emerging digital markets. The same processes that reshape human vision also generate new sites of capital accumulation.
The Commodification of Conceptual Nets
Prompting is a language-based practice and the use of language is supposed to be intuitive to humans. Yet, while generative AI companies promote their tools as requiring only natural language interaction, they simultaneously invest substantial resources in teaching users how to improve their prompting. 2 Creating effective prompts demands specialized methods. Schulhoff et al. (2025) document 58 distinct prompting techniques, ranging from basic instruction formats to complex multi-step reasoning chains. The prompt industry emerges from the gap between the promised seamlessness of ‘natural language’ interaction and the actual complexity of navigating conceptual nets. Rather than democratizing image creation and generating an ‘engine for imagination’, as corporate narratives suggest and Nataliia Laba (2024) critically analyzes, prompt culture becomes a site of digital enclosure where linguistic competence is restructured as a technical expertise requiring market mediation.
These dynamics unfold across three fields: the educational market around prompt engineering, the prompt market where prompts are sold, and the extraction practices of tech companies that collect, archive, and reuse user prompts in non-transparent ways. The prompt engineering market, encompassing software, services, and platforms, was valued at over USD 220 million in 2023 and is projected to exceed USD 2 billion by 2030, reflecting the rapid commodification of the interpretive labour that mediates between human intention and computational processing. Companies offer extensive documentation, tutorials, and best practices guides for prompt engineering while a secondary market of courses and instructional materials proliferates. These materials promise accessibility through natural language, while simultaneously acknowledging that optimization requires specialized knowledge. The companies that create the models thereby position themselves as necessary intermediaries to bridge this gap. This rhetorical structure creates a market solution for what is essentially a theoretical impossibility: the translation of imaginal thought into conceptual frameworks.
Launched in mid-2022, PromptBase was one of the first marketplaces dedicated to prompt trading, allowing users to purchase specific prompt formulations for as little as $1.99. By 2025, it had hosted over 260,000 prompts compatible with models including ChatGPT, Midjourney, Stable Diffusion, and DALL-E. PromptBase is one node in a broader AI prompt marketplace ecosystem. The global market for platforms where prompts are bought, sold, and traded – spanning text, image, audio, and video modalities – was valued at approximately USD 1.4 billion in 2024, with projections exceeding USD 10 billion by 2033 (Grand View Research, 2025). That the marketplace for trading prompt outputs substantially exceeds the market for optimizing prompt inputs is significant: it suggests that commodification, rather than skill development, drives the economic engine of prompt culture. Consumers buy the prompt rather than the image, acquiring ‘a recipe’ for reconstructing the recursive feedback between conceptual input and imaginal output. In this scenario, the prompt is no longer or not just the means for producing an image but a product in itself.
The commodification of prompts through platforms like PromptBase exemplifies what Chow and Celis Bueno (2025) critique as the ‘cloak of creativity’: the instrumental harnessing of creativity discourse to obscure labour extraction. Users believe they are creating and selling creative products, but they are providing the interpretive labour necessary to bridge the fundamental misalignment between human meaning-making and algorithmic processing. This creativity discourse justifies participation in what amounts to unpaid training of proprietary systems. Platforms capture behavioural patterns to improve prompting, whether through suggestions of how to improve a prompt as in ChatGPT and DALL-E, or through ‘shadow prompting’ (Salvaggio, 2023), while offering minimal compensation for individual sales. The commodification that matters most, then, operates not at the level of individual transactions but at the infrastructural level: platforms own the zero-dimensional substrate, the latent space, through which all prompt culture must pass, extracting value from each translation between human meaning-making and computational probability.
Corporate data collection practices reveal this extraction at work. Midjourney’s Terms of Service (2024) exemplify this mechanism: the platform retains rights to all prompts, generated images, and user interaction data ‘including text, images, and those generated from spoken input and summarized into text, and other content such as photos, videos, documents, and messages that you input into the Services’ for ‘service improvement and model training purposes’. 3 Users’ prompt histories, including failed attempts and iterative refinements, become proprietary datasets. The platform’s ‘Prompt Optimization Suggestions’ feature, which recommends improvements to user prompts, demonstrates how extracted behavioural data gets repackaged as value-added services.
The challenge for critical theory is to imagine how the recursive feedback between text and image might be organized differently, as a commons rather than a commodity, a site of collective meaning-making rather than corporate value extraction. This requires not just technical innovation but political alternatives that prioritize the democratic circulation of meaning over the accumulation of capital from linguistic labour. Such imagination becomes possible when we recognize that the capacity to create images through words draws on rhetorical traditions far older than contemporary AI systems, suggesting that current commodified arrangements are neither natural nor inevitable.
Inverted Ekphrasis: The Textual and Rhetorical Dimensions of Prompting
Creating images with words is not new. Ancient rhetoricians developed ekphrasis, an art in itself, to describe visual representations and thus create images in the mind of their audiences. As W.J.T. Mitchell (2005) writes: ‘You can hang a picture, but you cannot hang an image’ (p. 85). Although Flusser writes about images, and later on about technical images, his analysis concerns images that appear on a surface. What makes AI image generation unprecedented is not that language can evoke visual experience (it happened in antiquity) but that it can be visually materialized on surfaces. This materialization process reveals prompting as a contemporary manifestation of rhetoric as techne and a craft with ancient roots and new consequences (Hallsby, 2024).
Where rhetorical ekphrasis involves verbal description of visual art (lines) to create images (surfaces), generative AI systems invert this relationship, transforming textual descriptions (lines) into visual outputs (surfaces). Yet, contrarily to the rhetorical ekphrasis, the prompt does not describe existing images, but rather activates conceptual relationships encoded within the AI system to generate. The relationship between prompting and ekphrasis has recently attracted sustained scholarly attention (Bajohr, 2024; Scorzin, 2024; Verdicchio, 2024). While Bajohr productively theorizes text-to-image generation as ‘operative ekphrasis’ in which the text/image distinction collapses, the present analysis emphasizes what does not collapse: the qualitative-quantitative incommensurability that persists at the zero-dimensional substrate through which both text and image must pass.
Where Bajohr applied the concept of operative ekphrasis to visual poems, prompt culture extends this operativity into the more general realm of image production. For example, the prompt ‘an enchanting farewell of the day painted in a palette of gold’ does not represent any existing sunset; instead, it performs a conceptual operation that generates a new image by activating statistical relationships between linguistic concepts – metaphors (farewell of the day) and colours (palette of gold) within the training data. Now, the words ‘farewell of the day’ and ‘palette of gold’, although metaphorical, have an ontological and thus representational value. But how about the word ‘enchanting’?
Paul Bellow, writing in the OpenAI Developer Community’s ‘master thread for all DALLE3 tricks and tips’, observes that analogies and comparisons sometimes prove more effective than literal descriptions, yet also recommends avoiding overloading the prompt and using specific and detailed language. 4 Similar practitioner advice proliferates across platforms: every major AI system has its own ‘tips and tricks’ for image generation, and communities on social platforms like Reddit, Meta, and TikTok share techniques for obtaining the best prompt results. Existing taxonomies of prompting, such as Oppenlaender’s (2023) taxonomy of prompt modifiers, classify the content of prompts, that is, what practitioners include in them. The categories I propose operate at a different level: they concern the form rather than the content of prompts, describing the rhetorical mechanisms through which prompts function. Drawing on these discourses, and the rhetorical analysis developed in this article, I propose four analytical categories through which prompting operates:
Direction: prompts state in a specific and detailed way what is it that they want to convey – for example, a prompt like ‘a black cat on a windowsill’ will generate simply that.
Indirection: Effective prompts often use oblique rather than literal description. Rather than describing visual elements directly, prompt writers invoke moods, atmospheres, and cultural references that generate desired visual qualities.
Condensation: Prompts compress complex visual and cultural concepts into minimal linguistic triggers that can activate extensive statistical relationships within the model. So, for example, a phrase like ‘film noir aesthetic’ condenses decades of cinematic history, visual conventions, and cultural associations into a compact textual command.
Layering: Prompts combine multiple discursive registers, technical parameters, aesthetic references, emotional descriptors, and cultural invocations, creating semantic density that activates multiple pathways within the model simultaneously.
When ancient rhetoricians crafted verbal descriptions that could make audiences ‘see’ (with the eyes of their minds) surfaces that were not in front of them, they were exploiting a fundamental porosity between linguistic and visual meaning-making – that of imagination. Prompt culture repurposes this ancient recognition, positioning automation instead of imagination. While classical ekphrasis operated through direction that was culturally shared, with both rhetorician and audience drawing from common visual vocabularies, AI prompting operates through direction that navigates the statistical aggregation of different cultural vocabularies. The indirection of traditional ekphrasis relied on human interpretive capacity to fill the gaps between word and image, but algorithmic indirection fills these gaps through algorithmic inference. Where rhetorical condensation compressed cultural references within living memory and shared experience, AI condensation operates across vast temporal and cultural distances, collapsing incompatible aesthetic traditions into single surfaces. The layering mechanism reveals this transformation more clearly: prompting does not simply automate ekphrastic convention but fundamentally changes it, replacing the shared cultural ground between speaker and audience with the statistical vectors of the latent space.
Transformation of Image Making
Images are not one thing. Different types of images and different practices of image making carry their own tradition, craft and values. Flusser distinguishes between traditional images, those produced by humans, and technical images, those produced by apparatuses (Flusser, 2002, 2007, 2011). Apparatuses are, for Flusser, ‘the products of applied scientific texts’ and technical images are therefore the ‘indirect product of scientific texts’ (mechanics, optics, chemistry, physics, etc.) which gives them a different historical and ontological position than traditional images.
Ontologically, traditional images are abstractions of the first order insofar as they abstract from the concrete world while technical images are abstractions of the third order: They abstract from texts which abstract from traditional images which themselves abstract from the concrete world. (Flusser, 2011: 14)
A painting, for example, translates the three-dimensional, temporal, multisensory experience of being in a landscape into a two-dimensional visual surface. The painting is an abstraction, a first order abstraction according to Flusser, because it captures some aspects of the landscape while necessarily leaving others out. Texts are abstractions of the second order, as they abstract from images, whether through art criticism or scientific descriptions. Technical images are abstractions from texts that are abstractions from images. A photograph, for example, does not simply represent ‘what was there’ as Barthes (1981) would put it, but materializes concepts embedded in optical theory, chemical processes, and mechanical engineering developed through textual and scientific knowledge. Irrgang (2023) traces this process to what he calls ‘projective imagination’ in Flusser’s work, a new form of human imagination that emerged from the convergence of information aesthetics and video art practices. This concept acquires a new force. What Irrgang identifies as technical images representing ‘a new kind of human imagination’ thus evolves in prompt culture into a hybrid form where human linguistic imagination interfaces directly with algorithmic generative processes, creating what might be understood as a ‘computational projective imagination’.
AI generated images represent a fourth order of abstraction. These images are abstracts from prompts (texts), which are abstracts from data (code), which are abstract from technical images (already digitalized images), which in some cases are abstractions from traditional images, which are abstracted from the concrete world. While photography translates optical and chemical concepts into visual form, AI image generation translates vast archives of already-technical images into new outputs through the statistical processing of technical images (photographs, digital art, scanned paintings) decomposed into mathematical representations in latent spaces. AI generated images might therefore be understood as ‘hyper-technical images’, technical images of technical images, fourth order representations, that add another layer to Flusser’s framework. I use ‘hyper’ deliberately to gesture that these are images produced through a recursively intensified process in which previous technical images have been decomposed into zero-dimensional vectors and reconstituted as new surfaces. The prefix marks a degree of computational mediation and distance from concrete experience that exceeds what Flusser’s original formulation of technical images encompassed.
The apparatus in AI image generation is, strictly speaking, the model itself: the trained system with its frozen pre-texts, its latent spaces, its capacity to transform textual input into visual output (Flusser, 2012: 197). The prompt is not an apparatus but a text that operates upon an apparatus already saturated with frozen texts. Yet prompts are hybrid entities, simultaneously scientific texts that function as technical commands and cultural texts that carry meaning, aesthetics, and ideology. Consider the difference between ‘RAW sunset photo, 85 mm lens, f/1.4, natural lighting’ and ‘a melancholic sunset in the style of Edward Hopper, lonely figure silhouetted against golden light, American realism, cinematic composition’. The first prompt approximates Flusser’s conception of a scientific text, with technical parameters corresponding to photographic apparatus settings. The second, however, operates as a cultural text, invoking artistic traditions, emotional registers, and aesthetic references that carry cultural meaning far beyond their technical function as algorithmic commands. Both function in the same way within the apparatus generating hyper-technical images.
Hyper-technical images represent what Zylinska (2023: 2) has recognized as the increasingly blurring distinction between image capture (as in photography) and image creation. Prompt culture operates precisely at this blurred boundary as prompts neither capture existing images nor create entirely new ones but activate statistical relationships within training data that itself consists of captured images, transformed into generative possibilities.
Conclusion
Prompt culture fundamentally reorganizes how humans create, interpret, and relate to images. The seemingly straightforward operation of writing a line to create a surface is not straightforward at all. It requires intermediaries tasked with translating between incompatible systems of meaning-making. This translation creates prompt culture as a new epistemic order where the historical distinction between conceptual and imaginal thought collapses into algorithmic mediation. As this article has shown, prompt culture encompasses an entire sociotechnical ecosystem that includes: the infrastructural labour of dataset preparation, new professional categories and market relations, hybrid discourses blending technical instruction with rhetorical invocation, the transformation of linguistic competence into algorithmic expertise, and the reconfiguration of visuality through training data archives. Prompt culture extends beyond individual interactions with AI systems to encompass the broader reorganization of the production, reproduction, and circulation of visual meaning.
Flusser’s observation of the spiral movement from image through concept to image becomes, in prompt culture, a continuous feedback loop that erases temporal distinctions between sequential and simultaneous meaning-making. The hyper-technical image, AI’s fourth-order abstraction, embodies this temporal collapse. Each abstraction layer removes knowledge that cannot be recovered. This filtering is not a bug in the system but a feature that generates continuous demand of human interpretative labour disguised as creativity (Chow and Celis Bueno, 2025). Corporate platforms exploit the creativity promise (Laba, 2024), transforming human communicative practices into linguistic labour extraction. The recursive feedback between prompts and outputs accumulates value from meaning-making while constraining it within proprietary systems. Because prompts cannot fully describe an image, the apparatus fills the gaps by drawing on statistical constellations to create hyper-technical images that progressively consume the diversity, differences, ambiguities, and contradictions that constitute the plurality of human creativity.
As platforms capture and process human linguistic labour, they simultaneously undermine the political and cultural conditions that made that labour possible. The phenomenon of model collapse, where AI systems trained on their own outputs produce increasingly homogenized results, represents one possible endpoint of this process (Shumailov et al., 2024). The hyper-technical image thus embodies a temporal paradox: it depends upon the accumulated diversity of human visual culture while systematically reducing that diversity through each cycle of generation. Each passage through the zero-dimensional substrate progressively eliminates what resists quantification: the ambiguity of cultural expression, the embodied knowledge of craft traditions, the contextual specificity of local visual cultures. These are structural consequences of routing cultural production through a space where qualitative distinctions have been replaced by statistical proximities. What survives recursive cycling through zero-dimensionality is precisely what is most amenable to mathematical encoding, and therefore most generic.
Yet, this course of events is neither necessary nor inevitable. Understanding prompt culture through Flusser’s lens suggests that its transformative potential lies not in more sophisticated prompting techniques or more powerful models but in different organizational forms that support a productive dialogue between conceptual and imaginal modes of thought. Instead of proprietary platforms that capture linguistic labour for capital accumulation, we could imagine commons-based systems where the recursive feedback between prompts and images serves collective cultural production, addressing model collapse not through corporate control but through community curation that actively maintains cultural diversity at the intersection of text and image.
Prompt culture emerges as a symptom of our current historical moment, a site of transformations in how humans relate to images, and potentially a site of resistance. Its analysis demands critical frameworks capable of addressing this new ecosystem that bridges linguistic artifacts, technical instruments and cultural mediators. Critical theory has developed tools for analysing texts, and visual studies has developed tools for analysing images. Prompts are not just, not quite texts, and the images they create are not just, not quiet images. We need to develop new critical tools to understand this space in between: prompt culture.
Footnotes
Acknowledgements
I am grateful to the anonymous reviewers of this article for their generous engagement and thoughtful suggestions. I thank Blake Hallinan for their careful reading and encouragement, and Eran Fisher for his intellectual friendship, for allowing me to think out loud, and for his brilliant insights.
