Abstract
This article examines how artificial intelligence (AI) reshapes qualitative research (QR) by accelerating the circulation of classifications and introducing recursive feedback loops that reuse their own outputs. Drawing on Hacking’s concept of looping effects, we extend it through two mechanisms: acceleration (compressed loop time) and recursion (output re-entry). Reviewing current AI uses in QR, data synthesis, and retrieval, we identify four epistemic risks: over-smoothing, self-validation, performativity amplification, and externalized reflexivity. We then propose four provisional, reportable evaluative criteria suited for AI-mediated QR. These criteria translate long-standing commitments to transparency and credibility into the techno-epistemic age.
Keywords
Extended AI-Accountability Statement
The authors verified and are fully responsible for all content. No AI-system is listed as an author. Generative AI-tools (ChatGPT 5.0 Pro, Gemini 2.5 Pro) were used during four steps:
Step 1 (Sep 2025): The initial idea was based on our previous work (Meyer & Miggelbrink, 2013) at the crossroads of qualitative fieldwork and Hacking’s work (1995). We intersected that with reflection on the inclusion of (semi-)automated processes of data analysis in a discourse-theoretical project on populism (Nguyen & Meyer, 2026). After researching literature on epistemic risks, we triangulated this database with an LLM-assisted (ChatGPT 5.0 Pro) search for further literature, (I) expanding the literature base. Based on this expanded corpus, we refined our summary of epistemic risks, and proceeded with the conceptualization based on Hacking’s concepts of recursion and acceleration.
Step 2 (Oct 2025): We derived initially five, later four provisional, reportable evaluative criteria. We then used ChatGPT 5.0 Pro and Gemini 2.5 Pro to (II) review these criteria based on their applicability and feasibility. Based on these reviews, we revised the criteria. In this step, all literature references were re-validated.
Step 3 (Oct 2025): We created the full manuscript based on the previous work, including Pro- and Epilogue, and diving into Haraway (1985). After the full manuscript was devised, we prompted ChatGPT 5.0 Pro and Gemini 2.5 Pro to (III) act as scientific reviewers and thoroughly, critically, and constructively generate extensive reviews, in response to which we then revised the manuscript.
Step 4 (Oct 2025): ChatGPT 5.0 Pro was used to cross-check reference usage in the list of references and in the manuscript, as well as regarding the necessary formatting, (IV) serving as a formatting assistant. This was checked manually afterwards.
Prologue
Qualitative inquiry has always been an endeavour of crossings – between voices and codes, emotions and notebooks, lives and concepts. But the crossings have changed. Circuits hum inside our methods now. They creep inside through their pervasiveness in our daily lives. Once sound and ink were patiently waiting for the interpretative labour we so ambitiously executed from the black boxes of our human subjectivity. This has become silicon and immediacy.
Who dares to doubt our capabilities of transference? The cyborg age no longer sits at a distance; it threads through our prompts, our internet searches, our private conversations. This information accelerates and loops back into what we hope to remain our real lives. Outputs re-enter as inputs; drafts become datasets; templates harden into habits.
Change at machine speed? Whilst some, nevertheless, attempt to retreat from it, we cyborg apologets have to reflect consequences and ponder new progressive futures from what could easily mirror our most regressive nightmares. Rather than retreat from this condition, we ask what AI changes in the classificatory loops through which we produce and evaluate knowledge. “By the late twentieth century, our time, a mythic time, we are all chimeras, theorized and fabricated hybrids of machine and organism; in short, we are cyborgs.” (Haraway, 1985, p. 6).
1. Introduction
Since the 1990s, data-, software- and hardware-driven innovations have been re-shaping interpretative qualitative research (QR) (Flick, 2019; Williams, 2024) – approaches whose goal is to “reconstruct fixed structures and meanings that actors follow and that can, at best, be interpreted by actors” (Knoblauch, 2008, p. 210) and approaches that assume a “fundamental scope for subjective interpretations” that “is not absorbed into meanings and structures” (Knoblauch, 2008, p. 210). Early on, these impulses came from utilizing software to assist in lexicometric and qualitative-analytical research processes such as content analysis. In the 2020s, new techno-epistemic assemblages have emerged and allowed new stimuli: Big Qualitative Data (BQD) (meaning: large, often textual corpora processed, e.g., with topic modeling, Computer-Assisted Qualitative Data Analysis Software (CAQDAS), or LLM assistance) (Chandrasekar et al., 2024), Synthetic Data in model training and evaluation (SDT) (the growing presence of synthetic or model-amplified text in training/evaluation to further expand its capabilities) (Shumailov et al., 2024), and AI-mediated Qualitative Analysis (AIQA/AIaQA) (making use of LLMs, embeddings, and auto-coding, integrated into software analysis pipelines 1 ) (Mayring, 2025). These stimuli expand what and how much can be analysed and how quickly; they also alter how qualitative classifications are formed and circulated (e.g., Foraker et al., 2025; Loni et al., 2025; Pink et al., 2025).
Undoubtedly, AI-tools can be a force for increased productivity in QR, interdisciplinary innovation, and for a more widely and readily available circulation of knowledge. Furthermore, they can allow us to enhance existing analytical approaches beyond much current capabilities. However, much enthusiasm around AI in QR implicitly imports quantitative ideals into interpretative practice, such as representativeness/generalization, reliability/reproducibility, and objectivity – each with discipline-specific limits in qualitative inquiry (Braun & Clarke, 2025; Davison et al., 2024; Nguyen & Welch, 2025). AI-mediated methods fuel hopes for an expansion of interpretative methods in a way that gives them the same (assumed) robustness as is assumed for quantitative methods. From this, one could deduce that qualitative research can be enabled to meet the same quality criteria as are customary for quantitative methods. • Representativeness/generalization: Enlarged corpora can broaden coverage and rhetorical reach, yet text corpora rarely rest on stable sampling frames. In discourse/content analysis, statistical representativeness is doubted as a suitable measure. Instead, the relevant standard is analytic generalization/transferability and transparent corpus construction (Bednarek, 2024; Keller, 2011; Stoltz & Taylor, 2024). For instance, in discourse/content analysis, “representativeness” is often meant to be semantic, meaning a proper coverage of positions/meanings, rather than a statistical measure. Corpus size widens coverage, but inclusion/exclusion remains theory-led and must be documented (e.g., via corpus boundaries, search strategy, timestamps) (Bendel Larcher, 2023; Gür-Şeker, 2014).
2
• Reliability/reproducibility: Automated routines can standardize mechanical steps (data gathering, transcription, retrieval, first-pass semi-automated coding), but interpretative reproducibility is bounded by model stochasticity and prompt/path (and retrieval) dependence. Furthermore, repeatability does not necessarily equal agreement in meaning (Golafshani, 2003; Tracy, 2010). Comparative studies report mixed results: hybrid human-AI-workflows sometimes match, but do not consistently exceed, top human or model baselines, especially on meaning-sensitive tasks (Castellanos et al., 2025; Liu et al., 2025; Nguyen & Welch, 2025; Paulus & Marone, 2025; Vaccaro et al., 2024). • Objectivity: Assistance can reduce some idiosyncrasies of researcher subjectivity, but it does not remove positionality from the underlying data or its situatedness/context, nor the biases embedded in models, data, and prompts. Claims to neutrality require ethical handling and transparent disclosure (model, data, prompts, oversight) rather than being presumed (Davison et al., 2024; Resnik et al., 2025).
Given AI’s increasing push into QR, and building on Hacking’s (1995) looping effects of human kinds, we argue that AI does not merely speed up analysis (Chubb et al., 2022); it reconfigures the mechanics of qualitative classification. This results in various effects, of which we specifically want to highlight two: As classifications circulate through tools and publics, they feed back faster into talk, practice, and thus data – resulting in acceleration (Rosa, 2013). Evidence from human-AI interaction shows feedback loops that can amplify or skew judgments (Brown et al., 2022; Glickman & Sharot, 2025; Pagan et al., 2023; Perdomo et al., 2020). This concerns a second shift – recursion: when pipelines (partly) re-use their own outputs or synthetic surrogates as inputs (e.g., auto-coding templates, prompt libraries, synthetic corpora), loops iterate without renewed contact with lived worlds, closing the evidential circuit and risking evidence erosion, synthetic drift of meaning, or even model collapse (Foraker et al., 2025; Loni et al., 2025; Shumailov et al., 2024).
Hacking’s original looping effect concerns the circulation of classifications through institutions and public life, their uptake, resistance, or inhabitation by the people classified, and the subsequent revision of those classifications through renewed encounters with practice. We extend this account in two analytically distinct directions. Acceleration names the compression of the interval between classification, uptake, and reclassification. Recursion names the re-entry of earlier outputs into later analytical stages without renewed empirical contact. Acceleration changes the tempo of the loop; recursion changes its evidential architecture.
This article develops a conceptual framework and proposes four provisional, reportable evaluative criteria for AI-mediated qualitative research. Specifically, it speaks to researchers using these tools in processing qualitative data who seek methodological reflections as well as methodologists reflecting, re-negotiating and reframing conditions and criteria for evaluating qualitative research methods. It extends existing debates on reflexivity and transparency by identifying acceleration and recursion as specific techno-epistemic dynamics. These criteria are not intended as fixed metrics or a closed checklist, but as points of orientation that make AI-mediated interpretative work visible, reviewable, and contestable. In this sense, the paper serves both as a theoretical extension of looping effects and as a methodological guideline for documenting AI-mediation in interpretative research.
Our focus here is interpretative QR on text- and transcript-based materials that we regard typical resources for QR single-method as well as mixed-methods projects (interviews, documents, online sources). We do not treat multimodal/sensory ethnography or visual analytics in depth. Our claims concern analytical practices. This framework is written primarily from and for interpretative, constructivist, and critical traditions of qualitative inquiry, where meaning, situatedness, and reflexive accountability are central, and where quality is not reducible to statistical representativeness or procedural standardization. Accordingly, the risks we foreground (e.g., over-smoothing, authenticity drift, performativity amplification, and externalized reflexivity) are framed as threats to interpretative validity and to the politics of classification. We nevertheless expect the framework to travel and change into more post-positivist qualitative or mixed-methods contexts: here, the criteria can be used as transparency and governance add-ons. However, we do not claim that these criteria replace validity- or reliability-related procedures typically emphasised in positivist paradigms; rather, they complement them by making AI-mediated decisions, feedback loops, and normative assumptions visible and contestable.
As a theory-led response: (i) we extend Hacking (1995) by defining acceleration (compressed loop time) and recursion (output re-entry) for AI-mediated QR (section 2); (ii) we provide a concise overview of the key technical developments of AI and their impacts on QR (section 3); (iii) we diagnose four groups of resulting epistemic risks (section 4); (iv) we propose provisional, reportable evaluative criteria for the AI use and concomitant risks (section 5). “A cyborg is a cybernetic organism, a hybrid of machine and organism, a creature of social reality as well as a creature of fiction.” (Haraway, 1985, p. 6)
2. Conceptual Background: Qualitative Epistemology and Hacking’s Legacy
Hacking’s account of so-called human and interactive kinds and their looping effects remains an influential framework for understanding how categories and the people classified by them coproduce one another. In the classic formulation (Hacking, 1995, 2006), expert classifications of people (often value-laden) circulate through institutions, media, and everyday practice. People recognize, resist, or inhabit these ascriptions, and consequently, their behaviour and self-descriptions change. Classifications develop a social life of their own by being reproduced, confirmed, criticized, and contested. The category is then re-made in light of those changes by scientific or diagnostic inquiry. Hence, human kinds are “moving targets” because the very act of classification alters what is being classified (Hacking, 1999/2000, 2006).
Qualitative inquiry has long addressed the performative circulation of classifications by foregrounding situatedness (Haraway, 1988), thick description (Geertz, 1973), abduction (Timmermans & Tavory, 2012), negative-/deviant-case analysis via constant comparison (Becker, 1998; Glaser & Strauss, 1967), reflexivity (Macbeth, 2001), or dynamics such as retro-action and counter-performativity (Hostniker & Meyer, 2024). QR has therefore been a significant source and target of criticism of reified and naturalized classifications as well as of classificatory infrastructures.
Subsequent authors have operationalized Hacking’s work. Contributions on classificatory infrastructures show how standards enact worlds (Bowker & Star, 2000). Works on quantification demonstrate that turning qualities into metrics reshapes behaviour and organizations (Espeland & Sauder, 2007; Espeland & Stevens, 1998; Mennicken & Espeland, 2019). Studies on peripheralisation illustrated how spatial ascriptions may influence behaviour in shrinking regions (Meyer & Miggelbrink, 2013). Economic studies interrogated markets and models as engines rather than cameras (Burrell & Fourcade, 2021; MacKenzie, 2006). Clinical studies reveal epistemological injustices developing in interactions between patients and clinicians (Gauld et al., 2025), and historiographies of objectivity locate neutrality claims in specific technologies, disciplines, and methods (Daston & Galison, 2007). Taken together, labels, rankings and algorithms do not simply discover social kinds; they re- and co-produce them (Fourcade & Healy, 2017a; Fourcade & Healy, 2017b) – a dynamic reflected and willingly utilized in modern marketing (Paulson & O’Guinn, 2018), self-optimization or plastic surgery (Elliott, 2019).
What is new in the 2020s is not looping per se but that AI modifies the temporal and evidential conditions of the loop. First,
In Hacking’s original account, looping still depends on renewed empirical encounters with people and practices. Our extension identifies two ways in which AI-mediated research modifies that structure. Acceleration compresses the time between classificatory acts and their uptake. Recursion, by contrast, allows classifications, summaries, prompts, labels, or synthetic surrogates to re-enter later stages of analysis as inputs. The novelty of recursion is therefore not simply greater speed, but a partial displacement of empirical correction by pipeline-internal circulation.
This has profound ramifications: In literature on machine learning, some models show performative prediction: they don’t just predict the world; their predictions change the world they then learn from (Brown et al., 2022; Perdomo et al., 2020). Pagan et al. (2023) offer a taxonomy of feedback loops (who/what feeds back, with what bias effects). Contemporary philosophy recently has revisited Hacking’s theses under these conditions (Tsou, 2025; Vesterinen, 2021). In short, looping may shift from slow, dialogical circuits anchored in empirical data to machine-assisted, high-frequency, partially self-referential circuits.
Hacking’s original account largely presumes that empirical correction happens through renewed encounters with people and practices (Hacking, 1995). Today, classification pipelines can self-validate, in the shape of prompt-library inheritances, fine-tuning on analyst-approved outputs, preferences for prior labels, and synthetic corpora that encode earlier manual classifications. The result is a lack of renewed empirical checks – a loop that tightens on its own outputs (Pagan et al., 2023; Shumailov et al., 2024). Similar dynamics appear across diverse disciplines, e.g., in applied domains where disclosing algorithmic scores induces self-fulfilling prophecies (Bauer & Gill, 2024), or when clinical practice co-produces diagnostic kinds (Gauld et al., 2025; Lindholm & Wickström, 2020).
Without explicit attention to acceleration, recursion, and updated standards of documentation of the research process, Hacking’s framework even under-describes the role of AI-mediated QR. We propose a formal extension by differentiating acceleration and recursion. Based on these insights, the following sections explore current AI-dynamics and their effects on QR.
3. Technological Vectors and the Techno-Epistemic Effects on QR
AI-Influences Affecting Qualitative Research (Oct 2025)
“The main trouble with cyborgs, of course, is that they are the illegitimate offspring (…). But illegitimate offspring are often exceedingly unfaithful to their origins. Their fathers, after all, are inessential.”
(Haraway, 1985, p. 8)
3.1. Immediate, High-Prevalence Influences
A) AI-assisted qualitative analysis
LLMs and embedding tools (meaning: tools that employ a multidimensional numeric representation of similarity and relationships among texts) now appear at many stages of qualitative work (e.g., Mayring, 2025; Nguyen-Trung, 2025; Pattyn, 2024; Perkins & Roe, 2024; Pilati et al., 2024; Sun et al., 2025): transcription, search across corpora, first-pass coding and code suggestions, writing memos and summaries, even outlining and drafting. In day-to-day practice, teams may lean on prompt libraries and workflow templates that standardize practices (Liu et al., 2025). Retrieval may bring back earlier labels, nudging analyses toward the same frames (Davison et al., 2024; Zhang et al., 2025). Comparative studies show clear gains in speed and scoping, whereas meaning fidelity plateaus on more interpretative tasks (Liu et al., 2025), may differ from human researchers (Castellanos et al., 2025), or require more engagement to manage (Bijker et al., 2024).
The manifold ways in which interpretative labour is executed by AI-assistants are one main contact point between AI and QR. It shifts reflexive labour from close reading to prompt curation, introduces prompt- and path-dependence into interpretation, and raises expectations for transparent methods disclosures to editors, reviewers, and ethics boards. Properly disclosed and audited, these tools can improve coverage and efficiency and help surface novel leads in large corpora. B) Provenance, licensing, and traceability
As AI-tools enter analysis, expectations rise to show where data and labels came from (including licences and attribution; Longpre et al., 2024), and to document the analytical workflow (Schlegel & Sattler, 2025). Recent ethics work also stresses disclosing any synthetic content and the specific roles AI played in the study (Resnik et al., 2025; Shanley et al., 2024).
Comprehensible workflows and interpretative labour are cornerstones of QR, such provenance and traceability are fast becoming preconditions of credibility: they underpin later reporting criteria (e.g., a provenance-and-recursivity statement and a reflexive AI-audit) and let reviewers judge how AI shaped interpretation. However, traceability-related practices need to keep up with the infusion of AI-mediation into QR-workflows. C) Retrieval-augmented Generation
3
Typically, retrieval-augmented generation (RAG) supplements an LLM-model’s database/context by enabling it to evaluate or retrieve other sources (e.g., web data, transcribed interviews and other documents), while remaining in the data space created by the user. This allows statements or summaries on specific topics to be extracted even from larger corpora than the database the LLM-model was trained on. RAG impacts the evidence-gathering stages of qualitative research by locating, clustering, and summarizing materials from dynamic web corpora and proprietary indexes (Castellanos et al., 2025; Paulus & Marone, 2025; Zhang et al., 2025).
This approach may change what counts as retrievable evidence with potential consequences for claims of saturation, the reproducibility of literature bases, and transparency about sources and dates. As the potential contextual sources for the LLM-models we use become diverse and recent, the traceability of the interpretative labour assisted by AI is hampered. D) Human-in-the-loop alignment & instruction-tuning
The basic idea here is that AI-mediated processing of qualitative data takes place in an interplay between AI and human intervention. In coding processes, for example, the AI gradually learns which specifications, preferences, or guiding principles exist on the human side of data processing in order to adapt the proposed coding of the material accordingly. AI can also be used to provide structured feedback on the quality of individual steps in the research process based on rubrics. Contemporary foundation and assistant models are aligned with human judgement: developers collect human-coded preference data and apply rubric 4 -guided feedback (reinforcement learning from human feedback, RLHF), and increasingly AI-mediated feedback (sometimes called RLAIF). Models are then instruction-tuned on corpora assembled from public prompts, curated tasks, and crowd annotations (Bai et al., 2022; Ouyang et al., 2022; Stiennon et al., 2020). Synthetic instructions can top up these sets but do not replace preference data (Wang et al., 2022).
Even when synthetic materials are present, trainer subjectivities and organizational rubrics leave normative and linguistic imprints on models that later may influence coding and summarizing. For interpretative work, these latent value choices can surface as seemingly neutral suggestions or themes. In theory, resulting biases should be disclosed (alignment method, sources of preference data, presence of annotator guidelines, dates of fine-tunes) and, where possible, triangulated with human checks, diverse coders, and negative-case searches.
3.2. Near-Term, Field-Dependent Influences
E) Big Qualitative Data (BQD)
As qualitative research enters the era of BQD, the use of digital and AI-mediated tools to process extensive text corpora expands analytical capacity while also raising important methodological challenges. CAQDAS and LLM-based tools for large, text-centric corpora now support batch auto-coding or semantic searches. Such typical exploratory strategies are carried out based on word frequencies, lemmatization, word clouds and word trees, keyword-in-context, and dictionary-based methods, automatic summaries, paraphrases and codes, as well as cross-project reuse of topics (Bijker et al., 2024; Chandrasekar et al., 2024; Liu et al., 2025).
BQD nourishes the urge to expand QR-data corpora to expand our interpretative scope. However, some authors caution that topic modelling and large-scale aggregation can over-smooth heterogeneity (Bednarek, 2024) and falsely infer meaning from text (Curry et al., 2024). BQD widens coverage and its speed, but influences saturation and the recognition of rare cases, and can entrench codebooks across contexts if we just keep adding data without reflecting the context-sensitive and -specific tools with which we engage in our interpretative labour. F) Synthetic Data in model training and evaluation (SDT)
Synthetic text now appears upstream in model training and evaluation, and downstream in studies as augmentation, fill-in for data gaps, or as proxy for under-represented voices (Hämäläinen et al., 2023; Kapania et al., 2024; Wiles, 2025). Reviews and governance papers map the respective benefits and risks (Foraker et al., 2025; Goyal & Mahmoud, 2024; Loni et al., 2025; Resnik et al., 2025; Shanley et al., 2024). Controlled studies show model degradation, collapse and drift of meaning under recursive or high-synthetic regimes (Seddik et al., 2024; Shumailov et al., 2024).
For qualitative workflows, the resulting – and only vaguely traceable – evidential base of AI-models alters the evidential mix behind AI-mediated coding and summaries; and when synthetic text is analysed as data, it foregrounds provenance, authenticity, and disclosure. G) Multimodal AI (audio/video/image)
Automatic speech recognition and speaker diarisation (meaning: who spoke when), machine translation, and video/gesture extraction are increasingly folded into qualitative and mixed-methods pipelines across disciplines (Castellanos et al., 2025; Fieldhouse et al., 2025). There is active debate about whether paralinguistic signals (e.g., tonality, affect, gaze, gesture) can be validly carried over into interpretative claims (Kushwaha, 2024).
Such tools would extend analysis beyond text, necessitating questions of interpretative fidelity regarding a realm of the social in which practices of making meaning pose far more complex obstacles. But they also raise issues of consent and privacy, transcription and segmentation fidelity (accent/language/model bias), and the interpretability of embodied or affective outputs. H) Low-resource and non-English LLMs
Broader language coverage and better tooling continuously allow more languages to be used with or in AI, but quality and bias remain uneven across models and settings (Fieldhouse et al., 2025; Qadhi et al., 2024).
This enables more inclusive studies and cross-language comparisons, yet it also requires culturally sensitive prompting, careful back-translation or bilingual training, and validation protocols that check meaning fidelity, not just lexical overlap, before drawing interpretative claims.
3.3. Emerging Influences
I) Agentic/autonomous research assistants
Task-chaining agent systems (e.g. using Model Context Protocol (MCP)) can orchestrate retrieval, coding, writing memos, and even drafting. Early reports note clear workflow gains but also opaque authorship and accountability. It can be hard to see which decisions were made by the agent versus the analyst (Combrinck, 2024; Jiang et al., 2021; Zhang et al., 2025).
These agents can speed routine labour, but they also complicate attribution and auditability: without careful logging and disclosure, it becomes difficult to trace how interpretative steps were taken or to justify them to reviewers and participants. J) Utilizing on-device LLMs
Running models on-device or in trusted/secure environments reduces cloud exposure during sensitive fieldwork and aligns with debates on human control/teaming with AI (Howison et al., 2024; Tsamados et al., 2025). It supports offline capture and processing, but may be constrained by computing power and speed.
These tools enable privacy-preserving handling of recordings and transcripts, yet require updated consent and security language (meaning: where processing occurs, what leaves the device), and can limit reproducibility if results depend on device-specific hardware/models and cloud-computing power. Furthermore, programming libraries of different versions may result in significantly different results (Shahriari et al., 2022). K) Affective/voice & adversarial/poisoning risks
Tools that claim to infer sentiment or emotion from voice or text are easy to over-interpret, because affect cues are culture-, context- and genre-dependent, and model validation rarely matches the field setting (Aguilera et al., 2023; Hovy & Prabhumoye, 2021). In parallel, pipelines that depend on large-scale corpora face data-feedback effects and poisoning risks that can distort analyses (Pagan et al., 2023; Taori & Hashimoto, 2023).
Whereas such tools may enhance our interpretative capabilities, tracing interpretative fidelity in the face of potential conditionalities of their findings proves to be almost impossible. “Simultaneously material and ideological, the dichotomies may be expressed in the following chart of transitions from the comfortable old hierarchical dominations to the scary new networks I have called the informatics of domination” (Haraway, 1985, p. 31)
4. Epistemic Risks of Integrating AI Into QR
Overview of Epistemic Risks of Integrating AI Into QR
The risks discussed below do not all have the same evidential status. Some are directly documented in adjacent empirical literatures, such as over-smoothing in computational text analysis, feedback effects in human-AI interaction, or degradation under recursive synthetic training conditions. Others are plausible methodological risks for AI-mediated qualitative research inferred from those findings and from common workflow features. A third subset is forward-looking and normative: concerns not yet systematically demonstrated in qualitative research, but important to articulate because they bear on emerging standards of interpretative accountability.
4.1. Over-Smoothing and Novelty Loss
As shown in Section 3, AI-mediation may shorten analytic cycles and stabilize labels. The process of gradually condensing observations into a new and creative interpretation based on empirical material (as in grounded theory approaches, for example), which is quite common in qualitative research, can be lost in AI-mediated designs because these operate within the framework of already known, sedimented categories and knowledge systems. This tends to suppress an open search for clues in the material, which may even be inspired by memories of (and notes on) the original interview contexts.
Mechanistically, large corpora and templated prompting speed up the diffusion of sedimented categories, topic models and auto-coding can over-smooth context (Bednarek, 2024). Concurrently, BQD-related studies report productivity gains alongside mixed meaning fidelity (Bijker et al., 2024; Chandrasekar et al., 2024; Liu et al., 2025). Widely available AI-models are inherently probabilistic, meaning they generate an outcome by predicting the most statistically probable sequence of data based on the prompt, additional input, and contextual information. For that reason, subjective or deviant positions in the data are hard to predict and, moreover, are therefore likely to be overlooked. This contradicts the claim that QR should give voice and visibility to subjective, divergent and deviant views. For QR this can mean premature consensus, fewer genuinely new codes, and thinner description as codebooks travel across sites without re-grounding. On a more political note, special, unique or rare features in qualitative data may be overlooked, giving grounds for synthetic majorities to emerge (van Dijck, 2014). These dynamics risk epistemic injustice by privileging majoritarian or model-amplified voices over minority ones (Peters & Chin Yee, 2025).
Empirical support is strongest for smoothing tendencies and mixed meaning fidelity in adjacent computational studies; the specific consequence for novelty loss in qualitative inquiry is a methodological inference developed here.
4.2. Self-Validation and Authenticity Drift (Toward Fossilisation)
Synthetic/augmented data and fine-tuning of AI-mediated analysis, often combined with retrieval-augmented generation, enable outputs to re-enter the pipeline as inputs. When synthetic tokens are mixed upstream (during training or evaluation) or downstream (in the course of study input), prior classifications can be tacitly validated against surrogates of themselves. Moreover, empirical work documents degradation, and cautions against a potential model collapse and drift under recursive or high-synthetic regimes (Shumailov et al., 2024), while ethics- or governance-related contributions call for explicit disclosure of synthetic exposure (Resnik et al., 2025; Shanley et al., 2024). For QR, the danger is authenticity drift: categories may stabilize even as lived practice changes. The increasing distance from empirical reality could even self-reinforce to the point where empirical life could be seen as the deviant.
We therefore treat authenticity drift in qualitative workflows as a plausible downstream risk, not as a directly demonstrated endpoint in current QR practice.
4.3. Performativity Amplification, Lexical Anticipation, and Outlier Suppression
If QR is paradigmatically justified by being sensitive to the subjective and deviant beyond statistical representativeness and hegemonic interpretations, then AI often counteracts this claim. Similar to Risk 1, the interplay of AI-mediated analysis, large-corpus work, and retrieval-augmented generation can amplify the performative circulation of analyst labels (Castellanos et al., 2025; Chandrasekar et al., 2024; Zhang et al., 2025). The reuse of prompt templates and codebooks standardizes practices while retrieval tends to bring back thematically similar, already-labelled passages (called retrieval lock-in), overshadowing negative or rare cases (Bednarek, 2024). Evidence on human-AI feedback loops shows such interactions can magnify bias. For QR this appears as lexical anticipation by participants, an apparent thinning of negative cases, and faster policy travel of quasi-standardized social categories. On a political note, this may be utilized to maliciously create categories and flood training data to more profoundly skew our tools of social research.
Human-AI feedback effects are empirically documented, whereas lexical anticipation and intensified outlier suppression in qualitative settings remain a potential but credible concern.
4.4. Externalized Reflexivity, de-skilling, and Governance-By-Pipeline
The use of AI in qualitative research requires not only an examination of the processes that take place within the system, but above all an examination of the decisions made regarding the use of AI. If AI is primarily viewed as a means of making work easier – which it can often be – its performative character is underestimated and reflexive vigilance can decline or is externalised to AI, too. Yet, as current iterations of LLMs rely on probabilistically modelling language to simulate reasoning, or to support it, this must not be confused with reflexive reasoning itself (Jowsey et al., 2025).
As AI-mediated qualitative analysis becomes routine, reflexive labour risks being offloaded to prompts and templates, while provenance and traceability remain uneven, and agent assistants blur authorship and accountability. The literature flags ethical cautions for AI in QR and offers guidance for responsible use (Davison et al., 2024), and technical work shows that end-to-end provenance capture is feasible (Schlegel & Sattler, 2025). Alignment choices embed political values into ostensibly neutral tools, leading to questions about the regressive potential of AI-futures. Therefore, researchers remain responsible for making those value-imprints visible. For QR, the risk revolves around interpretative de-skilling, reproducibility gaps, and a drift toward policy-by-proxy as vendor defaults encode e.g. governance choices.
This risk is primarily methodological and normative: it concerns what researchers may cease to notice, justify, or disclose as AI-routines become infrastructural. “… taking responsibility for the social relations of science and technology means (…) embracing the skillful task of reconstructing the boundaries of daily life (…), in communication with all of our parts.” (Haraway, 1985, p. 59)
5. Provisional, Reportable Evaluative Criteria for QR in the Age of AI
The following reportable criteria are proposed as provisional evaluative anchors for AI-mediated qualitative research. They are not intended as universal thresholds, nor as a closed checklist that could be applied uniformly across qualitative traditions. Rather, their function is to make AI-mediated interpretative work reportable, inspectable, and contestable: they ask whether claims remain empirically refreshed, whether provenance and recursive output re-entry are visible, whether heterogeneity and negative cases remain analytically available, and whether the conditions of AI-mediation are reflexively disclosed. In this sense, the criteria structure justification and critique without foreclosing debate about their future refinement.
The criteria extend classic commitments to credibility/trustworthiness (Lincoln & Guba, 1985) or reflexivity/transparency (Macbeth, 2001), adapting them to AI-mediated workflows. Because this article is a conceptual intervention, these criteria are deliberately proposed in provisional form. Their purpose is not to close discussion, but to open a more explicit debate about how AI-mediated qualitative work should be described and evaluated as practices, tools, and standards evolve.
Precisely, Tracy (2010, p. 840) identifies eight QR-criteria related to the topic, rigor, sincerity, credibility, resonance (with an audience) and contribution (to a state-of-the-art), research ethics, and meaningful coherence. Interestingly, whilst we consider our suggestions consequential, they also seem to stray from qualitative criteria due to the need to cover the hybrid regime of AI-mediated QR.
By proposing the following criteria, we aim, firstly, at introducing potential additional criteria for those research designs employing AI-mediated practices. Yet, secondly, we aim at triggering a deepened reflection with the potential ramifications of AI-mediated qualitative research. Given the rapid advancements in AI-technologies and their implementation into research designs, our criteria can only serve as a temporary snapshot of such reflections to deliberately allow for an ongoing, critical, and restless debate in the face of current and future technological advancements. With further progress, these criteria need to be doubted, substituted, and/or refined to match upcoming innovations and developments.
5.1. Empirical Refresh Ratio
5.1.1. Rationale
When pipelines reuse their own outputs or include synthetic text, periodic encounters with lived worlds help prevent adverse model behaviour (e.g., Shumailov et al., 2024). Ethics/governance work urges clear handling of synthetic exposure (Resnik et al., 2025). As empirical research flourishes from empirical corrections, the impact of synthetic data needs to be explicitly accounted for. Whereas the role of synthetic data in model training may not be influenced in general, its role in QR may be hedged through transparency.
5.1.2. Definition
Share of newly collected, empirical evidence used at each research stage (e.g., coding, model evaluation, write-up).
5.1.3. Evaluative Function
This criterion asks whether interpretative claims continue to be corrected by newly encountered empirical material rather than being progressively stabilized through recycled or synthetic inputs.
5.1.4. Possible Reporting
Indicate the proportions or relative shares present at each workflow stage; define a recency window (e.g. % of material collected in the last 12 months) and justify any exceptions (e.g. historical corpora).
5.1.5. Implementation
Whereas the necessity of empirical data cannot be overstated, it is difficult to specify a fixed rate because this cannot be justified independently of the research problem. Rather than a rigid specification, it is therefore more important to provide a transparent representation of the extent to which and at which points primary, secondary and synthetic material is fed in (e.g. in the transcription, in the coding process, in the formation of categories). For instance, researchers may be asked for each analytical stage whether the input was either newly collected empirical material, previously collected empirical material, or synthetic or model-generated material.
5.2. Provenance & Recursivity Accounting
5.2.1. Rationale
Audit literature documents licensing and attribution gaps (Longpre et al., 2024), although end-to-end provenance capture is technically feasible (Schlegel & Sattler, 2025). Ethics guidance recommends declaring synthetic exposure and AI-roles (Resnik et al., 2025; Shanley et al., 2024). Accordingly, it is necessary and possible to document the process in which the researcher uses AI to perform analytical steps, to respond to the (interim) results produced, and to prepare conclusions.
5.2.2. Definition
A statement of where data, labels, and tools come from and how much output re-entry (recursion) the pipeline contains.
5.2.3. Evaluative Function
This criterion asks whether the evidential lineage of data, labels, prompts, and tools is visible enough for readers to identify where recursive output re-entry may have tightened the analytical loop.
5.2.4. Possible Reporting
List dataset licenses and attribution; disclose estimated share of synthetic tokens in upstream tools/inputs; note codebook or prompt inheritance, fine-tunes, and vendor constraints; state the alignment regime (RLHF/RLAIF), whether instruction-tuning used human vs synthetic prompts, and whether annotator guidelines are public.
5.2.5. Implementation
This criterion is not concerned with numerical information, but conclusive and comprehensive documentation concerning the analytical pipeline (data, labels, tools). Whereas completeness may not be achievable (as AI-service providers may license models, and retrain them – creating a multi-layer cascade of documentation), an as-comprehensive-as-possible paradigm should be followed. This will require keeping these documentary necessities in mind already in the stage of designing, deciding, and executing the AI-mediated analytical steps. As technical advancements progress, more or different documentation will be necessary. A minimum set of items for disclosure may include where the data, labels, codebooks, and prompts came from, which tools and versions were used, where recursive output re-entry occurred, and what vendor or licensing limits remained.
5.3. Heterogeneity Preservation Assessment
5.3.1. Rationale
Topic-model critiques warn against over-smoothing (Bednarek, 2024); saturation debates caution against premature closure (Tight, 2024), and others emphasize preserving nuance (Braun & Clarke, 2025). This is certainly also a problem with ‘manual’ coding methods, but it tends to be exacerbated by AI because the immediate decision is not made by the researcher working with the material, but is generated by prompting. As analytical over-smoothing of rare and unique cases contradicts the premise and strengths of QR, the impact of respective adverse effects of AI-implementation needs to be assessed and documented. As a measure of how strongly the diversity, inconsistency and marginality of empirical statements are retained in the course of an AI-mediated analysis, the Heterogeneity Preservation Assessment (HPA) is intended to indicate the extent to which individual or rare positions (statements) remain recognisable or have been subsumed. It thus indicates the extent (and, if applicable, the analytical step) to which the number of codes and cases decreases as a result of summarisation.
5.3.2. Definition
A measure of diversity in codes/cases over time (e.g., code distributions across sites/cases).
5.3.3. Evaluative Function
This criterion asks whether deviant, contradictory, low-frequency, or minor material remains analytically visible rather than being absorbed into model-amplified regularities.
5.3.4. Possible Reporting
Report changes in heterogeneity across stages; flag notable drops; describe steps to protect negative/outlier cases (e.g., targeted sampling, counter-queries).
5.3.5. Implementation
The HPA specifically concerns the methodological tool of “categories” in coding-centric qualitative research designs. This criterion may be difficult to implement in general, as research contexts, practices of establishing and using categories, and assessing their intercoder-comparability vary. We do not propose a single standardized calculation of heterogeneity preservation. Depending on research design, this criterion may be rendered numerically, comparatively, or narratively. Its central purpose is not to impose a universal metric, but to require explicit reflection on whether AI-mediated coding, summarization, or retrieval smooths away deviant, marginal, contradictory, or low-frequency material. At minimum, authors should indicate where heterogeneity may have been reduced, how negative or outlier cases were checked, and how the remaining loss of variation is justified. In the following, we therefore suggest two options of assessing an HPA – yet explicitly invite further debate for practical implementation:
Option 1: As the HPA needs to be a measure for the change of variability, it assesses how specific analytical steps in AI-mediated designs produce different results (for similar data) if done manually by humans. It may be a numerical representation of the relation between a set amount of possible topical categories that may emerge in a given dataset (e.g., an array of interviews) and the actual number of categories coded in this dataset. It compares an AI-mediated design and a manual human-centric design.
Step 1 needs to be the definition of a baseline-case that was coded manually (at best: using an intercoder design featuring several coders that have completed intercoder coaching), and for which an HPA(0) needs to be established. Whilst this may be elusive in small projects, manually coded datasets for various topics are already currently available as baseline assessments and may serve as proxies for baselines. Following an iterative process, the next steps need to assess the HPA after AI-mediated coding steps (e.g., after each coding pass using AI), and compare them to a baseline HPA.
Option 2: As over-smoothing rare cases is a core concern, the HPA may focus on the preservation of rare cases or outliers which – if coded by a human – only occur in the lowest percentile of frequency. Whereas this may be calculated, we could simulate their preservation by proxy: In such a design, a number of segments in the datasets may be designated that are considered deviant, marginal, or contradictory. This decision may be taken based on conceptual premises. An HPA would then be a non-numerical and more narrative assessment of the extent to which these marginal cases are represented or vanish in AI-mediated findings.
5.4. Reflexive AI-Audit
5.4.1 Rationale
Information-systems and qualitative methods work call for reflexive disclosure of AI use (Davison et al., 2024; Hayes, 2025); some authors encourage reporting guidelines following values-based transparency (Braun & Clarke, 2025).
5.4.2. Definition
A transparent account of the models, training libraries, used hardware, prompts, retrieval sources and snapshots, parameters and seeds, and human oversight used in the analysis (as a supplement to criterion 5.2).
5.4.3. Evaluative function
This criterion asks whether the conditions of AI-mediation are disclosed in enough detail for the interpretation to be inspectable and contestable.
5.4.4. Possible reporting
Provide model family and version; intervention points (transcription, retrieval, coding, summarizing); prompt snippets or link to a prompt repository; retrieval sources with snapshot dates; parameters/seeds where applicable; oversight steps; known limitations. Indicate search strategy and link to the inclusion/exclusion table for discovered materials.
5.4.5. Implementation
Reflexive AI-audit means comprehensive documentation of AI’s involvement in the research process, including disclosure of the exact specifications of the tools used and their application – either as a reflective text or in accordance with defined protocols. Currently, (topic-specific) protocols are still the exception; however, it is very likely that this will change with the expansion of the use of AI-tools, not least in order to comply with the rules of good scientific practice. The audit may be provided in a methods appendix, a supplementary table, or a protocol-like disclosure note. “The cyborg does not dream of community on the model of the organic family (…). The cyborg would not recognize the Garden of Eden.” (Haraway, 1985, p. 8).
6. Discussion and Conclusion
Qualitative interpretation depends on situated judgment, responsiveness to difference, and the possibility of being corrected by the material. In contrast to Jowsey et al. (2025), we do not generally doubt the potential of AI for QR. Yet, AI-mediated QR must confront a structural tension: current models generate statistically probable continuations, whereas interpretative inquiry seeks to reconstruct meaning, contradiction, and context in ways that cannot be reduced to probability alone. AI may therefore assist interpretative work, but it does not remove the researcher’s responsibility for reflexive judgment.
Our claim is therefore not that AI creates looping from scratch, but that it intensifies one dimension of looping (tempo) and transforms another (evidential re-entry). Long before LLMs, CAQDAS (e.g., Atlas.ti, NVivo, MAXQDA) already sped up and supported coding, retrieval, and cross-project reuse. Likewise, recursive dynamics were discussed in non-AI settings under performativity, retroaction, and counter-performativity (see Hostniker & Meyer, 2024). What is distinctive today is (i) the magnitude and speed of these dynamics in AI-mediated pipelines, and (ii) their partial decoupling from renewed field encounters – through output re-entry via prompt libraries, fine-tunes, and synthetic data.
Our critique rests on a normative judgment, not an empirical claim: researchers often treat some degree of acceleration and recursion as acceptable when it is human-driven, and problematic when it is AI-mediated. The boundary is fragile and contingent, and overlaps with debates about how much human control or “teaming” with AI is desirable or possible (Tsamados et al., 2025). Noting this boundary surfaces a hidden premise: we want humans to remain agents of social change, or at least accountable stewards of the classifications we circulate, as “[o]ur own mythology consists in imagining ourselves as radically different, even before searching out small differences and small divides.” (Latour, 1993, p. 116)
As Latour argues, modern social change is better understood by attending to mediations (the hybrids of nature and society) rather than presuming a purified separation between them (Latour, 1993). In this perspective, actants are any human or nonhuman entities that participate in networks of association; agency is not possessed in isolation but emerges from relations among these actants (Latour, 2005). This shift from purified domains to mediating networks helps to explain the nature of change – and, to some extent, the core conflict of paradigms at hand. As a matter of fact, most foundation models are produced and governed by large vendors; thus, vendor defaults can steer qualitative methods and topics. Keeping accountability with the researcher requires disclosure and, where possible, auditable open pipelines.
Although we sympathise with Jowsey et al. (2025), change is already underway, across fields and within qualitative research, as research teams adopt AI-tools and practices. Consequently, QR needs to adapt. This descriptive claim simply notes that the widespread circulation of large, industry-built foundation models is creating a new state of the art, one that will shape what counts as evidence, how we analyse it, and the conclusions we draw.
Since widely accessible large language models entered practice only in late 2022, the field is changing faster than its standards. The abundance of new capabilities risks rendering established assumptions, procedures, and quality heuristics obsolete. We therefore call for explicit normative commitments for knowledge production using AI, detailing how evidence is collected, traced, audited, interpreted, and governed – both in general and for qualitative research in particular in this new age of techno-epistemology.
Epilogue
Should Qualitative Inquiry dream of Eden? Is there any garden to return to outside the circuitry of our tools and the endless cycles of progress, mediation, and regress?
Our criteria are not commandments but commitments that keep methods porous to the lives they read. We do not need to quietly inherit the hierarchies of technological corporatism. The criteria we offer will not open the loop; they render it visible. They stage conditions under which error can be seen, contested, and repaired. They are cyborg practices for a cyborg craft: neither anti-machine nor credulous, but accountable.
And accountability, in the end, is less a rule than a posture, and the readiness to be addressed by what we study and by those who will live with our classifications.
Footnotes
Acknowledgements
We thank the anonymous reviewers and our colleague Dr. Dominik Kremer (Leibniz Institute for Regional Geography) for their constructive feedback.
Author Contributions
All authors contributed to methodological discussions on incorporating AI. Frank Meyer conceptualized, refined, organised, and created the manuscript. Judith Miggelbrink and Paul Nguyễn supported the manuscript creation, the editorial work, and the revisions.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded through the German Research Foundation (DFG), grant no. 531002243.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
No new data were created or analyzed in this study. Data sharing is therefore not applicable.
