Acceleration and Recursion in AI-Mediated Qualitative Research: Discussing Adapted Criteria for Qualitative Inquiry in the Techno-Epistemic Age

Abstract

This article examines how artificial intelligence (AI) reshapes qualitative research (QR) by accelerating the circulation of classifications and introducing recursive feedback loops that reuse their own outputs. Drawing on Hacking’s concept of looping effects, we extend it through two mechanisms: acceleration (compressed loop time) and recursion (output re-entry). Reviewing current AI uses in QR, data synthesis, and retrieval, we identify four epistemic risks: over-smoothing, self-validation, performativity amplification, and externalized reflexivity. We then propose four provisional, reportable evaluative criteria suited for AI-mediated QR. These criteria translate long-standing commitments to transparency and credibility into the techno-epistemic age.

Keywords

qualitative research artificial intelligence AI-assisted analysis looping hacking evaluative criteria reporting

Extended AI-Accountability Statement

The authors verified and are fully responsible for all content. No AI-system is listed as an author. Generative AI-tools (ChatGPT 5.0 Pro, Gemini 2.5 Pro) were used during four steps:

Step 1 (Sep 2025): The initial idea was based on our previous work (Meyer & Miggelbrink, 2013) at the crossroads of qualitative fieldwork and Hacking’s work (1995). We intersected that with reflection on the inclusion of (semi-)automated processes of data analysis in a discourse-theoretical project on populism (Nguyen & Meyer, 2026). After researching literature on epistemic risks, we triangulated this database with an LLM-assisted (ChatGPT 5.0 Pro) search for further literature, (I) expanding the literature base. Based on this expanded corpus, we refined our summary of epistemic risks, and proceeded with the conceptualization based on Hacking’s concepts of recursion and acceleration.

Step 2 (Oct 2025): We derived initially five, later four provisional, reportable evaluative criteria. We then used ChatGPT 5.0 Pro and Gemini 2.5 Pro to (II) review these criteria based on their applicability and feasibility. Based on these reviews, we revised the criteria. In this step, all literature references were re-validated.

Step 3 (Oct 2025): We created the full manuscript based on the previous work, including Pro- and Epilogue, and diving into Haraway (1985). After the full manuscript was devised, we prompted ChatGPT 5.0 Pro and Gemini 2.5 Pro to (III) act as scientific reviewers and thoroughly, critically, and constructively generate extensive reviews, in response to which we then revised the manuscript.

Step 4 (Oct 2025): ChatGPT 5.0 Pro was used to cross-check reference usage in the list of references and in the manuscript, as well as regarding the necessary formatting, (IV) serving as a formatting assistant. This was checked manually afterwards.

Prologue

Qualitative inquiry has always been an endeavour of crossings – between voices and codes, emotions and notebooks, lives and concepts. But the crossings have changed. Circuits hum inside our methods now. They creep inside through their pervasiveness in our daily lives. Once sound and ink were patiently waiting for the interpretative labour we so ambitiously executed from the black boxes of our human subjectivity. This has become silicon and immediacy.

Who dares to doubt our capabilities of transference? The cyborg age no longer sits at a distance; it threads through our prompts, our internet searches, our private conversations. This information accelerates and loops back into what we hope to remain our real lives. Outputs re-enter as inputs; drafts become datasets; templates harden into habits.

Change at machine speed? Whilst some, nevertheless, attempt to retreat from it, we cyborg apologets have to reflect consequences and ponder new progressive futures from what could easily mirror our most regressive nightmares. Rather than retreat from this condition, we ask what AI changes in the classificatory loops through which we produce and evaluate knowledge.

“By the late twentieth century, our time, a mythic time, we are all chimeras, theorized and fabricated hybrids of machine and organism; in short, we are cyborgs.”

(Haraway, 1985, p. 6).

1. Introduction

Since the 1990s, data-, software- and hardware-driven innovations have been re-shaping interpretative qualitative research (QR) (Flick, 2019; Williams, 2024) – approaches whose goal is to “reconstruct fixed structures and meanings that actors follow and that can, at best, be interpreted by actors” (Knoblauch, 2008, p. 210) and approaches that assume a “fundamental scope for subjective interpretations” that “is not absorbed into meanings and structures” (Knoblauch, 2008, p. 210). Early on, these impulses came from utilizing software to assist in lexicometric and qualitative-analytical research processes such as content analysis. In the 2020s, new techno-epistemic assemblages have emerged and allowed new stimuli: Big Qualitative Data (BQD) (meaning: large, often textual corpora processed, e.g., with topic modeling, Computer-Assisted Qualitative Data Analysis Software (CAQDAS), or LLM assistance) (Chandrasekar et al., 2024), Synthetic Data in model training and evaluation (SDT) (the growing presence of synthetic or model-amplified text in training/evaluation to further expand its capabilities) (Shumailov et al., 2024), and AI-mediated Qualitative Analysis (AIQA/AIaQA) (making use of LLMs, embeddings, and auto-coding, integrated into software analysis pipelines¹) (Mayring, 2025). These stimuli expand what and how much can be analysed and how quickly; they also alter how qualitative classifications are formed and circulated (e.g., Foraker et al., 2025; Loni et al., 2025; Pink et al., 2025).

Undoubtedly, AI-tools can be a force for increased productivity in QR, interdisciplinary innovation, and for a more widely and readily available circulation of knowledge. Furthermore, they can allow us to enhance existing analytical approaches beyond much current capabilities. However, much enthusiasm around AI in QR implicitly imports quantitative ideals into interpretative practice, such as representativeness/generalization, reliability/reproducibility, and objectivity – each with discipline-specific limits in qualitative inquiry (Braun & Clarke, 2025; Davison et al., 2024; Nguyen & Welch, 2025). AI-mediated methods fuel hopes for an expansion of interpretative methods in a way that gives them the same (assumed) robustness as is assumed for quantitative methods. From this, one could deduce that qualitative research can be enabled to meet the same quality criteria as are customary for quantitative methods.

• Representativeness/generalization: Enlarged corpora can broaden coverage and rhetorical reach, yet text corpora rarely rest on stable sampling frames. In discourse/content analysis, statistical representativeness is doubted as a suitable measure. Instead, the relevant standard is analytic generalization/transferability and transparent corpus construction (Bednarek, 2024; Keller, 2011; Stoltz & Taylor, 2024). For instance, in discourse/content analysis, “representativeness” is often meant to be semantic, meaning a proper coverage of positions/meanings, rather than a statistical measure. Corpus size widens coverage, but inclusion/exclusion remains theory-led and must be documented (e.g., via corpus boundaries, search strategy, timestamps) (Bendel Larcher, 2023; Gür-Şeker, 2014).²

• Reliability/reproducibility: Automated routines can standardize mechanical steps (data gathering, transcription, retrieval, first-pass semi-automated coding), but interpretative reproducibility is bounded by model stochasticity and prompt/path (and retrieval) dependence. Furthermore, repeatability does not necessarily equal agreement in meaning (Golafshani, 2003; Tracy, 2010). Comparative studies report mixed results: hybrid human-AI-workflows sometimes match, but do not consistently exceed, top human or model baselines, especially on meaning-sensitive tasks (Castellanos et al., 2025; Liu et al., 2025; Nguyen & Welch, 2025; Paulus & Marone, 2025; Vaccaro et al., 2024).

• Objectivity: Assistance can reduce some idiosyncrasies of researcher subjectivity, but it does not remove positionality from the underlying data or its situatedness/context, nor the biases embedded in models, data, and prompts. Claims to neutrality require ethical handling and transparent disclosure (model, data, prompts, oversight) rather than being presumed (Davison et al., 2024; Resnik et al., 2025).

Given AI’s increasing push into QR, and building on Hacking’s (1995) looping effects of human kinds, we argue that AI does not merely speed up analysis (Chubb et al., 2022); it reconfigures the mechanics of qualitative classification. This results in various effects, of which we specifically want to highlight two: As classifications circulate through tools and publics, they feed back faster into talk, practice, and thus data – resulting in acceleration (Rosa, 2013). Evidence from human-AI interaction shows feedback loops that can amplify or skew judgments (Brown et al., 2022; Glickman & Sharot, 2025; Pagan et al., 2023; Perdomo et al., 2020). This concerns a second shift – recursion: when pipelines (partly) re-use their own outputs or synthetic surrogates as inputs (e.g., auto-coding templates, prompt libraries, synthetic corpora), loops iterate without renewed contact with lived worlds, closing the evidential circuit and risking evidence erosion, synthetic drift of meaning, or even model collapse (Foraker et al., 2025; Loni et al., 2025; Shumailov et al., 2024).

Hacking’s original looping effect concerns the circulation of classifications through institutions and public life, their uptake, resistance, or inhabitation by the people classified, and the subsequent revision of those classifications through renewed encounters with practice. We extend this account in two analytically distinct directions. Acceleration names the compression of the interval between classification, uptake, and reclassification. Recursion names the re-entry of earlier outputs into later analytical stages without renewed empirical contact. Acceleration changes the tempo of the loop; recursion changes its evidential architecture.

This article develops a conceptual framework and proposes four provisional, reportable evaluative criteria for AI-mediated qualitative research. Specifically, it speaks to researchers using these tools in processing qualitative data who seek methodological reflections as well as methodologists reflecting, re-negotiating and reframing conditions and criteria for evaluating qualitative research methods. It extends existing debates on reflexivity and transparency by identifying acceleration and recursion as specific techno-epistemic dynamics. These criteria are not intended as fixed metrics or a closed checklist, but as points of orientation that make AI-mediated interpretative work visible, reviewable, and contestable. In this sense, the paper serves both as a theoretical extension of looping effects and as a methodological guideline for documenting AI-mediation in interpretative research.

Our focus here is interpretative QR on text- and transcript-based materials that we regard typical resources for QR single-method as well as mixed-methods projects (interviews, documents, online sources). We do not treat multimodal/sensory ethnography or visual analytics in depth. Our claims concern analytical practices. This framework is written primarily from and for interpretative, constructivist, and critical traditions of qualitative inquiry, where meaning, situatedness, and reflexive accountability are central, and where quality is not reducible to statistical representativeness or procedural standardization. Accordingly, the risks we foreground (e.g., over-smoothing, authenticity drift, performativity amplification, and externalized reflexivity) are framed as threats to interpretative validity and to the politics of classification. We nevertheless expect the framework to travel and change into more post-positivist qualitative or mixed-methods contexts: here, the criteria can be used as transparency and governance add-ons. However, we do not claim that these criteria replace validity- or reliability-related procedures typically emphasised in positivist paradigms; rather, they complement them by making AI-mediated decisions, feedback loops, and normative assumptions visible and contestable.

As a theory-led response: (i) we extend Hacking (1995) by defining acceleration (compressed loop time) and recursion (output re-entry) for AI-mediated QR (section 2); (ii) we provide a concise overview of the key technical developments of AI and their impacts on QR (section 3); (iii) we diagnose four groups of resulting epistemic risks (section 4); (iv) we propose provisional, reportable evaluative criteria for the AI use and concomitant risks (section 5).

“A cyborg is a cybernetic organism, a hybrid of machine and organism, a creature of social reality as well as a creature of fiction.”

(Haraway, 1985, p. 6)

2. Conceptual Background: Qualitative Epistemology and Hacking’s Legacy

Hacking’s account of so-called human and interactive kinds and their looping effects remains an influential framework for understanding how categories and the people classified by them coproduce one another. In the classic formulation (Hacking, 1995, 2006), expert classifications of people (often value-laden) circulate through institutions, media, and everyday practice. People recognize, resist, or inhabit these ascriptions, and consequently, their behaviour and self-descriptions change. Classifications develop a social life of their own by being reproduced, confirmed, criticized, and contested. The category is then re-made in light of those changes by scientific or diagnostic inquiry. Hence, human kinds are “moving targets” because the very act of classification alters what is being classified (Hacking, 1999/2000, 2006).

Qualitative inquiry has long addressed the performative circulation of classifications by foregrounding situatedness (Haraway, 1988), thick description (Geertz, 1973), abduction (Timmermans & Tavory, 2012), negative-/deviant-case analysis via constant comparison (Becker, 1998; Glaser & Strauss, 1967), reflexivity (Macbeth, 2001), or dynamics such as retro-action and counter-performativity (Hostniker & Meyer, 2024). QR has therefore been a significant source and target of criticism of reified and naturalized classifications as well as of classificatory infrastructures.

Subsequent authors have operationalized Hacking’s work. Contributions on classificatory infrastructures show how standards enact worlds (Bowker & Star, 2000). Works on quantification demonstrate that turning qualities into metrics reshapes behaviour and organizations (Espeland & Sauder, 2007; Espeland & Stevens, 1998; Mennicken & Espeland, 2019). Studies on peripheralisation illustrated how spatial ascriptions may influence behaviour in shrinking regions (Meyer & Miggelbrink, 2013). Economic studies interrogated markets and models as engines rather than cameras (Burrell & Fourcade, 2021; MacKenzie, 2006). Clinical studies reveal epistemological injustices developing in interactions between patients and clinicians (Gauld et al., 2025), and historiographies of objectivity locate neutrality claims in specific technologies, disciplines, and methods (Daston & Galison, 2007). Taken together, labels, rankings and algorithms do not simply discover social kinds; they re- and co-produce them (Fourcade & Healy, 2017a; Fourcade & Healy, 2017b) – a dynamic reflected and willingly utilized in modern marketing (Paulson & O’Guinn, 2018), self-optimization or plastic surgery (Elliott, 2019).

What is new in the 2020s is not looping per se but that AI modifies the temporal and evidential conditions of the loop. First, acceleration: AI-mediated coding, retrieval and summarization shorten the interval between classification, uptake, and reclassification. We term this acceleration a compression of loop time, consonant with macro-accounts of acceleration of social relations (Rosa, 2013). Second, recursion: synthetic data and pipeline re-use allow outputs to re-enter as inputs, so loops iterate without renewed contact with lived worlds and empirical verification. Recent evidence shows model behaviour drifting when trained on synthetic/recursively generated data (sometimes termed model collapse), potentially resulting in evidence erosion and drift of meaning in machine-mediated systems (Resnik et al., 2025; Shumailov et al., 2024; Taori & Hashimoto, 2023). A loop can be accelerated without being recursive, and recursive without major temporal compression; the two therefore should not be collapsed into one another. Both acceleration and recursion change fundamental mechanics of generating knowledge by altering reflection and empirical correction of knowledge.

In Hacking’s original account, looping still depends on renewed empirical encounters with people and practices. Our extension identifies two ways in which AI-mediated research modifies that structure. Acceleration compresses the time between classificatory acts and their uptake. Recursion, by contrast, allows classifications, summaries, prompts, labels, or synthetic surrogates to re-enter later stages of analysis as inputs. The novelty of recursion is therefore not simply greater speed, but a partial displacement of empirical correction by pipeline-internal circulation.

This has profound ramifications: In literature on machine learning, some models show performative prediction: they don’t just predict the world; their predictions change the world they then learn from (Brown et al., 2022; Perdomo et al., 2020). Pagan et al. (2023) offer a taxonomy of feedback loops (who/what feeds back, with what bias effects). Contemporary philosophy recently has revisited Hacking’s theses under these conditions (Tsou, 2025; Vesterinen, 2021). In short, looping may shift from slow, dialogical circuits anchored in empirical data to machine-assisted, high-frequency, partially self-referential circuits.

Hacking’s original account largely presumes that empirical correction happens through renewed encounters with people and practices (Hacking, 1995). Today, classification pipelines can self-validate, in the shape of prompt-library inheritances, fine-tuning on analyst-approved outputs, preferences for prior labels, and synthetic corpora that encode earlier manual classifications. The result is a lack of renewed empirical checks – a loop that tightens on its own outputs (Pagan et al., 2023; Shumailov et al., 2024). Similar dynamics appear across diverse disciplines, e.g., in applied domains where disclosing algorithmic scores induces self-fulfilling prophecies (Bauer & Gill, 2024), or when clinical practice co-produces diagnostic kinds (Gauld et al., 2025; Lindholm & Wickström, 2020).

Without explicit attention to acceleration, recursion, and updated standards of documentation of the research process, Hacking’s framework even under-describes the role of AI-mediated QR. We propose a formal extension by differentiating acceleration and recursion. Based on these insights, the following sections explore current AI-dynamics and their effects on QR.

3. Technological Vectors and the Techno-Epistemic Effects on QR

In the following, we will descriptively cover the spectrum of technological developments in the AI-sector and structure them in relation to their relevance for QR. As of Oct 2025, we have identified a total of 11 technological developments, referred to below as A to K. We have grouped these into three categories based on their existing or anticipated relevance to qualitative research processes. The first category comprises developments with immediate, high-prevalence influences (A to D). The second category comprises developments with near-term, field-dependent influences (E to H). The third category comprises developments with emerging influences (I to K). Table 1 gives an overview of these developments, current practices and relevance for QR.

Table 1.

AI-Influences Affecting Qualitative Research (Oct 2025)

Tier	Influence	Explanation	Current practices	Relevance for QR
1 A	AI-assisted qualitative analysis (AIQA/AIaQA)	Large language models and embeddings now support every step of qualitative analysis - from transcription and retrieval to first-pass coding, memoing, summarizing and drafting.	In day-to-day projects, teams use prompt libraries and workflow templates; retrieval often resurfaces earlier labels, which nudges analyses toward the same frames.	This is the most common AI-touchpoint; it shifts reflexive labour, introduces prompt/path dependence, and raises transparency expectations.
1 B	Provenance, licensing & traceability	Provenance means documenting where data, labels and tools come from and how each analytical step was produced, including synthetic exposure.	Journals and funders increasingly ask for licenses, attribution and pipeline records; many teams add provenance capture to CAQDAS/LLM workflows.	Credibility now depends on such records; they underpin Provenance & Recursivity Accounting/Reflexive AI-Audit and let reviewers see how AI shaped interpretations.
1 C	Retrieval-augmented generation	Retrieval-augmented generation combines search with generation to find, cluster and summarize materials from the (changing) web.	Researchers rely on live indices and proprietary corpora when scoping literatures or coding document sets, then paste summaries into memos.	Because the web is volatile, RAG changes what counts as evidence; time-stamping and freeze-framing are needed for saturation and reproducibility.
1 D	Human-in-the-loop alignment & instruction-tuning	Alignment pipelines train models with human preference ratings and rubrics (RLHF) and, increasingly, AI-mediated feedback (RLAIF); instruction-tuning mixes public prompts, curated tasks and crowd annotations.	These trainer decisions and rubrics are embedded in general-purpose assistants that later help with coding and summarizing.	Trainer subjectivities can appear as “neutral” suggestions; qualitative studies should disclose alignment regimes and triangulate outputs.
2 E	Big Qualitative Data (BQD)	Very large text corpora analysed with CAQDAS+LLM features enable batch coding, semantic search and fast spread of “themes.”	Codebooks and auto-rules are reused across datasets; themes move quickly between related projects.	Scale aids coverage but strains negative-case work; treat genre/platform as variables and re-ground codes across sites.
2 F	Synthetic Data in model training and evaluation (SDT)	Synthetic text now appears upstream (in model training/evaluation) and downstream (as augmented inputs in studies).	Projects combine real and synthetic materials; prior model outputs may be re-used as “data.”	Alters the evidential mix and demands explicit provenance and authenticity checks.
2 G	Multimodal AI (audio/video/image)	Audio, video and image tools (diarisation, translation, gesture extraction) are entering qualitative and mixed-methods pipelines.	Teams generate automated transcripts, add speaker turns and timing, and sometimes code paralinguistic cues.	Widens what can be analysed but raises consent, fidelity and interpretability issues.
2 H	Low-resource/non-English LLMs	New models expand support for non-English and low-resource languages, though quality and bias vary.	Researchers run multilingual retrieval and coding and, where possible, cross-check with native speakers.	Improves inclusion and comparability but requires culturally aware prompting and validation.
3 I	Agentic/autonomous research assistants	Task-chaining agents coordinate retrieval, coding, memoing and drafting with limited supervision.	Analysts launch multi-step runs that produce labels and summaries; authorship and accountability can blur.	Saves time on routine work yet complicates attribution and auditability.
3 J	Utilizing on-device/edge LLMs	Models run locally on secure devices, reducing reliance on the cloud.	Fieldworkers process sensitive materials offline and sync later.	Supports privacy-preserving work but can limit reproducibility and requires clear consent/security language.
3 K	Affective/voice & adversarial/poisoning risks	Some tools claim to infer emotion or sentiment; open pipelines face data-feedback loops and poisoning risks.	Sentiment/affect add-ons are bolted on; datasets from the open web may be contaminated or drift over time.	These risks threaten corpus integrity and category stability; provenance checks are needed.

“The main trouble with cyborgs, of course, is that they are the illegitimate offspring (…). But illegitimate offspring are often exceedingly unfaithful to their origins. Their fathers, after all, are inessential.”

(Haraway, 1985, p. 8)

3.1. Immediate, High-Prevalence Influences

A) AI-assisted qualitative analysis

LLMs and embedding tools (meaning: tools that employ a multidimensional numeric representation of similarity and relationships among texts) now appear at many stages of qualitative work (e.g., Mayring, 2025; Nguyen-Trung, 2025; Pattyn, 2024; Perkins & Roe, 2024; Pilati et al., 2024; Sun et al., 2025): transcription, search across corpora, first-pass coding and code suggestions, writing memos and summaries, even outlining and drafting. In day-to-day practice, teams may lean on prompt libraries and workflow templates that standardize practices (Liu et al., 2025). Retrieval may bring back earlier labels, nudging analyses toward the same frames (Davison et al., 2024; Zhang et al., 2025). Comparative studies show clear gains in speed and scoping, whereas meaning fidelity plateaus on more interpretative tasks (Liu et al., 2025), may differ from human researchers (Castellanos et al., 2025), or require more engagement to manage (Bijker et al., 2024).

The manifold ways in which interpretative labour is executed by AI-assistants are one main contact point between AI and QR. It shifts reflexive labour from close reading to prompt curation, introduces prompt- and path-dependence into interpretation, and raises expectations for transparent methods disclosures to editors, reviewers, and ethics boards. Properly disclosed and audited, these tools can improve coverage and efficiency and help surface novel leads in large corpora.

B) Provenance, licensing, and traceability

As AI-tools enter analysis, expectations rise to show where data and labels came from (including licences and attribution; Longpre et al., 2024), and to document the analytical workflow (Schlegel & Sattler, 2025). Recent ethics work also stresses disclosing any synthetic content and the specific roles AI played in the study (Resnik et al., 2025; Shanley et al., 2024).

Comprehensible workflows and interpretative labour are cornerstones of QR, such provenance and traceability are fast becoming preconditions of credibility: they underpin later reporting criteria (e.g., a provenance-and-recursivity statement and a reflexive AI-audit) and let reviewers judge how AI shaped interpretation. However, traceability-related practices need to keep up with the infusion of AI-mediation into QR-workflows.

C) Retrieval-augmented Generation³

Typically, retrieval-augmented generation (RAG) supplements an LLM-model’s database/context by enabling it to evaluate or retrieve other sources (e.g., web data, transcribed interviews and other documents), while remaining in the data space created by the user. This allows statements or summaries on specific topics to be extracted even from larger corpora than the database the LLM-model was trained on. RAG impacts the evidence-gathering stages of qualitative research by locating, clustering, and summarizing materials from dynamic web corpora and proprietary indexes (Castellanos et al., 2025; Paulus & Marone, 2025; Zhang et al., 2025).

This approach may change what counts as retrievable evidence with potential consequences for claims of saturation, the reproducibility of literature bases, and transparency about sources and dates. As the potential contextual sources for the LLM-models we use become diverse and recent, the traceability of the interpretative labour assisted by AI is hampered.

D) Human-in-the-loop alignment & instruction-tuning

The basic idea here is that AI-mediated processing of qualitative data takes place in an interplay between AI and human intervention. In coding processes, for example, the AI gradually learns which specifications, preferences, or guiding principles exist on the human side of data processing in order to adapt the proposed coding of the material accordingly. AI can also be used to provide structured feedback on the quality of individual steps in the research process based on rubrics. Contemporary foundation and assistant models are aligned with human judgement: developers collect human-coded preference data and apply rubric⁴-guided feedback (reinforcement learning from human feedback, RLHF), and increasingly AI-mediated feedback (sometimes called RLAIF). Models are then instruction-tuned on corpora assembled from public prompts, curated tasks, and crowd annotations (Bai et al., 2022; Ouyang et al., 2022; Stiennon et al., 2020). Synthetic instructions can top up these sets but do not replace preference data (Wang et al., 2022).

Even when synthetic materials are present, trainer subjectivities and organizational rubrics leave normative and linguistic imprints on models that later may influence coding and summarizing. For interpretative work, these latent value choices can surface as seemingly neutral suggestions or themes. In theory, resulting biases should be disclosed (alignment method, sources of preference data, presence of annotator guidelines, dates of fine-tunes) and, where possible, triangulated with human checks, diverse coders, and negative-case searches.

3.2. Near-Term, Field-Dependent Influences

E) Big Qualitative Data (BQD)

As qualitative research enters the era of BQD, the use of digital and AI-mediated tools to process extensive text corpora expands analytical capacity while also raising important methodological challenges. CAQDAS and LLM-based tools for large, text-centric corpora now support batch auto-coding or semantic searches. Such typical exploratory strategies are carried out based on word frequencies, lemmatization, word clouds and word trees, keyword-in-context, and dictionary-based methods, automatic summaries, paraphrases and codes, as well as cross-project reuse of topics (Bijker et al., 2024; Chandrasekar et al., 2024; Liu et al., 2025).

BQD nourishes the urge to expand QR-data corpora to expand our interpretative scope. However, some authors caution that topic modelling and large-scale aggregation can over-smooth heterogeneity (Bednarek, 2024) and falsely infer meaning from text (Curry et al., 2024). BQD widens coverage and its speed, but influences saturation and the recognition of rare cases, and can entrench codebooks across contexts if we just keep adding data without reflecting the context-sensitive and -specific tools with which we engage in our interpretative labour.

F) Synthetic Data in model training and evaluation (SDT)

Synthetic text now appears upstream in model training and evaluation, and downstream in studies as augmentation, fill-in for data gaps, or as proxy for under-represented voices (Hämäläinen et al., 2023; Kapania et al., 2024; Wiles, 2025). Reviews and governance papers map the respective benefits and risks (Foraker et al., 2025; Goyal & Mahmoud, 2024; Loni et al., 2025; Resnik et al., 2025; Shanley et al., 2024). Controlled studies show model degradation, collapse and drift of meaning under recursive or high-synthetic regimes (Seddik et al., 2024; Shumailov et al., 2024).

For qualitative workflows, the resulting – and only vaguely traceable – evidential base of AI-models alters the evidential mix behind AI-mediated coding and summaries; and when synthetic text is analysed as data, it foregrounds provenance, authenticity, and disclosure.

G) Multimodal AI (audio/video/image)

Automatic speech recognition and speaker diarisation (meaning: who spoke when), machine translation, and video/gesture extraction are increasingly folded into qualitative and mixed-methods pipelines across disciplines (Castellanos et al., 2025; Fieldhouse et al., 2025). There is active debate about whether paralinguistic signals (e.g., tonality, affect, gaze, gesture) can be validly carried over into interpretative claims (Kushwaha, 2024).

Such tools would extend analysis beyond text, necessitating questions of interpretative fidelity regarding a realm of the social in which practices of making meaning pose far more complex obstacles. But they also raise issues of consent and privacy, transcription and segmentation fidelity (accent/language/model bias), and the interpretability of embodied or affective outputs.

H) Low-resource and non-English LLMs

Broader language coverage and better tooling continuously allow more languages to be used with or in AI, but quality and bias remain uneven across models and settings (Fieldhouse et al., 2025; Qadhi et al., 2024).

This enables more inclusive studies and cross-language comparisons, yet it also requires culturally sensitive prompting, careful back-translation or bilingual training, and validation protocols that check meaning fidelity, not just lexical overlap, before drawing interpretative claims.

3.3. Emerging Influences

I) Agentic/autonomous research assistants

Task-chaining agent systems (e.g. using Model Context Protocol (MCP)) can orchestrate retrieval, coding, writing memos, and even drafting. Early reports note clear workflow gains but also opaque authorship and accountability. It can be hard to see which decisions were made by the agent versus the analyst (Combrinck, 2024; Jiang et al., 2021; Zhang et al., 2025).

These agents can speed routine labour, but they also complicate attribution and auditability: without careful logging and disclosure, it becomes difficult to trace how interpretative steps were taken or to justify them to reviewers and participants.

J) Utilizing on-device LLMs

Running models on-device or in trusted/secure environments reduces cloud exposure during sensitive fieldwork and aligns with debates on human control/teaming with AI (Howison et al., 2024; Tsamados et al., 2025). It supports offline capture and processing, but may be constrained by computing power and speed.

These tools enable privacy-preserving handling of recordings and transcripts, yet require updated consent and security language (meaning: where processing occurs, what leaves the device), and can limit reproducibility if results depend on device-specific hardware/models and cloud-computing power. Furthermore, programming libraries of different versions may result in significantly different results (Shahriari et al., 2022).

K) Affective/voice & adversarial/poisoning risks

Tools that claim to infer sentiment or emotion from voice or text are easy to over-interpret, because affect cues are culture-, context- and genre-dependent, and model validation rarely matches the field setting (Aguilera et al., 2023; Hovy & Prabhumoye, 2021). In parallel, pipelines that depend on large-scale corpora face data-feedback effects and poisoning risks that can distort analyses (Pagan et al., 2023; Taori & Hashimoto, 2023).

Whereas such tools may enhance our interpretative capabilities, tracing interpretative fidelity in the face of potential conditionalities of their findings proves to be almost impossible.

“Simultaneously material and ideological, the dichotomies may be expressed in the following chart of transitions from the comfortable old hierarchical dominations to the scary new networks I have called the informatics of domination”

(Haraway, 1985, p. 31)

4. Epistemic Risks of Integrating AI Into QR

We address four risks that we consider most important for interpretative validity in qualitative inquiry. These risks are specific to how AI intersects with everyday qualitative practice, not generic AI-failure modes. For each risk, we show how it arises from the vectors in Section 3 and why it matters for interpretation. Again, we summarise our observations in Table 2.

Table 2.

Overview of Epistemic Risks of Integrating AI Into QR

	Risk	Explanation	Evidential status
1	Over-smoothing and novelty loss	• Risk of preserving existing categories and overlooking new developments	• documented tendency in adjacent literature
		• Risk of generating an outcome by predicting the most statistically probable sequence of data based on the prompt, input, and contextual information	• plausible QR consequence
		• Risk of overlooking subjective or deviant positions	• plausible QR consequence
2	Self-validation and authenticity drift (toward fossilisation)	• Risk of (falsely) stabilising existing knowledge through outputs that re-enter the pipeline as inputs	• documented recursive degradation/synthetic drift;
		• Ethical risks by inadequate use of synthetic data	• potential QR consequence
		• Risk of losing context sensitivity	• potential QR consequence
3	Performativity amplification, lexical anticipation, and outlier suppression	• Risk of further privileging hegemonic positions through research based on statistical probabilities, while opposing positions no longer appear	• documented feedback-loop effects
3			• plausible QR consequence
4	Externalized reflexivity, de-skilling, and governance-by-pipeline	• Risk of delegating or externalising not only key decisions in the data evaluation process, but also in reflecting on the quality of these decisions.	• primarily methodological/normative concern
4			• supported by emerging practice-related literature

The risks discussed below do not all have the same evidential status. Some are directly documented in adjacent empirical literatures, such as over-smoothing in computational text analysis, feedback effects in human-AI interaction, or degradation under recursive synthetic training conditions. Others are plausible methodological risks for AI-mediated qualitative research inferred from those findings and from common workflow features. A third subset is forward-looking and normative: concerns not yet systematically demonstrated in qualitative research, but important to articulate because they bear on emerging standards of interpretative accountability.

4.1. Over-Smoothing and Novelty Loss

As shown in Section 3, AI-mediation may shorten analytic cycles and stabilize labels. The process of gradually condensing observations into a new and creative interpretation based on empirical material (as in grounded theory approaches, for example), which is quite common in qualitative research, can be lost in AI-mediated designs because these operate within the framework of already known, sedimented categories and knowledge systems. This tends to suppress an open search for clues in the material, which may even be inspired by memories of (and notes on) the original interview contexts.

Mechanistically, large corpora and templated prompting speed up the diffusion of sedimented categories, topic models and auto-coding can over-smooth context (Bednarek, 2024). Concurrently, BQD-related studies report productivity gains alongside mixed meaning fidelity (Bijker et al., 2024; Chandrasekar et al., 2024; Liu et al., 2025). Widely available AI-models are inherently probabilistic, meaning they generate an outcome by predicting the most statistically probable sequence of data based on the prompt, additional input, and contextual information. For that reason, subjective or deviant positions in the data are hard to predict and, moreover, are therefore likely to be overlooked. This contradicts the claim that QR should give voice and visibility to subjective, divergent and deviant views. For QR this can mean premature consensus, fewer genuinely new codes, and thinner description as codebooks travel across sites without re-grounding. On a more political note, special, unique or rare features in qualitative data may be overlooked, giving grounds for synthetic majorities to emerge (van Dijck, 2014). These dynamics risk epistemic injustice by privileging majoritarian or model-amplified voices over minority ones (Peters & Chin Yee, 2025).

Empirical support is strongest for smoothing tendencies and mixed meaning fidelity in adjacent computational studies; the specific consequence for novelty loss in qualitative inquiry is a methodological inference developed here.

4.2. Self-Validation and Authenticity Drift (Toward Fossilisation)

Synthetic/augmented data and fine-tuning of AI-mediated analysis, often combined with retrieval-augmented generation, enable outputs to re-enter the pipeline as inputs. When synthetic tokens are mixed upstream (during training or evaluation) or downstream (in the course of study input), prior classifications can be tacitly validated against surrogates of themselves. Moreover, empirical work documents degradation, and cautions against a potential model collapse and drift under recursive or high-synthetic regimes (Shumailov et al., 2024), while ethics- or governance-related contributions call for explicit disclosure of synthetic exposure (Resnik et al., 2025; Shanley et al., 2024). For QR, the danger is authenticity drift: categories may stabilize even as lived practice changes. The increasing distance from empirical reality could even self-reinforce to the point where empirical life could be seen as the deviant.

We therefore treat authenticity drift in qualitative workflows as a plausible downstream risk, not as a directly demonstrated endpoint in current QR practice.

4.3. Performativity Amplification, Lexical Anticipation, and Outlier Suppression

If QR is paradigmatically justified by being sensitive to the subjective and deviant beyond statistical representativeness and hegemonic interpretations, then AI often counteracts this claim. Similar to Risk 1, the interplay of AI-mediated analysis, large-corpus work, and retrieval-augmented generation can amplify the performative circulation of analyst labels (Castellanos et al., 2025; Chandrasekar et al., 2024; Zhang et al., 2025). The reuse of prompt templates and codebooks standardizes practices while retrieval tends to bring back thematically similar, already-labelled passages (called retrieval lock-in), overshadowing negative or rare cases (Bednarek, 2024). Evidence on human-AI feedback loops shows such interactions can magnify bias. For QR this appears as lexical anticipation by participants, an apparent thinning of negative cases, and faster policy travel of quasi-standardized social categories. On a political note, this may be utilized to maliciously create categories and flood training data to more profoundly skew our tools of social research.

Human-AI feedback effects are empirically documented, whereas lexical anticipation and intensified outlier suppression in qualitative settings remain a potential but credible concern.

4.4. Externalized Reflexivity, de-skilling, and Governance-By-Pipeline

The use of AI in qualitative research requires not only an examination of the processes that take place within the system, but above all an examination of the decisions made regarding the use of AI. If AI is primarily viewed as a means of making work easier – which it can often be – its performative character is underestimated and reflexive vigilance can decline or is externalised to AI, too. Yet, as current iterations of LLMs rely on probabilistically modelling language to simulate reasoning, or to support it, this must not be confused with reflexive reasoning itself (Jowsey et al., 2025).

As AI-mediated qualitative analysis becomes routine, reflexive labour risks being offloaded to prompts and templates, while provenance and traceability remain uneven, and agent assistants blur authorship and accountability. The literature flags ethical cautions for AI in QR and offers guidance for responsible use (Davison et al., 2024), and technical work shows that end-to-end provenance capture is feasible (Schlegel & Sattler, 2025). Alignment choices embed political values into ostensibly neutral tools, leading to questions about the regressive potential of AI-futures. Therefore, researchers remain responsible for making those value-imprints visible. For QR, the risk revolves around interpretative de-skilling, reproducibility gaps, and a drift toward policy-by-proxy as vendor defaults encode e.g. governance choices.

This risk is primarily methodological and normative: it concerns what researchers may cease to notice, justify, or disclose as AI-routines become infrastructural.

“… taking responsibility for the social relations of science and technology means (…) embracing the skillful task of reconstructing the boundaries of daily life (…), in communication with all of our parts.”

(Haraway, 1985, p. 59)

5. Provisional, Reportable Evaluative Criteria for QR in the Age of AI

The following reportable criteria are proposed as provisional evaluative anchors for AI-mediated qualitative research. They are not intended as universal thresholds, nor as a closed checklist that could be applied uniformly across qualitative traditions. Rather, their function is to make AI-mediated interpretative work reportable, inspectable, and contestable: they ask whether claims remain empirically refreshed, whether provenance and recursive output re-entry are visible, whether heterogeneity and negative cases remain analytically available, and whether the conditions of AI-mediation are reflexively disclosed. In this sense, the criteria structure justification and critique without foreclosing debate about their future refinement.

The criteria extend classic commitments to credibility/trustworthiness (Lincoln & Guba, 1985) or reflexivity/transparency (Macbeth, 2001), adapting them to AI-mediated workflows. Because this article is a conceptual intervention, these criteria are deliberately proposed in provisional form. Their purpose is not to close discussion, but to open a more explicit debate about how AI-mediated qualitative work should be described and evaluated as practices, tools, and standards evolve.

Precisely, Tracy (2010, p. 840) identifies eight QR-criteria related to the topic, rigor, sincerity, credibility, resonance (with an audience) and contribution (to a state-of-the-art), research ethics, and meaningful coherence. Interestingly, whilst we consider our suggestions consequential, they also seem to stray from qualitative criteria due to the need to cover the hybrid regime of AI-mediated QR.

By proposing the following criteria, we aim, firstly, at introducing potential additional criteria for those research designs employing AI-mediated practices. Yet, secondly, we aim at triggering a deepened reflection with the potential ramifications of AI-mediated qualitative research. Given the rapid advancements in AI-technologies and their implementation into research designs, our criteria can only serve as a temporary snapshot of such reflections to deliberately allow for an ongoing, critical, and restless debate in the face of current and future technological advancements. With further progress, these criteria need to be doubted, substituted, and/or refined to match upcoming innovations and developments.

5.1. Empirical Refresh Ratio

5.1.1. Rationale

When pipelines reuse their own outputs or include synthetic text, periodic encounters with lived worlds help prevent adverse model behaviour (e.g., Shumailov et al., 2024). Ethics/governance work urges clear handling of synthetic exposure (Resnik et al., 2025). As empirical research flourishes from empirical corrections, the impact of synthetic data needs to be explicitly accounted for. Whereas the role of synthetic data in model training may not be influenced in general, its role in QR may be hedged through transparency.

5.1.2. Definition

Share of newly collected, empirical evidence used at each research stage (e.g., coding, model evaluation, write-up).

5.1.3. Evaluative Function

This criterion asks whether interpretative claims continue to be corrected by newly encountered empirical material rather than being progressively stabilized through recycled or synthetic inputs.

5.1.4. Possible Reporting

Indicate the proportions or relative shares present at each workflow stage; define a recency window (e.g. % of material collected in the last 12 months) and justify any exceptions (e.g. historical corpora).

5.1.5. Implementation

Whereas the necessity of empirical data cannot be overstated, it is difficult to specify a fixed rate because this cannot be justified independently of the research problem. Rather than a rigid specification, it is therefore more important to provide a transparent representation of the extent to which and at which points primary, secondary and synthetic material is fed in (e.g. in the transcription, in the coding process, in the formation of categories). For instance, researchers may be asked for each analytical stage whether the input was either newly collected empirical material, previously collected empirical material, or synthetic or model-generated material.

5.2. Provenance & Recursivity Accounting

5.2.1. Rationale

Audit literature documents licensing and attribution gaps (Longpre et al., 2024), although end-to-end provenance capture is technically feasible (Schlegel & Sattler, 2025). Ethics guidance recommends declaring synthetic exposure and AI-roles (Resnik et al., 2025; Shanley et al., 2024). Accordingly, it is necessary and possible to document the process in which the researcher uses AI to perform analytical steps, to respond to the (interim) results produced, and to prepare conclusions.

5.2.2. Definition

A statement of where data, labels, and tools come from and how much output re-entry (recursion) the pipeline contains.

5.2.3. Evaluative Function

This criterion asks whether the evidential lineage of data, labels, prompts, and tools is visible enough for readers to identify where recursive output re-entry may have tightened the analytical loop.

5.2.4. Possible Reporting

List dataset licenses and attribution; disclose estimated share of synthetic tokens in upstream tools/inputs; note codebook or prompt inheritance, fine-tunes, and vendor constraints; state the alignment regime (RLHF/RLAIF), whether instruction-tuning used human vs synthetic prompts, and whether annotator guidelines are public.

5.2.5. Implementation

This criterion is not concerned with numerical information, but conclusive and comprehensive documentation concerning the analytical pipeline (data, labels, tools). Whereas completeness may not be achievable (as AI-service providers may license models, and retrain them – creating a multi-layer cascade of documentation), an as-comprehensive-as-possible paradigm should be followed. This will require keeping these documentary necessities in mind already in the stage of designing, deciding, and executing the AI-mediated analytical steps. As technical advancements progress, more or different documentation will be necessary. A minimum set of items for disclosure may include where the data, labels, codebooks, and prompts came from, which tools and versions were used, where recursive output re-entry occurred, and what vendor or licensing limits remained.

5.3. Heterogeneity Preservation Assessment

5.3.1. Rationale

Topic-model critiques warn against over-smoothing (Bednarek, 2024); saturation debates caution against premature closure (Tight, 2024), and others emphasize preserving nuance (Braun & Clarke, 2025). This is certainly also a problem with ‘manual’ coding methods, but it tends to be exacerbated by AI because the immediate decision is not made by the researcher working with the material, but is generated by prompting. As analytical over-smoothing of rare and unique cases contradicts the premise and strengths of QR, the impact of respective adverse effects of AI-implementation needs to be assessed and documented. As a measure of how strongly the diversity, inconsistency and marginality of empirical statements are retained in the course of an AI-mediated analysis, the Heterogeneity Preservation Assessment (HPA) is intended to indicate the extent to which individual or rare positions (statements) remain recognisable or have been subsumed. It thus indicates the extent (and, if applicable, the analytical step) to which the number of codes and cases decreases as a result of summarisation.

5.3.2. Definition

A measure of diversity in codes/cases over time (e.g., code distributions across sites/cases).

5.3.3. Evaluative Function

This criterion asks whether deviant, contradictory, low-frequency, or minor material remains analytically visible rather than being absorbed into model-amplified regularities.

5.3.4. Possible Reporting

Report changes in heterogeneity across stages; flag notable drops; describe steps to protect negative/outlier cases (e.g., targeted sampling, counter-queries).

5.3.5. Implementation

The HPA specifically concerns the methodological tool of “categories” in coding-centric qualitative research designs. This criterion may be difficult to implement in general, as research contexts, practices of establishing and using categories, and assessing their intercoder-comparability vary. We do not propose a single standardized calculation of heterogeneity preservation. Depending on research design, this criterion may be rendered numerically, comparatively, or narratively. Its central purpose is not to impose a universal metric, but to require explicit reflection on whether AI-mediated coding, summarization, or retrieval smooths away deviant, marginal, contradictory, or low-frequency material. At minimum, authors should indicate where heterogeneity may have been reduced, how negative or outlier cases were checked, and how the remaining loss of variation is justified. In the following, we therefore suggest two options of assessing an HPA – yet explicitly invite further debate for practical implementation:

Option 1: As the HPA needs to be a measure for the change of variability, it assesses how specific analytical steps in AI-mediated designs produce different results (for similar data) if done manually by humans. It may be a numerical representation of the relation between a set amount of possible topical categories that may emerge in a given dataset (e.g., an array of interviews) and the actual number of categories coded in this dataset. It compares an AI-mediated design and a manual human-centric design.

Step 1 needs to be the definition of a baseline-case that was coded manually (at best: using an intercoder design featuring several coders that have completed intercoder coaching), and for which an HPA(0) needs to be established. Whilst this may be elusive in small projects, manually coded datasets for various topics are already currently available as baseline assessments and may serve as proxies for baselines. Following an iterative process, the next steps need to assess the HPA after AI-mediated coding steps (e.g., after each coding pass using AI), and compare them to a baseline HPA.

Option 2: As over-smoothing rare cases is a core concern, the HPA may focus on the preservation of rare cases or outliers which – if coded by a human – only occur in the lowest percentile of frequency. Whereas this may be calculated, we could simulate their preservation by proxy: In such a design, a number of segments in the datasets may be designated that are considered deviant, marginal, or contradictory. This decision may be taken based on conceptual premises. An HPA would then be a non-numerical and more narrative assessment of the extent to which these marginal cases are represented or vanish in AI-mediated findings.

5.4. Reflexive AI-Audit

5.4.1 Rationale

Information-systems and qualitative methods work call for reflexive disclosure of AI use (Davison et al., 2024; Hayes, 2025); some authors encourage reporting guidelines following values-based transparency (Braun & Clarke, 2025).

5.4.2. Definition

A transparent account of the models, training libraries, used hardware, prompts, retrieval sources and snapshots, parameters and seeds, and human oversight used in the analysis (as a supplement to criterion 5.2).

5.4.3. Evaluative function

This criterion asks whether the conditions of AI-mediation are disclosed in enough detail for the interpretation to be inspectable and contestable.

5.4.4. Possible reporting

Provide model family and version; intervention points (transcription, retrieval, coding, summarizing); prompt snippets or link to a prompt repository; retrieval sources with snapshot dates; parameters/seeds where applicable; oversight steps; known limitations. Indicate search strategy and link to the inclusion/exclusion table for discovered materials.

5.4.5. Implementation

Reflexive AI-audit means comprehensive documentation of AI’s involvement in the research process, including disclosure of the exact specifications of the tools used and their application – either as a reflective text or in accordance with defined protocols. Currently, (topic-specific) protocols are still the exception; however, it is very likely that this will change with the expansion of the use of AI-tools, not least in order to comply with the rules of good scientific practice. The audit may be provided in a methods appendix, a supplementary table, or a protocol-like disclosure note.

“The cyborg does not dream of community on the model of the organic family (…). The cyborg would not recognize the Garden of Eden.”

(Haraway, 1985, p. 8).

6. Discussion and Conclusion

Qualitative interpretation depends on situated judgment, responsiveness to difference, and the possibility of being corrected by the material. In contrast to Jowsey et al. (2025), we do not generally doubt the potential of AI for QR. Yet, AI-mediated QR must confront a structural tension: current models generate statistically probable continuations, whereas interpretative inquiry seeks to reconstruct meaning, contradiction, and context in ways that cannot be reduced to probability alone. AI may therefore assist interpretative work, but it does not remove the researcher’s responsibility for reflexive judgment.

Our claim is therefore not that AI creates looping from scratch, but that it intensifies one dimension of looping (tempo) and transforms another (evidential re-entry). Long before LLMs, CAQDAS (e.g., Atlas.ti, NVivo, MAXQDA) already sped up and supported coding, retrieval, and cross-project reuse. Likewise, recursive dynamics were discussed in non-AI settings under performativity, retroaction, and counter-performativity (see Hostniker & Meyer, 2024). What is distinctive today is (i) the magnitude and speed of these dynamics in AI-mediated pipelines, and (ii) their partial decoupling from renewed field encounters – through output re-entry via prompt libraries, fine-tunes, and synthetic data.

Our critique rests on a normative judgment, not an empirical claim: researchers often treat some degree of acceleration and recursion as acceptable when it is human-driven, and problematic when it is AI-mediated. The boundary is fragile and contingent, and overlaps with debates about how much human control or “teaming” with AI is desirable or possible (Tsamados et al., 2025). Noting this boundary surfaces a hidden premise: we want humans to remain agents of social change, or at least accountable stewards of the classifications we circulate, as “[o]ur own mythology consists in imagining ourselves as radically different, even before searching out small differences and small divides.” (Latour, 1993, p. 116)

As Latour argues, modern social change is better understood by attending to mediations (the hybrids of nature and society) rather than presuming a purified separation between them (Latour, 1993). In this perspective, actants are any human or nonhuman entities that participate in networks of association; agency is not possessed in isolation but emerges from relations among these actants (Latour, 2005). This shift from purified domains to mediating networks helps to explain the nature of change – and, to some extent, the core conflict of paradigms at hand. As a matter of fact, most foundation models are produced and governed by large vendors; thus, vendor defaults can steer qualitative methods and topics. Keeping accountability with the researcher requires disclosure and, where possible, auditable open pipelines.

Although we sympathise with Jowsey et al. (2025), change is already underway, across fields and within qualitative research, as research teams adopt AI-tools and practices. Consequently, QR needs to adapt. This descriptive claim simply notes that the widespread circulation of large, industry-built foundation models is creating a new state of the art, one that will shape what counts as evidence, how we analyse it, and the conclusions we draw.

Since widely accessible large language models entered practice only in late 2022, the field is changing faster than its standards. The abundance of new capabilities risks rendering established assumptions, procedures, and quality heuristics obsolete. We therefore call for explicit normative commitments for knowledge production using AI, detailing how evidence is collected, traced, audited, interpreted, and governed – both in general and for qualitative research in particular in this new age of techno-epistemology.

Epilogue

Should Qualitative Inquiry dream of Eden? Is there any garden to return to outside the circuitry of our tools and the endless cycles of progress, mediation, and regress?

Our criteria are not commandments but commitments that keep methods porous to the lives they read. We do not need to quietly inherit the hierarchies of technological corporatism. The criteria we offer will not open the loop; they render it visible. They stage conditions under which error can be seen, contested, and repaired. They are cyborg practices for a cyborg craft: neither anti-machine nor credulous, but accountable.

And accountability, in the end, is less a rule than a posture, and the readiness to be addressed by what we study and by those who will live with our classifications.

Footnotes

Acknowledgements

We thank the anonymous reviewers and our colleague Dr. Dominik Kremer (Leibniz Institute for Regional Geography) for their constructive feedback.

ORCID iDs

Frank Meyer

Paul Nguyễn

Author Contributions

All authors contributed to methodological discussions on incorporating AI. Frank Meyer conceptualized, refined, organised, and created the manuscript. Judith Miggelbrink and Paul Nguyễn supported the manuscript creation, the editorial work, and the revisions.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded through the German Research Foundation (DFG), grant no. 531002243.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is therefore not applicable.*

Notes

References

Aguilera

Mellado

Rojas

(2023). An Assessment of In-the-Wild Datasets for Multimodal Emotion Recognition. Sensors, 23(11), 5184. https://doi.org/10.3390/s23115184

Bai

Jones

Ndousse

Askell

Chen

DasSarma

Drain

Fort

Ganguli

Henighan

Joseph

Kadavath

Kernion

Conerly

El-Showk

Elhage

Hatfield-Dodds

Hernandez

Hume

Kaplan

(2022). Training a helpful and harmless assistant with RLHF. arXiv Preprint. https://arxiv.org/abs/2204.05862

Bauer

Gill

(2024). Mirror, mirror on the wall: Algorithmic assessments, transparency, and self-fulfilling prophecies. Information Systems Research, 35(1), 226–248. https://doi.org/10.1287/isre.2023.1217

Becker

H. S.

(1998). Tricks of the trade: How to think about your research while you’re doing it. University of Chicago Press.

Bednarek

(2024). Topic modelling in corpus-based discourse analysis: Uses and critiques. Discourse Studies, 27(4), 659–671. https://doi.org/10.1177/14614456241293075

Bendel Larcher

(2023). Linguistische Diskursanalyse. Ein Lehr-und Arbeitsbuch. Narr Francke Attempto. https://doi.org/10.24053/9783823395867

Bijker

Merkouris

S. S.

Dowling

N. A.

Rodda

S. N.

(2024). ChatGPT for automated qualitative research: Content analysis. Journal of Medical Internet Research, 26, e59050. https://doi.org/10.2196/59050

Bowker

G. C.

Star

S. L.

(2000). Sorting things out: Classification and its consequences. MIT Press.

Braun

Clarke

(2025). Reporting guidelines for qualitative research: A values-based approach (BQQRG). Qualitative Research in Psychology, 22(2), 399–438. https://doi.org/10.1080/14780887.2024.2382244

10.

Brown

Hod

Kalemaj

(2022). Performative prediction in a stateful world. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS 2022), A Virtual Conference, 28-30 March 2022. Proceedings of Machine Learning Research. https://proceedings.mlr.press/v151/brown22a.html

11.

Burrell

Fourcade

(2021). The society of algorithms. Annual Review of Sociology, 47(1), 213–237. https://doi.org/10.1146/annurev-soc-090820-020800

12.

Castellanos

Jiang

Gomes

Vander Meer

Castillo

(2025). Large language models for thematic summarization in qualitative health care research: Comparative analysis of model and human performance. JMIR AI, 4, e64447. https://doi.org/10.2196/64447

13.

Chandrasekar

Clark

S. E.

Martin

Vanderslott

Flores

E. C.

Aceituno

Barnett

Vindrola-Padros

Vera San Juan

(2024). Making the most of big qualitative datasets: A living systematic review of analysis methods. Frontiers in Big Data, 7, 1455399. https://doi.org/10.3389/fdata.2024.1455399

14.

Chubb

Cowling

Reed

(2022). Speeding up to keep up: Exploring the use of AI in the research process. AI & SOCIETY, 37(4), 1439–1457. https://doi.org/10.1007/s00146-021-01259-0

15.

Combrinck

(2024). A tutorial for integrating generative AI in mixed methods data analysis. Discover Education, 3(116), 116. https://doi.org/10.1007/s44217-024-00214-7

16.

Curry

Baker

Brookes

(2024). Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT. Applied Corpus Linguistics, 4(1), 100082. https://doi.org/10.1016/j.acorp.2023.100082

17.

Daston

Galison

(2007). Objectivity. Zone Books.

18.

Davison

Chughtai

Nielsen

Marabelli

Iannacci

van Offenbeek

Tarafdar

Trenz

Techatassanasoontorn

Díaz Andrade

Panteli

(2024). The ethics of using generative AI for qualitative data analysis. Information Systems Journal, 34(5), 1433–1439. https://doi.org/10.1111/isj.12504

19.

Elliott

(2019). The looping effects of enhancement technologies. Journal of bioethical inquiry, 16(1), 127–131. https://doi.org/10.1007/s11673-018-9893-2

20.

Espeland

W. N.

Sauder

(2007). Rankings and reactivity: How public measures recreate social worlds. American Journal of Sociology, 113(1), 1–40. https://doi.org/10.1086/517897

21.

Espeland

W. N.

Stevens

M. L.

(1998). Commensuration as a social process. Annual Review of Sociology, 24(1), 313–343. https://doi.org/10.1146/annurev.soc.24.1.313

22.

Fieldhouse

J. K.

Randhawa

Wolking

Genovese

B. N.

Mazet

J. A. K.

Desai

(2025). The intersection of artificial intelligence with qualitative or mixed methods for communicable disease research: A scoping review. Public Health, 248(November), 105961. https://doi.org/10.1016/j.puhe.2025.105961

23.

Flick

(2019). The concepts of qualitative data: Challenges in neoliberal times for qualitative Inquiry. Qualitative Inquiry, 25(8), 713–720. https://doi.org/10.1177/1077800418809132

24.

Foraker

Morrow

J. D.

Johnson

J. A.

Wilcox

A. B.

Forster

A. J.

Payne

P. R. O.

(2025). Understanding synthetic data: Artificial datasets for real-world evidence. BMJ Evidence-Based Medicine, 31(3), 148–151. https://doi.org/10.1136/bmjebm-2024-113617

25.

Fourcade

Healy

(2017). Classification situations: Life-chances in the neoliberal era. Historical Social Research, 42(1), 23–51. https://doi.org/10.12759/hsr.42.2017.1.23-51

26.

Fourcade

Healy

(2017). Seeing like a market. Socio-Economic Review, 15(1), 9–29. https://doi.org/10.1093/ser/mww033

27.

Gauld

Nicolle

Constant

Gagné-Julien

A.-M.

(2025). The role of clinicians in the looping effect: epistemic injustices and looping breaks. Medicine, Health Care and Philosophy, 28(3), 561–576. https://doi.org/10.1007/s11019-025-10279-2

28.

Geertz

(1973). The interpretation of cultures. Basic Books.

29.

Glaser

B. G.

Strauss

A. L.

(1967). The discovery of grounded theory: Strategies for qualitative research. Aldine Publishing Company.

30.

Glickman

Sharot

(2025). How human–AI feedback loops alter human perceptual, emotional and social judgements. Nature Human Behaviour, 9(2), 345–359. https://doi.org/10.1038/s41562-024-02077-2

31.

Golafshani

(2003). Understanding Reliability and Validity in Qualitative Research. The Qualitative Report, 8(4), 597–607. https://doi.org/10.46743/2160-3715/2003.1870

32.

Goyal

Mahmoud

Q. H.

(2024). A systematic review of synthetic data generation techniques using generative AI. Electronics, 13(17), 3509. https://doi.org/10.3390/electronics13173509

33.

Gür-Şeker

(2014). Zur Verwendung von Korpora in der Diskurslinguistik. In Angermüller

Nonhoff

Herschinger

Macgilchrist

Reisigl

Wedl

Wrana

Ziem

(Eds.), Kompendium der interdisziplinären Diskursforschung (pp. 583–603). transcript. https://doi.org/10.14361/transcript.9783839427224.583

34.

Hacking

(1995). The looping effects of human kinds. In Sperber

Premack

A. J.

(Eds.), Causal Cognition: A Multidisciplinary Debate (pp. 351–383). Clarendon Press, Oxford University Press.

35.

Hacking

(1999/2000). The Social Construction of What? Harvard University Press.

36.

Hacking

(2006). Kinds of people: Moving targets: British Academy Lecture. In Marshall

P. J.

(Ed.), Proceedings of the British Academy, 151 (p. 285-317), 2006 Lectures. https://doi.org/10.5871/bacad/9780197264249.003.0010

37.

Haraway

(1985). A manifesto for cyborgs: Science, technology, and socialist-feminism in the 1980s. Socialist Review, 80, 65–108.

38.

Haraway

(1988). Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective. Feminist Studies, 14(3), 575–599. https://doi.org/10.2307/3178066

39.

Hayes

A. S.

(2025). “Conversing” with qualitative data: Enhancing qualitative research through Large Language Models (LLMs). International Journal of Qualitative Methods, 24(February), 16094069251322346. https://doi.org/10.1177/16094069251322346

40.

Hämäläinen

Tavast

Kunnari

(2023). Evaluating large language models in generating synthetic HCI research data: A case study. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), Hamburg Germany, April 23 - 28, 2023, Association for Computing Machinery. https://doi.org/10.1145/3544548.3580688

41.

Hostniker

Meyer

(2024). Blinded by the lights. Improvisational theater as a method for researching regional identities. Forum: Qualitative Social Research, 25(3), https://doi.org/10.17169/fqs-25.3.4223

42.

Hovy

Prabhumoye

(2021). Five sources of bias in natural language processing. Language and Linguistics Compass, 15(8), e12432. https://doi.org/10.1111/lnc3.12432

43.

Howison

Angell

Hasting

J. S.

(2024). Protecting sensitive data with secure data enclaves. Digital Government: Research and Practice, 5(2), 14. https://doi.org/10.1145/3643686

44.

Jäger

Zimmermann

(2019). Lexikon Kritische Diskursanalyse. Eine Werkzeugkiste. Unrast Verlag.

45.

Jiang

J. A.

Wade

Fiesler

Brubaker

J. R.

(2021). Supporting serendipity: Opportunities and challenges for human–AI collaboration in qualitative analysis. Proceedings of the ACM on Human–Computer Interaction, 5(94), 1–23. https://doi.org/10.1145/3449168

46.

Jowsey

Braun

Clarke

Lupton

Fine

(2025). We Reject the Use of Generative Artificial Intelligence for Reflexive Qualitative Research. Qualitative Inquiry, https://doi.org/10.1177/10778004251401851

47.

Kapania

Agnew

Eslami

Heidari

Fox

(2024). Simulacrum of stories: Examining large language models as qualitative research participants. arXiv Preprint. https://arxiv.org/abs/2409.19430

48.

Keller

(2011). Wissenssoziologische Diskursanalyse. Grundlegung eines Forschungsprogramms. VS Verlag für Sozialwissenschaften. https://doi.org/10.1007/978-3-531-92058-0

49.

Knoblauch

(2008). Sinn und Subjektivität in der qualitativen Forschung. In Kalthoff

Hirschauer

Lindemann

(Eds.), Theoretische Empirie. Zur Relevanz qualitativer Forschung (pp. 210–233). Campus.

50.

Kushwaha

(2024). AI and non-verbal communication: Enhancing understanding of emotional cues for hearing impairment children. In Ahmad

Hussain

M. I.

Mustaqueem

Kushwaha

R. K.

(Eds.), Transforming Learning: The Power of Educational Technology (pp. 13–27). BlueRose One.

51.

Latour

(1993). We have never been modern. Harvard University Press.

52.

Latour

(2005). Reassembling the social. An introduction to Actor-Network Theory. Oxford University Press.

53.

Lincoln

Y. S.

Guba

E. G.

(1985). Naturalistic Inquiry. Sage.

54.

Lindholm

S. K.

Wickström

(2020). ‘Looping effects’ related to young people’s mental health: How young people transform the meaning of psychiatric concepts. Global Studies of Childhood, 10(1), 26–38. https://doi.org/10.1177/2043610619890058

55.

Liu

Zambrano

A. F.

Baker

R. S.

Barany

Ocumpaugh

Zhang

Pankiewicz

Nasiar

Wei

(2025). Qualitative coding with GPT-4: Where it works better. Journal of Learning Analytics, 12(1), 169–185. https://doi.org/10.18608/jla.2025.8575

56.

Longpre

Mahari

Chen

Obeng-Marnu

Sileo

Brannon

Muennighoff

Khazam

Kabbara

Perisetla

Shippole

Bollacker

Villa

Pentland

Hooker

(2024). A large-scale audit of dataset licensing and attribution in AI. Nature Machine Intelligence, 6(8), 975–987. https://doi.org/10.1038/s42256-024-00878-8

57.

Loni

Poursalim

Asadi

Gharehbaghi

(2025). A review on generative AI models for synthetic medical text, time series, and longitudinal data. npj Digital Medicine, 8(1), 281. https://doi.org/10.1038/s41746-024-01409-w

58.

Macbeth

(2001). On “Reflexivity” in Qualitative Research: Two Readings, and a Third. Qualitative Inquiry, 7(1), 35–68. https://doi.org/10.1177/107780040100700103

59.

MacKenzie

(2006). An Engine, not a camera: How financial models shape markets. The MIT Press.

60.

Mayring

(2025). Qualitative content analysis with ChatGPT: Pitfalls, rough approximations and gross errors. A field report. Forum: Qualitative Social Research, 26(1), https://doi.org/10.17169/fqs-26.1.4252

61.

Mennicken

Espeland

W. N.

(2019). What’s new with numbers? Sociological approaches to the study of quantification. Annual Review of Sociology, 45(1), 223–245. https://doi.org/10.1146/annurev-soc-073117-041343

62.

Meyer

Miggelbrink

(2013). The subject and the periphery. About discourses, loopings and ascriptions. In Fischer-Tahir

Naumann

(Eds.), Peripheralization. The making of spatial dependencies and social injustice (pp. 207–233). Springer. https://doi.org/10.1007/978-3-531-19018-1_10

63.

Nguyen

Meyer

(2026). Conceptualizing Rural Populism: The Heartland and Processes of Interpellation and Transformation. Berichte. Geographie und Landeskunde, 99(1), 41–63. https://doi.org/10.25162/bgl-2026-0004

64.

Nguyen

D. C.

Welch

(2025). Generative artificial intelligence in qualitative data analysis: analyzing—or just chatting? Organizational Research Methods, 29(1), 3–39. https://doi.org/10.1177/10944281251377154

65.

Nguyen-Trung

(2025). ChatGPT in thematic analysis: Can AI become a research assistant in qualitative research? Quality & Quantity, 59(6), 4945–4978. https://doi.org/10.1007/s11135-025-02165-z

66.

Ouyang

Jiang

Almeida

Wainwright

Mishkin

Zhang

Agarwal

Slama

Ray

Schulman

Hilton

Kelton

Miller

Simens

Askell

Welinder

Christiano

Leike

Lowe

(2022). Training language models to follow instructions with human feedback. arXiv Preprint. https://doi.org/10.48550/arXiv.2203.02155

67.

Pagan

Baumann

Elokda

De Pasquale

Bolognani

Hannák

(2023). A Classification of feedback loops and their relation to biases in automated decision-making systems. In Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO '23), Boston MA USA, 30 October 2023- 1 November 2023, Association for Computing Machinery. https://doi.org/10.1145/3617694.3623227

68.

Pattyn

(2024). The value of generative ai for qualitative research: A pilot study. Journal of Data Science and Intelligent Systems, 3(3), 184–191. https://doi.org/10.47852/bonviewJDSIS42022964

69.

Paulson

E. L.

O’Guinn

T. C.

(2018). Marketing social class and ideology in post-world-war-two american print advertising. Journal of Macromarketing, 38(1), 7–28. https://doi.org/10.1177/0276146717733788

70.

Paulus

T. M.

Marone

(2025). “In minutes instead of weeks”: Discursive constructions of generative ai and qualitative data analysis. Qualitative Inquiry, 31(5), 395–402. https://doi.org/10.1177/10778004241250065

71.

Perdomo

Zrnic

Mendler-Dünner

Hardt

(2020). Performative prediction. Proceedings of Machine Learning Research, 119, 7599–7609. https://proceedings.mlr.press/v119/perdomo20a.html

72.

Perkins

Roe

(2024). The use of Generative AI in qualitative analysis: Inductive thematic analysis with ChatGPT. Journal of Applied Learning and Teaching, 7(1), 390–395. https://doi.org/10.37074/jalt.2024.7.1.22

73.

Peters

Chin Yee

(2025). Generalization bias in large-language-model summarization of scientific research. Royal Society Open Science, 12(4), 1-1. https://doi.org/10.1098/rsos.241776

74.

Pilati

Munk

A. K.

Venturini

(2024). Generative AI for social research: Going native with artificial intelligence. Sociologica, 18(2), 1–8. https://doi.org/10.6092/issn.1971-8853/20378

75.

Pink

Korsmeyer

Lyall

(2025). Generative AI and broken futures. Qualitative Inquiry, Online First, 10778004251358070. https://doi.org/10.1177/10778004251358070

76.

Qadhi

S. M.

Alduais

Chaaban

Khraisheh

(2024). Generative AI, research ethics, and higher education research: Insights from a scientometric analysis. Information, 15(6), 325. https://doi.org/10.3390/info15060325

77.

Resnik

D. B.

Hosseini

Kim

J. J. H.

Epiphaniou

Maple

(2025). GenAI synthetic data create ethical challenges for scientists: Here’s how to address them. Proceedings of the National Academy of Sciences, 122(9), e2409182122. https://doi.org/10.1073/pnas.2409182122

78.

Rosa

(2013). Social acceleration: A new theory of modernity. Columbia University Press.

79.

Schlegel

Sattler

K.-U.

(2025). Capturing end-to-end provenance for machine learning pipelines. Information Systems, 132(July), 102495. https://doi.org/10.1016/j.is.2024.102495

80.

Seddik

M. E. A.

Chen

S.-W.

Hayou

Youssef

Debbah

(2024). How bad is training on synthetic data? A statistical analysis of language model collapse. arXiv Preprint. https://arxiv.org/abs/2404.05090

81.

Shahriari

Ramler

Fischer

(2022). How do deep-learning framework versions affect the reproducibility of neural network models? Machine Learning and Knowledge Extraction, 4(4), 888–911. https://doi.org/10.3390/make4040045

82.

Shanley

Hogenboom

Lysen

Wee

Lobo Gomes

Dekker

Meacham

(2024). Getting real about synthetic data ethics: Are AI ethics principles a good starting point for synthetic data ethics? EMBO Reports, 25(5), 2152–2155. https://doi.org/10.1038/s44319-024-00101-0

83.

Shumailov

Shumaylov

Zhao

Papernot

Anderson

Gal

(2024). AI models collapse when trained on recursively generated data. Nature, 631(8022), 755–759. https://doi.org/10.1038/s41586-024-07566-y

84.

Stiennon

Ouyang

Ziegler

Lowe

Voss

Radford

Amodei

Christiano

P. F.

(2020). Learning to summarize with human feedback. In Larochelle

Ranzato

Hadsell

Balcan

M. F.

Lin

(Eds.), Advances in neural information processing systems 33 (NeurIPS 2020)(p. ). Proceedings. https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html

85.

Stoltz

D. S.

Taylor

M. A.

(2024). Mapping texts: Computational text analysis for the social sciences. Oxford University Press. https://doi.org/10.1093/oso/9780197756874.001.0001

86.

Sun

Kim

Choi

(2025). A methodological exploration of generative artificial intelligence (AI) for efficient qualitative analysis on hotel guests’ delightful experiences. International Journal of Hospitality Management, 124(January), 103974. https://doi.org/10.1016/j.ijhm.2024.103974

87.

Taori

Hashimoto

(2023). Data feedback loops: Model-driven amplification of dataset Biases. Proceedings of Machine Learning Research. https://proceedings.mlr.press/v202/taori23a.html

88.

Tight

(2024). Saturation: An overworked and misunderstood concept? Qualitative Inquiry, 30(7), 577–583. https://doi.org/10.1177/10778004231183948

89.

Timmermans

Tavory

(2012). Theory Construction in Qualitative Research: From Grounded Theory to Abductive Analysis. Sociological Theory, 30(3), 167–186. https://doi.org/10.1177/0735275112457914

90.

Tracy

S. J.

(2010). Qualitative quality: Eight “big-tent” criteria for excellent qualitative research. Qualitative Inquiry, 16(10), 837–851. https://doi.org/10.1177/1077800410383121

91.

Tsamados

Floridi

Taddeo

(2025). Human control of AI systems: from supervision to teaming. AI Ethics, 5(2), 1535–1548. https://doi.org/10.1007/s43681-024-00489-4

92.

Tsou

J. Y.

(2025). Hacking on looping effects and kinds of people. https://philsci-archive.pitt.edu/25137/

93.

Vaccaro

Almaatouq

Malone

(2024). When combinations of humans and AI are useful: A systematic review and meta-analysis. Nature Human Behaviour, 8(12), 2293–2303. https://doi.org/10.1038/s41562-024-02024-1

94.

van Dijck

(2014). Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society, 12(2), 197–208. https://doi.org/10.24908/ss.v12i2.4776

95.

Vesterinen

(2021). Identifying the explanatory domain of the looping effect: Congruent and incongruent feedback mechanisms of interactive kinds. Journal of Social Ontology, 6(2), 159–185. https://doi.org/10.1515/jso-2020-0015

96.

Wang

Kordi

Mishra

Liu

Smith

N. A.

Khashabi

Hajishirzi

(2022). Self-Instruct: Aligning language models with generated instructions. arXiv Preprint. https://doi.org/10.48550/arXiv.2212.10560

97.

Wiles

(2025). Recursive cognition in practice: How AI dialogue generated and analyzed its own methodology. International Journal of Qualitative Methods, 24(September), 16094069251381709. https://doi.org/10.1177/16094069251381709

98.

Williams

R. T.

(2024). Paradigm shifts: Exploring AI’s influence on qualitative inquiry and analysis. Frontiers in Research Metrics and Analytics, 9, 1331589. https://doi.org/10.3389/frma.2024.1331589

99.

Zhang

Xie

Lyu

Cai

Carroll

J. M.

(2025). Harnessing the power of AI in qualitative research: Exploring, using and redesigning ChatGPT. Computers in Human Behavior: Artificial Humans, 4(2), 100144. https://doi.org/10.1016/j.chbah.2025.100144