Abstract
I offer this manuscript as a methodological provocation for qualitative health research at a moment when artificial intelligence (AI)–powered translation is becoming increasingly normalised in cross-language interviewing, transcription, and analysis. Rather than reiterate the now-familiar claim that “AI translation can be risky,” I argue that the central problem is epistemic: translation technologies are quietly becoming part of the infrastructure of qualitative knowledge production, yet remain methodologically under-disclosed and analytically under-theorised. Drawing on epistemic injustice and cross-language qualitative scholarship, I show how AI translation can flatten culturally saturated narratives. I then explain why Reflexive Thematic Analysis (RTA) functions as a methodological stress test in this terrain: when researchers rely on AI-mediated translations without robust human verification, the interpretive commitments of RTA become difficult to sustain. In response, I propose a guideline: a Minimum Disclosure Standard for AI-mediated translation (MDS-AIMT) in qualitative health research, specifying considerations for authors regarding the selection and use of translation tools, the languages involved, the verification processes employed, data governance protocols, the alignment of analysis methodologies, and the incorporation of AI-specific reflexivity in their work. Finally, I demonstrate a concrete hybrid “repair” strategy—returning to a vignette in which a culturally dense proverb is flattened into “she feels sad about medicine”—to show how AI can assist with access without becoming an unacknowledged co-author of the data. My aim is to move the field toward an ethic of epistemic accountability: technology as an assistant, not an author, and translation as an interpretive practice rather than a methodological footnote.
Keywords
Prologue
Every act of communication is an act of translation. —Gregory Rabassa (Rabassa, 1974)
A qualitative health researcher based in a high-income country (HIC) prepares for a remote interview with community health workers in a rural low- and middle-income country (LMIC). The project is ethically approved, the research question is important, and the researcher is not indifferent to the craft of qualitative inquiry—indeed, they have built their scholarly identity around it. Yet, the conditions of contemporary research practice press in from every side: finite budgets, fragile access arrangements, tight timelines, and institutional expectations to “make do” with available resources. The line item for professional interpretation was not retained in the funding round, and the bilingual research assistant intended to support the interviews is no longer available. The researcher is left with a choice that will feel familiar to many qualitative scholars working across languages: postpone and risk losing access altogether, or proceed and accept methodological compromise.
They proceed. A translation application is opened—not as a sign of carelessness, but as a pragmatic attempt to maintain a connection across linguistic distance. A community health worker begins to speak: quickly, fluently, with the cadence of someone who has told this story before, and with an idiom that carries more than information. It carries moral weight. The participant offers a proverb about scarcity and obligation—one of those expressions that compresses local history into a few words. A beep signals the translation has “worked.” She said she feels sad about medicine.
The sentence is intelligible. It is also strangely thin, a flattened echo of what was just said. Something is missing—not just “detail,” but the structure of meaning itself: the proverb’s social force, the emotional texture, the implied critique of systems, the culturally situated sense of betrayal or endurance. In the moment, the researcher hesitates. If they treat this output as data, then the translation app is not simply a neutral tool; it becomes a co-producer of what is thinkable, codeable, and ultimately publishable. The researcher is no longer interpreting participants’ accounts alone; they are interpreting an algorithm’s approximation of those accounts.
This is where the ethical and epistemological stakes sharpen. Cross-language qualitative research has long treated translation not as a mechanical step but as an interpretive act that shapes what counts as knowledge (Temple & Young, 2004; van Nes et al., 2010). Institutional ethics guidance, meanwhile, can be uneven: requirements to ensure that consent is understood may coexist with surprisingly limited and inconsistent guidance on what linguistic equity demands in practice, especially for participants who are already socially and politically marginalised (Perry, 2011; Squires, 2009). Into this gap steps artificial intelligence (AI)–powered translation: convenient, persuasive, increasingly normalised, and often treated in manuscripts as a minor methodological footnote rather than a substantive epistemic intervention.
The scene above is therefore not a caricature of “bad research,” nor an attempt to shame researchers working under real constraints. It is a description of a structurally produced dilemma—one that becomes more consequential when the people being interviewed are positioned as “vulnerable,” and when their accounts are the very material through which health inequities will be named and explained. In such contexts, the risk is not only mistranslation; it is epistemic injustice: the subtle ways certain speakers, languages, and forms of expression are rendered less intelligible, less credible, or less fully knowable within dominant knowledge-making practices (Fricker, 2007). When a proverb becomes “sadness,” when a culturally dense account becomes a generic affect word, whose reality is ultimately carried forward into the literature, and whose is lost?
This paper begins from that hesitation. It explores what is at stake when AI translation becomes the medium of qualitative encounter in healthcare research, and asks what methodological responsibilities follow when translation devices are not simply assisting communication but actively shaping the evidence on which our claims about inequity are built (Fricker, 2007).
Pearls, Piths, and Provocations: Why I Offer This Methodological Intervention
This manuscript is written for Qualitative Health Research’s Pearls, Piths, and Provocations section—an editorial space explicitly designed to foster debate about significant issues, communicate methodological advances, and invite discussion of provocative ideas that can move qualitative health research forward (Qualitative Health Research, 2025). In that tradition, I am not offering an empirical study with a conventional “methods-results” arc, nor am I arguing for a categorical prohibition of AI tools. I am identifying an emerging methodological challenge—one that is becoming increasingly apparent in manuscripts but is seldom theorised with the necessary rigour.
The possible objection is: who among qualitative health researchers is unaware that translation technologies can distort meaning? Yet, the problem I am tracing is not simply that AI translation can “go wrong.” The deeper issue is that AI translation is becoming infrastructural—treated as an unremarkable instrument that quietly moves from recruitment to interviewing to transcription to analysis, while the epistemic and ethical consequences remain “below the waterline.” This concern is echoed in recent scholarship on cross-language qualitative work, which argues that translation is not a technical step to be dispatched quickly, but a methodological and ethical site where meaning is inevitably reshaped and where AI may help or hinder depending on the purpose, context, and safeguards in place (Lingard & Klasen, 2025). The provocation here is therefore aimed at the field’s normalisation of AI translation as a routine methodological footnote rather than a consequential interpretive act.
This matters in qualitative healthcare research because language is not only a vehicle for information; it is part of the phenomenon. Accounts of illness, stigma, suffering, care, inequity, and system navigation are routinely communicated through idiom, moral language, hesitation, metaphor, and culturally saturated expressions. A second reason this intervention is timely is the broader transformation of qualitative practice through digital and remote modalities. Over the past decade—and especially since the COVID-19 pandemic—online qualitative research has expanded rapidly, creating genuine opportunities for access, participation, and methodological innovation (Carter et al., 2021; Howlett, 2022). At the same time, online approaches intensify long-standing concerns about exclusion, rapport, privacy, and inequities in participation, particularly for medically underserved groups and those with limited access to technology (Jackson et al., 2023). AI translation tools accelerate this remote turn by promising to make cross-language interviewing frictionless, but the removal of friction is not always a virtue in qualitative inquiry: sometimes, friction is what alerts us to interpretive danger.
Finally, this paper is designed to be useful, not only conceptually provocative. I aim to go beyond critique by providing a tangible contribution: a Minimum Disclosure Standard for AI-mediated translation (MDS-AIMT) in qualitative health research. The premise is straightforward: if AI translation is used, it should not be methodologically invisible. It should be disclosed in sufficient detail for readers, reviewers, and editors to evaluate its credibility, analytical coherence, and ethical integrity. In doing so, I build toward practical guidance on hybrid models (AI-assisted translation rather than AI-substitutive translation) and clarify why interpretive analytic approaches—particularly Reflexive Thematic Analysis (RTA)—function as a methodological “stress test” when the researcher is linguistically distant from participants’ accounts.
Methodological Innovation—or Shortcut?
Qualitative research in healthcare has long been valued for its narrative richness, contextual sensitivity, and relational depth. Immersive fieldwork and interpersonal exchange remain core elements of this approach (Lim, 2024). Over the past decade, global connectivity has enabled researchers to conduct interviews across time zones via video and telephone calls (Carter et al., 2021). The COVID-19 pandemic further accelerated this shift, compelling researchers to innovate and incorporate digital modalities into qualitative research processes (Howlett, 2022). On the surface, this may appear to be a democratising development. A doctoral candidate in a well-resourced institution can, in principle, engage participants across languages without the steep costs of travel, extended fieldwork, or professional translation. Yet, the methodological question remains: what kind of knowledge is produced when linguistic access is purchased through automation rather than through relationship, immersion, and accountability? The risk is not that AI translation necessarily produces unusable data, but that it can produce plausible data—data that read smoothly in the dominant language while quietly stripping away the interpretive density that constitutes qualitative evidence in the first place. In this sense, the methodological problem is not speed itself, but the possibility that speed becomes a substitute for the labour of meaning-making. If qualitative healthcare research is to maintain credibility while adopting new tools, then innovation must be paired with explicit standards for transparency, congruence, and epistemic responsibility.
Two Scenes, One Dilemma
To clarify the stakes when AI-powered translation enters qualitative healthcare research, I use two intentionally contrasted scenes. These vignettes are not meant to romanticise a “pre-AI” past or to portray contemporary researchers as careless. Rather, they illuminate a methodological pivot point: when translation becomes automated, the interpretive infrastructure of qualitative research changes—often without being acknowledged as such in study design, ethics review, analysis, or reporting. Translation has long been recognised as an interpretive act that shapes the final knowledge product (Temple & Young, 2004), and scholarship on cross-language methods has repeatedly warned that the “invisibility” of translation work undermines trustworthiness (Squires, 2009). The scenes below provide a concrete way to see how that invisibility can be amplified—rather than resolved—by contemporary translation technologies.
Scene 1: Pre-AI Era (Human Interpreters as Linguistic and Cultural Infrastructure)
A qualitative health researcher from an HIC travels in person to a rural region of an LMIC. Before any formal interviewing begins, the researcher spends time learning key phrases, reading locally situated accounts, and establishing relationships with community gatekeepers. A trained interpreter is recruited—ideally someone familiar with the health system context and accountable to both the research team and the community. The interpreter is not treated as a neutral conduit; they are positioned as a visible methodological actor whose interpretations influence what becomes recordable and analysable data (Temple & Young, 2004).
On arrival, the researcher learns that the “work” of translation extends beyond words. The interpreter explains when a phrase is idiomatic rather than literal, when an expression signals respect or resistance, and when silence communicates more than speech. The research team debriefs after interviews. They document moments of difficulty (“untranslatable” concepts, contested meanings, cultural references that do not travel cleanly) and create translation memos that remain part of the audit trail. This is slow work, but it is epistemically generative: it makes interpretation visible rather than automatic. It also aligns with long-standing methodological arguments that meaning—especially meaning embedded in metaphors and culturally situated narratives—can be lost or transformed when moved across languages (van Nes et al., 2010).
Crucially, this scene is not presented as “ideal” or universal. Cross-language health research has repeatedly shown that interpreters are often rendered invisible, their credentials unreported, and their influence on meaning unexamined (Squires, 2009). Interpreter positionality, power dynamics, and the risks of relying on untrained mediators have also been extensively discussed—particularly in work involving migrants and refugees, where interpretive mediation can carry legal, clinical, and ethical implications (Chiumento et al., 2018). What this first scene does, then, is highlight a methodological principle: good cross-language qualitative research makes mediation explicit and accountable.
Scene 2: AI-Powered Interviews (Machine Translation as an Unacknowledged Co-Author)
Now consider a second scene. The researcher does not travel. A remote interview is arranged—sometimes because budgets are limited, sometimes because access is politically fragile, sometimes because the study is designed to minimise travel and emissions. A participant is handed a phone preloaded with an AI translation app. The participant speaks, the device converts the speech into text, and the English text appears on the researcher’s screen. This seems efficient. It also produces a seductive form of “fluency”: communication appears immediate, as though the researcher and participant share a language.
Yet, the interpretive scaffolding present in Scene 1 is largely absent. The app does not explain why an idiom matters, what a proverb signals socially, or how certain metaphors encode local moral worlds. Instead, AI translation can flatten culturally dense expressions into generic categories—turning layered accounts into simplified statements that appear coherent but lose their analytic texture (Abjalova & Sharipova, 2024; Ji, 2023; Shahmerdanova, 2025). The translation may also standardise speech in ways that create a misleading homogeneity across participants’ accounts: different people, speaking from different contexts, can begin to “sound the same” once rendered through a single algorithmic pipeline.
In this scene, the risk is not only that the translation contains errors. The greater methodological danger is that the translation output becomes treated as the data itself, and the tool’s interpretive decisions become invisible. This is a well-known problem in cross-language research: when translation work is not documented, readers cannot evaluate how meanings are transformed across languages (Yunus et al., 2022). AI translation intensifies that invisibility because it produces polished text rapidly, often without leaving a transparent trace of uncertainty, alternative renderings, or culturally situated explanation.
Finally, AI-powered interviewing can quietly reshape who can participate fully. “Access” is not evenly distributed: app-based interaction assumes a degree of digital comfort, auditory clarity, stable connectivity, and ease in navigating technology under interview conditions. Those least comfortable with technology—or those who are illiterate, older, living with sensory impairments, or experiencing cognitive fatigue—may be most at risk of exclusion or misrepresentation when AI tools mediate their speech. In this way, AI translation can simultaneously promise inclusion while producing new layers of inequity.
One Dilemma: Frictionless Access Versus Accountable Meaning
The dilemma, then, is not “humans good, AI bad.” The dilemma is whether the field is willing to accept frictionless cross-language access at the cost of making interpretive mediation invisible—particularly in healthcare research where narrative nuance is central to what we claim to know. The first scene is slower and more resource-intensive, but it foregrounds translation as an interpretive and ethical practice. The second scene is efficient and scalable, but it risks delegating meaning-making to a system whose epistemic assumptions, training data, and translation priorities are rarely interrogated within qualitative manuscripts.
This is why the rest of this paper does not stop at warning. Instead, it develops an argument for methodological accountability: when AI translation is used, it must be disclosed, contextualised, and accompanied by safeguards appropriate to the analytic approach and the vulnerability of the population under study.
Epistemic Injustice in Translation
A central concept in this paper is epistemic injustice—a term that draws attention to how some people are wronged specifically in their capacity as knowers (Fricker, 2007). This risk becomes sharper when epistemic injustice is understood not only as a problem of interpersonal credibility but also as a structural phenomenon that shapes whose knowledge becomes legible and authoritative. In cross-cultural healthcare research, participants often describe deeply personal and context-specific experiences of illness and care, experiences embedded in local moral worlds, social expectations, and culturally shaped narratives (Woodland et al., 2021). These narratives deserve accurate and culturally informed representation. When an AI translator stands between participant and researcher, participants are effectively compelled to speak into an interpretive infrastructure that may not recognise their idioms, metaphors, or culturally specific concepts, and may “repair” their speech into standardised forms that are more palatable to dominant linguistic norms (Li, 2025). Translation is always an interpretive act, but cross-language qualitative research has long warned that when translation practices are rendered invisible, the trustworthiness of the knowledge claim is undermined (Temple & Young, 2004). AI translation can deepen that invisibility by producing text that appears polished and immediate, while obscuring uncertainty, alternative renderings, and culturally necessary explanations.
This is where epistemic injustice becomes tangible. Testimonial injustice can occur when machine translation renders a participant’s account incoherent, simplistic, or emotionally “thin,” leading the researcher—often unknowingly—to treat it as less analytically meaningful. A participant’s proverb, humour, irony, or culturally coded critique might be converted into a bland sentiment label, as in the prologue’s “feels sad about medicine.” The participant’s credibility is not challenged directly; rather, it is quietly diminished by the technological mediation that shapes what the researcher can hear and record. Hermeneutical injustice arises when the shared interpretive resources required to understand a participant’s account are absent, because the AI system cannot carry cultural meaning across languages, and the researcher lacks a linguistic or cultural route to reconstruct it. In such cases, the participant’s experience becomes partly unsayable within the translated dataset, not because it lacks meaning but because the tools and frameworks used to capture it cannot hold that meaning. This concern resonates with scholarship that explicitly links multilingual communication and translation practices to risks of epistemic injustice, particularly in relation to distress and wellbeing narratives (White et al., 2022).
The injustice is amplified by the political economy of language technology. AI’s training data tend to privilege widely spoken languages and standardised dialects, while minoritised dialects, local idioms, and low-resource languages are underrepresented (Helm et al., 2024). As a result, local expressions may be mistranslated, sidelined, or flagged as “unknown” (Ravindran, 2023). This does not merely create technical “noise”; it can enact a patterned form of epistemic exclusion, where only those experiences that are easily rendered into dominant linguistic categories survive the translation pipeline.
In a field increasingly attentive to epistemic justice, the crucial methodological question is therefore: what does it mean to claim fidelity to participants’ voices when those voices are algorithmically re-authored into the dominant language without transparent checks, cultural brokerage, or interpretive accountability? The answer cannot be a simple endorsement or rejection of translation technology. It must be a demand for explicit methodological visibility: what tool was used, for which languages, under what conditions, with what verification, and with what implications for analytic interpretation.
AI, Equity, and the Global South
Of particular concern is that many AI language systems have been developed and optimised in Western contexts, with disproportionate emphasis on major global languages and standardised linguistic forms (Kastrati et al., 2025). For rural communities in LMICs, local dialects and minority languages may not be represented adequately—if at all—in the data that train commercial translation systems. In such settings, translation errors are not occasional annoyances; they can be pervasive and systematic, especially when cultural meaning is expressed through idioms, proverbs, or locally anchored references that do not map cleanly onto dominant-language categories (Zhu et al., 2024).
There is an ethical irony here. Cross-cultural qualitative healthcare research often claims a justice-oriented purpose: to surface experiences that are otherwise marginalised, to inform equitable policy, to illuminate structural harm. Yet, if AI tools are used without culturally embedded mediation, the process can become epistemically extractive, capturing fragments of speech that are “AI-legible,” while losing precisely the contextual richness that would make the account socially and politically meaningful. The very populations positioned as “hard to reach” risk becoming easy to quote but hard to understand.
This is why epistemic injustice is not a rhetorical flourish in this paper; it is the conceptual lens through which I evaluate whether AI translation practices advance or undermine qualitative health research’s commitments to credibility, representation, and justice. The next section turns to the institutional and methodological conditions that are making “DIY cross-language research” increasingly common—and why speed, in this domain, can become a methodological hazard rather than a virtue.
The Rise of “Do-It-Yourself (DIY) Cross-Language Research”
I increasingly encounter a familiar line in contemporary qualitative manuscripts: “A translation app was used for participants who did not speak the researcher’s language.” This sentence is typically positioned as a neutral procedural note—almost an administrative aside—rather than as an epistemic and ethical decision that shapes what can be known, whose meanings are legible, and which voices remain intact through analysis and dissemination.
“DIY cross-language research” is not simply a matter of convenience; it is a mode of knowledge production. It tends to treat translation as a technical add-on rather than as a relational, interpretive, and accountability-laden practice. The result is a methodological drift: a normalisation of cross-language interviewing without the corresponding infrastructure (human, cultural, interpretive) that has historically been necessary to do such work responsibly and well.
What the Literature Shows About IRBs and Language Guidance
Any narrative suggesting that institutional review boards (IRBs) have historically provided consistent, stringent, or standardised guidance on language mediation is not well supported. The empirical literature points in a different direction: IRB approaches to “vulnerability” and non-dominant language participation have been variable, uneven, and often minimal.
For example, Perry’s analysis of 32 university IRB websites demonstrated wide variation in how “vulnerable populations” are defined and revealed that many IRBs offered little guidance on conducting research with participants with limited English proficiency (Perry, 2011). McMillan’s review of IRB policies at top NIH-funded research centres similarly found inconsistent practices and a lack of robust guidance on what constitutes valid consent processes for non-English-speaking adults, suggesting an institutional reluctance to engage deeply with the ethical underpinnings of consent in this area (McMillan, 2020).
Even where professional translation and interpreting are widely acknowledged as best practice in principle, translation ethics scholarship emphasises that “best practice” has never been synonymous with consistent implementation—and that ethical responsibility cannot be outsourced to institutional templates alone (Drugan, 2017). This matters because methodological minimalism thrives in spaces where oversight is ambiguous. When IRB guidance is thin, translation decisions become individualised, budget-driven, and often go unreported.
The practical consequence is that cross-language qualitative work can move forward with surprisingly little methodological disclosure: who translated what, at which stage, with what competence, and with what verification. This silence becomes particularly consequential in research with groups labelled “vulnerable,” where ethical frameworks demand heightened attention to power asymmetries and meaningful comprehension, not merely procedural consent (Chaar et al., 2025; Taquette & Borges da Matta Souza, 2022).
The Allure of Speed—and the Illusion of Adequacy
AI translation tools offer what qualitative researchers are often denied: speed without staffing, reach without travel, volume without infrastructure. But speed can create an illusion of methodological adequacy. A large corpus of translated transcripts can appear to embody “rigour” while actually masking interpretive fragility, especially when translation is treated as an invisible step rather than a consequential analytic transformation.
Qualitative quality is not produced by volume; it is built through relational depth, interpretive accountability, and trust—often cultivated slowly, and frequently dependent on cultural attunement (Tuck & McKenzie, 2015). Trust is not just a research instrument; it is part of what makes participant narratives possible and what enables interpretive nuance to surface. When translation is mechanised, researchers risk losing exactly the features that make qualitative inquiry distinctive: ambiguity, metaphor, emotion, and culturally saturated meaning.
This matters even more in research embedded in community-engaged or justice-oriented traditions. Ethical qualitative practice often demands reciprocal relationships, sustained accountability, and an explicit refusal of extractive knowledge production (Guishard et al., 2018). If translation apps enable researchers to “collect” narratives without building relationships or local partnerships, a new kind of extractivism becomes possible: rapid cross-language data harvesting under the banner of accessibility and innovation.
Speed can also reconfigure reflexivity itself. When the pipeline accelerates, reflection is often compressed. Methodological decisions that should be deliberative—who translates, what is lost, how meaning is verified—become tacit. In that sense, “DIY” becomes not just a logistical choice but a stance toward qualitative ethics: a willingness to tolerate uncertainty about meaning while still proceeding to publish claims.
AI Translation Is Not Uniform
One crucial point must be stated plainly: AI translation quality is not uniform. “A translation app” is not a method; it is a wide and unstable category.
There are meaningful differences across platforms (Google Translate, DeepL, Microsoft Translator, LLM-based systems), modalities (text-to-text vs speech-to-speech), languages and dialects, and context-specific vocabulary (general language vs medical or mental health terminology). Empirical evaluations in health settings repeatedly show that machine translation can produce clinically and contextually significant errors, particularly when conversations involve nuance or high-stakes content (Herrmann-Werner et al., 2021; Patil & Davies, 2014).
Recent work comparing tools also suggests variability in performance across translation systems, with errors that can alter meaning in medically relevant ways (Sebo & de Lucia, 2024). Even when translation appears “good enough” at a surface level, the more subtle losses—tone, register, idiom, implication—often remain unmeasured and therefore unaccounted for in qualitative interpretation.
Importantly, some studies highlight not only accuracy concerns but also user experience, trust, and appropriateness. For instance, limited-English-proficient patients have reported mixed perceptions of relying on Google Translate in clinical contexts—raising questions about whether machine translation supports comprehension or merely simulates it (Khoong et al., 2019). Broader reviews of machine translation in health communication similarly emphasise uneven evidence bases and the need for caution in deployment (Herrera-Espejel & Rach, 2023; Zappatore & Ruggieri, 2024).
The variability in AI translation efficacy is compounded by the fact that its performance depends on factors that are not uniformly distributed across contexts. Audio quality, background noise, speaker accent, dialect mismatch, and the presence of domain-specific vocabulary all influence output. The context of the interview matters: in-person interactions with a tech-savvy participant using a high-quality microphone are not equivalent to remote interviews conducted via unstable connections or shared devices.
Equity Impacts: Who Is Most Harmed by Variability?
The unevenness of AI translation quality does not land neutrally. It maps onto the digital divide—and therefore risks amplifying inequities.
Participants who are older, less digitally confident, or living with disability may face higher cognitive and practical burdens when AI tools are introduced into the interaction. Digital literacy is increasingly recognised as a social determinant shaping access to services and participation in health-related practices (Arias López et al., 2023). People with disabilities experience persistent digital inequities that affect access to and the usability of technology (Pettersson et al., 2023). Older adults’ experiences with digital health systems illustrate how usability, confidence, and design assumptions can exclude or exhaust users (Hvalič-Touzery et al., 2024).
When translation apps become part of qualitative interviewing, the risk is not only mistranslation; it is differential participation—who can speak comfortably, who feels surveilled by a device, who becomes hesitant, who gives shorter answers, who withdraws. This is where methodological choice becomes an equity issue.
The Translation Chain
A key reason “DIY” translation is methodologically slippery is that AI can be inserted at multiple points in the research process—often without clear disclosure. I call this the translation chain: a sequence of translation-mediated steps where meaning can shift repeatedly.
Rather than treating translation as a single event (“we translated interviews”), it is necessary to specify where AI is used, how it is used, and the degree of verification. Cross-language research methodologists have long emphasised that translation is interpretive and consequential—shaping not just words but analytic possibilities (Squires, 2009; Temple & Young, 2004; van Nes et al., 2010). In an AI era, this insight becomes even more urgent because translation can now occur rapidly, repeatedly, and invisibly.
To make the chain explicit, disclosure should distinguish at least six stages: • Recruitment and relationship-building: Was AI used to communicate eligibility, build rapport, or negotiate participation? • Consent and comprehension: Were consent forms or consent discussions AI-mediated? How was understanding assessed? • Interview interaction: Was AI used in real time to translate questions, responses, or follow-up probes? • Transcription: Were “translated transcripts” generated directly from audio? Was there a human checking? • Analysis and theme development: Were codes derived from translated text, bilingual text, or both? Who adjudicated meaning disputes? • Quotation and dissemination: Were the published quotes AI-translated? Were original-language quotations retained anywhere for verification?
Mapping the translation chain is not a bureaucratic detail. It is an epistemic audit: it shows where meaning may have drifted, where accountability sits, and how interpretive claims can be defended. It also raises a necessary methodological question that naturally follows into the next section: when the data are already filtered through AI translation, what happens to analytic approaches—such as reflexive thematic analysis (RTA)—that require deep immersion in participants’ language, tone, and cultural nuance?
Why RTA Is a Stress Test for AI-Mediated Translation
The previous section argued that AI can enter multiple points in the translation chain and that the location of AI use matters. This section extends that logic to the analytic stage: some analytic approaches are more vulnerable than others to AI-mediated distortions of meaning. I focus on RTA because it places exceptionally high demands on interpretive proximity, linguistic nuance, and the researcher’s immersion in participants’ meaning-making. In other words, if AI-mediated translation can “hold” under RTA’s assumptions, it likely can hold under many other approaches; if it fails under RTA, the failure is instructive for the broader field.
Why Focus on Reflexive Thematic Analysis
RTA is widely used in qualitative health research because it offers a flexible yet theoretically informed way to develop patterns of meaning (themes) from interview, observational, and textual data (Braun & Clarke, 2024b). Across healthcare and applied qualitative inquiry, RTA is frequently chosen precisely because it can address complexity—how people narrate illness, stigma, inequity, and care within particular sociocultural and institutional contexts (Trainor & Bundon, 2021). It is also common in health professions education and clinical research, where authors often adopt RTA to generate interpretive claims while retaining analytic transparency (Braun & Clarke, 2024a; Olmos-Vega et al., 2023).
This matters because RTA is not simply a “technique” for sorting text. It is built on a view of analysis as active meaning-making: the researcher is not a passive extractor of themes but a positioned interpreter who develops themes through deep engagement with data, reflexive memoing, and iterative re-reading (Braun & Clarke, 2023). In RTA, how language is used—metaphor, cadence, ambiguity, cultural reference, irony, silence—often constitutes the analytic material rather than being peripheral “noise.” This is why I treat RTA as a stress test.
If AI translation is used to mediate cross-language interviewing, the translation tool becomes part of the meaning-making apparatus. It no longer sits “outside” analysis as a neutral conduit; it becomes an upstream interpreter that shapes what is available to the researcher for coding and theming. This is why the current debates about AI and reflexive qualitative methods are directly relevant—even when the AI in question is “translation” rather than “analysis.” A recent and highly visible intervention by Jowsey and colleagues explicitly argues that reflexive qualitative approaches (including RTA) are human practices requiring a “subjective, positioned, and reflexive researcher,” and therefore, the use of generative AI is not methodologically congruent (Jowsey et al., 2025). While their target is GenAI used for reflexive analysis, the logic is instructive for AI-mediated translation: if reflexive qualitative analysis depends on situated human meaning-making, then upstream algorithmic mediation of meaning becomes methodologically consequential—not incidental.
Reflexive Thematic Analysis and the Problem of Distance
AI-mediated translation introduces distance precisely where RTA requires closeness. If the researcher is not fluent in the participant’s language and the transcript is generated by an app (or AI-assisted translation at the point of transcription), then the researcher is no longer reading the participant’s words. They are reading a model’s rendering of those words. That rendering may be competent at a literal, denotative level, but still flattens the affective and cultural texture that RTA often treats as analytically central.
Cross-language qualitative methods scholarship repeatedly shows that meaning can shift when translated, particularly when the source text contains idioms, metaphors, culturally embedded concepts, or context-dependent cues (Temple & Young, 2004; van Nes et al., 2010). Crucially, these shifts are often not random. They can standardise language, “smooth” ambiguities, or reframe culturally located expressions into dominant-language categories that appear coherent but lose local resonance. When RTA is applied to such translated data without human verification, the analysis is constrained by what the AI system recognises as translatable and by the linguistic norms embedded in its training data.
This is where reflexivity becomes difficult to operationalise. In a typical RTA workflow, the researcher reads, re-reads, and revisits data in relation to evolving themes, field notes, and analytic memos (Campbell et al., 2021). Yet, if the translation output is the only accessible layer of data, the researcher has limited capacity to ask a foundational RTA question: What else might this have meant? The act of reflexive interrogation becomes constrained because the researcher cannot reliably cross-check the transcript’s fidelity to the participant’s intended meaning—particularly when the participant’s meaning was carried in tone, cultural reference, or linguistic artistry rather than in literal propositional content.
Methodological Congruence: When “RTA + AI Translation” Becomes Incoherent
The question is not whether AI translation can ever be used in qualitative research. It is whether—and under what conditions—an author can plausibly claim to have conducted RTA when the primary data are AI-mediated translations that have not been robustly checked.
Methodological congruence in RTA depends on a basic alignment between (1) the claimed analytic approach, (2) the researcher’s relationship to the data, and (3) the interpretive warrant for the resulting themes. When AI translation is used without safeguards, that alignment becomes fragile.
RTA claims become methodologically weak (or incoherent) when most of the following conditions apply: (1) AI translation is used as the primary (or sole) bridge across languages during the interview and/or transcription, and the researcher does not understand the original language sufficiently to evaluate how meaning may have shifted. (2) There is no human verification layer (e.g., trained interpreter, bilingual co-researcher, cultural broker, or independent bilingual checking) to examine whether key passages, metaphors, emotional cues, and culturally embedded references were translated in ways that preserve analytic meaning. (3) There is no documented translation audit trail—no record of which segments were AI-translated, which were human verified, where discrepancies occurred, how disagreements were resolved, or which interpretive uncertainties remain. Cross-language research quality depends not on the fantasy of perfect equivalence but on transparent documentation of interpretive decisions (Temple & Young, 2004). (4) Themes are developed entirely from AI-translated text—without triangulation with audio, field notes, original-language excerpts, or member reflection in the participant’s language (Braun & Clarke, 2024b). In such situations, the researcher’s “immersion” is immersion in a translated artefact, not immersion in participants’ meaning-making. (5) RTA is primarily used as a label rather than as a clearly defined analytical approach. There is a lack of clarity regarding whether the analysis is mainly “small q” descriptive or “Big Q” interpretive (Braun & Clarke, 2024b). Additionally, there is minimal discussion of reflexive practices and limited consideration of how translation affects theme development. This is precisely where RTA’s increasing popularity can become risky: RTA language gets used, while the analytic practices do not match its epistemic commitments.
This is why I consider RTA a stress test. RTA does not tolerate hidden mediation well—because it explicitly positions meaning-making as relational, situated, and reflexive. If authors wish to retain RTA in AI-mediated cross-language studies, they need to meet a higher disclosure and verification standard than is currently typical in many manuscripts. If they cannot, then the analytic approach should be reconsidered (e.g., shifting toward more explicitly descriptive approaches or adopting a hybrid translation model that restores interpretive accountability). In short, the more interpretive the analytic approach, the more ethically and epistemically consequential the translation choices become.
A Cautionary Tale—and Where I Draw the Line
A Composite Scenario of Analytic Flattening
Imagine a “plausible composite” qualitative study exploring postpartum care experiences in an LMIC. The researcher conducts remote interviews across languages using an AI translation tool and subsequently applies an interpretive analytic approach—perhaps even RTA—to the translated transcripts. The manuscript reads smoothly. The themes are familiar: “Financial Hardship,” “Limited Hospital Resources,” and “Desire for Family Support.” Yet, the dataset begins to feel oddly uniform: participants’ accounts converge into similar emotional labels (“sad,” “stressed,” “worried”), similar moral framings (“family is important”), and similar system critiques (“resources are lacking”). The analysis appears coherent, but coherence can be a methodological mirage when translation has standardised what was originally diverse (Silu, 2024).
In this composite scenario, the problem is not that the researcher is careless. The problem is that the analytic substrate has already been shaped—quietly—before analysis begins. AI translation may compress culturally dense postpartum narratives into generic categories that are legible in the dominant publication language, but misaligned with the moral worlds in which postpartum care is lived: shame, obligation, spiritual interpretations of pain, culturally shaped norms of disclosure, and shifting power relations within households. When these worlds are translated into a narrow emotional vocabulary, the analysis risks producing themes that are not “wrong” in a trivial sense, but thinner than the phenomenon itself. The researcher—without language fluency or robust human verification—may not even know what was lost or what was substituted in its place (Xian, 2008).
What Is Lost—and What Is Silently Substituted
AI-mediated translation can remove or transform features that matter deeply in qualitative health research: idiom; temporality (“before/after” markers that structure illness narratives); hesitation and silence (often meaningful in stigma-laden topics); voice register; code-switching; culturally coded metaphors; and “untranslatable” terms that require explanation rather than replacement (Temple & Young, 2004). What is silently substituted is often a more standardised, publication-ready version of speech—one that appears clear while concealing uncertainty, cultural density, and interpretive plurality.
A Proportionality Principle for AI Translation Use
This leads to my guiding principle: the more interpretive the analytic approach, the more sensitive the topic, and the more “vulnerability-weighted” the population, the higher the standard for human mediation and verification must be. AI translation may be defensible for low-stakes logistical coordination, preliminary scoping, or limited clarifications, but it becomes methodologically dangerous when asked to bear the burden of meaning for interpretive claims.
Red Lines and High-Risk Contexts
There are contexts where AI translation alone should be treated as insufficient: trauma narratives; asylum/refugee contexts; stigmatised conditions (e.g., sexual health, mental health, and intimate partner violence); safeguarding situations; and settings where digital capacity is limited or uneven. Likewise, analytic approaches that rely on deep interpretive immersion—RTA, interpretive phenomenology, and narrative inquiry—should not proceed on AI-translated text without human verification and an explicit audit trail.
Vulnerability as an Ethical Amplifier
Finally, “vulnerability” should not be a label attached to participants; it must be operationalised as power asymmetry, heightened risk of misrepresentation, and heightened ethical demands for meaningful comprehension and the integrity of consent (Perry, 2011). In such contexts, treating AI translation as a minor methodological detail is not just a limitation; it is a credibility risk that the field can no longer afford to normalise.
Minimum Disclosure Standard for AI-Mediated Translation in Qualitative Health Research
If AI-mediated translation is used in qualitative health research, it should not be relegated to a single sentence—“we used a translation app”—that renders epistemic mediation invisible. Translation is not a neutral conduit; it is an interpretive act that shapes what becomes analysable “data.” In cross-language studies, trustworthiness depends not on a fantasy of perfect equivalence, but on transparent documentation of how meanings were moved, negotiated, and—where necessary—left unresolved (van Nes et al., 2010).
For that reason, I propose the MDS-AIMT (Supplemental File 1). Importantly, and in keeping with Qualitative Health Research’s position that checklist-driven evaluation can encourage mechanical “box-ticking” at the expense of substantive methodological reasoning, I present MDS-AIMT as a narrative disclosure guideline rather than a compliance checklist (Morse, 2021). The intent is not bureaucratic inflation. It is an epistemic safeguard: to make the “translation chain” visible so that readers, reviewers, editors, and ethics committees can assess credibility, evaluate analytic congruence, and appraise whether epistemic injustice is being inadvertently reproduced.
Hybrid Models in Action: Repairing “Feels Sad About Medicine”
A disclosure guideline is only valuable if it can be enacted in practice. To demonstrate the practical application of the MDS-AIMT narrative guideline, I return to the prologue’s moment: “She feels sad about medicine.” The sentence is plausible, but it bears the unmistakable signature of flattening—an affective label standing in for a culturally dense account. The methodological response is not to discard AI translation outright, but to treat the AI output as an interpretive prompt rather than as “data” that can be coded at face value. What follows is a worked hybrid model that makes translation mediation visible, accountable, and analytically congruent, while remaining feasible for many real-world qualitative health research contexts.
A Worked Example of a Hybrid Approach, Mapped to MDS-AIMT in Practice
Domain 1: Study Framing and Cross-Language Scope
At the design stage, the cross-language approach is explicitly stated: the participant and researcher do not share a primary language, and translation will mediate recruitment, consent, interview interactions, and quotations (MDS-AIMT Domain 1). Rather than presenting AI translation as a neutral convenience, its use is justified as a bounded response to resource constraints and limited access. Critically, the approach is designed to be hybrid: AI may support real-time comprehension, but it will not be treated as the authoritative account of meaning, particularly when participants are positioned within asymmetric power relations or described as “vulnerable.” This framing establishes, from the outset, that the goal is not frictionless communication but interpretive fidelity through accountable mediation.
Domain 2: AI Tool Disclosure and Interaction Conditions
In practice, the researcher documents the conditions under which the AI translation occurred (MDS-AIMT Domain 2): the modality (e.g., speech-to-text), the interaction format (remote vs. in-person), the operator, and the audio environment. These are not technical trivia; they shape what the system can “hear” and therefore what it can translate. When the app produces “She feels sad about medicine,” the researcher records an immediate reflexive memo: “Output appears semantically plausible but culturally thin; idiom/proverb may be collapsed; do not code as affect without verification.” This single memo operationalises the core move: the AI output is designated as provisional rather than authoritative. In qualitative terms, this is a micro-audit trail that marks a point of epistemic risk in the translation chain.
Domain 3: Translation Chain Procedures
Next, the researcher initiates repair during the interview itself (MDS-AIMT Domain 3). Rather than asking for confirmation (“Do you mean you feel sad?”), the researcher probes for meaning in the participant’s terms: “Can you tell me what that proverb means to you in this situation?” This invites elaboration of the moral world embedded in the proverb—often where the analytic substance resides—rather than forcing the participant into the AI’s affective category. If the participant responds with another idiom or a culturally specific reference, the researcher slows the pacing and uses strategic repetition, thereby allowing the translation tool to capture complete units of meaning rather than fragmented speech. Where possible, the researcher preserves original-language audio for later verification and creates a brief “untranslatables” note (e.g., terms without clean equivalents, culturally loaded phrases, and locally specific references). The aim is not to eliminate ambiguity, but to make ambiguity and translation loss visible rather than to smooth them away.
Domain 4: Verification, Analysis, and Reflexivity
Finally, the hybrid model reasserts interpretive accountability through targeted verification (MDS-AIMT Domain 4). Instead of translating the entire interview through a human interpreter (often infeasible), the researcher escalates high-stakes segments—metaphors, proverbs, emotionally salient passages, contested meanings—to a bilingual reviewer, interpreter, or cultural broker for post-hoc verification. This step is deliberately selective and analytic: it focuses verification on the interpretive stakes where they are highest. Where feasible, the researcher conducts member reflection in the participant’s language at the level of meaning rather than grammar: “Does this reflect what you intended?” The outcome is documented in translation memos: what was accepted, what was corrected, what remained uncertain, and how those uncertainties shaped the analytic claims. This is precisely the difference between “AI translation happened” and AI translation was methodologically managed.
What “Good Enough” Looks Like—and What It Doesn’t
A pragmatic thresholding guide follows from this hybrid enactment: • Low-risk use (often acceptable): logistics, scheduling, brief clarifications, and non-sensitive exchanges where translation is not the basis for interpretive claims. • Moderate-risk use (conditional): descriptive studies with bounded claims, where AI is paired with targeted human verification of key excerpts and a documented audit trail. • High-risk use (requires robust safeguards): studies involving vulnerability, trauma, stigma, safeguarding concerns, or interpretive analytic approaches (e.g., RTA, narrative inquiry, and interpretive phenomenology) without bilingual verification and transparent documentation. In these contexts, AI-only translation is not only a limitation; it is a credibility threat that can undermine analytic congruence and make interpretive claims untenable.
In short, the hybrid model demonstrates how MDS-AIMT functions as a guideline for narrative accountability: it does not demand perfection, but it does require that translation be treated as a visible methodological site where epistemic justice, ethical integrity, and interpretive rigour are actively protected.
Implications for the Field
For Authors
• Treat AI translation as a methodological decision, not an operational footnote. • Adopt the MDS-AIMT so readers can evaluate credibility. • Avoid methodological overclaiming: if the translation chain is AI-dominant and unverified, interpretive conclusions should be correspondingly restrained. • Build translation into study design and budgeting as an epistemic requirement, not a luxury.
For Reviewers and Editors
• Ask: Where was AI used in the translation chain? What was verified, by whom, and how? • Require an analytic congruence statement when interpretive methods are claimed. • Treat absence of disclosure as a quality issue, not a stylistic omission, especially in studies with marginalised or linguistically minoritised groups.
For IRBs and Ethics Committees
The literature suggests that many IRBs provide limited guidance on research involving non-dominant-language speakers. AI translation heightens the need for explicit ethical review questions, such as: • What tools are used and what data leave the device? • How is comprehension ensured beyond translated text? • What protections are in place for sensitive disclosures processed through third-party systems? • How will meaning be verified for analytic purposes?
For Training and Pedagogy in Qualitative Health Research
Methodological curricula now require translation literacy: training researchers to treat translation as interpretive infrastructure, to document uncertainty, and to recognise risks of epistemic injustice in cross-language work. This is not an “AI module”; it is a rearticulation of what qualitative rigour requires in multilingual worlds.
Concluding Reflections
Qualitative health research, at its best, is not simply about collecting words; it is about understanding worlds—moral worlds, social worlds, clinical worlds, worlds shaped by inequity. AI translation can increase access to those worlds, but it can also quietly reshape them by transforming what is sayable, what is recordable, and what becomes analysable. The core challenge is not whether AI translation is “allowed.” The challenge is whether we will accept AI-mediated accounts as if they were unmediated, and whether we will allow the convenience of frictionless translation to normalise epistemic thinning—especially for those already positioned at the margins of knowledge production.
In this paper, I have argued that we must shift from an era of casual disclosure (“we used an app”) to an era of epistemic accountability. Translation, whether human or machine, is an interpretive act; treating it as invisible invites epistemic injustice. RTA and other interpretive approaches make this particularly clear: the more an analytic method depends on nuance, cadence, metaphor, and culturally situated sense-making, the less tenable it becomes to outsource meaning to opaque systems without verification.
My forward stance is principled but not anti-technology: technology should be an assistant, not an author. The MDS-AIMT and hybrid repair strategies offered here are intended to make AI use visible, contestable, and improvable—so that innovation does not come at the cost of credibility. If qualitative health research is to remain a justice-oriented practice, then linguistic difference cannot be treated as a technical nuisance to be solved by automation. It must be treated as an ethical and epistemic site where our commitments to representation, rigour, and responsibility are tested and made explicit.
Supplemental Material
Supplemental Material - Making Artificial Intelligence (AI)–Mediated Translation Visible: A Minimum Disclosure Standard for Qualitative Health Research
Supplemental Material for Making Artificial Intelligence (AI)–Mediated Translation Visible: A Minimum Disclosure Standard for Qualitative Health Research by Animesh Ghimire in Qualitative Health Research
Footnotes
Ethical Considerations
This study did not require ethical board approval because it did not contain human or animal trials.
Author Contributions
Conceptualisation: AG; Formal Analysis: AG; Investigation: AG; Writing—Original Draft Preparation and Writing—Review and Editing: AG. The author approved the publication of the final version.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analysed in this study.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
