Abstract
Debates about generative artificial intelligence (AI) have produced claims that such tools should be excluded from qualitative, and especially reflexive, research. This note challenges that conclusion. We argue that generative AI is not a single method but a heterogeneous set of transformer-based models, many open and configurable for scholarly use. Situating large language models within a longer lineage of computational text analysis, we suggest decoder-only models offer affordances aligned with qualitative epistemologies, including attention to context and holism. We outline how these models can support reflexive analysis without replacing judgment. We propose technological reflexivity to guide responsible, ethical use.
Keywords
Introduction
Reflexive qualitative researchers in sociology and beyond have been incorporating computational methods—methods from information sciences, natural language processing, machine learning, and computer science—for more than a decade (e.g., Abramson et al., 2025; Baumer et al., 2017; Brandt & Timmermans, 2021; DiMaggio et al., 2013; Ibrahim & Voyer, 2025; Mohr, 1998; Mohr et al., 2015; Nelson, 2020; Pardo-Guerra & Pahwa, 2022; Voyer, Kline, & Danton, 2022; Voyer, Kline, Danton, & Volkova, 2022). Up until a few years ago, this has happened largely uncontested but also, with notable exceptions, largely parallel to developments in non-computational reflexive qualitative methods (e.g., Braun & Clarke, 2012; Deterding & Waters, 2021; but see also Charmaz, 2025).
The release of ChatGPT in 2022 changed this by making computational methods more widely visible and accessible. The broad popularity of ChatGPT has brought computational qualitative scholars and non-computational qualitative scholars into a new, uneasy conversation. On one hand, some computational researchers asking qualitative questions approach their research from more of a data science background, handling questions of research quality and validity from what we might think of as a more traditionally quantitative perspective, and sometimes even arguing that only those with expertise in data science should use large language models (LLMs) for qualitative research (e.g., Azaria et al., 2024; Bano et al., 2024). On the other hand, many traditional qualitative researchers are skeptical of the use of computational tools. This has come out most visibly in two recent position statements related to the use of generative artificial intelligence (AI; the definition of which we return to below) in qualitative research. The first of these was published in Qualitative Inquiry and was signed by over 400 international qualitative researchers (Jowsey et al., 2025). The second position statement has been posted on the Social Science Research Network (Teixeira et al., 2025). The conclusion of both of these position statements is that generative AI should not be used in any capacity in qualitative research, and in reflexive qualitative research in particular.
The authors of this note, who have long been working with computational methods for qualitative research, read these other statements with interest but fundamentally disagree with the conclusions. We likewise disagree with claims that qualitative researchers must become data scientists to work with computational tools. This inspired us to write our own statement about the future of reflexive qualitative research, which the authors of this note are deeply invested in. We do not seek to refute prior arguments against the use of generative AI in qualitative research point by point, and we encourage readers to first read the two position statements listed above. Instead, we seek to provide an alternative perspective on what generative AI is, how it can reflexively be used in qualitative research, and why it is important that qualitative research experts be central to the continued development of computational methods, as qualitative scholars have been for the past 15 years.
This note proceeds in four parts. First, we explain why it is difficult to define generative AI (exemplified by LLMs such as GPT and Claude, and including open-source alternatives such as Olmo and Pleias), we situate generative AI in the timeline of computational text analysis methods, and we articulate the important differences within and between LLMs. We end this section explaining why generative AI (specifically the decoder-only transformer architecture, which we define below) is potentially a powerful tool for qualitative researchers. Second, even as we complicate whether generative AI has a clear definition, we discuss how qualitative scholars have been carefully testing the use of generative AI models for qualitative research. Third, we discuss best practices around using generative AI in reflexive research in particular, arising from this ongoing research. Fourth, we end with concluding remarks on the future of reflexive qualitative research as a methodological field.
What Is Generative AI?
Part of the challenge with productively discussing the potential use of generative AI in qualitative research is defining what generative AI actually is. Generative AI is not a precise technical term, but it is a phrase that has come to refer, loosely, to a collection of models developed via deep learning techniques and, in particular, the transformer architecture, used to generate media such as images, text, and audio. Decoder-only transformer models, the most common technology underlying what are commonly called “generative AI” models, are computer systems trained to produce text by predicting one word at a time based on what has come before (technically one token, or subword, at a time—hereafter we use the more technical term token). The base models are not searching a database of stored sentences, nor are they retrieving answers from the internet. Instead, they have been exposed to enormous collections of prior texts and have learned (in the technical sense) statistical patterns about how words tend to follow one another in different contexts. When you give such a model a prompt, it treats that prompt as the beginning of a sequence and then repeatedly calculates what token is most likely to come next, given everything it has been provided so far. It generates longer, coherent responses by making this prediction thousands of times in a row. It does not matter whether the model “understands” language in a human sense; what matters for qualitative research is that these models produce context-sensitive continuations based on learned regularities in past language use. Their mathematical basis, we suggest in this note, is potentially powerful for reflexive qualitative researchers (we reflect on these affordances in more detail in the section “Generative AI for Reflexive Qualitative Research” below).
Most users tend to interact with these models via natural language interfaces through browser-based chatbot-like interfaces. Developers and technical researchers, however, including qualitative researchers, interact with these models differently by treating them less like end-user chatbots and more like research tools that we can configure and adapt to our preferences. Instead of working through a web browser, we often use an application programming interface (API), which gives us control over available parameters and allows us to experiment with and test model output. And many of us use these models locally rather than sending data into the cloud. Running models locally gives us full control over where data are stored and how it is used and provides a way to systematically prompt and interpret output from models.
In addition, while the population at large tends to interact exclusively with commercial models, like ChatGPT, Perplexity, or Claude, many of us using these models for research turn to publicly available open-source models such as Llama, Gemma, Mistral, and Olmo, among others. Commercial models, as has been rightly pointed out by many (e.g., Spirling, 2023), including in the two position statements that motivated this response, are black boxes. We do not know the exact data they were trained on, we do not know the exact training process, including any post-training steps (e.g., instruction tuning), and we do not know what guardrails have been put in place that might impact output. The owners of these models are primarily interested in making profit, not serving academic interests. Such proprietary, black-box models are thus not ideal for academic research, especially qualitative research where we care about meaning and interpretation. Furthermore, because qualitative researchers often work with sensitive and personal data, data privacy is a primary concern and proprietary models often have questionable data practices.
There is, however, a growing community of researchers creating truly open language models that qualitative researchers can, and do, use instead. In fully open models, for example, the Olmo (Groeneveld et al., 2024) and the Pleias (Langlais et al., 2025) families of models, we do know what happened at every step of the process, from data to training to post-training. We can additionally download these models onto our own machines, meaning no data changes hands when we use them.
In short, generative AI is not any one thing. LLMs trained using transformer architectures are only one example of generative AI, and commercial models such as ChatGPT and Claude are not the only LLMs that we can use and are typically not the primary models used by academics, including qualitative researchers. If scholars want to claim that qualitative researchers should not use ChatGPT, we might tend to agree (though there are many cases where a scholar might want to use ChatGPT, e.g., if they want to study how ChatGPT may be impacting users). Indeed, quantitative researchers have made similar arguments for avoiding commercial models in quantitative research (Spirling, 2023).
But some critics have gone further, arguing that generative AI should be categorically excluded from qualitative research. This is a much broader claim, and one that we do not think is defensible. We think this specifically because of the way transformer-based architectures model language, which includes taking account of contextual cues and incorporating relationships between words across large amounts of text more wholistically. This way of modeling language is a resource for qualitative researchers, including, as we suggest below, supporting them in systematically probing for interpretive understandings.
Does Generative AI “Reason”?
If we accept the definition of generative AI that is commonly used, which is a collection of models based on deep learning and the transformer architecture that can generate media, what do these models do? From the perspective of the authors of this note, generative AI models so defined are simply the next stage of advancements on methods that computational qualitative scholars have long been incorporating into qualitative research (e.g., static word embeddings, unsupervised machine learning). The underlying technology of these models is not substantially different from earlier approaches—though it is improved in important ways—but the simple natural language interface of tools like ChatGPT, which allows users to enter a prompt and receive output that can read as impressively (if uncannily) human-like, has fundamentally changed how we interact with computational tools. We think it is helpful here to step back and situate generative AI in the past 15 years of advancement in text analysis tools, as they have intersected with qualitative research.
Topic models were among the first class of text analysis models that excited qualitative researchers. The special issue of Poetics published in 2013 captured this excitement (Mohr & Bogdanov, 2013) and conveys different ways qualitative researchers were thinking about how to incorporate topic models into qualitative analysis, even with a more critical lens (Mohr et al., 2013). Broadly, topic models are used to cluster words and documents in ways that can reveal patterns across a corpus, which are then used to do further analysis to understand specific phenomena and to generate theory. The excitement was never, nor is it now, that topic models could do the work of qualitative researchers, but that they are a potentially powerful method to help researchers do the work qualitative scholars want to do.
After the topic model wave came the static word embedding wave. Where topic models model text at the document level, word embeddings model at the word level. They are used to transform words into vectors, created based on how words co-occur with other words in a corpus. This allows qualitative researchers to understand relationships between words, especially along socially meaningful dimensions such as meanings attached to morality (Arseniev-Koehler & Foster, 2020), gender (Boutyline et al., 2023), immigration (Stoltz & Taylor, 2021) and immigrants (Voyer, Kline, Danton, & Volkova, 2022), intersectionality (Nelson, 2021), and class (Kozlowski et al., 2019; Voyer, Kline, & Danton, 2022).
Static word embeddings have known limitations. They do not handle polysemy (multiple meanings) well, struggle with out-of-vocabulary words and rare words, and fail to adapt to new language, assigning one fixed vector to a word regardless of its usage. The transformer architecture changed the way embeddings were created, dynamically adjusting embeddings based on surrounding contexts. For example, the word “state” can now have a different embedding in “nation state” versus “static state,” representing the context-specific polysemy of the word “state.”
We want to pause here to emphasize how powerful this move from static to dynamic embeddings is for qualitative scholars. Qualitative scholars are often interested in how words are used in the context we are studying, in what they mean for those using them, and in how and why meaning shifts across cultures, contexts, and over time. The transformer architecture builds that into the way embeddings are created. This is, potentially, a very powerful tool if our goal is to understand the meaning conveyed in the data we are trusted to analyze.
More practically, the transformer architecture was used to create, for example, the family of BERT models (based on encoder-only transformers). These models are very good at classifying (shorter) texts into categories and are much better at tasks such as classifying sentiment and stance (e.g., is a paragraph conveying a pro- or anti-immigrant stance, or is it neutral?) compared to previous methods (Bonikowski et al., 2022; Chausson et al., 2025). These tasks can be selectively useful for qualitative researchers, who do sometimes have specific things they want to know about a collection of texts.
Generative AI, such as the well-known GPT family of models, is instead built using decoder-only transformers. As described above, these models generate text token by token and, combined with additional techniques such as instruction tuning (a fine-tuning process in which models are trained on examples of prompts and appropriate responses to better follow user instructions) and reinforcement learning (a training process in which models are rewarded for preferred behaviors and penalized for undesirable ones), have led to the chatbot models we know today, among many other advances. This token-by-token text generation is remarkably good at things like text continuation and instruction following (more on how qualitative researchers can use this class of models below).
We went into the technical weeds for a specific reason. Common criticisms of this class of models is that they produce “incorrect” output, and that they do not “understand” text the way that humans do and thus should not be used in qualitative research. We suggest here that models built using the transformer architecture are powerful for qualitative researchers not because they excel at outputting facts (though they have proven to do this at increasingly higher rates). And, philosophical debates about sentience aside (as intellectually interesting as they are), these models are not powerful because they have human-like reasoning capabilities. They are powerful because they embed words based on their context-specific use, and they can do so at a very large scale. It is really that simple. This is why we believe generative AI should not be dismissed outright by qualitative researchers.
Generative AI for Qualitative Research
Why should we use generative AI models for qualitative research? Because these models skip the computationally heavy encoder stage, decoder-only models can scale over long sequences, and the attention mechanism (Vaswani et al., 2017) allows assigning differential weight to tokens based on how words relate to one another across a text. This has two main benefits for qualitative research. First, it allows qualitative researchers to analyze long sequences of text where the understanding of themes requires more wholistic reading of entire documents (Than et al., 2025). Second, qualitative researchers can better specify what they are trying to understand about long texts and tailor models to more complex understanding tasks by, for example, providing multiple examples or long, contextualized prompts (Ibrahim & Voyer, 2025).
These two features—long training sequences and complex prompting—are wonderfully aligned with qualitative commitments: that we can’t always understand the meaning of a text by working sentence by sentence, we often have to think about the document or corpus as a whole, and the meaning of the text that we want to use for analysis cannot typically be specified in simple information extraction frameworks. Decoder-only models are thus also aligned with qualitative epistemologies, and these models are already being carefully and responsibly tested for use cases important for qualitative scholars.
Practically, qualitative scholars have used generative AI for identifying complex themes that require wholistic reading of entire documents to extract (Than et al., 2025), for use in generating themes from a corpus of text (Ibrahim & Voyer, 2025), and for use as a check for how comprehensive qualitative researchers have been in identifying all relevant themes in a corpus (Meng et al., 2025).
Decoder-only models also support the processural aspects of interpretive work. Because qualitative analysis often proceeds iteratively as researchers clarify key concepts, specify boundaries, and refine what counts as key empirical evidence, decoder-only models can be valuable as tools for articulating, revising, and documenting analytic decisions in language while leaving the adjudication of meaning and warrants with the researcher (Ibrahim & Voyer, 2025).
From the standpoint of qualitative rigor, decoder-only models are especially useful. For example, they can be prompted to generate rival readings of the same excerpt under alternative theoretical lenses, to search in the data for negative or deviant cases that would challenge an emerging theme, and to identify ambiguities or missing evidence that would be required to support a stronger claim. As qualitative research tools, they can also support disciplined memoing and transparency by helping to link interpretive claims to the empirical material and serving as a record of how prompts, codes, and interpretations developed over time. Used in this way, generative models do not replace qualitative judgment, but they do help researchers systematically test the coherence, boundaries, and completeness of their interpretations.
Generative AI for Reflexive Qualitative Research
We readily acknowledge that there are bad and inappropriate uses of generative AI for qualitative research, and the two letters we are responding to cover important misuses. We do not believe it is good practice, for example, to just “ask ChatGPT” or another proprietary model and accept its output as something deserving of our attention. We should know the technique we are using and use it appropriately, incorporate accuracy checks into our process, and always maintain interpretive control. And there is by now enough careful methodological work by reflexive qualitative researchers (some of which was covered in the previous section) that, rather than foreclosing the use of generative AI altogether, we can now start to articulate best practices around the use of generative AI for reflexive qualitative research.
Ibrahim and Voyer (2025) frame the use of generative AI in interpretive qualitative research through the lens of technological reflexivity. Technological reflexivity is an extension of qualitative research’s primary approach to establishing research quality. While quantitative approaches to validity and research quality typically emphasize standardization, measurement reliability, statistical identification, and replicable procedures designed to minimize the influence of the individual researcher on results, qualitative researchers foreground reflexivity as a central criterion of rigor, making explicit how the researcher’s positionality, theoretical commitments, and interpretive decisions shape what counts as evidence and how claims are constructed. Extending reflexivity to the use of computational tools like generative AI means employing an explicit and ongoing attentiveness to how generative AI shapes analytic possibilities and constraints and how the researcher’s own choices in configuring and engaging the tool become part of the method.
Working reflexively with generative AI in qualitative research is something that traditional qualitative researchers are already prepared to do. A first requirement is to treat model bias as an ordinary and consequential feature of research with computational tools rather than an occasional error that can be estimated statistically. Because generative AI models are impacted by alignment and safety constraints built into their architecture and their training data, they can systematically privilege particular registers and normative framings while muting others and may perform unevenly across tasks. Qualitative researchers should therefore assume bias and develop a working understanding of what a chosen model tends to amplify or suppress and assess just as they consider their own perspective and work to mitigate their own biases and limitations.
Technological reflexivity further calls for oversight of generative AI in the same way one would seek to manage multiple qualitative researchers working on the same data and analyses. A holdout strategy operationalizes this by having the researcher independently analyze a subset of the qualitative data and then compare that interpretation to model-assisted analysis. Likewise, the same materials can also be used to compare alternative models or configurations and justify model choice.
Additionally, technological reflexivity highlights the importance of documenting researcher–model interaction, similar to the way the qualitative researchers account for their interaction with research participants. For example, the instructions (prompts) that researchers provide generative AI embed assumptions and steer the operation of the model. For this reason, prompt development should be recorded as an iterative analytic practice, analogous to refining an interview protocol. A codebook, another common element of qualitative research, can preserve prompts as they are tested and revised, document their effects, and archive chat records, making the researcher’s steering role visible and enabling others to evaluate the resulting interpretations.
Technological reflexivity also requires continual evaluation of generated output against the empirical record, the expertise of the researchers, and relevant theory. Generative AI cannot substitute for human judgment about evidentiary support, conceptual fit, and significance. Instead, interpretive control remains with the researcher, who uses model suggestions to extend, challenge, and improve their interpretation rather than to replace it.
Finally, as is commonly noted, reflexive qualitative research with generative AI must take account of our ethical commitments as researchers. Because proprietary, cloud-based systems may log or retain inputs, researchers must manage confidentiality and privacy risks, especially with sensitive materials. They should use local and open-source models when possible and adopt consent and ethics procedures that clearly disclose how data are processed, what protections exist, and what risks remain. Community-based generative AI projects, such as Pleias (Langlais et al., 2025), are experimenting with approaches that foreground data provenance, consent, and collective governance, treating training data as something that should be used with community participation rather than extracted anonymously. At the same time, smaller and more specialized models such as the Phi family of models (Abdin et al., 2024), among others, seek to reduce environmental costs by prioritizing lower energy use, local control, and context-sensitive performance over sheer scale. In short, there are a range of technical and ethical options from which researchers can choose in light of their own methodological commitments, including, quite naturally, not to use generative AI at all.
The Future of Reflexive Qualitative Research
As our tools change, our methodological repertoire changes. Qualitative research has never been static: it has continually adapted to shifts in the empirical material available to us and to the emergence of new techniques for engaging that material. The contemporary research landscape includes established and emerging computational tools that can meaningfully assist qualitative researchers. Transformer-based language models represent a striking development not because they “simulate” human understanding, but because they offer practical affordances that can extend long-standing qualitative aims, including sustained attention to context, sensitivity to meaning as relational and polysemous, and the capacity to move iteratively between close reading and broader pattern recognition. Their value, in other words, lies in their alignment with what qualitative researchers already seek to do: to trace patterns and themes across the complexity of our empirical material, build and refine concepts through engagement with that material, and articulate interpretive claims transparently.
Building responsibly on technological developments means treating them as part of an ongoing methodological lineage rather than as a rupture in which data science replaces interpretive qualitative inquiry. Generative AI is a tool that demands we engage in the methodological work necessary to specify appropriate use and to develop conventions for documentation and research ethics. This is precisely where qualitative expertise is indispensable. This moment requires qualitative researchers who can theorize how these new tools shape what becomes salient during the process of research discovery. We need qualitative researchers to identify and respond when model outputs are misaligned with local meaning. We also need qualitative researchers to lead in the design of analytic procedures that preserve interpretive control. In other words, qualitative researchers should come to the table as methodological experts who can define how these tools ought to be used in ways that are commensurate with our interpretive research aims.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
