Abstract
We conducted a systematic search of qualitative research into the individual’s experience of chronic low back pain. Two reviewers independently read through 740 unique abstracts. Inter-rater reliability was fair. The final sample comprised 19 articles which we critiqued using the Critical Appraisal Skills Programme checklist. This article focuses on the critical appraisal. Limitations include a lack of an adequate rationale for the theoretical framework, a lack of an account for the decisions made across recruitment and data collection, and a lack of reflexivity. Finally we discuss and offer recommendations for reflexivity and the explication of qualitative methodology in research articles.
Introduction
The inception of this review initially came from curiosity. As a team of researchers and clinicians we were interested in the impact of qualitative research on clinical practice; specifically in relation to the work of the Pain Management department. We were interested in whether qualitative research made a difference in the knowledge base of health professionals. In the course of this review however, we became concerned that some aspects (representing the strengths) of qualitative research appeared to be neglected. In this article we first highlight some contemporary issues surrounding the selection and use of critical appraisal tools. We then describe our systematic approach to article selection and appraisal. We report on the results of the appraisal and subsequently discuss two of our main concerns: (i) that researchers are still explaining and justifying qualitative research; and (ii) that a core part of qualitative research, reflexivity, is not adequately evidenced in any of the articles we reviewed.
In line with other researchers (Miller, 2010; Noyes et al., 2008; Reid et al., 2009) we recognize the issue of quality appraisal in qualitative research is full of tension and ambiguity. Such debate is ongoing, though some argue for the urgent need to achieve some form of conclusion (Dixon-Woods and Fitzpatrick, 2001). There are already a number of articles whose authors offer an in-depth discussion of the key components that comprise good qualitative research. To avoid re-duplicating the literature we direct interested readers to articles such as Horsburgh (2003), Koch (2006) and, more recently, Sin (2010) for further information on quality criteria within qualitative research.
A gold-standard appraisal tool?
The evolutionary nature of the debate has not hindered the development of numerous appraisal tools. Indeed several authors point to the proliferation of qualitative tools that are available to appraise quality (Dixon-Woods et al., 2006; Noyes et al., 2008). Unsurprisingly, some researchers are now moving on from the question of how to appraise qualitative research, towards the consideration of which appraisal tool is the strongest. For instance Katrak et al. (2004) undertook a systematic review of both quantitative and qualitative appraisal tools in which they sought to identify the most appropriate items across a range of diverse (research) appraisal tools. Their initial search revealed 193 articles which potentially described an appraisal process and through refinement, they arrived at 121 tools, of which seven were qualitative. Katrak and colleagues emphasized that such tools should have an empirical base to support both their development and application. Furthermore they recommended that researchers should purposively select the appraisal tool according to the needs of the research. The latter suggestion is perhaps recognition that there can be no ‘one size fits all’ approach to quality appraisal.
In a more recent study, Hannes, Lockwood and Pearson (2010) sought to assess the extent to which three qualitative appraisal tools successfully took account of the validity of qualitative research, arguing that validity is akin to methodological rigor. In selecting tools for evaluation they considered whether the tool was free of cost and available online; its applicability to different qualitative approaches; its use in published syntheses of qualitative research; and the extent of organizational support and involvement (beyond “individual academic interest” (Hannes et al., 2010: 1737)). Of the eight tools they initially selected, Hannes and colleagues reviewed the Critical Appraisal Skills Program (CASP) tool, the Evaluation Tool for Qualitative Studies (ETQS) and the Joanna Briggs Institute (JBI) tool. Of these three the researchers concluded that the CASP was the “least sensitive to validity” (p. 1741). The JBI was the most sensitive.
The issue of context
Of course, one risks over-simplifying the issue by attempting to ‘arrive at’ a conclusion that a particular appraisal tool is the best one to use. Such a view is open to challenge. Many qualitative researchers hold anti-realist assumptions; the data they generate within their research is contextually bound to a particular time and space. This is no less true in the use of qualitative checklists. For instance Barbour and Barbour (2003) argue that concepts of trustworthiness are not rigid but are subject to change over time. Knowledge is subject to paradigm influence. That which is considered trustworthy in one era might not stand up to scrutiny in a subsequent community of practice. Barbour and Barbour (2003) also note how the use of qualitative appraisal tools in earlier years of qualitative research may have excluded seminal articles: an absence of techniques such as triangulation does not preclude a research article from offering valuable and groundbreaking insights.
If criteria for quality can change over time, they can also vary in the space of a single moment. Qualitative research is full of diversity. Indeed Dixon-Woods et al. (2004) argue that the quest for a universal checklist potentially undermines the diversity of qualitative methods and therefore the criteria to evaluate one type of approach might not be appropriate or relevant when assessing another. Unsurprisingly for qualitative researchers the notion of a ‘true’ gold-standard appraisal tool is unlikely to be congruent with an anti-realist approach to knowledge that many qualitative scholars hold. Certainly caution must be exercized when debating and selecting qualitative checklists.
A comparison of three appraisal tools
With this in mind we now discuss how we came to select the CASP for use in the present study. We began planning the protocol for this review in March 2009. As a useful starting point we consulted the Cochrane Handbook for Systematic Reviews of Interventions (Noyes et al., 2008) because this contained a useful chapter on qualitative research. We also consulted the website of the Joanna Briggs Institute, given its role in evidence appraisal. These led us to consider three appraisal tools which were free of cost and easily accessible; we considered the CASP (Public Health Resource Unit, 2006); the QARI Critical Appraisal Instrument (Joanna Briggs Institute, 2007) and a report entitled Quality in Qualitative Evaluation (QQE) (Spencer et al., 2003).
a) Critical Appraisal Skills Program
The CASP comprises ten questions addressing clarity of aims; appropriateness of qualitative methodology, research design, recruitment strategy and data collection method; consideration of reflexivity and ethical issues; rigor of analysis; clarity of findings; and the value of the research. Following each of the ten questions there follows a (number of) prompt(s) for the reviewer to use when addressing the question. The first two questions (which concern the statement of aims and the appropriateness of a qualitative methodology) act as screening questions. The reviewer can then choose whether to continue with the remaining questions. The priority of these two questions suggests they are important and integral to qualitative research, though we would question whether omitting an article on the basis of unclear research aims would be appropriate. In terms of completing the CASP, the first two screening questions invite a yes or no response. The remaining eight questions leave room for the reviewer’s comments.
b) QARI Critical Appraisal Instrument
The QARI tool, like the CASP, has ten items. These are presented as statements which the reviewer must decide are either present (yes), absent (no) or unclear. These items cover congruity between the philosophical perspective and the research methodology; congruity between the research methodology and: research aims, data collection methods, data analysis and presentation, and, data interpretation; whether the researcher offers a cultural or theoretical reflexive statement; whether the researcher addresses outside influences; the extent to which participant voices are adequately represented; whether the research was ethically conducted; and, the grounding of the conclusions in the data. There is room for the reviewers to sum and score the responses to these ten items and then make a decision to include the article. There is also room for the reviewer’s comments.
c) Quality in Qualitative Evaluation
Finally, we also considered the QQE. This report (and tool) begins with a seven page discussion about the context of the framework and the appraisal questions. Following this the authors offer 18 questions. Five of these questions consider the research findings in relation to their credibility, their contribution to knowledge, how they address the research aims, their ‘generalizability’, and, the application of critical appraisal. One question concerns the design rationale. Two questions consider the sample rationale. Another question assesses the conduct of data collection. A further four questions assess data analysis with respect to the clarity of the analytic approach, the retention of data context, the use of multiple perspectives, and, the communication of data complexity. Two questions consider the way the results are reported, including the grounding of interpretation in the data. The remaining three questions consider reflexivity, ethics and audibility. Following each of these 18 items there are a number of “quality indicators” which expand on the question. Finally, there is room for the reviewer to write any comments they have. Like the CASP, this appraisal tool does not score the article.
Rationale for the CASP
The selection of an appraisal tool is a subjective decision. Following a comparison of the three appraisal tools, we selected the CASP. Although any of the three we considered could have offered a useful framework to appraising quality, we felt the CASP occupied some middle ground. We acknowledge the work of Hannes et al. (2010) who suggest that the JBI (QARI) tool is stronger than the CASP in assessing validity. However their article was published over a year after the design of our protocol. Moreover our decision not to employ the QARI was influenced by the lack of guidance for each question: The availability of anchors can facilitate the critical appraisal, particularly for novice reviewers. The anchors help to reduce ambiguity surrounding the question so that a similar interpretation of the question can be achieved between two raters. As we, like others, (Dixon-Woods et al., 2004) felt the QQE was somewhat unwieldy, we settled on selecting the CASP.
Literature search methods
We undertook this review as part of assessing whether qualitative research made a difference to the knowledge of frontline clinicians. As chronic low back pain (CLBP) is a relatively common illness (NICE, 2009) with high costs at personal, social and economic levels (Miller and Timson, 2004; NICE, 2007; Reid, 2004) we settled on this topic. Specifically, our review sought to explore recently published qualitative research on the patient experience of living with CLBP.
Search strategy
Our search strategy was developed by BN in conjunction with a Consultant Anaesthetist (JR) and two Chartered Psychologists (RG & KL). In addition, we consulted a university information specialist for advice. We used a combination of keyword and thesaurus search terms across a broad range of health-related databases: AMED, BNI, CINAHL Plus with Full Text, EMBASE, IBSS, MEDLINE, PsychINFO and PsychARTICLES. We combined the search terms for PsychINFO and PsychARTICLES databases because they use the same thesaurus terms. Appendix 1 shows the search strategy for the CINAHL database as a typical example. The keywords used across databases remained the same, whereas the subject headings chosen differed according to the unique indexing of each database.
Selection criteria
We used the following selection criteria when assessing an abstract’s suitability for inclusion:
We defined recent research as articles published within the previous five years. Given the common nature of CLBP we anticipated a large amount of articles. By narrowing our search to recent research we were able to focus on contemporary and therefore more relevant articles. For ease of searching this period was considered to be from the start of 2004 until the search day, 12 October 2009.
We only looked at articles that focussed on the patient experience of CLBP. We decided to narrow our criteria to the patient experience because of the large number of qualitative studies that have been published in the field of CLBP. We believe this enabled the synthesis component to be more manageable. In defining the ‘patient experience’ we asked whether the article considers:
what it is like to live with CLBP;
how CLBP has impacted the life of the patient;
whether CLBP has changed the patient’s life story;
how CLBP affects different aspects of their “person” (e.g. their cognition, emotions, behaviour).
We defined chronic pain as pain lasting more than three months (Bond et al., 2006) in the lower back. Where there were several conditions alongside low back pain, we included the article if it was clear the majority (50% or more) of participants experienced CLBP.
We sought a definition of qualitative research and settled on Noblit and Hare’s (1988) broad and encompassing definition: the article was selected if it was primarily concerned with generating explanation and understanding that was derived in part from individuals’ reported experience and in line with an interpretive framework.
We excluded articles that were case- studies, systematic reviews or from dissertation abstracts international. Case-studies often do not provide sufficient information, while systematic reviews typically contain their own syntheses. We excluded articles from dissertation abstracts international as it was not feasible to request original dissertations.
Abstract review
The abstracts were reviewed by BN and ZR, both psychology graduates. BN has significant experience in undertaking qualitative research in chronic pain; ZR has clinical experience through working in a physical health psychology service. The reviewers independently reviewed the abstracts and decided whether they were suitable, not suitable or uncertain for inclusion according to the selection criteria we described above. Where there were discrepancies the reviewers resolved these through discussion. In cases that could not be resolved, the reviewers consulted a third party (RG/JR/KL). When there was a lack of clarity concerning whether the article met the selection criteria, one of the reviewers (BN) made contact with the author(s) of the article. Typical queries the reviewer made include finding out the number of participants in the article who experienced low back pain, and, the duration of pain chronicity. A good response was received from many of the authors he contacted. We removed duplicate abstracts prior to calculating the inter-rater reliability for the two reviewers.
Critical appraisal
The reviewers obtained full-text articles in cases where the abstract was initially judged as meeting the selection criteria. They subsequently excluded articles that did not meet the selection criteria. As per the QUOROM flowchart (Figure 1) the reviewers excluded a total of 19 full-text articles. The main reason for excluding articles was because the focus was not on the patient experience of CLBP. Some articles were excluded by the reviewers for several of the reasons listed in Figure 1 and so the corresponding numbers do not add up to 19 articles. The reviewers independently read the articles in conjunction with the CASP (Public Health Resource Unit, 2006) so that a critique of the article could be made. In line with others (Atkins et al., 2008) we adapted the CASP, replacing the CASP’s definition of qualitative research in question two with Noblit and Hare’s (1988) definition which we described earlier. This offered a more comprehensive understanding of qualitative research. Aside from this change we made no other alterations to the CASP tool. We did not exclude studies based on the result of the critical appraisal unless the authors inappropriately used a qualitative approach (CASP question two). We did not appraise excluded articles.

QUOROM statement flowchart.
The reviewers met to discuss their individual appraisal of each article. Together they went through their appraisal for each of the CASP’s ten questions and jointly, produced an appraisal summary for each article. Finally, using the summary of each individual appraisal, one of the reviewers (BN) produced a further summary for each section of the CASP’s ten questions. He re-read a selection of the articles, identifying their strengths and weaknesses. There were a number of reasons for the production of the additional summary. Primarily we sought to honour the work of the authors whose articles we reviewed; we wanted to produce a fair review and one that we had confidence in. We felt the joint appraisal alone, did not achieve this. The reviewers experienced the common research phenomenon of practice effect whereby the later reviews were stronger than the earlier articles they reviewed. Re-reviewing the articles allowed for greater consistency across all 19 articles. Given the fluid nature of the CASP prompts under each question, we found some details were not initially covered. The re-review allowed for such details to be obtained. Overall we believe this additional process enabled us to produce a thorough review, subsequently strengthening the credibility of the critique.
Reflections on the critical appraisal process
As first author and one of the main reviewers (BN), I wanted to offer some reflections on the process of undertaking the critical appraisal. The second reviewer (ZR) and I met up a number of times over a 12 month period. The first stage of the research was to decide which articles to request. This involved independently reviewing the abstracts, comparing our selection and then discussing our disagreements. The second stage was similar in many respects in that we independently read the 19 full-text articles, made notes on them using the CASP and then met to develop a synthesized critique.
As a doctoral student undertaking qualitative research, I had more experience in working with qualitative research than ZR; this certainly affected the dynamic between us in the early months of the research. I was aware of this and exercised caution in expressing my own voice, so that I facilitated my colleague to express her dissention where necessary. The earlier phase of the research was characterized by developing a joint understanding of how we interpreted the protocol. As time progressed we both developed confidence and ZR developed greater freedom in expressing her opinion. I believe my colleague helped bring a healthy balance to the process of discussion; as I naturally lean towards the critical side, it was a great strength to have a colleague bring another voice into the conversation and decision making process, particularly during the joint synthesis of the CASP critique.
The supervisory team (RG/JR/KL) also provided invaluable input. In the early days we drew on their depth of clinical and academic experience in resolving disagreements that we could not resolve ourselves. This occurred at various stages of the research process; for instance in understanding how CLBP was defined and in making key methodological decisions (such as inter-reliability calculations). Towards the end of the research process I encountered a further dilemma which I resolved with the support of my academic supervisor (RG). As a doctoral student, two of the articles we reviewed were authored by my future external examiner. Through liaising with my supervisor and the journal editor we were able to assess the potential impact on the critique that ensured it was fair and balanced. We are confident this impact is minimal and perhaps more importantly, that our disclosure allows us to maintain an integrity of transparency. Research can be replete with such dilemmas and we believe that reflexivity and audibility go a long way to successfully holding and managing these tensions. We encourage others to take similar reflexive steps.
Results
Results of abstract selection
The search yielded a total of 1482 abstracts (see Table 1). There were a total of 740 unique abstracts and 742 duplicate abstracts. One abstract was unintentionally treated as a duplicate and subsequently found to be a unique abstract. This was because there were two other updated abstracts as part of a Cochrane Systematic Review, all sharing similar titles and authors. In any case all three abstracts failed to meet the selection criteria. The first author carried out a re-checking process of the abstracts and found this to be an isolated incident. The inter-rater reliability was calculated without this abstract. The authors are confident this had no effect on the review. The kappa value for the two reviewers was 0.573. Orwin (1994, cited by Higgins and Deeks, 2008) suggests kappa values between 0.40 and 0.59 reflect fair agreement whereas values between 0.60 and 0.74 suggest good agreement. The reviewers’ agreement can therefore be considered fair, approaching good.
Breakdown of Resultant Abstracts & Articles Selected Across Databases
Note: aDuplicate abstracts were removed in a cumulative manner. Hence the BNI database contained 2 abstracts that were duplicated in the AMED database. In the same manner, there were full-text articles that were requested in earlier databases (i.e. CINAHL), that were also present in later databases (i.e. MEDLINE) but which were not counted for the sake of calculating inter-rater reliability.
Characteristics of final sample
Of the 740 unique abstracts, 38 full-text articles were requested. The reviewers independently read these and following discussion agreed that 19 of these met the selection criteria. Table 2 provides detailed information about these articles. For one of the articles (May, 2007), the reviewers considered an earlier article (May, 2001) referenced by the author as this contained more detailed information surrounding the method. A further article (Rahman et al. 2004, referenced by Harding et al., 2005) was also considered by BN during the summary process.
Full-Text Articles Selected for Critique & Synthesis
The majority of studies draw on participants from the United Kingdom. Three studies emanate from Sweden and two studies from Australia. The remaining two studies originate from the United States and from Holland. The mode publication year is 2007. The approach, data collection method and analysis method are presented separately given these are far from singular, unitary concepts. The most common theoretical frameworks used are narrative (5) and phenomenology (5) approaches. All but two studies used interviews to collect data, either individually or as part of a focus group: one used an open-ended questionnaire and the other used a cumulative group discussion. Some of the studies supplemented interview data with a questionnaire. Finally for data analysis, six studies used a form of phenomenology, six studies used a form of thematic analysis, four studies used a form of Grounded-Theory approach, four studies used a framework approach and one study used “common ideas in qualitative analysis (Dey, 1993)” (Hansson et al., 2006: 2186). These approaches were sometimes mixed.
Critique results
We wish to emphasize that our critique is based upon the words authors have used in their article. These words may or may not effectively represent the quality underpinning their research. Our discussion therefore reflects the tensions of the context in which qualitative research is published. Our intention is to stimulate discussion about how most effectively researchers can present their work to a research community that does not wholly accept qualitative research as a credible endeavour. We present the results of the critique as per the CASP headings. For brevity’s sake we reference individual articles using a study code, which is detailed in Table 2.
Occasionally we found articles did not put forward a strong rationale for why the topic was worth considering and furthermore some articles (e.g. S10) had unwieldy introductions: a more concise introduction would suffice and might better engage the reader. We believe article S7 offers a strong example of an introduction: The research aims are situated in the context of background literature and the literature review adopts at times, a critical stance. This critique helps form the rationale for the study.
In contrast we found on a number of occasions authors sought to explain the rationale or the philosophy of a qualitative approach per se. Such explication is not necessarily wrong or a shortcoming, however we believe this indicates qualitative research might continue to be less privileged by the research community. The ‘need to explain’ has the potential to consume words. Such words might be better directed towards the discussion of reflexivity. For instance in two of the articles the authors offer an outline of what qualitative research is (S12 and S17) and in three articles the authors describe the purpose of qualitative approaches (S7, S12 and S17). Others highlight the appropriate use of qualitative approaches for the study of human experience (S2 and S16). Authors of one article (S18) suggest the qualitative methodology they used (IPA) should not be seen as seeking to replace quantitative approaches. Finally one article (S11) describes how qualitative research attains credible data. We will discuss some of the concerns surrounding these findings later on.
Only four articles (S6, S7 and S18-19) offered a strong justification for both the underpinning methodology and the method used. Authors frequently offered a rationale for either method or methodology, but rarely for both. In some instances, authors detailed a theoretical framework but then used a different method of analysis. This is evident from the articles listed in Table 2. We believe authors need to discuss and justify how the method of analysis relates to their stated theoretical perspective of choice.
Articles such as S6 represent studies with a strong research design. Authors of such articles describe the aims of their philosophical approach with some historical context. Furthermore they justify their choice of approach specifically in relation to the research aims. They also justify their use of semi-structured interviews.
We felt one of the main shortcomings in this section was the lack of justification for the decisions made in the recruitment process. Five articles (S1, S4, S6, S9 and S11) could have provided greater detail of the recruitment process. In two articles the authors employed sampling processes which might not be considered appropriate for qualitative research. For instance S5 used a random sampling process and S13 used a systematic sampling strategy. In both cases it was unclear why these strategies were used. In contrast we did find that in eight articles (S2, S6, S7, S10-11, S13, S15 and S18) the authors described using purposive sampling processes, in one article (S3) the author described using theoretical sampling and in a further article (S4) the authors described the sampling strategy in terms of both of these. All of the 19 articles offered data on participant characteristics. This ranged from minimal information to detailed data presented in tables.
One of our main concerns arising from section five of the CASP was the lack of discussion surrounding data saturation. Only five of the 19 articles explicitly discussed saturation (S3-4, S9, S13 and S15). The principle of saturation is not necessarily pertinent to all research designs however we believe reference should be made to the choice of sample size (as S14 achieves) and how the decision to stop recruitment came about. In the case of an unplanned analysis (for instance, in the case of open-ended questionnaire items) we suggest that authors should confirm their belief (or not) in the saturation of the themes.
In contrast however we report that many of the articles discussed anonymity and confidentiality issues. Furthermore ten of the articles specified the name of the ethics committee approving the study (S1-2, S4-5, S8, S12, S15-17 and S19). One article (S6) put in place psychological procedures to protect participants who were distressed and another sought to emphasize the disconnected nature of the research from clinical activities (S2).
We found that bias was addressed in all of the studies either through the use of an independent researcher (S3-4, S6-7, S11, S14-15 and S17-18) through the use of multiple researchers undertaking an analysis (S1-2, S4-5 S8-10, S12 and S19), or through discussion with peers (S13 and S16).
The quotes chosen for inclusion were rarely given a rationale behind their selection. We found that three of the articles contained quotes which did not always appear to support the themes (S1, S4 and S15). In two articles (S13 and S16) we felt some of the themes were too broad or simplistic. In the majority of cases however we considered the quotes successfully supported the interpretation or themes.
Finally, we could only find direct evidence of contrasting data in three articles (S3-4 and S10) though a number of articles did offer evidence of data variation. We would caution placing too much confidence in this aspect of the critique however, given that the reviewers found it difficult to assess contradictory data and data variation.
Whereas authors often made reference to previous research in their discussion, many of these references were typically in support of their research findings, demonstrating how their study extends previous research. Only a minority of references within articles represented a challenge to the study’s findings. A more critical stance might be warranted in which authors both compare and contrast their findings with previous research. Finally we note in eight of the articles the authors did not appear to suggest areas for further research (S1, S3-4, S9-10, S15 and S18-19).
Discussion
Summary
Following an appraisal of 19 articles the reviewers identified a number of strengths within this sample. We believe most articles had clear aims, described in detail the questions researchers asked participants, contained research conducted to a good ethical standard, offered a rigorous analysis, sought to address bias, described their findings in a clear manner and proffered clinical implications. We note a number of key limitations however. Researchers typically failed to offer an adequate explanation and rationale for the guiding theoretical framework used, they did not give an adequate account for the decisions made in the recruitment and data collection process, they often did not describe either saturation or how the decision to stop collecting data was made, and finally, they failed to demonstrate adequate reflexivity throughout the research process.
Are we thinking qualitatively?
This review has led us to question why researchers feel the need to explain qualitative methods. The following points highlight some of our concerns arising from the review:
The ‘need to explain’ qualitative methodology and the use of inappropriate sampling strategies left us with the impression that some researchers are still influenced by positivist ideas. Detailed explication of qualitative methodology is perhaps unnecessary; qualitative research is after all an established discipline. For instance some (Miller, 2010) note within nursing practice, qualitative approaches have become the “standard way in which researchers generate knowledge” (p. 193). Similarly others (Pope and Mays, 2009) suggest qualitative research has become well-accepted within research conducted in the health services, with the focus moving towards concerns surrounding rigor and quality. We therefore believe researchers need to move away from justifying qualitative approaches and instead explain why their methodology is the appropriate approach for the research question.
We occasionally found that researchers did not follow the methodology through into all aspects of the research process. This is evident to the reader from Table 2. There are many studies here which draw on influences (in their approach to the research) which are not followed through in to the data analysis. A rigorous qualitative study cannot employ a “cook-book” approach to study design without thoughtful justification. We believe researchers need to avoid the mindset that “anything goes” in the design process of qualitative research and employ strong justification in outlining their approach.
Only a minority of studies described how saturation was achieved. While accepting that not all qualitative methods call for data saturation, we suggest researchers do need to articulate clearly how they reached a decision to stop data collection, or in the case of retrospective data, how confident the researchers are that the themes are well-saturated. Of course unsaturated themes do not make for a worthless study. The richness of qualitative data allows for valuable insights to be gained without attaining saturation. However researchers must be transparent in discussing their recruitment process and how they decided to stop collecting data.
Finally, the reviewers were concerned about the use of independent checks to prevent bias; either in the form of multiple coders, using an independent coder or seeking external advice. Although efforts to privilege the participant’s voice should be applauded, it was not evident to us that independent checks always had this purpose. The notion of multiple coders has echoes of positivism in which there is a single reality to be uncovered. Our concerns lie in the apparent unacceptability of a single researcher’s interpretation. Done with sensitivity, adequate reflexivity and within a constructivist framework, the interpretation of a single researcher should be considered an acceptable qualitative approach. For instance Daly (1997, cited by Horsburgh, 2003) believes that “all facts are interpreted facts” (p. 308). He also argues for the importance of “preserving” the voice and meaning of the participant. We therefore believe it is more important to deliver a single interpretation that contains the meaning and nuances (as close to those) intended by the participant, than to have multiple interpretations that potentially fail in this.
Reflexivity: are we practicing what we preach?
Reflexivity is an integral part of conducting good qualitative research. Furthermore there are different approaches to reflexivity. For some it is about minimizing the impact of the researcher on the data (Sin, 2010), whereas for others, reflexivity is about making use of subjectivity (Gough, 2003). This stands in contrast to research located within a positivist framework, where subjectivity is seen as a threat to validity. Gough suggests reflexivity “facilitates a critical attitude towards locating the impact of research(er) context and subjectivity on project design, data collection, data analysis, and presentation of findings” (p. 22). In the current review we were surprised at the lack of reflexivity demonstrated across the 19 articles. In the few articles that were reflexive we occasionally felt these had positivist overtones, suggesting different perceptions about the function of reflexivity. There was not a single article that we considered an exemplar of reflexive practice.
In the course of her clinical work, one of us (RG) came across a recently published article discussing the experiences of visually impaired individuals. This article might be considered such an exemplar. Thurston (2010) offers a detailed reflexive account following her results. She became aware of her own assumptions and prejudices through the process of journaling and bracketing. She acknowledges the potential effect of such bias on her data. Thurston then describes the impact of the research on herself, noting how it has challenged the way she sees herself as a blind individual.
A consequence of failing to be reflexive is to risk obscuring the subjective. Researchers might subsequently present their approach, intentionally or otherwise, as objective. Even within this review the authors made subjective decisions: The selection criteria consist of decisions jointly made based on clinical and academic experience; nevertheless they are subjective calls. Horsburgh (2003) might agree, noting “reflexivity refers to active acknowledgement by the researcher that her/his own actions and decisions will inevitably impact on the meaning and context of the experience under investigation.” (p. 308). We believe that researchers must acknowledge and embrace the subjectivity within their work. This will help form the basis of rigorous qualitative research.
Through presenting preliminary results at the 16th Annual Qualitative Health Research Conference, we gained insight into some of the possible explanations for the lack of reflexivity. Delegates highlighted the restrictions that journal editors place on manuscript length. Reflexive accounts therefore might be cut short or omitted because of word limit restrictions. In other words the researchers may have taken reflexive steps in the research process but may not have reported doing so in their article. There is then the difference between the conduct and the reporting of the study. A limitation of evaluating the conduct of the study based on the report of the article is that a critique can only be made based on the article contents.
It is not just qualitative work that could benefit from evidence of reflexivity. The notion of reflexivity appears to be alien in quantitative research. If reflexivity benefits research through facilitating a critical examination of how the researcher interacts and influences data, then surely quantitative researchers would also do well to embrace a reflexive approach? Quite how this could be incorporated into a very different paradigm remains to be seen, but it is a debate that is yet to be seriously entertained.
Although we have advocated in this article for greater evidence of reflexivity, we do not pretend that reflexive practice is without its difficulties. One of the dangers of incorporating reflexive narratives into our work is to risk disengaging other parts of the research community for whom personal narrative is synonymous with stream of consciousness. This risks the perception that such work is inadmissible or indeed irrelevant. Such a risk is perhaps acutely felt in auto-ethnographical work (e.g. Holt, 2003) where narratives of the self are (for some, questionably) used as data.
A further problem of open reflexive accounts is to risk unnecessarily undermining the quality of the research. In a culture where reflexivity is not yet widely embraced, to expose all of one’s subjectivities, internal tensions and weaknesses is to create an unlevel playing field among the many articles that do not undertake this practice. One solution might be to make available reflexive notes online, in a forum detached from the journal article yet referenced within it. This would allow reflexivity to be evidenced while removing such exposition from the peer-review process. There are certainly challenges for the reflexive researcher to manage and we acknowledge that incorporating reflexivity is no easy feat.
Limitations
The authors sought to design and implement a rigorous and trustworthy review process. The audit trail, maintained through field notes by one of the reviewers (BN), helped towards assuring this. However the strength of the review was limited in a couple of respects. Firstly, upon reflection we feel the appraisal process could have benefited from the involvement of another experienced qualitative researcher. Although the main reviewer is a doctoral researcher engaged in qualitative work, the second reviewer (ZR) lacked experience in qualitative research. Despite this, we consider her clinical experience within physical health psychology particularly helped with identifying the clinical utility of the research studies. Furthermore we believe she brought a healthy challenge to the joint appraisal discussions, enabling a more balanced approach to be achieved. Secondly, we were disappointed not to have achieved a higher inter-rater reliability. The fluid definition of “patient experience” might have proved too loosely defined and was therefore partly responsible for a lower rate of agreement than we had hoped for; it was often difficult to determine from the abstract alone whether an article was appropriate.
Recommendations
Given qualitative research is an established field, researchers might do well to consider before describing what qualitative research is, whether their scientific counter-parts would use valuable manuscript space describing more established methodologies such as the RCT. We suggest there is scope for researchers to grow in using qualitative approaches in a confident and rigorous manner. The research community must resist the urge to justify qualitative methods. When editors ask researchers to describe qualitative approaches, authors should consider offering a rebuttal, drawing on the literature that demonstrates the established nature of qualitative methods.
Moving on to reflexivity, more needs to be done on a number of levels. We suggest that reflexivity needs to be more clearly evidenced in articles. However this forces difficult decisions about what to leave out, in order to meet the word length requirements that journals require. At the 16th Annual Qualitative Health Research Conference delegates suggested that researchers need to “lobby” journal editors. Indeed editors of both qualitative and quantitative journals need to develop an expectation of reflexivity within research articles. Reviewers similarly must welcome open and honest accounts of reflexivity, even if this appears to acknowledge what might be seen as “biased interpretations”. Finally, we believe authors must ensure their research is both conducted reflexively and reported in a reflexive manner. Granted not all editors will be open to devoting journal space to accounts of reflexivity. However a short two-line reflexive statement is enough to demonstrate authors have considered the impact they, as researchers, have had on the research process. Perhaps the extra space will come from removing the statement, “it’s only a qualitative study” (Morse, 2008).
Conclusion
There is a substantial body of qualitative research focusing on CLBP; the majority of which, in this review, stems from the United Kingdom. The bulk of the articles we reviewed describe research conducted to high standards with findings that can be transferred into the clinical context. However the lack (in substance or in reporting) of some core qualitative principles is of concern. For qualitative research to be considered credible, those using its principles need to offer evidence to demonstrate their work is rigorous and trustworthy.
Footnotes
Appendix 1
Search Strategy Used With CINAHL Plus With Full Text
Acknowledgements
We acknowledge and thank Stephen Gough, Liaison Librarian at Birmingham City University, for his guidance in the protocol design; Dr Sayeed Haque, Consultant Statistician, for his support with calculating inter-rater reliability; we thank the two anonymous reviewers and the journal editor for all their invaluable input; we also acknowledge that parts of the critique contained within this article were presented in rudimentary form at the 16th Annual Qualitative Health Research Conference, 3rd-5th October 2010 in Vancouver, and thank the delegates for sharing their thoughts.
Declaration of Interests
The authors declared no conflicts of interest with respect to the authorship and/or publication of this article.
Funding
Ben Newton is a PhD student at Birmingham City University and has been funded by a bursary.
