Abstract
An important aspect of conducting peer observation of teaching (POT) is the observational form that guides the observation process. Although several published POT forms are available for faculty to use, these forms tend to be unique to the academic discipline, with none originating from the perspective of faculty in the field of public health. As such, the purpose of this study was to create a new POT form grounded in the published literature that could be used for in-person classes for a newly established Department of Public Health. This study used a modified Delphi method with 10 faculty experts in the department, ranging in academic title and years of experience in teaching, observing others’ teaching, and being observed as a teacher. A foundation of 657 relevant items found in 33 published POT forms were used for six phases of anonymous surveys, group discussions, and pilot testing. The final POT form consisted of 22 new items that represented over 30 categories of established, observable teaching practices, and a dichotomous measurement scale with open ended text boxes. The study underscored previous research regarding the utility of the Delphi method, which included allowing for instructor buy-in on the items that would be used to evaluate their own teaching practices.
Introduction
In 1994, the American Association of Higher Education (AAHE) made a noteworthy step toward promoting peer review of teaching at the college level. Specifically, the AAHE sponsored the project “From idea to prototype: The peer review of teaching” to promote peer review and enhance faculty members’ teaching skills (Hutchings, 1995). Twelve universities from across the United States participated in the project and included “teams” of two faculty members per department from a variety of academic disciplines (e.g., chemistry, mathematics, English, history, music, business, engineering, nursing) to develop various peer review practices (e.g., reviewing course syllabi, observing teaching; Gabaccia, 1996).
During that same timeframe, the University of North Carolina (UNC) System also made an important decision regarding peer review of teaching. Sparked by a controversial tenure decision at the University of North Carolina at Chapel Hill, the Board of Governors questioned how teaching was evaluated in the tenure process (Gabaccia, 1996). Based on recommendations from committee members to improve faculty teaching practices, the Board of Governors required that all non-tenured faculty be directly observed during classroom teaching (University of North Carolina System, 1993).
In the decades following the AAHE and UNC System’s promotion of peer observation of teaching (POT), research studies found benefits to teaching observations at the college level. Several reviews of the published literature indicated that POT can improve college instructors’ knowledge of teaching practices, confidence in teaching, adoption of new teaching skills, and critical self-reflection of teaching (Corcelles-Seuba et al., 2025; Cutroni & Paladino, 2023; Johnston et al., 2022; Thomas et al., 2014; Zeng, 2020). In addition, POT can provide benefits at the departmental level, including the establishment of minimum teaching standards, professional development of junior faculty, and improved collegiality among peers (Corcelles-Seuba et al., 2025; Cutroni & Paladino, 2023).
An important aspect of conducting POT is the observational form that guides the observation process. Based on the popular “collaborative model” of POT, a faculty member is observed by their peers in a non-judgmental manner to promote faculty discussion, reflection, and improvement of teaching (Gosling, 2014). The collaborative model of POT tends to consist of three phases (G. A. Martin & Double, 1998). First, during the pre-observation meeting, the faculty member being observed (i.e., the “observee”) and those conducting the observation (i.e., the “observers”) discuss the context of the course, the focus of the observation, the day and time of the class session to be observed, and which POT form will be used for the observation. Second, the POT takes place in which the observers unobtrusively watch the class and complete the POT form. Third, during the post-observation feedback meeting, the faculty observee and observers use the completed POT form to discuss what was observed and to provide constructive criticism of how the observee’s teaching practices could be improved (G. A. Martin & Double, 1998).
Given that observational forms are a vital component during the second and third phases of POT (G. A. Martin & Double, 1998), choosing a POT form that best fits the course that is being observed is an important step in the POT process (Siddiqui et al., 2007). The literature suggests that there are several POT forms that have been published and are available for faculty to use (Dillon et al., 2020; Otero-Saborido et al., 2024); however, these forms tend to be unique to academic disciplines, especially in the fields of science, technology, engineering, and mathematics (i.e., STEM fields; Dillon et al., 2020). Although public health borrows from a variety of disciplines, it is a unique field that is grounded in the teaching of others, which has warranted the need for its own dissemination of scholarship of teaching and learning (Gambescia, 2015), including a POT form for faculty in the field. The POT form developed by Jia et al. is perhaps most relatable to the field of public health, as it was developed for faculty at an inter-professional school of health professions; however, that POT form was specifically designed for evaluating faculty who teach online courses (Jia et al., 2025). Other POT forms have been developed to assess faculty who teach in any discipline, but those forms have been limited by their specific focus (e.g., inclusive teaching practices [Addy et al., 2023], teaching in classrooms physically designed to enhance active learning strategies [Birdwell & Harris, 2022], teaching techniques to elicit student participation [Nunn, 1996]), and their use of excessive assessment items (Murray, 1983), which could make POT a cumbersome task (Otero-Saborido et al., 2024).
As such, the purpose of this study was to develop a POT form based on the perspectives from faculty in the field of public health. In July of 2025, our institution formed a new Department of Public Health. As part of being a new department, the faculty were required to develop their POT protocols per UNC System requirements, which includes determining which form would be used for POT. Rather than using an existing POT form originating from a different academic discipline, or general POT forms with a narrow focus, the faculty aimed to create a new POT form for in-person classes that might be useful for those teaching in the field of public health, and yet also grounded in the published literature.
Methods
This project was conducted as a modified Delphi study. The Delphi methodology was developed during the Cold War by the United States Air Force in collaboration with the RAND Corporation, in which seven experts were asked to estimate the number of bombs that the Soviet Union needed to eliminate strategic targets in the United States (Dalkey & Helmer, 1963). Aiming to reach a consensus, several phases of anonymous surveys were used to gather the experts’ opinions, with the survey results provided to experts at each phase to help inform their answers based on the opinions of the other experts. By using anonymous surveys, rather than in-person discussion, the study avoided direct confrontation among experts, and allowed for independent thought by reducing the potential for experts to persuade each other’s opinions (Dalkey & Helmer, 1963).
The Delphi method, and modifications to the method, have since been used by scholars from several academic disciplines to determine a level of consensus among experts on a variety of issues. Researchers from a wide array of disciplines have made use of the Delphi method, including education (Green, 2014) and health sciences (Nasa et al., 2021; Schifano & Niederberger, 2025). Although the Delphi method has been frequently modified to fit the topic and purpose of a study, common elements of a Delphi study include selecting several experts on a topic, conducting iterative phases of anonymous surveys, providing experts with anonymous results from each survey phase, and reaching a determined level of consensus (Keeney et al., 2001; Nasa et al., 2021; Niederberger & Spranger, 2020; Schifano & Niederberger, 2025).
Selection of Experts
The Delphi “experts” for this project were the 10 current faculty for the new Department of Public Health, who ranged in academic title and years of teaching experience. Although additional research is needed to learn more about the optimal number of experts for Delphi studies, based on Shang’s (2023) narrative literature review on the topic, it is recommended that at least eight experts be included in a Delphi study. The faculty were considered experts based on their years of teaching at the college level, the number of times they have observed their peers’ teaching, and the number of times that their teaching was observed by their peers (Shang, 2023; Table 1). They were also considered to be representative experts, as they would eventually be using the POT form as observers and as observees.
Experts’ Academic Title, Years of College-Level Teaching Experience, Number of Times Serving as an Observer to Peers’ Teaching, and Number of Times Teaching was Observed by Peers.
Phase 1
The first phase of the Delphi study consisted of finding items that were previously used in POT forms. In January of 2025, the authors collected college-level POT forms that were published in the peer-reviewed literature. POT forms were located using the databases: PubMed, the Cumulative Index to Nursing and Allied Health Literature (CINAHL), the Educational Resources Information Center (ERIC), and Google Scholar. To locate articles, the authors used the following search terms in various combinations: college, peer, observation, teaching, and form. Reference pages of relevant articles were searched to locate additional publications. POT forms were included in the study regardless of academic discipline or mode of instruction (e.g., in-person, online); however, forms were only included if they had been used at the college level, included items on instructor teaching practices, and published from scholars in the United States. A total of 33 published college-level POT forms were included in the project (Table 2).
Published POT Forms Used for the Modified Delphi Study.
The items from each POT form were combined into a single document and placed into separate categories by the first author. Items were included in the document if they evaluated the instructor’s teaching practices and could be directly observable during a single class session. Items were excluded if they were meant to evaluate teaching practices outside of class (e.g., quality of syllabi, quality of homework assignments and rubrics, organization of online learning management systems [e.g., Blackboard, Moodle, Canvas]) or student behaviors during class (e.g., paying attention, taking notes, using cell phones, being distracted by others). A separate category included the different measurement scales that were used in the published POT forms. A total of 657 items were included in the study. The first author then grouped these items into a total of 60 different categories (Table 3) and were included in an anonymous, online survey. Approval from our institution’s IRB was then sought, and then deemed as “exempt” (study # HS-25-229).
POT Categories and Number of Items Selected Per Phase of the Delphi Study.
Category did not reach 70% consensus.
Category was reflected in new items created by combining and editing items during Phases 5 and 6.
Phase 2
The Experts were then invited to complete the survey. The survey was administered anonymously online via Google Forms. The Experts were given 1 to 2 weeks to complete each Phase of surveys, with reminders given close to each survey deadline to achieve a full response rate. For Phase 2, Experts were instructed to select any number of items from any category that they believed would be relevant to evaluating good teaching practices for POT in the new department. If any item within a category was selected by an expert, then that category also received a selection. A total of 448 items from 59 categories were selected (Table 3). After completing the Phase 2 survey, the Experts met to discuss the cutoff level for consensus in future Phases. Based on Diamond et al. (2014) systematic review of Delphi studies, the Experts agreed to only include categories of items that had been selected by at least 70% of the Experts.
Phase 3
Items that were selected from Phase 2 were then used to create a Phase 3 survey. The items that were not selected from Phase 2 were eliminated from the study. Experts were given the same instructions from Phase 2 when completing the Phase 3 survey; however, the Experts were provided with the anonymous results from the Phase 2 survey that showed the number of Experts that selected each item from each category. The results were meant to help build consensus by informing Experts of their peers’ selections. Using 70% consensus, a total of 245 items from 43 categories were selected (Table 3).
Phase 4
Items that were selected from Phase 3 were then used to create a Phase 4 survey. The items from categories that did not achieve 70% consensus from Phase 3 were eliminated from the study. In addition, any categories, and their respective items, that did not achieve 70% consensus were eliminated from the study. Experts were given the same instructions from Phase 3 and were again provided with the anonymous results from the Phase 3 survey. A total of 161 items from 39 categories were selected (Table 3).
Phase 5
As a modification to the Delphi method, Experts then met to discuss and edit the remaining items. Allowing space for group discussion during the final phases of a Delphi study has been used as a modification to the method in previous healthcare research (Broder et al., 2022; Vonk Noordegraaf et al., 2011). At the onset of the study, it was determined that group discussion would likely be needed to edit items for the creation of an original POT form. Specifically, the Experts met to combine items, eliminate items based on the number of selections, and then edit each item for consistency in tense, word choice, and length. Phase 5 resulted in a POT form consisting of 31 new items that reflected 39 of the original categories (Table 3).
Phase 6
The POT form that resulted from Phase 5 was then pilot tested by each faculty expert. Given the difficulty in coordinating the Experts’ schedules to observe a peer’s instruction for an in-person course, the website YouTube was searched for a lecture that the Experts could observe and evaluate at their convenience using the POT form. Due to the lack of classroom lectures uploaded on YouTube, a faculty workshop hosted by the University of South Carolina’s Center for Teaching Excellence (2020) YouTube channel was used for the pilot test. The workshop was focused on implementing active learning strategies in large classrooms, and was presented by an Associate Professor at the university to colleagues. The workshop was selected for the pilot test because it: resembled a lecture for college students, was similar in length as a typical college lecture (i.e., 50 min), featured several items that the POT form was designed to assess, provided clear audio, and showed the workshop’s slides, presenter, and audience members.
After completing the pilot test, the Experts met to edit the POT form. Based on the pilot observation experience, the Experts discussed editing the form’s measurement scale to eliminate ambiguity, combining items, and removing items that seemed excessive (e.g., redundant, irrelevant; Otero-Saborido et al., 2024). Phase 6 resulted in a POT form consisting of 24 new items. As a group, the Experts discussed how each item signified the original categories, which were ultimately reflective of 36 categories (Table 3).
Results
Several items and categories were eliminated during the Phases that involved the Experts’ anonymous feedback (Phases 2, 3, and 4). From the original 657 items gathered during Phase 1, the number of items decreased to 448 items, 235 items, and 161 items, respectively (Table 3). Likewise, from the original 60 categories during Phase 1, the number of categories decreased to 59 categories, 43 categories, and 39 categories, respectively (Table 3). Specifically, by Phase 4, the following categories were eliminated from each overarching topic: Scale of observation form (checklist/dichotomous, scale), Content (giving reminders to students about dates/deadlines, transitioning clearly between topics, defining key terms, explaining relevance of content, using accurate content, preparing students for future exams), Questions and answers (assessing students’ understanding at the start of class, allowing enough time for students to answer questions, incorporating students’ ideas into the class session, being flexible with class session based on questions), Diverse teaching and active learning methods (interacting and providing feedback during active learning), Communication skills (moving around the room, being aware of students, capturing students’ attention), Rapport (showing humility, using humor, having good rapport with students in general), Classroom management (managing disruptions by students, managing external disruptions), and Overall effectiveness (rating of instructor’s overall teaching effectiveness; Table 3).
A major refinement of the items and POT form took place during Phases 5 and 6 as the modified aspect of the Delphi study that removed Expert anonymity (i.e., meeting in person to discuss and edit the remaining items). In these Phases, the Experts’ discussions effectively decreased the remaining 161 items to 24 items (including the form’s measurement scale), and the remaining 39 categories to 36 categories. A contributing factor to the reduction of items took place by Experts working together to combine items. For example, by including the word “clarity” into item #3 on the POT form (Figure 1) “spoke with effective volume, clarity, pace, and tone,” the Experts were able to combine a separate item that assessed if an instructor was speaking clearly.

Final peer observation of teaching form.
During the modified Phases 5 and 6 of the study, there were two categories that initially met consensus, yet were ultimately eliminated after group discussion. The Experts decided to eliminate the category of conversing with students before, during, or after class, as it may have been too difficult to do so for several faculty who teach back-to-back classes. In addition, the Experts chose to eliminate the category of being sensitive to differing student views or backgrounds, as the Experts felt apprehensive about potentially violating the UNC System’s recent changes in the implementation of diversity, equity, and inclusion on campus (University of North Carolina System, 2024; Table 3).
Similarly, during Phases 5 and 6, there was one category that the Experts initially eliminated from the Delphi study, but then included back into the POT form after group discussion. Specifically, the Experts discussed how the category of allowing enough time for students to answer questions was valuable and could be included by editing item #12 of the POT form (i.e., asked questions during the class session and gave time for answers; Figure 1).
In addition, during Phase 6, the Experts decided to switch from using a Likert-based measurement scale to a dichotomous measure. Prior to the study’s pilot-test, the Experts reached consensus on using a Likert-based scale for measuring each item on the POT from; however, after the pilot-test, the Experts determined that a Likert-based measurement for each item was too ambiguous. After discussion, the Experts decided that a dichotomous measurement (i.e., no revisions vs. suggested revisions), with an option for providing open-ended comments per each item that needed revisions or deserved commendations, was best suited for providing observees with developmental feedback (Table 1 and Figure 1).
At the end of the study, the final number of items were categorized into a POT form based on the logical progression of time when observing a typical class session. Two items were used to assess the beginning of class, 15 items to assess during the class session, and five items to assess at the end of the class session (Figure 1). By structuring the POT form according to when observers would be assessing certain items during the flow of class, the form allowed for a more user-friendly experience during an observation session.
Discussion
This study used a modified Delphi method to create a new POT form based on the perspectives of experienced college instructors of public health using existing POT items found in the peer-reviewed literature. From a foundation of 657 relevant items found in 33 published POT forms, six phases of surveys, group discussions, and pilot testing resulted in a final POT form for in-person classes consisting of a dichotomous measurement scale and 22 new items. The items represented over 30 categories of established, observable teaching practices.
To the authors’ knowledge, this study was the most comprehensive analysis of common categories found in existing POT forms in higher education. After conducting a systematic review, Otero-Saborido et al. (2024) found 13 published POT forms used in higher education. Our study’s findings expanded upon that review by including 33 forms, and by categorizing the POT items into frequently observable teaching practices. The list of POT categories may be useful for faculty from other departments that are creating or adapting POT forms, or for scholars who are interested in studying POT.
The findings from this study reflect previous research regarding the utility of the Delphi method. Granted, the Delphi method has its share of critiques (Diamond et al., 2014; Keeney et al., 2001; Niederberger & Spranger, 2020); however, the method has been widely used, and was applied by two studies in creating POT forms in the field medical education (Frankl et al., 2017; Newman et al., 2009) that were included in the first phase for this project. Instructors of higher education may want to consider using the Delphi method, or a modification of the method, when creating their department’s own POT form. If a department opts for using the Delphi method, the items from the articles included in this study, or simply the categories listed in this article, may serve as a starting point for faculty to assess.
The process of creating a POT form through the Delphi method allowed for instructor buy-in on the items that would be used to evaluate their own teaching practices. Since faculty may view POT with mistrust and anxiety (Cutroni & Paladino, 2023), it can be assumed that involving the faculty in deciding which teaching practices should be evaluated, as both observers and observees, could help to improve the trustworthiness of a POT form. As K. L. Harris et al. (2008) stated, “. . .including staff in the decision-making process about how peer review of teaching will be conducted should significantly assist the ‘buy-in’ of staff to the program and, consequently, affect the degree to which they participate effectively and productively” (p. 21). As such, those creating, or changing, a POT form may want to involve all faculty in a department to provide input on the items that are being used to evaluate their teaching practices.
It is important to note that those who use this study’s POT from may want to adapt its measurement scale. Although the study’s Experts eventually reached the conclusion that a dichotomous scale (i.e., no revisions vs. suggested revisions) with optional open-ended comments would best serve observees’ teaching development, that measurement may not be suitable for a department’s summative evaluation of teaching, or for faculty members’ promotion dossiers. As such, faculty may want to opt for a more nuanced Likert-based scale found in previously published POT forms (e.g., Solid evidence, Partial evidence, None or insufficient evidence [Bryan et al., 2018], Commendable with no recommendation for improvement, Needs minor improvement, Adequate as the instructor attempted to do this but could use some development or revision, Below adequate levels as development is recommended, Missing as the instructor did not do this or should consider adding [Davis, 2011], Truly exemplary, Done well, Needs improvement [Gaskamp & Kintner, 2014]).
The results from this study have at least two major implications for future research. First, the POT form resulting from this study would benefit from being tested for validity and reliability. Although the items that served as a basis for this study were published in peer-reviewed journals, by combining and editing the pre-existing items, the POT form consists of newly created items, and is thus in need of testing for validity and reliability, similar to most POT forms (Otero-Saborido et al., 2024). Second, scholars may want to study the impact of the POT form toward enhancing teaching practices, as well as improvements in student learning. There is a real need in the research literature for examining if POT can lead to changes in the observee (i.e., implementing practices suggested by an observer) or the observer (i.e., trying new teaching practices that were used by the observee). In addition, research on POT is lacking in terms of changes in student outcomes (e.g., assessment scores, improved active learning) as a result of changes that are implemented by faculty due to POT (Cutroni & Paladino, 2023).
There are several limitations that should be considered when interpreting the findings from this study. First, given the search terms and databases used to retrieve POT forms in the peer-reviewed literature, it is possible that additional forms may have been unintentionally excluded in the study. Second, although there were strengths to the study’s selection of Experts, it is important to note that only one of the Experts had an advanced degree in the field of education. By not including additional educational experts in the study, it is possible that POT items that were excluded from the study may have been included by those with a more nuanced understanding of the study’s topic. This limitation reflects Keeney and colleagues’ statement in their critical review of Delphi studies, “. . .consensus from a Delphi process does not mean that the correct answer has been found” (Keeney et al., 2001, p. 198). Third, although the Experts’ answers to the survey phases were anonymous, complete anonymity could not be assured, since the Experts knew each other, and were able to see other Experts’ collective survey responses and feedback during in-person group discussions in the study’s final phases. Fourth, the collection of items from POT forms were initially grouped into categories by only the first author, and later by the group of Experts. A more robust method (i.e., factor analysis) could have been used to identify meaningful groupings. Fifth, the POT form was developed only by the faculty in the Department of Public Health at our institution, and is thus not generalizable, and may not reflect the perspectives of faculty in other departments of public health across the country. Sixth, the POT form was meant for in-person classes and may not be relevant to online courses. Finally, the POT form that resulted from this study has yet to be tested for validity or reliability.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Appalachian State University’s Center for Excellence in Teaching and Learning for Student Success’ 2025 Teaching Quality Framework Grant.
