Practice Still Makes Proficient: A Replication of Kershaw et al. (2018,2023;Kershaw & Gordon,2024) Using an Alternate Form of the Journal Article Comprehension Assessment

Abstract

Background

Recent research suggests that empirical research comprehension can be improved through structured assignments that distribute deliberate practice opportunities over the semester.

Objective

This recent research used identical Journal Article Comprehension (JAC) assessments of this structured practice in a pretest/posttest design. The current study's aim was to develop an alternate form of the JAC assessment.

Method

Across four semesters, 125 students completed one form of the JAC at the pretest and another at the posttest. These different forms were matched on question type and answer options.

Results

All groups improved in empirical research comprehension between the pretest and posttest, as shown through the total JAC score. This improvement occurred regardless of which combination of JAC articles they received at the pretest or posttest.

Conclusion

The findings suggest that both JACs are equivalent measures of empirical research comprehension. The current results also replicate past research showing overall improvement, that improvement is not just a practice effect, and that improvement can occur with only a few practice opportunities.

Teaching Implications

The equivalence of the new form of the JAC gives educators an alternate way to assess empirical research comprehension within the context of a semester or to assess the transfer of skills across the curriculum.

Keywords

reading empirical articles instructional scaffolds alternate forms assessment cognitive psychology

Comprehending and analyzing empirical research articles is an important science process skill for students in multiple STEM fields (Coil et al., 2010). As noted by Goudsouzian and Hsu (2023), educational organizations have made multiple calls to enhance instruction in reading empirical research articles. Being able to comprehend and analyze empirical research articles fits under the American Psychological Association's (APA, 2023) learning goals of exercising scientific reasoning and demonstrating evidence of psychological literacy. A large number of past publications have suggested methods for teaching students how to read empirical research articles. Still, there is a paucity of evidence regarding the assessment of the efficacy of these methods (for a review, see Goudsouzian & Hsu).

Several studies have, however, provided support for instructional methods such as peer tutoring (Van Lacum et al., 2014) and structured assignments (Bachiochi et al., 2011; Sego & Stuart, 2016; Wenk & Tronsky, 2011) in improving students’ comprehension of empirical research articles. Each of these studies used a pretest/posttest design and embedded their assessments within the context of an undergraduate course.

Similarly, my colleagues and I developed an instructional method to scaffold comprehension of empirical research articles, the Reading Worksheet assignment (RW). The RW consists of six standard open-ended questions that ask students about the goals of a research study, its methodology, the main results, the conclusions, a critique of the study, and an idea for future research (Kershaw et al., 2018). RWs have been implemented in undergraduate cognitive psychology courses taught by multiple university professors. The choice of cognitive psychology courses was made because of my background and that of my colleagues, what courses we typically taught, and the role of cognitive psychology in the curricula at our universities as a bridge between statistics and research methods courses. Further, upper-level psychology courses can be effective testbeds for the honing of career-related skills (Halonen & Dunn, 2018) such as reading, interpreting, summarizing, and applying data, which are relevant to career paths that psychology majors often follow, such as in health care, human resources, and business (Appleby, 2018).

Effective administration of this assignment involves a tutorial to introduce the RW, followed by multiple RWs distributed across the semester (Kershaw et al., 2018, 2023; Kershaw & Gordon, 2024), which is done in line with the effective learning strategies of deliberate and distributed practice (cf. Dunlosky et al., 2013). See the Method section and Kershaw (2024) for a full description of the implementation of the RW assignment, its standard questions, and suggestions for articles that have been used for the RWs.

To evaluate the effectiveness of the RW as a learning intervention, we (Kershaw et al., 2018, 2023; Kershaw & Gordon, 2024) assessed students’ comprehension of empirical research articles through a multiple-choice instrument called the Journal Article Comprehension (JAC) assessment, which asks students seven multiple choice questions about an empirical research article, covering the same topics as the RW (the JAC breaks the method into multiple questions). In past research, we only used one version of the JAC based on Beilock and Carr (2005). Using this JAC, we found that students who completed seven or more RWs improved on the JAC between pretest and posttest, while a control group who only completed one article summary did not improve on the JAC (Kershaw et al., 2018). We also found that completing three or four RWs is as effective as completing 9, 10, or 30 RWs (Kershaw et al., 2023; Kershaw & Gordon).

Although the RW is an effective instructional scaffold across multiple instructors and universities, its associated assessment, the JAC, has only used questions about one article, Beilock and Carr (2005). Giving the same assessment at pretest and posttest has been done in other studies that have found improvement in research comprehension (e.g., Bachiochi et al., 2011; Wenk & Tronsky, 2011), raising the concern of practice effects. Practice effects are extremely common in testing situations. For example, two meta-analyses showed an average improvement of one-quarter of a standard deviation for standardized cognitive ability tests (Hausknecht et al., 2007) and cognitive components of neuropsychological assessments (Calamia et al., 2012). Larger effect sizes were shown for identical test forms (Hausknecht et al.), but alternate forms eliminated practice effects for most tests (Calamia et al.).

The current study's aim was to replicate Kershaw et al.'s (2018, 2023; Kershaw & Gordon, 2024) methodology of assigning multiple RWs between pretest and posttest administrations of the JAC with the intent of replicating past findings that these practice assignments would improve students’ research comprehension. Further, my goal was to extend my past research by developing an alternate form of the JAC based on a different empirical article. Although we did not find practice effects on the JAC in a control group (Kershaw et al., 2018), developing an alternate form would allow for eliminating practice effects as a potential explanation of our past results. Further, an alternate form would allow instructors flexibility in choosing an assessment instrument. I hypothesized that the alternate form of the JAC would be just as sensitive as the original form in detecting changes in students’ empirical research article comprehension. Further, I did not expect differences in comprehension to change based on which form of the JAC was received.

Method

Participants and Instructional Context

The participants were 182 students enrolled in a face-to-face cognitive psychology course during the Fall 2021 through Spring 2023 semesters. Of the 182 students enrolled, 11 did not consent to share their data; 30 did not complete the assessments at pretest, posttest, or both; and 16 were retaking the course after failing it. Removal of data from these students resulted in a final sample of 125 participants. No demographic data were collected about the participants, but students in this course were generally in their second or third year of study, were all psychology majors, and were taking the course to fulfill a major requirement. At my university, cognitive psychology has statistics as a prerequisite but is taken before a content-based research methods course. Thus, this course is a bridge between statistics and research methods, with one of its skill goals being learning how to comprehend published research. The current sample of participants is from the same course, same university, and has the same instructor as groups in my past research (Kershaw et al., 2018, 2023; Kershaw & Gordon, 2024), thus replicating the sample characteristics and instructional context of my past research.

Students were randomly assigned to receive one version of the JAC assessment at the pretest and another version at the posttest. This resulted in 25 students receiving the Original JAC (based on Beilock & Carr, 2005) at pretest and posttest, 33 students receiving the Alternate JAC (based on Breslin & Safer, 2011) at pretest and posttest, 33 students receiving the Original JAC at pretest and the Alternate JAC at posttest, and 34 students receiving the Alternate JAC at pretest and the Original JAC at posttest. Group membership is uneven due to the removal of data for not consenting, retaking the course, or not completing the pretest, posttest, or both, as explained above. Kershaw et al. (2018, 2023; Kershaw & Gordon, 2024) only used the Original JAC at pretest and posttest, making the development and administration of the Alternate JAC an extension of my past research.

Materials

Beilock and Carr (2005) was used as the article for the Original JAC by Kershaw et al. (2018, 2023; Kershaw & Gordon, 2024). I chose Breslin and Safer (2011) for the Alternate JAC following the guidelines I used to originally select Beilock and Carr: a short article, a single study, and related to topics covered in the cognitive psychology course. It is about a different topic, autobiographical memory for sporting events, than Beilock and Carr, which is about how individual differences in working memory capacity predict choking under pressure in mathematical problem-solving.

Following item cloning procedures suggested by Clause et al. (1998), I constructed the Alternate JAC by matching answer key options from the Original JAC so that there were the same answer options (fully correct, partially correct, and incorrect) in the same order for each question (both versions of the JAC are in Kershaw, 2024).

Procedure

This study was approved by the University of Massachusetts Dartmouth IRB, protocol 15.005. Replicating the procedure followed by Kershaw et al. (2018, 2023; Kershaw & Gordon, 2024), students completed the pretest JAC within the first three weeks of classes, prior to completing the practice reading worksheet (RW) and having an in-class tutorial about reading research articles (see Kershaw (2024) for the in-class tutorial slides). When students began the pretest, they read a consent letter and choose whether to share their data beyond the context of the class for research purposes. Between the pretest and posttest, students were assigned five RWs, based on Kershaw et al. (2023) and Kershaw and Gordon's (2024) results about a smaller number of RWs being sufficient to increase research article comprehension. Thus, the current study replicates the use of smaller numbers of RW assignments used by some groups in Kershaw et al. (2023) and Kershaw and Gordon, but is different than groups within these studies who received larger numbers of RW assignments. Please see Kershaw (2024) for the RW questions and research articles assigned for the RWs. Students completed the posttest JAC during finals week, after all RW assignments had been completed, replicating the placement of the posttest in Kershaw et al. and Kershaw and Gordon.

Results

Descriptive statistics for the overall JAC score by article are provided in Figure 1. A 2 × 2 × 2 mixed factorial ANOVA was conducted on the total JAC score, with time (pretest vs. posttest) as a within-subjects variable and two between-subjects variables, JAC at pretest (Original or Alternate) and JAC at posttest (Original or Alternate). There was a main effect of time, such that there was an improvement in the total JAC score between the pretest and posttest, F(1, 121) = 14.13, p < .001, $η_{p}^{2}$ = .11 (see Figure 1). There were no main effects of JAC assignment at pretest, F(1, 121) = 1.49, p = .23, $η_{p}^{2}$ = .01, or posttest, F(1, 121) = 0.25, p = .62, $η_{p}^{2}$ = .002. There were no significant two-way interactions between time and pretest JAC, F(1, 121) = 0.71, p = .40, $η_{p}^{2}$ = .01; time and posttest JAC, F(1, 121) = 0.002, p = .97, $η_{p}^{2}$ < .001; nor pretest JAC and posttest JAC, F(1, 121) = 0.15, p = .70, $η_{p}^{2}$ = .001. Likewise, the three-way interaction was not significant, F(1, 121) = 0.03, p = .86, $η_{p}^{2}$ < .001.

Figure 1.

Mean accuracy on the journal article comprehension (JAC) assessment at pretest and posttest by group.

Because there was no interaction between time and JAC assignment, a series of paired-samples t-tests were conducted to examine which components of the JAC improved between the pretest and posttest. As shown in Table 1, students improved in identifying the purpose of the study, its dependent variables, and its potential weaknesses.

Table 1.

Mean Accuracy on Each Question of the Journal Article Comprehension (JAC) Assessment at Pretest and Posttest.

JAC Question	Pretest M (SD)	Posttest M (SD)	t (124)	p	d
Purpose	0.70 (0.42)	0.82 (0.36)	−2.37	.019*	0.21
Participants/Procedure	0.84 (0.32)	0.88 (0.25)	−1.35	.180	0.12
IVs	0.65 (0.35)	0.69 (0.37)	−0.95	.346	0.09
DVs	0.62 (0.33)	0.73 (0.35)	−2.52	.013*	0.23
Results	0.88 (0.31)	0.90 (0.31)	−0.30	.767	0.03
Conclusions	0.92 (0.22)	0.92 (0.21)	0.00	1.000	0.00
Criticisms	0.51 (0.26)	0.64 (0.26)	−4.01	< .001*	0.36

Note. * p < .05. Cohen's d is a measure of effect size. A d value of 0.2 is a small effect size, 0.5 is a medium effect, and 0.8 is a large effect.

Discussion

The goal of the current study was to replicate and extend past research by Kershaw et al. (2018, 2023; Kershaw & Gordon, 2024) by developing an alternate form of the JAC assessment that was equivalent to the original form in detecting changes in students’ comprehension of empirical research articles after completing multiple RW assignments during the semester. Between pretest and posttest, all groups improved in research article comprehension, as shown through mean score improvements across time on the JAC. This result replicates findings by Kershaw et al. and Kershaw and Gordon showing that completing multiple RW assignments leads to improvement in research comprehension. Importantly, this improvement occurred regardless of which combination of JAC articles students received at the pretest or posttest. This finding suggests that the Alternate JAC introduced here is an equivalent measure of performance change in research article comprehension to the Original JAC. By developing an alternate form of the JAC, I have addressed concerns that improvements in research article comprehension were due to practice effects, a common effect and concern in psychological testing and assessment (Calamia et al., 2012; Hausknecht et al., 2007). Because using alternate assessment forms eliminates most practice effects (Calamia et al.), the Alternate JAC gives educators a different way to assess empirical research comprehension, whether within the context of a semester or to assess the transfer of skills across the curriculum. The current results further bolster Kershaw et al.'s (2018) finding of no improvements in research comprehension in a control group that did not complete any RWs, again leading to the conclusion the improvements on the JAC (Original or Alternate) are not due to practice effects.

The current results also replicate past research by Kershaw et al. (2018, 2023) and Kershaw and Gordon (2024), showing overall improvement in JAC scores and, specifically, improved understanding of article purpose, comprehension of dependent variables, and identification of study weaknesses (Kershaw & Gordon). The scores and changes on these particular components of the JAC are comparable to those of my previous research (Kershaw et al., 2018, 2023; Kershaw & Gordon). Further, I conducted a post-hoc power analysis using G*Power 3.1 (Faul et al., 2007) based on the obtained effect size for the main effect of time ( $η_{p}^{2}$ = .11), the sample size, the number of groups, number of measurements, and the correlation between the pretest and posttest measures (r [123] = .12). For the current study, the power was 1.00, which is above Cohen's (1988) recommended level of .80. Thus, the current sample size of 125 participants was sufficient for detecting changes in research comprehension and the study is not underpowered.

As shown in my past research (Kershaw et al., 2018, 2023; Kershaw & Gordon, 2024) and previous literature (Wenk & Tronsky, 2011), students show a good understanding of the general conclusions from research, which may explain why there were high scores for the results and conclusions questions for the JAC that did not change over time. Likewise, my past research and past literature (Wenk & Tronsky) have shown some changes in understanding of the methodology, as reflected in changes in comprehension of dependent variables, but not changes in comprehension of independent variables or characteristics of the participants and procedure. While this could suggest that these concepts are difficult or resistant to change, in line with Stoa et al.'s (2022) findings that students rated psychological concepts related to research methodology, such as operational definitions and different study designs, as particularly difficult, there are other methodology concepts that students already understand fairly well. For example, Stoa et al. found that students rated independent and dependent variables less difficult than other methodology-related concepts. In addition, Varela et al. (2005) found that more advanced students did not show changes in comprehending a research article's methodology.

Limitations and Future Directions

The scope of the current study is limited to a single university and cognitive psychology courses. I have adapted the RW assignment for introductory psychology courses by providing more direction with the questions, such as referring to a particular page or paragraph in an empirical article. However, I have not collected empirical data on this more in-depth scaffolding for beginning students, which could be done in future research. Further, I have only examined research article comprehension changes across one semester. Examining change in research article comprehension across the curriculum based on appropriate scaffolds would also support the APA's (2023) distinction between foundation and baccalaureate learning indicators.

One promising finding in the current study was that students showed an improvement in identifying weaknesses of the research, a contrast from some of my past research (Kershaw et al., 2018, 2023) and past literature (Varela et al., 2005; Wenk & Tronsky, 2011). Kershaw and Gordon (2024), similar to the current study, did find an improvement in identifying weaknesses, which may be due to a change in the wording of the RW question from “What criticisms do you have of this research? Is there anything you would do differently?” (Kershaw et al., 2018, p. 927; also used by Kershaw et al., 2023) to “What weaknesses in the design or procedure of this study limit how much you trust its findings? Choose ONE weakness and explain it” (see Kershaw, 2024, for the newest RW template). The new RW question could give students better direction in identifying weaknesses than the previous version. To more fully explore how students’ ability to identify weaknesses changes over the semester, a colleague and I are currently conducting a project in which we are examining changes in the weaknesses students identify in research summaries written as an instructional resource by Millis et al. (2021).

Implications for Instruction

The present results replicate past findings that improvement in research article comprehension is not just a practice effect (Kershaw et al., 2018) and that improvement can occur with a limited number of practice assignments (Kershaw et al., 2023; Kershaw & Gordon, 2024). In the current study, students showed improved research comprehension with five RWs, but past research has shown improvement with as few as three RWs (Kershaw & Gordon). For instructors trying to meet APA (2023) learning goals related to scientific reasoning and psychological literacy, achieving these learning goals in only a few assignments distributed over the semester allows for efficiency and flexibility to meet other goals.

Overall, the current research supports that science process skills (Coil et al., 2010), such as comprehending empirical research articles, can be honed using deliberate and distributed practice (cf. Dunlosky et al., 2013) over the semester. In line with the APA's (2023) learning goals, the instructional approach outlined in this article and its corresponding assessment are meant to support instructors in teaching scientific inquiry and critical thinking skills. These skills can be honed in upper-level psychology courses (Halonen & Dunn, 2018), thereby also helping students to practice career-readiness skills (National Association of Colleges & Employers, 2024) in relation to the APA's (2023) learning goal of professional development. Given the demands on instructors to show that their pedagogical practices are effective, I hope the example provided in this article gives instructors an empirically supported method to improve their students’ understanding of research articles. In addition to this example, I encourage fellow psychology instructors to use the many assessment resources available from sources such as the Society for the Teaching of Psychology's Resources for Teachers of Psychology website or APA's Project Assessment (see references for information for both). Further, I hope this example inspires instructors to use and develop alternate forms of assessments to address various learning goals.

Footnotes

Author Note

Trina C. Kershaw

The data that support these findings, as well as some materials, are openly available at

Preliminary results from this research were presented at the 46^th Annual National Institute on the Teaching of Psychology. I thank Heloisa Alves for access to her cognitive psychology course for developing the assessments, Jordan Lippman for his assistance in developing the Original JAC, and Judy Sims-Knight for her advice regarding power analyses. I also thank the editors and anonymous reviewers for their helpful feedback.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Trina C. Kershaw

Open Practices

For publishing their material, Kershaw received badges for Open Data and Open Materials. The public content may be retrieved from .

References

American Psychological Association. (2017). Project Assessment (PASS). http://pass.apa.org.

American Psychological Association. (2023). APA guidelines for the undergraduate psychology major: Version 3.0. https://www.apa.org/about/policy/undergraduate-psychology-major.pdf.

Appleby

D. C.

(2018). Preparing psychology majors to enter the workforce: Then, now, with whom, and how. Teaching of Psychology, 45(1), 14–23. https://doi.org/10.1177/0098628317744944

Bachiochi

Everton

Evans

Fugere

Escoto

Letterman

Leszczynski

(2011). Using empirical article analysis to assess research methods courses. Teaching of Psychology, 38(1), 5–9. https://doi.org/10.1177/0098628310387787

Beilock

S. L.

Carr

T. H.

(2005). When high-powered people fail: Working memory and “choking under pressure” in math. Psychological Science, 16(2), 101–105. https://doi.org/10.1111/j.0956-7976.2005.00789.x

Breslin

C. W.

Safer

M. A.

(2011). Effects of event valence on long-term memory for two baseball championship games. Psychological Science, 22(11), 1408–1412. https://doi.org/10.1177/0956797611419171

Calamia

Markon

Tranel

(2012). Scoring higher the second time around: Meta-analyses of practice effects in neuropsychological assessment. The Clinical Neuropsychologist, 26(4), 543–570. https://doi.org/10.1080/13854046.2012.680913

Clause

C. S.

Mullins

M. E.

Nee

M. T.

Pulakos

Schmitt

(1998). Parallel test form development: A procedure for alternate predictors and an example. Personnel Psychology, 51(1), 193–208. https://doi.org/10.1111/j.1744-6570.1998.tb00722.x

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Erlbaum.

10.

Coil

Wenderoth

M. P.

Cunningham

Dirks

(2010). Teaching the process of science: Faculty perceptions and an effective methodology. CBE—Life Sciences Education, 9(4), 524–535. https://doi.org/10.1187/cbe.10-01-0005

11.

Dunlosky

Rawson

K. A.

Marsh

E. J.

Nathan

M. J.

Willingham

D. T.

(2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4–58. https://doi.org/10.1177/1529100612453266

12.

Faul

Erdhelder

Lang

A.-G.

Buchner

(2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146

13.

Goudsouzian

L. K.

Hsu

J. L.

(2023). Reading primary scientific literature: Approaches for teaching students in the undergraduate STEM classroom. CBE—Life Sciences Education, 22(3), 1–13. https://doi.org/10.1187/cbe.22-10-0211

14.

Halonen

J. S.

Dunn

D. S.

(2018). Embedding career issues in advanced psychology major courses. Teaching of Psychology, 45(1), 41–49. https://doi.org/10.1177/0098628317744967

15.

Hausknecht

J. P.

Halpert

J. A.

Di Paolo

N. T.

Moriarty Gerrard

M. O.

(2007). Retesting in selection: A meta-analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied Psychology, 92(2), 373–385. https://doi.org/10.1037/0021-9010.92.2.373

16.

Kershaw

T. C.

(2024, July 25). Alternate form JAC project. https://doi.org/10.17605/OSF.IO/C67EB

17.

Kershaw

T. C.

Fugate

J. M. B.

O’Hare

A. J.

(2023). Teaching undergraduates to understand published research through structured practice in identifying key research concepts. Scholarship of Teaching and Learning in Psychology, 9(2), 216–233. https://doi.org/10.1037/stl0000239

18.

Kershaw

T. C.

Gordon

L. T.

(2024). Practice assignments improve research comprehension skills and reinforce concept learning. Scholarship of Teaching and Learning in Psychology, 10(4), 536–551. https://doi.org/10.1037/stl0000337

19.

Kershaw

T. C.

Lippman

J. P.

Fugate

J. M. B.

(2018). Practice makes proficient: Teaching undergraduate students to understand published research. Instructional Science, 46(6), 921–946. https://doi.org/10.1007/s11251-018-9456-2

20.

Millis

Halpern

Wiemer

Wallace

(2021). Evaluating research summaries. In Society for the Teaching of Psychology, Resources for teachers of psychology. https://teachpsych.org/page-1603066.

21.

National Association of Colleges and Employers. (2024). Competencies for a career-ready workforce. https://naceweb.org/docs/default-source/default-document-library/2024/resources/nace-career-readiness-competencies-revised-apr-2024.pdf?sfvrsn=1e695024_6.

22.

Sego

S. A.

Stuart

A. E.

(2016). Learning to read empirical articles in general psychology. Teaching of Psychology, 43(1), 38–42. https://doi.org/10.1177/0098628315620875

23.

Society for the Teaching of Psychology. (n.d.). Resources for teachers of psychology. https://teachpsych.org/page-1603066.

24.

Stoa

Chu

T. L.

Gurung

R. A. R.

(2022). Potential potholes: Predicting challenges and learning outcomes in research methods in psychology courses. Teaching of Psychology, 49(1), 21–29. https://doi.org/10.1177/0098628320979881

25.

Van Lacum

E. B.

Ossevoort

M. A.

Goedhart

M. J.

(2014). A teaching strategy with a focus on argumentation to improve undergraduate students’ ability to read research articles. CBE – Life Sciences Education, 13(2), 253–264. https://doi.org/10.1187/cbe.13-06-0110

26.

Varela

M. F.

Lutnesky

M. M. F.

Osgood

M. P.

(2005). Assessment of student skills for critiquing published primary scientific literature using a primary trait analysis scale. Microbiology Education, 6(1), 20–27. httpsss://doi.org/http://doi.org/10.1128/154288105X14285806518972 https://doi.org/10.1128/me.6.1.20-27.2005

27.

Wenk

Tronsky

(2011). First-year students benefit from Reading primary research articles. Journal of College Science Teaching, 40(4), 60–67. https://sites.hampshire.edu/ctl/files/2014/07/JCST_Wenk.pdf .