Framing Validity Evidence: Revisiting “Reliability and Validity of a Teacher Impressions Scale to Assess Social Play of Swedish Children in Inclusive Preschools”

Abstract

During our tenure as editor-in-chief and editorial assistant, we placed a strong emphasis on how validity was conceptualized in the manuscripts we reviewed. Specifically, we encouraged authors to align their discussions of validity with the widely accepted understanding that it involves evaluating the trustworthiness and appropriateness of the intended uses and interpretations of test scores. In this commentary, we revisit the 2022 article by Mina Sedem, Eva Siljehag, Mara Westling Allodi, and Samuel Odom, titled “Reliability and Validity of a Teacher Impressions Scale to Assess Social Play of Swedish Children in Inclusive Preschools.” We selected this article because it exemplifies how multiple sources of evidence can be used to support claims about the validity of test score interpretations and uses.

Keywords

observation infant/toddler/preschool social-emotional behavior rating scale

As editor of Assesment for Effective Intervention (AEI), it was both a pleasure and privilege to collaborate with numerous authors who entrusted our editorial team with their valuable scholarship. While not all manuscripts we received were ultimately published in AEI, we approached every submission with the utmost respect, recognizing that each represented an important part of someone’s body of scholarly work. Our team—comprising associate editors, the editorial assistant, and the editor-in-chief—deeply valued the opportunity to engage with the authors, learn about their research, and understand the potential impact of that research on the field. Without exception, every manuscript taught us something new, and we remain grateful for the trust the authors placed in our hands.

For those manuscripts not published in AEI, a frequent reason was misalignment with the journal’s aim and scope. As outlined in the author guidelines, AEI is unique in its focus on research related to the development, validation, and use of assessments specifically designed to inform intervention. Over time and across editorial teams, the journal’s aim and scope have held nuanced distinctions. During our tenure, we placed a strong emphasis on the conceptualization of validity. Recognizing the diverse audience AEI serves, we sought to foster cohesion within the field by unifying perspectives on how validity is framed. We emphasized the widely accepted understanding that validity is not a property of the test itself but rather an evaluation of the trustworthiness and appropriateness of the intended uses and interpreta-tions of test scores (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, & NCME], 2014). This perspective underscores the importance of gathering and evaluating multiple sources of evidence to assess an instrument’s suitability for making specific decisions. As described in the Standards for Educational and Psychological Testing, these sources of evidence fall into five categories: test content, response process, internal structure, relations to other variables, and consequences of testing.

In alignment with the journal’s aim and scope, the articles published during our tenure addressed assess-ment development, validation, and use for intervention across a variety of domains, including academic skills, social-emotional behavior, and mental health. Although all highlight important aspects of assessment for effective intervention, we opted to revisit the 2022 article, “Reliability and Validity of a Teacher Impressions Scale to Assess Social Play of Swedish Children in Inclusive Preschools” by Mina Sedem, Eva Siljehag, Mara Westling Allodi, and Samuel Odom.

This article exemplifies the framing of validity as the evaluation of multiple sources of evidence to examine the trustworthiness and appropriateness of an assessment’s intended uses and interpretations. The article focuses on the Swedish version of the Teacher Impression Scale (TIS-S), a behavioral rating scale designed for use in inclusive preschool settings to help teachers identify children’s need for support in play and social interactions. The authors identified four key claims that needed to be tested before having confidence in the trustworthiness and appropriateness of the results for informing these decisions. These claims included:

The TIS-S scale is highly reliable based on measures of internal consistency including Cronbach’s alpha and item-total correlations.

The scores demonstrate stability over time based on evidence of test-retest reliability using scores collected 1 week apart.

The scores on the TIS-S differentiate students who have special education needs and typically developing peers

A principal components analysis (PCA) supports the construct stability of the item structure of the TIS-S.

To provide evidence related to these four claims, the analyses focused on evaluating the internal structure of the scale and gathering criterion-related evidence. Ultimately, the results support their claim that the TIS-S “can help identify children’s need for support in play and social interaction in inclusive preschool settings” (p. 58).

In their examination of the accumulating evidence, the authors considered multiple sources of evidence and critically examined practical constraints to some existing sources. For example, they explore the advantages and limitations of inter-observer agreement within the context of validity evidence for a teacher observational rating system. The authors did not evaluation inter-rater agreement at the item level. The authors stated,

The TIS is a rating scale based on observations and differs from a time-based, discrete categorical observational system that generates frequencies of behavior or percentages of intervals in which a behavior occurs. For the latter system, detailed measurement of inter-rater agreement is more important. (p. 59)

The authors go on to recommend that future research should assess inter-rater agreement to supplement the existing evidence.

They highlighted the importance of a broader research agenda to fully evaluate the validity of the uses and interpretations of the TIS-S. The TIS-S builds on an existing measure called the TIS, which was developed to evaluate a “Play Time/Social Time” intervention designed to support students with and without disabilities through peer-mediated play. Their findings supported the intended uses and interpretations of the instrument for guiding teachers’ intervention decisions. Broadly speaking, the article demonstrates how multiple sources of validity evidence support teachers in using observational ratings to inform instruction and targeted interventions. The authors make a clear connection to informing instruction, stating that the scores from the measure support the identification of children’s strengths and weaknesses and may help teachers identify supports that could be provided at an early stage. In addition, the measure is “user friendly” (p. 59) and does not require extensive rater training.

Notably, this manuscript illustrates the value of assessments that emphasize children’s strengths. Specifically, the authors write,

Results indicate that all items in the instrument are positively formulated, in the sense that they describe children’s normally occurring social behaviors and strengths. The formulations do not evoke difficulties and do not induce the teachers to observe shortcomings; on the contrary, the teachers are made aware of which behaviors could be encouraged and thus occur more often among children in social play situations. (p. 58)

Unlike assessments that may unintentionally emphasize deficits, the TIS-S is explicitly designed to spotlight positive social behaviors children exhibit with their peers. This strength-based approach aligns with the instrument’s purpose: to guide decisions to support children’s play and social interactions. By focusing on strengths, the TIS-S facilitates teachers’ ability to design effective interventions and strengthening social competence and play between children in a way that builds on children’s existing capabilities.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Leanne R. Ketterlin-Geller

References

American Educational Research Association, American Psycho-logical Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Sedem

Siljehag

Allodi

M. W.

Odom

S. L.

(2022). Reliability and validity of a teacher impressions scale to assess social play of Swedish children in inclusive preschools. Assessment for Effective Intervention, 48(1), 52–61. https://doi.org/10.1177/15345084221100416