Abstract
There are increasing numbers of children presenting to paediatric hospital settings in mental health crisis. Typically, non-mental health professionals are responsible for the initial assessment of these children and are required to identify immediate physical and emotional health needs. To ensure the safety of these children, immediate risk of suicide and self-harm should be assessed. However, no standardized assessment tool is used in clinical practice, and for those tools that are used, their validity and reliability is unclear. A scoping review was conducted to identify the existing assessment tools of immediate self-harm and suicide risk. Searches of electronic databases and relevant reference lists were undertaken. Twenty-two tools were identified and most assessed acute risk of suicide with only four tools incorporating a self-harm assessment. The tools varied in number of items (4–146), subscales (0–11) and total scores (16–192). Half incorporated Likert-type scales, and most were completed via self-report. Many tools were subject to limited psychometric testing, and no single tool was valid or reliable for use with children presenting in mental health crisis to non-mental health settings. As such, a clinically appropriate, valid and reliable tool that assesses immediate risk of self-harm and suicide in paediatric settings should be developed.
Introduction
Globally, the prevalence of mental health problems in children and young people (CYP) is increasing (Merikangas et al., 2009), with estimates of up to 10% of all CYP being clinically diagnosable (Green et al., 2005). These statistics represent a spectrum of conditions, requiring different levels of healthcare across primary and secondary care service settings.
At the acute end of this spectrum are CYP experiencing mental health crisis. This is defined as a psychiatric emergency involving ‘an acute disruption of psychological homeostasis whereby usual coping mechanisms fail and distress and functional impairment’ results (Lewis and Roberts, 2001). This may include extreme anxiety or panic attacks; psychotic episodes (including delusions, hallucinations, paranoia or hearing voices); hypomania or mania; other behaviours that feel out of control; and acts of suicide or self-harm (Mind, 2013).
There are varying definitions of both self-harm and suicide in the literature. However, for the purpose of this article, self-harm has been defined as an act with nonfatal outcome, in which an individual, irrespective of motivation, initiates a non-habitual behaviour that, without intervention from others, will cause self-harm, or deliberately ingests a substance in excess of the prescribed or generally recognized therapeutic dosage, and which is aimed at realizing desired changes (National Institute for Health and Clinical Excellence (NICE), 2013; Schmidtke et al., 1996).
Internationally, emergency department (ED) attendance for self-harm and suicidal behaviour is high (Bethell et al., 2013; Rhodes et al., 2014), with literature indicating that CYP presenting to the ED due to self-harm are likely to present again within the months following a previous presentation (Bennardi et al., 2016; Hulten et al., 2001). Suicide remains prevalent in CYP, with prevalence rates of 5.3 per 100,000 in 15–19-year-olds in the United Kingdom (Office of National Statistics, 2016). Moreover, 145 suicides of young people under the age of 20 were reported in England between 2014 and 2015 (Rodway et al., 2016). Mental health crisis is the primary cause of approximately 5% of ED attendances (Parsonage et al., 2012) with the most prevalent presenting conditions being self-harm or suicide behaviours. Moreover, in CYP aged 10–19 years in England, suicide prevalence is 4.3 per 100,000, and self-harm is 435.95 per 100,000, with repeat ED attendance becoming increasingly commonplace (Hawton et al., 2012).
In the United Kingdom, for CYP presenting to hospital in mental health crisis, initial assessment is often undertaken by non-mental health professionals (paediatricians or children’s nurses) within ED and paediatric ward settings (Anderson and Standen, 2007). This assessment aims to address immediate physical health needs (Olfson et al., 2005) and identify immediate risks to CYP’s safety while they await expert assessment by specialist mental health professionals.
Evidence suggests that risk assessments are no more accurate at predicting risk than expert specialist mental health professional clinical judgement in non-acute psychiatric outpatients (Quinlivan et al., 2017). However, acute paediatric care settings present specific differences in utility, focus and context that make the application of an assessment of suicide and self-harm unique. For example, the assessment is usually made by non-mental health experts who may lack specialist knowledge and experience to inform clinical decisions (Crawford et al., 2003). Furthermore, the focus of these assessments is to assess any immediate (i.e. hours or days) risks of self-harm or suicide while in receipt of acute paediatric care. Additionally, assessments are performed in time-limited circumstances with CYP with potentially dynamic and fluctuating mental health. Therefore, to enable implementation of a plan of care where immediate risks can be mitigated, healthcare professionals require appropriate support and guidance to inform their assessment. In the United Kingdom, the NICE (2004) guidelines advocate that CYP who self-harm should be assessed for risk. This assessment should identify the psychiatric illness and its relationship to self-harm, assess personal and social context and any specific factors predicting self-harm, and recognize any significant relationships that may be supportive or represent a threat. Such an assessment would need to consider the relatively immediate risk of self-harm or suicide in order to make time critical risk management decisions. Moreover, it would need to consider the developmental age of the CYP as children can often find verbal expression difficult, especially when in emotional distress (Vatne et al., 2010). Furthermore, the risk assessment should include assessment of previous ED presentations as this represents one of the strongest predictors of future ED repetitions across age and gender in young people (Bennardi et al., 2016; Hawton and James, 2005). Currently, however, there is no standardized assessment tool utilized in clinical practice in the United Kingdom, and for those that are used, their validity, reliability and acceptability remain questionable.
Aim
There is need for a scoping review exploring the breadth and psychometric properties of existing risk assessment tools of immediate risk of self-harm and suicide in CYP. The aim of this review is to scope the literature for existing assessment tools of immediate risk of self-harm and suicide in CYP and synthesize their characteristics and psychometric properties.
Method
A scoping review method adhering to a published framework (Arksey and O’Malley, 2005) was employed to guide evidence identification, data charting, collating, summarizing and reporting. Scoping reviews offer a transparent and systematic approach to reviewing literature and are particularly useful in research areas with emerging evidence bases and where the research questions go beyond intervention effectiveness (Arksey and O’Malley, 2005; Levac et al., 2010).
This scoping review employed a sequential two-phase approach. Phase 1 identified the assessment tools from the published literature. Phase 2 identified the psychometric testing papers for each assessment tool. Phases 1 and 2 involved searching four online databases (PubMed, MEDLINE, EMBASE and PsychINFO) and reference lists of included papers.
Search strategy
Phase 1 searches were conducted in November 2016. Predefined search terms and Boolean phrasing were used to identify assessment tools of immediate risk of self-harm or suicide (see Table 1).
Phase 1 search terms.
Phase 2 was conducted in May 2017. The assessment tool names identified from the phase 1 search were used to search each online bibliographic database to identify the psychometric testing papers for each assessment tool.
For both phases, the searches were saved and the references extracted into a reference management package (Mendeley™) for duplicate removal, followed by abstract and full-text eligibility screening.
Eligibility criteria
Inclusion criteria were (1) an assessment, scale or measure that assesses immediate suicide/self-harm risk; (2) validity/reliability testing of the assessment with CYP (aged 1–18 years); (3) English language publication; (4) full text accessible; (5) peer reviewed journal publication.
Exclusion criteria were (1) validity/reliability tested in adults only; (2) reported only in books/commentaries; (3) assessment is a subscale only; (4) assessment is a structured interview; (5) accessible as abstract only; (6) unpublished/grey literature; (7) the assessment tool is a screener of previous behaviour as opposed to an assessment tool of potential future behaviour.
Screening, data extraction and analysis
For products of search phases 1 and 2, each abstract and full text were screened for eligibility by two reviewers independently. Following identification of eligible full texts, data were charted, collated and summarized using the approach outlined by Arksey and O’Malley (2005). This involved one researcher (GMW) extracting data pertaining to the characteristics (including: focus of assessment, number of items, target population, completion and response formats) and psychometric properties (specifically reliability and validity) of the assessment tools into a table with predefined headings to ensure standardization of included data. Two researchers then agreed suitability and checked for accuracy (TC, JCM). This charted data were then collated and narratively summarized in relation to the risk assessment tool characteristics, and then their psychometric properties.
Findings
Phase 1 searches revealed 22 eligible full-text articles through which 26 risk assessment tools were identified. From these, 20 assessment tools met the eligibility criteria with reasons for exclusion shown in Figure 1. Phase 2 searches revealed two further assessment tools which met the eligibility criteria and were subsequently included in the review. The phase 2 searches also identified 62 papers that tested the reliability and validity of the 22 assessment tools (see Figure 2).

Eligibility flow diagram for phase 1 search.

Eligibility flow diagram for phase 2 search.
Overview of risk assessment tool characteristics
Please see Table 2 for the characteristicas of each seperate risk assessment tool. Most assessment tools assessed immediate risk of suicide only (18/22; 81%), with the remainder (4/22; 18%) incorporating a limited number of self-harm questions (Angelkovska, 2014; Horowitz et al., 2001; Pfeffer, 1986; Reynolds, 1990). The completion format for most of the assessment tools was self-report (13/22; 59%) (Conrad et al., 2009; Cotton and Range, 1996; Cull and Gill, 1982; Horowitz et al., 2001, 2012; Miller et al., 1986; Osman et al., 1998; Pfeffer et al., 2000; Plutchik et al., 1989; Range and Lewis, 1992; Reynolds, 1987a, 1987b; Shaffer et al., 2004), with the remainder being clinician-report (7/22; 32%) (Beck et al., 1974; Larzelere et al., 2004; Orbach et al., 1984, 1991; Pfeffer, 1986; Posner et al., 2011; Reynolds, 1990); parent report (1/22; 4.5%) (Angelkovska, 2014); or included provision for self, parent or clinician report (1/22; 4.5%) (Flamarique et al., 2016).
Assessment tool/scale characteristics.
The assessment tools varied in relation to the number of items/questions (range: 4–146); subscales (range: 0–11); and maximum total score (range: 16–192), with less than half of the assessment tools not reporting total scores (9/22; 41%) (Angelkovska, 2014; Conrad et al., 2009; Cull and Gill, 1982; Horowitz et al., 2001; Larzelere et al., 2004; Plutchik et al., 1989; Range and Lewis, 1992; Reynolds, 1990; Shaffer et al., 2004). The assessment tools varied in response format with a mixture of Likert-type scale only (11/22; 50%) (Beck et al., 1974; Cotton and Range, 1996; Cull and Gill, 1982; Flamarique et al., 2016; Miller et al., 1986; Orbach et al., 1984, 1991; Osman et al., 1998; Range and Lewis, 1992; Reynolds, 1987b, 1987a); binary only (6/22; 27%) (Conrad et al., 2009; Horowitz et al., 2001, 2012; Larzelere et al., 2004; Pfeffer et al., 2000; Plutchik et al., 1989); mixed response (4/22; 27.5%) (Angelkovska, 2014; Pfeffer, 1986; Posner et al., 2011; Reynolds, 1990); and visual analogue scales (1/22; 4.5%) (Shaffer et al., 2004).
All included assessment tools were psychometrically tested in at least one subsequent testing paper. The Columbia Suicide Severity Rating Scale (C-SSRS) (Posner et al., 2011) was the most rigorously studied, with 11 subsequent psychometric testing papers (Atkinson et al., 2014; Emslie et al., 2014, 2015; Findling et al., 2013; Flamarique et al., 2016; Horwitz et al., 2015; Kerr et al., 2014; King et al., 2015; Knafo et al., 2015; Mirkovic et al., 2015; Posner et al., 2011).
Overview of psychometric testing
Psychometric testing across the assessment tools was undertaken on mixed ethnicities and populations aged 5–19 years (Online Supplemental Tables 3 and 4). It was also undertaken across various settings, including inpatient hospitals (14/22; 63.6%) (Cotton and Range, 1996; Eltz et al., 2007; Fennig et al., 2005; Ferrara et al., 2012; Grilo et al., 1999; Gutierrez et al., 2000; Knafo et al., 2015; Koutek et al., 2016; Mcnicholas, 2011; Mieczkowski et al., 1993; Mirkovic et al., 2015; Morano et al., 1993; Ofek et al., 1998; Orbach et al., 1984, 1991; Osman et al., 1994, 2000; Pettit et al., 2009; Pfeffer et al., 2000; Posner et al., 2011; Range and Lewis, 1992; Romanowicz et al., 2013; Schwartz-Stav et al., 2006; Shaunesey et al., 1993; Spirito et al., 1987, 1996); schools (13/22; 59.1%) (Allison et al., 1995; Angelkovska, 2014; Cho et al., 2008; Cotton and Range, 1996; Davis, 1992; Jia et al., 2015; Labelle et al., 2015; Lee, 2011; Mazza, 2000; Mazza and Reynolds, 1999; Miranda et al., 2014; Orbach et al., 1984, 1991; Osman et al., 1994, 1998; Pfeffer et al., 2000; Range and Lewis, 1992; Reynolds, 1990; Reynolds and Mazza, 2001; Shaffer et al., 2004; Wong, 2004); universities (1/22; 4.5%) (Osman et al., 1993); outpatient departments (9/22; 40.9%) (Angelkovska, 2014; Atkinson et al., 2014; Emslie et al., 2014, 2015; Findling et al., 2013; Flamarique et al., 2016; King et al., 1997, 2014; Labelle et al., 2015; Orbach et al., 1984; Range and Lewis, 1992; Rosenberg et al., 2006; Storch et al., 2014); EDs (4/22; 18.2%) (Horowitz et al., 2001, 2012; Horwitz et al., 2015; King et al., 2015; Stanley et al., 2013); non-hospital community settings (3/22; 13.6%) (Angelkovska et al., 2012; Gutierrez, 1999; Kerr et al., 2014; Zhang et al., 2014); detention centres (1/22; 4.5%) (Stathis et al., 2008); foster care settings (1/22; 4.5%) (Larzelere et al., 2004); and residential facility/home settings (3/22; 13.6%) (Badura Brack et al., 2012; Larzelere et al., 2004; Larzelere et al., 1996).
Most psychometric testing papers were undertaken in English-speaking populations in the United States; 45/62; 72.6%). Several were tested in non-English language translation, as follows: Hebrew (4/22; 18.2%) (Fennig et al., 2005; Ofek et al., 1998; Orbach et al., 1991; Schwartz-Stav et al., 2006); Chinese (3/22; 13.6%) (Jia et al., 2015; Wong, 2004; Zhang et al., 2014); Korean (2/22; 9.0%) (Lee, 2011; Cho et al., 2008); French (4/22; 18.2%) (Flamarique et al., 2016; Knafo et al., 2015; Labelle et al., 2015; Mirkovic et al., 2015), German (1/22; 4.5%) (Flamarique et al., 2016); Dutch (1/22; 4.5%) (Flamarique et al., 2016); Italian (2/22; 9.0%) (Ferrara et al., 2012; Flamarique et al., 2016); and Spanish (1/22; 4.5%) (Flamarique et al., 2016).
Face validity
Face validity was tested with varying degrees of rigour for five (22.7%) assessment tools (Flamarique et al., 2016; Larzelere et al., 2004; Mieczkowski et al., 1993; Pfeffer et al., 2000; Range and Lewis, 1992). The Suicide Intent Scale (Mieczkowski et al., 1993) reported face validity without description of method or outcome. The Life Orientation Inventory (LOI) items were reviewed by psychologists and previously suicidal individuals (Range and Lewis, 1992). The Child Suicide Risk Assessment (CSRA) (Larzelere et al., 2004) items were reviewed by children, and it was found that 85% of the items were understood well. The Suicidality Treatment Occurring Paediatrics – Suicidality Assessment Scale (STOP-SAS) (Flamarique et al., 2016) reported child feedback of item comprehension and problems differentiating items, consequently a process of re-wording, sentence shortening and children’s suggested examples were incorporated into the scale. The Child-Adolescent Suicidal Potential Index (Pfeffer et al., 2000) was reviewed by psychiatric professionals leading to revision of instructions, items and response formats. Children’s suggested changes to wording comprehension were also implemented.
Predictive validity
Predictive validity was tested for 19 assessment tools (86.4%). Methods of predictive validity were as follows: first, assessment score correlations with actual events (such as past, present or future suicide/self-harm thoughts or behaviours; 14/22; 63.6%) (Eltz et al., 2007; Fennig et al., 2005; Ferrara et al., 2012; Flamarique et al., 2016; Gutierrez et al., 2000; Horwitz et al., 2015; Kerr et al., 2014; King et al., 2014, 2015; Koutek et al., 2016; Larzelere et al., 2004; Larzelere et al., 1996; Mieczkowski et al., 1993; Miranda et al., 2014; Osman et al., 2000; Pfeffer et al., 2000; Posner et al., 2011; Reynolds, 1990; Zhang et al., 2014); second, sensitivity and specificity (13/22; 59.1%) (Flamarique et al., 2016; Gutierrez et al., 2000; Horowitz et al., 2001, 2012; King et al., 2015; Larzelere et al., 2004; Larzelere et al., 1996; Osman et al., 1994, 2000; Pfeffer et al., 2000; Posner et al., 2011; Shaffer et al., 2004; Stathis et al., 2008; Zhang et al., 2014); and third, the proportion of positive and negative findings that were true positive and true negative results, that is, positive predictive value (PPV) and negative predictive value (NPV; 8/22; 36.4%) (Horowitz et al., 2001, 2012; Koutek et al., 2016; Larzelere et al., 2004; Larzelere et al., 1996; Shaffer et al., 2004; Zhang et al., 2014). The C-SSRS (Posner et al., 2011) had predictive validity most rigorously tested (four psychometric testing papers) (Horwitz et al., 2015; Kerr et al., 2014; King et al., 2015; Posner et al., 2011).
Eight assessment tools (36.4%) consistently predicted suicide/self-harm events (Eltz et al., 2007; Flamarique et al., 2016; Grilo et al., 1999; Gutierrez et al., 2000; Koutek et al., 2016; Larzelere et al., 1996; Miranda et al., 2014; Pfeffer et al., 2000; Reynolds, 1990; Zhang et al., 2014); five (22.7%) predicted suicide/self-harm variably (Fennig et al., 2005; Ferrara et al., 2012; Horwitz et al., 2015; Kerr et al., 2014; King et al., 2014, 2015; Larzelere et al., 2004; Osman et al., 2000; Posner et al., 2011; Zhang et al., 2014); and one (4.5%) did not predict suicide/self-harm (Mieczkowski et al., 1993).
Sensitivity and specificity testing across the studies revealed substantial variability suggesting that although these scales were able to identify those at risk they were also likely to classify some individuals’ as at risk when they were not. The C-SSRS, the Suicide Ideation Questionnaire-Junior and the Suicide Ideation Questionnaire had the highest sensitivity ratings suggesting they are the most likely to be able to identify those at risk of engaging in suicidal or self-harming behaviour.
Total-item PPVs were performed for 6/22 (27.2%) assessment tools (Cull and Gill, 1982; Horowitz et al., 2012; Larzelere et al., 2004; Reynolds, 1987a, 1987b; Shaffer et al., 2004) and NPVs were performed for 3/22 (13.6%) assessment tools (Horowitz et al., 2012; Larzelere et al., 2004; Shaffer et al., 2004). Total-item PPVs were variable across studies (range: 8.8–71.3%) (Horowitz et al., 2012; Larzelere et al., 2004; Larzelere et al., 1996; Shaffer et al., 2004; Zhang et al., 2014) as was total-item NPVs (range: 13.6–99.7%) (Horowitz et al., 2012; Larzelere et al., 2004; Shaffer et al., 2004). The Suicide Probability Scale (SPS) (Cull and Gill, 1982) had the lowest PPV (Larzelere et al., 1996) and the CSRA (Larzelere et al., 2004) had the lowest NPV. The Adolescent Suicide Questionnaire (Horowitz et al., 2012) had the highest PPV and NPV.
Convergent validity
Convergent validity, that is, the degree to which two measures should theoretically correlate, was tested for 19 (86.4%) assessment tools, all of which tested total-item convergent validity. Subscale convergent validity was tested in 10/22 (45.5%) of the assessment tools (Beck et al., 1974; Cotton and Range, 1996; Cull and Gill, 1982; Horowitz et al., 2001; Larzelere et al., 2004; Orbach et al., 1991; Osman et al., 1998; Pfeffer, 1986; Pfeffer et al., 2000; Posner et al., 2011). Correlations between assessments and construct measures were variable. Five assessment tools (22.7%) failed to demonstrate significant correlations between all subscales and construct measures (Cotton and Range, 1996; Ferrara et al., 2012; Gutierrez, 1999; Mieczkowski et al., 1993; Ofek et al., 1998; Orbach et al., 1991; Osman et al., 1994, 2000; Rosenberg et al., 2006; Spirito et al., 1996). Furthermore, four assessment tools failed to correlate total-item scores with some construct measures (18.2%) (Grilo et al., 1999; Pettit et al., 2009; Rosenberg et al., 2006; Storch et al., 2014). The Multi-Attitude Suicide Tendency (MAST) scale had convergent validity most rigorously tested in six psychometric testing papers (Ferrara et al., 2012; Gutierrez, 1999; Orbach et al., 1991; Osman et al., 1994, 2000; Wong, 2004).
Discriminant validity (between groups)
Discriminant validity was tested for 20 (90.1%) assessment tools, of which 16 (72.7%) tested total-item subscale discriminant validity (Angelkovska, 2014; Beck et al., 1974; Conrad et al., 2009; Cotton and Range, 1996; Cull and Gill, 1982; Horowitz et al., 2012; Larzelere et al., 2004; Orbach et al., 1991; Osman et al., 1998; Pfeffer et al., 2000; Plutchik et al., 1989; Posner et al., 2011; Range and Lewis, 1992; Reynolds, 1987a, 1987b; Reynolds, 1990), and 9 (40.9%) tested subscale discriminant validity (Conrad et al., 2009; Cull and Gill, 1982; Miller et al., 1986; Orbach et al., 1984, 1991; Osman et al., 1998; Pfeffer et al., 2000; Posner et al., 2011; Reynolds, 1990). Numerous demographic and characteristic domains were also tested, and some assessment tools were consistently able to discriminate between age (2/22; 9.0%) (Pfeffer et al., 2000; Reynolds, 1990); gender (2/22; 9.0%) (Horwitz et al., 2015; Pfeffer et al., 2000); psychiatric diagnosis (6/22; 27.3%) (Knafo et al., 2015; Mazza, 2000; Range and Lewis, 1992; Schwartz-Stav et al., 2006; Spirito et al., 1987, 1996); suicide and self-harm status (10/22; 45.5%) (Grilo et al., 1999; Gutierrez et al., 2000; Horwitz et al., 2015; King et al., 2015; Larzelere et al., 2004; Lee, 2011; Morano et al., 1993; Osman et al., 1998; Range and Lewis, 1992; Reynolds, 1990; Romanowicz et al., 2013; Shaunesey et al., 1993; Spirito et al., 1987, 1996); physical illness status (2/22; 9.0%) (Angelkovska, 2014; Spirito et al., 1996); accidental injury (1/22; 4.5%) (Rosenberg et al., 2006); and family history of suicide (1/22; 4.5%) (Romanowicz et al., 2013). Some assessment tools consistently failed to discriminate for age (3/22; 13.6%) (Romanowicz et al., 2013; Spirito et al., 1987, 1996; Zhang et al., 2014); gender (3/22; 13.6%) (Allison et al., 1995; Grilo et al., 1999; Spirito et al., 1996); psychiatric diagnosis (2/22; 9.0%) (Grilo et al., 1999; Rosenberg et al., 2006); and history of abuse (1/22; 4.5%) (Grilo et al., 1999). The C-SSRS (Posner et al., 2011) was most rigorously tested for discriminant validity (eight psychometric testing papers) (Atkinson et al., 2014; Emslie et al., 2014, 2015; Findling et al., 2013; Horwitz et al., 2015; King et al., 2015; Knafo et al., 2015; Mirkovic et al., 2015).
Internal consistency
Internal consistency was tested for 17/22 (77.3%) assessment tools (Angelkovska, 2014; Beck et al., 1974; Cull and Gill, 1982; Flamarique et al., 2016; Horowitz et al., 2012; Larzelere et al., 2004; Miller et al., 1986; Orbach et al., 1991; Osman et al., 1998; Pfeffer et al., 2000; Pfeffer, 1986; Plutchik et al., 1989; Posner et al., 2011; Range and Lewis, 1992; Reynolds, 1987a, 1987b; Reynolds, 1990). Total-item internal consistency (range: α = .60–.99) was higher and less variable overall than subscale internal consistency (range: .38–.95). Therefore, when taken as a whole, the scales demonstrate better internal consistency and less fluctuation than when exploring between subscales. The Suicidal Ideation Questionnaire – Junior Version (SIQ-JR) (Reynolds, 1987a) achieved the highest internal consistency (r = .99) (Gutierrez, 1999). The MAST scale (Orbach et al., 1991) was most rigorously tested for internal consistency (five psychometric testing papers) (Gutierrez, 1999; Orbach et al., 1991; Osman et al., 1994, 2000; Wong, 2004).
Inter-rater reliability
Inter-rater reliability was tested for 4/22 (18.2%) assessment tools (Flamarique et al., 2016; Pfeffer, 1986; Posner et al., 2011; Reynolds, 1990). These assessment tools were subjected to total-item inter-rater reliability (2/22; 9.0%) (Flamarique et al., 2016; Reynolds, 1990) and subscale inter-rater reliability (2/22; 9.0%) (Fennig et al., 2005; Kerr et al., 2014; Ofek et al., 1998). Total-item inter-rater analyses revealed variable correlations (range: .47 to .99) as did the subscale inter-rater analyses (range: .40–.97). The Suicide Behaviour Interview had the highest inter-rater reliability (intraclass correlation coefficient = .99) (Reynolds, 1990) and the STOP-SAS had the lowest (r = .47) (Flamarique et al., 2016). The Child-Adolescent Suicide Potential Scale (Pfeffer, 1986) had inter-rater reliability most rigorously tested (two psychometric testing papers) (Fennig et al., 2005; Ofek et al., 1998).
Test-retest reliability
Test-retest reliability was tested for 7/22 (31.8%) assessment tools, demonstrating variable reliability (range: r = .32–.92) (Cull and Gill, 1982; Orbach et al., 1984; Pfeffer, 1986; Pfeffer et al., 2000; Range and Lewis, 1992; Reynolds, 1987a; Shaffer et al., 2004). Three (14%) assessment tools reported subscale test-retest reliability, with less variability (r = .39–.78), suggesting these scales have some ability to remain consistent over time (Ofek et al., 1998; Orbach et al., 1984; Pfeffer et al., 2000). The SPS and LOI (r = .92) (Larzelere et al., 1996; Range and Lewis, 1992) had the highest test-retest reliability and the Columbia Suicide Screen had the lowest (r = .32) (Shaffer et al., 2004). 1/22 (4.5%) The Fairy Tales Test assessment tool (Orbach et al., 1984) failed to achieve test-retest reliability for all questions.
Discussion
The assessment tools included in this review varied in length, response and scoring format, age ranges and degree of psychometric testing. Most assessments were tested across broad age ranges, and may be criticized as lacking developmental sensitivity. The SIQ and the SIQ-JR, however, were exceptions, having undergone age-based revisions/adaptations. Some measures of suicide risk incorporated risk items relating to self-harm. No measure assessed risk of self-harm in isolation. Most assessment tools were tested only in the United States and primarily with inpatients, in contrast to cross-cultural psychometric guidelines (Beaton et al., 2000). Few papers reported language translations and none reported cultural adaptations. Most assessment tools were originally developed in the English language, but few reported psychometric testing in UK populations, suggesting limited applicability in acute paediatric settings in this region. As such, it is understandable that UK guidelines do not promote the use of any one assessment tool to safely manage immediate risk of self-harm or suicide to inform clinical decisions in acute paediatric settings (Horowitz et al., 2014).
Across the included tools, internal consistency and test-retest reliability was generally moderate to good, suggesting that many are constructed of items that are likely to measure the same construct (i.e. risk of suicide) and that the tools are able to produce similar scores when tested over a number of time points, respectively. Test-retest reliability was, however, variable across many of the studies and may be due to suicide/self-harm risk being sensitive to change.
Only four assessments (Flamarique et al., 2016; Pfeffer, 1986; Posner et al., 2011; Reynolds, 1990) investigated inter-rater reliability, thus we have little evidence that the current assessment tools provide consistent results across different raters. Moreover, for those tools for which this testing was undertaken, it appears the majority were tested with raters (i.e. clinician, self and parent) with limited scientific or clinical justification.
Although face validity is considered the weakest validity test (Devon et al., 2007), it is typically considered a prerequisite before performing other validity/reliability tests (Devon et al., 2007). However, few studies tested it, and those that had, lacked strong methodological report, thus reducing the potential usefulness for the tools, and limiting the ability to replicate procedures (Schulz et al., 2010). Moreover, there appears to be limited consideration to the developmental issues within the tools included in this review. As such, considering the substantial differences in cognitive ability, perception and understanding between younger children and those closer to 18 years of age, the current tools appear unable to provide accurate representation of potential risk for CYP across the age range.
This review highlights that the majority of previous assessment tools of immediate risk of self-harm and/or suicide have not been tested to levels recommended by psychometric guidelines (Devon et al., 2007). Moreover, several of these assessment tools demonstrate inconsistent validity and reliability ratings across different testing studies. Additionally, cut-off values denoting high-risk scores are sparsely defined thus limiting their clinical utility as such values can be a useful adjunct to suicide risk assessment in non-psychiatric emergency settings (Cochrane-Brink et al., 2000). Several assessment tools were only tested in one subsequent psychometric testing paper, highlighting limited testing across the majority of the assessment tools. An exception is the C-SSRS which generally performed well across multiple psychometric domains and has been used to monitor medication safety in clinical practice (Atkinson et al., 2014; Emslie et al., 2014, 2015; Findling et al., 2013).
The findings from this scoping review stem from an extensive, transparent search of the literature and provide a summary of the characteristics, and ratings of reliability and validity of assessments tools of immediate self and suicide risk in CYP. This is a scoping review however, and as such it cannot be concluded with certainty that additional risk assessment tools have not been developed and psychometrically tested. Moreover, use of only the terms ‘self-harm’ and ‘deliberate self-harm’ in the search strategy represents a potential limitation, as other additional studies may have identified the behaviour using alternative terminology.
However, the review has identified key gaps and deficits including limited immediate self-harm risk assessment tools for CYP, limited psychometric testing of current assessment tools in specific contexts and regions, and no one assessment tool having been fully validated in an inpatient paediatric setting.
Thus, there are clear implications for clinical practice as currently there appears to be no suicide/self-harm risk assessment tool validated for use in inpatient paediatric settings where there may be an immediate risk of self-harm or suicidal behaviour (i.e. within hours of the triage assessment). As a result, healthcare professionals working within paediatric inpatient settings have to resort to using their own clinical judgement (which may be based on minimal experience and training) or a risk assessment framework/tool that has not been developed for the specific needs of this population/setting. Consequently, this may lead to an inaccurate assessment of risk potentially resulting in either over or under estimation of risk rating, and subsequent inappropriate safety management strategies being utilized.
Considering the increasing prevalence of mental health problems in CYP, and the paucity in existing risk assessments outlined here, future research should be focused on the development of a clinically appropriate, psychometrically tested assessment tool of immediate risk of self-harm and suicide behaviour for CYP. This assessment tool could then be used to support safety management decisions across acute paediatric care settings.
Supplemental Material
Supplemental_Table_3 - Assessment tools of immediate risk of self-harm and suicide in children and young people: A scoping review
Supplemental_Table_3 for Assessment tools of immediate risk of self-harm and suicide in children and young people: A scoping review by Tim Carter, Gemma M Walker, Aimee Aubeeluck and Joseph C Manning in Journal of Child Health Care
Supplemental Material
Supplemental_Table_4 - Assessment tools of immediate risk of self-harm and suicide in children and young people: A scoping review
Supplemental_Table_4 for Assessment tools of immediate risk of self-harm and suicide in children and young people: A scoping review by Tim Carter, Gemma M Walker, Aimee Aubeeluck and Joseph C Manning in Journal of Child Health Care
Footnotes
Authors’ note
TC and JCM designed the scoping review protocol. GMW wrote the initial draft of the manuscript. TC, AA and JCM revised, edited and finalized the manuscript. TC, JCM and GMW conducted the database searches, confirmed included papers and agreed data extraction. All authors agreed the final manuscript for publication. The views represented are the views of the authors alone and do not necessarily represent the views of the Nottinghamshire Clinical Commissioning Group or Nottingham University Hospitals NHS Trust.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Nottinghamshire Clinical Commissioning Group and Nottingham University Hospitals NHS Trust sponsorship.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
