Abstract
Background:
Semmes-Weinstein monofilaments (SWMs) are used for the assessment of tactile sensitivity. The aim of this study is to investigate the intra- and inter-rater reliability of the full 20 SWM kit for the assessment of tactile sensitivity at hand level in a large sample of community-dwelling adults.
Methods:
A consecutive convenience sample of community-dwelling adults was enrolled in an outpatient clinic. By applying stimuli to 5 different locations, according to the dermatomeric levels of the upper limb, a study was designed to test the intra- and inter-rater reliability of the SWM in assessing tactile sensitivity of the hands. Intra-rater reliability was investigated in 2 evaluation sessions; during each session 2 independent examiners performed the test (inter-rater reliability). Test-retest and intra-rater reliability, for each site and for both sides, were estimated with the quadratic weighted kappa index, with 95% confidence intervals. The false-negative responses were also recorded.
Results:
A total of 102 participants completed the study. According to the perceived filament, values reached the .70 threshold in terms of inter-rater and intra-rater reliability, except for the dorsal surface of the hand at the base of the thumb in both hands. The number of false-negative responses was 6.1% of all stimuli administered.
Conclusions:
The full 20 SWM kit is reliable at the hand level in healthy subjects, when used in groups of individuals. These results are based on a large sample, in a high number of sites. When doing the overall assessment, false-positive responses should be considered.
Keywords
Introduction
Semmes-Weinstein monofilaments (SWMs) are used for the assessment of tactile sensitivity. They consist of a nylon filament embedded at right angles to a plastic support handle. The complete SWM kit consists of 20 monofilaments made to be able to apply a perpendicular force to the skin surface to be tested, ranging from a minimum of 0.008 (size 1.65) to 300 (size 6.65) grams of force. Through a logarithmic calculation, these values are converted to a conventional number (size) and color-coded into 5 levels, with different clinical meanings: green = normal sensitivity (size, 1.65-2.83, including 4 filaments); blue = slight decrease in surface sensitivity to touch (size, 3.22-3.61, 2 filaments); purple = decrease in protective sensitivity (size, 3.84-4.31, 4 filaments); red = loss of protective sensitivity (size, 4.56-6.45, 9 filaments); red lines = deep pressure sensation only (size, 6.65, 1 filament).1,2
Previous studies investigating the reliability of the instrument on healthy subjects at the hand level3-10 frequently reported low reliability. It may be due to several issues such as inadequate sample size,4-11 the use of the mini-kit of 5 SWMs instead of the full kit,2,4,8,9 or investigation of intra-rater5,6,10 and inter-rater 7 reliability separately. An additional source of caution in interpreting the results pertains to the statistical approach adopted. Specifically, different studies employed varying methods: in some instances, the kappa statistic was used,4,5 whereas in others, the intraclass correlation coefficient (ICC) was applied,3,6-9,11 occasionally accompanied by the calculation of the standard error of measurement (SEM).9,11 Nevertheless, as the data were collected as ordinal variables, the kappa statistic is not appropriate, being designed for dichotomous data, whereas both ICC and SEM are suitable only for continuous variables.
Finally, sources of heterogeneity that warrant caution in interpreting the results may arise from the unreported training or experience of the raters and from differences in the locations where the sensitivity tests were conducted, as variability in these parameters may lead to differences in reliability. Furthermore, the absence of a standardized assessment protocol may contribute to additional variability across studies.12-14
The aim of this study is to investigate the intra-rater and inter-rater reliability of the full 20 SWM kit for the assessment of tactile sensitivity of 5 different locations at hand level in a large sample of community-dwelling adults.
Methods
This is an observational, cross-sectional, single-center study approved by the local ethic committee (24924_OSS). Written informed consent was received from all subjects before participation. The study reporting followed the guidelines recommended in the GRAAS checklist. 15
Study Design
A consecutive convenience sample of community-dwelling adults was enrolled. By applying stimuli to 5 different locations, a reliability study was designed to test the intra-rater and inter-rater reliability of the SWM in assessing tactile sensitivity of the hands.
Setting and Procedure
Tests were conducted in an outpatient clinic providing rehabilitation of upper limb and hand. The rating was performed by 2 trained physiotherapy undergraduate students (SN, MaS), under supervision. Training and supervision were provided by the lead researcher (MiS) (senior physiotherapist, faculty member with expertise in upper limb and hand rehabilitation). Two 1-hour specific training sessions on the use of SWMs and interpretation of results were held.
As recommended, the assessment procedures were performed in a standardized environment, in terms of temperature, sound insulation, and brightness. 16
For reliability assessment, each subject underwent to 2 evaluation sessions (intra-rater reliability), performed with an interval between 2 and 7 days. During each session, 2 independent examiners performed the test 20 minutes apart (inter-rater reliability), as suggested by previous literature.17,18
A standardized test administration procedure was established. The order in which the 2 examiners performed the evaluation, the order in which the hands were evaluated, and the order in which the tactile stimuli were applied to the 5 areas of the hand were randomized and remained unchanged at the retest. Randomization was performed using an online random sequence generator 19 by a researcher who was not otherwise involved in the evaluation process.
Participants were assessed in a sitting position with closed eyes and blindfolded and unaware of the order in which the stimuli would be presented. The arm was positioned on a table with the hand held in a relaxed open position, supine. Five different locations on both the right and left hands were tested. The 5 areas of the hands tested were: index finger fingertip (IF; medial nerve, digital branches); little finger fingertip (LF; ulnar nerve, digital branches); palmar surface of the hand on the hypothenar eminence (HE; ulnar nerve, superficial branch); palmar surface of the hand on the thenar eminence (TE; medial nerve, palmar branch); and dorsal surface of the hand at the base of the thumb (BT; radial nerve, superficial branch, and dorsal digital branches) (Figure 1). The sites were chosen to represent the different dermatomes of the upper limb.

The 5 areas of the hands tested.
A microfilament size of 2.83 is typically considered the optimal threshold for detecting normal tactile sensitivity. As women have slightly more sensitive hands, a microfilament size of 2.44 was chosen as the starting point for the assessment. 1 If the participant did not perceive the stimulus, we would progress to the next higher-gauge filament (ascendant protocol).
In accordance with the literature, 11 the monofilament was applied gently, perpendicular to the area being tested, with a progressive force such that the nylon filament would bend without sliding over the skin. The application and removal times were both 1.5 seconds, with the total time for the process being about 3 seconds. The participant was invited to say “yes” when they perceived the stimulus. These data were recorded and considered the perceived threshold. For filaments ranging from 2.44 to 4.08, the stimulus was applied a maximum of 3 times. For the others, it was applied only once. 20 The test finished when all sites were tested.
Study Population
We invited individuals who attended the outpatient clinic for study or work reasons, as well as those who accompanied patients, to participate in the research. Individuals attending the clinic for any other reason, including patients, were not eligible for the study. Assessment of eligibility criteria and participant enrolment were performed or supervised by the lead researcher.
The inclusion criteria were age ≥18 years, being community-dwelling, and willingness to participate in the study. Exclusion criteria included: (1) presence of ongoing inflammatory disease; (2) history of musculoskeletal, vascular, traumatic, or central and peripheral nervous system diseases with upper extremity impairment; (3) presence of cognitive deficits that limit understanding and performance of the required task; (4) presence of ulcers, calluses, abrasions, wounds or necrotic tissue at the stimulation sites; (5) any other comorbidity or disability that would preclude participation in the assessment program.
Individuals participated on a voluntary basis; no compensation was offered. All participants were given verbal and written information about the study and gave signed informed consent before enrollment. After enrollment, we collected information on each participant’s gender, age, and level of education. The number and distribution of true-positive and false-negative responses were recorded based on the filament and the area tested.
Sample Size
A minimum sample size of 100 participants was deemed necessary, in line with the recommendations of the Consensus-Based Standards for the Selection of Health Measurement Instruments (COSMIN). 21
Statistical Analysis
We used descriptive statistics to depict characteristics of participants and monofilament test results. Categorical data were reported by frequencies and percentages with 95% confidence interval (CI). Shapiro-Wilk’s test was used to test the normality of distribution. Due to the skewed distributions, continuous variables were reported by median and interquartile range (IQR).
Pearson χ2 test was used to calculate differences between groups for nominal or ordinal variables, except when cell counts were <5, in which case Fisher exact test was used. In case of significant results, post hoc analyses were performed using adjusted standardized z-score residuals. 22 The Mann-Whitney or Kruskal-Wallis tests were used to analyze interval variables. If significant results were obtained from the Kruskal-Wallis test, a post hoc analysis was conducted using the Dunn test with Bonferroni correction. We performed a sign test to determine whether the proportions of false-negative responses observed by the 2 raters or in the 2 hands were similar.
Test-retest and intra-rater reliability were estimated at the filament level using the quadratic weighted kappa index and 95% CI for each site and both hands separately. Then, the data observed at the filament level were color-coded and the reliability analysis was repeated.
Reliability was interpreted as following: excellent reliability (reliability such that the procedure can be used to evaluate individuals) = weighted kappa coefficient > .90; good reliability (reliability such that the procedure can be used to evaluate groups of individuals) = weighted kappa coefficient > .70. 23
Statistical analyses were conducted with Jamovi (The jamovi project, 2025; R Core Team, 2025) and StatsDirect (Buchan I. StatsDirect statistical software; http://www.statsdirect.com. England: StatsDirect Ltd 2024) software. The significance level was set at P < .050.
Results
One hundred and two participants were enrolled in the study and completed it with no missing data. They were men and women; no additional gender information was provided. Sample characteristics are reported in Table 1. The median age was 46.5 years; half of the participants were men and 10 left-handed. The sample consisted of healthcare professionals, undergraduate physiotherapy students, and accompanying persons. The groups differed in age and levels of education attained (Table 1).
Characteristics of the Participants.
Data are median (IQR; minimum-maximum), or absolute frequency (relative frequency; 95% confidence interval).
Kruskal-Wallis test, χ2(2) = 49.0.
Post hoc analysis: undergraduate physiotherapy students younger than health care professionals (z = 3.53, P = .001) and accompanying persons (z = 6.98, P < .001).
χ2 test; χ²(2) = 1.87.
Fisher exact test; χ²(2) = 0.60.
Fisher exact test; χ²(8) = 53.2.
Post hoc analysis: a greater proportion of health care professionals had tertiary education (adjusted z = 6.15), a greater proportion of undergraduate physiotherapy students had upper secondary education (adjusted z = 3.85), a greater proportion of accompanying persons had primary or lower secondary education (adjusted z = 2.94 and 2.47, respectively).
P < .050.
A total of 4345 stimuli were administered (test: N = 2,202; retest: N = 2,143), that is, all stimuli with a value assigned (10²*4 = 4080) plus false-negative responses. The perceived size of the filaments ranged from 2.44 to 4.08. According to the color codes, the observed tactile sensitivity fell within the green, blue, or purple grades. The relative distribution of true-positive responses across filaments and areas is presented in Table 2.
Relative Distribution of True-Positive Responses Across Filaments and Areas.
Data are absolute frequency (relative frequency; 95% confidence interval).
Abbreviations: IF, index finger fingertip; LF, little finger fingertip; HE, palmar surface of the hand on the hypothenar eminence; TE, palmar surface of the hand on the thenar eminence; BT, dorsal surface of the hand at the base of the thumb.
The number of false-negative responses was 265, ranging from 0 to 34 per participant (median, 0; IQR, 2). These responses accounted for 6.1% (95% CI = [5.4, 6.9]) of the stimuli that were administered. Fifty-eight participants had only true-positive responses (56.9%, 95%CI = [47.2, 66.1]). Including the 11, 10, and 4 participants who recorded 1, 2, or 3 false-negative responses, respectively, brings the total to 81.4% of the sample (95% CI = [72.7, 87.7]).
The number of false-negative responses per area ranged from 13 to 42 (median, 27; IQR, 17) (Figure 2, Table 3).

Percentage distribution of false-positive responses by area.
Cumulative False-Negative Responses in Each Assessment Session.
Data are absolute frequencies.
Abbreviations: IF, middle fingertip area of the index finger; LF, middle fingertip area of the little finger; HE, palmar surface of the hand on the hypothenar eminence; TE, palmar surface of the hand on the thenar eminence; BT, dorsal surface of the hand at the base of the thumb.
There were more false-negative responses in the first rating round (7.4 vs 4.8%, χ2(1) = 12.3, P < .001). One rater registered more false-positive responses than the other one (N = 162, 61.1%, 95% CI = [55.0, 67.0]; sign test, P < .001) (Table 3). A higher number of false-positive responses were recorded for the right hand than for the left (N = 154, 58.1%, 95% CI = [51.9, 64.1]; sign test, P < .001) (Figure 2, Table 3).
No differences were observed in the number of false-negative responses between men and women, left- and right-handed participants, or health care professionals, students and accompanying persons.
Reliability
The results of intra-rater and inter-rater reliability analyses are reported in Tables 4 and 5, respectively. According to the analysis, when the filaments were considered, the absolute values of the quadratic weighted kappa and the lower limit of the 95% CI reached the threshold of 0.70 for all areas tested except the base of the thumb (BT) (Tables 4 and 5). When color codes were considered, the absolute values of quadratic weighted kappa and the lower limit of the 95% CI reached the 0.70 threshold in intra-rater reliability testing for the right-hand IF area, as well as in inter-rater reliability testing for the IF areas of both hands (Tables 4 and 5).
Intra-rater Reliability.
Data are weighted kappa (95% confidence interval).
Abbreviations: IF, middle fingertip area of the index finger; LF, middle fingertip area of the little finger; HE, palmar surface of the hand on the hypothenar eminence; TE, palmar surface of the hand on the thenar eminence; BT, dorsal surface of the hand at the base of the thumb.
Inter-rater Reliability.
Data are weighted kappa (95% confidence interval).
Abbreviations: IF, middle fingertip area of the index finger; LF, middle fingertip area of the little finger; HE, palmar surface of the hand on the hypothenar eminence; TE, palmar surface of the hand on the thenar eminence; BT, dorsal surface of the hand at the base of the thumb.
Discussion
The study’s key finding is that SWMs are a reliable tool for assessing tactile sensitivity in groups of individuals. This includes the fingertips of the index and little fingers areas, as well as the thenar and hypothenar eminences.
The reliability of the measurement was lower at the level of the dorsal surface of the hand at the base of the thumb, innervated by the radial nerve. Similar findings were reported by Mamino et al 8 and could be explained by the area’s specific characteristics. First, the skin of the radial site is nonglabrous, unlike the skin of the other tested sites. Nonglabrous skin has a slightly different set of receptors than glabrous skin. For example, there are no Meissner corpuscles; these are functionally replaced by rapidly adapting receptors associated with hair follicles. Nonglabrous skin also has slowly adapting mechanoreceptors (Merkel cells). However, the presence of rapidly adapting hair follicle mechanoreceptors may have affected reliability at the dorsal level of the hand, thus reducing reliability. 17 Furthermore, the base of the thumb is the only tested area located in the dorsal part of the hand. The standard assessment position was not optimal for evaluation of BT area; the radial dermatome would have been evaluated with the forearm in pronation. However, the order in which the sites were examined was randomized and concealed from the participants, and changing the hand position from the standard would have allowed the area of stimulus delivery to be anticipated. It is worth pointing out that the rate of false-negative responses in BT areas is comparable to that in other areas.
Taking color codes into account results in lower reliability; satisfactory levels were observed only in the evaluation of the index fingertips (right hand, intra-rater and inter-rater; left hand, inter-rater).
Reliability indexes reported in previous studies show a large variability ranging from .150 5 to .945 9 for intra-rater reliability and from .400 4 to .950 9 for inter-rater reliability.
Our findings show higher intra-rater reliability when compared with those reported by Poole et al, 10 despite the 2 studies are comparable in terms of protocol and statistical analysis. However, different sample size (n = 30) and only partially overlapping assessment sites (all fingers) justify differences in results.
Other comparisons are not possible, as the high variability in terms of sample size,4-11 the used protocol,3-5 and the use of the mini-kit of 5 SWMs instead of the full kit.3,4,8,9
Complete kit showed higher reliability when compared with previous studies, where the mini-kit of 5 SWMs was used.3,4,8 This result was unexpected as a larger range of sizes should result in higher variability, and then it could be also due to other methodological issues.
False-negative responses represent 6.1% of the total. As expected, this percentage is lower than that reported for people with diabetes, both with and without ulcerations. 24 The frequency of this phenomenon varies depending on the laterality, the time of assessment, the rater, and the area assessed. Although this issue is underexplored in previous reliability studies because it relates to validity, it should be taken into account when interpreting the results of reliability.
The SEM and the minimal detectable change were not calculated because the intervals between different force gram levels are not constant, and therefore, the 2 indices do not provide applicable information (ie, the expected random change in scores when no real change has occurred, and the minimum amount of change that must be observed for it to be considered a real change). 25
Limits
The main limitation is the characteristics of the raters, who were undergraduate physiotherapy students, and the results refer only to the raters of interest. However, the raters were trained and supervised by an experienced physiotherapist and the higher reliability compared with previous studies suggests that the characteristics of our raters may not actually be a limitation. To achieve consistent clinical assessments, a structured approach combining standardized test administration procedures and training is required. Having both raters trained by the same instructor likely improved the consistency of their ratings, and we recognize that including raters with different training backgrounds would make the study’s findings more generalizable.
Furthermore, including a consecutive convenience sample may not guarantee the representativeness and generalizability of the results. However, the large number of participants, wide age range, and lack of differences between men and women, left- and right-handed individuals, health care professionals, students, and accompanying persons suggest that the results may be representative of the reference population.
In conclusion, the main finding of this study is that the full 20 SWM kit is reliable at the hand level in healthy subjects, when used in groups of individuals. These results are based on a large sample, in a high number of sites and using a rapid and widely used protocol. When doing the overall assessment, false-negative responses should be considered.
Footnotes
Ethical Considerations
This study was approved by the local ethic committee (Comitato Etico Regione Toscana – Area Vasta Centro (CEAVC) number 24924_OSS). All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008.
Consent to Participate
Informed consent was obtained from all individual participants included in the study.
Author Contributions
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Statement of Human and Animal Rights
This article does not contain any studies with human or animal subjects.
