Abstract
BACKGROUND:
High knee flexion postures are often adopted in occupational settings and may lead to increased risk of knee osteoarthritis. Pattern recognition algorithms using wireless electromyographic (EMG) signals may be capable of detecting and quantifying occupational exposures throughout a working day.
OBJECTIVE:
To develop a k-Nearest Neighbor (kNN) algorithm for the classification of eight high knee flexion activities frequently observed in childcare.
METHODS:
EMG signals from eight lower limb muscles were recorded for 30 participants, signals were decomposed into time- and frequency-domain features, and used to develop a kNN classification algorithm. Features were reduced to a combination of ten time-domain features from 8 muscles using neighborhood component analysis, in order to most effectively identify the postures of interest.
RESULTS:
The final classifier was capable of accurately identifying 80.1%of high knee flexion postures based on novel data from participants included in the training dataset, yet only achieved 18.4%accuracy when predicting postures based on novel subject data.
CONCLUSIONS:
EMG based classification of high flexion postures may be possible within occupational settings when the model is first trained on sample data from a given individual. The developed algorithm may provide quantitative measures leading to a greater understanding of occupation specific postural requirements.
Introduction
Repetitive cyclic or prolonged joint loading are known factors in the progressive degradation of knee joint tissue and increased incidences of knee osteoarthritis (OA) [1–4]. High knee flexion postures (such as kneeling, squatting, bending, or lifting), where the flexion angle exceeds 120° are uncommonly adopted for activities of daily living in Western cultures. In occupational settings however, workers are often required to perform repetitive high knee flexion motions for a significant portion of their working hours [5–7]. Childcare (including day care workers and early childhood educators) is one such industry involving the frequent adoption of high flexion postures yet it has been largely overlooked in recent research [3, 8]. Therefore, the postural demands of this occupation must be studied to assess the OA-related risks due to high knee flexion.
Childcare workers are required to perform a variety of physically demanding tasks throughout a typical day. These tasks have been reported to lead to shoulder, back, and lower limb discomfort and have been linked to an increased risk of musculoskeletal injuries [8, 9]. While recent interest has focused on lifting postures leading to back and shoulder pain, no study to our knowledge has examined the effects of these exposures on the knee despite 37.2%of 85 childcare workers reporting knee pain in a study by Horng et al. [10]. Labaj et al. [11, 12] questioned 32 childcare workers and found over 40%reported knee pain, with severity equal or greater to that of other body segments. In fact, the Occupational Health Clinics for Ontario Workers (OHCOW) has recommended a “Tripod Lift” for lifting infants from the floor, performed in a single leg kneeling posture in order to protect the back; however, the effects of this lift on the knee are unknown [13]. There is therefore a need to quantify childcare worker exposure to occupational high knee flexion postures.
Several challenges exist in studying the occupational demands of childcare. Firstly, there is a need to protect the privacy of the children under the care of the childcare worker, limiting the types of measurement tools which can be used. Additionally, one must consider the active nature of this occupation as well as the potential for occlusion of recording instrumentation due to children, toys, or furniture within laboratory and childcare settings. A cameraless, wireless method of recording high knee flexion postures would enable continuous exposure measurement, without the concerns for privacy or loss of data that would accompany other motion capture technologies.
Advances in pattern recognition and machine learning algorithms have led to increased interest in human movement recognition using wearable technologies [14–17]. Pattern recognition has historically been performed using repeatable features derived from the kinematics trajectories of human motion [18–21]. For this reason, inertial sensors have become a popular tool for the classification of human movement for a broad range of applications given their ability to directly measure linear and angular displacements of the segments on which they are placed [22, 23]. However, these sensors are incapable of providing any measure of the underlying muscular activations responsible for driving the observed motions. The use of electromyography (EMG) provides a means for the measurement of electrical signals associated with muscle contractions [24]. However, EMG signals are highly variable both between and within individuals. EMG measures are affected by a variety of factors including signal amplitude fluctuations due to sensor placement, tissue conductivity, and the need for normalization [25]. These limitations have resulted in a decreased interest into the use of EMG signals for human gesture recognition applications, yet recent work has sought to apply these signals directly as an alternative source for human gesture recognition [20, 27].
Research efforts have recently been focused on the application of EMG signals as a source for human movement classification for myoelectric control of upper- and lower-body prostheses [15, 28–30]. Based on the promising results of these works, it may be possible to apply similar techniques to those used in myoelectric control of lower-leg prostheses to the classification of high knee flexion activities in occupational settings. Given that the electrical impulses which are responsible for controlling the lower extremity joints pass through a variety of muscles in the lower limbs, it may be possible to detect knee and ankle positions based on shank and thigh EMG, with the added benefit of gaining more information about the muscular effort required. Using EMG placed on the proximal segments rather than on the foot, this approach may prove useful in settings where footwear changes may be made (as has been observed in childcare settings) or where protective footwear is required (as in many factory settings). The objective of this study was therefore to develop an EMG based classification algorithm capable of identifying various high knee flexion activities similar to those assumed by childcare workers. Further, the study sought to assess whether the developed model could be applied to the classification of high knee flexion postures in individuals on which it was not trained.
Methods
Participants
A sample of thirty participants (sex: fourteen males/sixteen females, age: 23.0 ± 3.17 years, hei-ght: 1.72 ± 0.1 m, mass: 74.6 ± 16.6 kg) were rec-ruited from the university student population to participate in this study. Individuals with a history of knee joint disease, current knee pain or leg injury, or who were incapable of kneeling or squatting without difficulty were excluded from participating. This study was approved by the University of Waterloo Research Ethics Board (IRB#31497) and informed consent was received from each subject prior to testing.
Experimental protocol and instrumentation
Surface EMG data were recorded during all trials from eight muscles of the participants’ right leg using a wireless EMG system (Wave Plus, Cometa srl, Milan, IT; input impedance = 20 M ohms, common mode rejection ration = 120 dB at 60 Hz). Each wireless EMG transmitter also contained a tri-axial accelerometer used for trial segmentation (detailed explanations of trial segmentation are provided in section 2.3 below). The muscles examined were: vastus medialis (VM) and lateralis (VL), rectus femoris (RF), tibialis anterior (TA), medial and lateral gastrocnemius (MG and LG), semitendinosus (ST), and biceps femoris (BF). These muscles were selected as they have been shown to be important contributors to the achievement of deep knee flexion postures [31]. Surface EMG sites were shaved, abraded, and cleaned with alcohol prior to the application of bipolar Ag/AgCl electrodes (Ambu Blue Sensor N, Denmark), which were placed according to SENIAM guidelines [32]. Raw EMG signals were bandpass filtered via the hardware (10–500 Hz), amplified, and sampled at 2048 Hz.
The eight knee flexion postures replicated in this protocol were previously defined through the analysis of video observations of 18 childcare workers from five daycares [11, 12]. Identified postures included dorsiflexed kneeling, plantarflexed kneeling, heels-up squatting, flatfoot squatting, floor height lifting, knee height lifting (e.g. to lift a child who is standing), child chair sitting, and cross-legged sitting. These postures were simulated in our laboratory and performed as seen in Fig. 1. Participants completed three trials of each high knee flexion posture in a block randomized manner until all eight movements were recorded.

Participant in end range of motion postures with EMG instrumentation on right leg performing: DK, dorsiflexed kneeling; PK, plantarflexed kneeling; HS, heels-up squatting; FS, flatfoot squatting; FL, floor height lifting; KL, knee height lifting; CCS, child chair sitting; and CLS, cross-legged sitting.
Each motion trial consisted of the participant taking a step forward, leading with their left foot, before transitioning into the pose from standing (descent phase), holding the fully flexed pose for 8 seconds, and transitioning back to standing (ascent phase). Each posture was presented using a photograph and verbal cues rather than demonstrations in order to avoid influencing participants’ motion while encouraging them to perform each posture as naturally as possible. When performing the floor height lift, participants were instructed to grasp a wooden block placed on the ground, moving as they typically would to pick up an object from the ground. For the knee height lifting task, participants were instructed to grasp the back of a child’s chair (0.56 m above the ground) as they would grasp a standing child in order to lift them. For the lifting postures, the participants were not specifically instructed to flex the knees. Although the photos presented to the participant showed some knee flexion, the range of knee flexion in the knee height lift varied widely. Kneeling transitions were performed asymmetrically through a lunging posture such that the left knee made contact with the ground before the right. Participants were invited to practice each of the eight postures until comfortable. Upon completion of the high flexion postures, participants completed three walking trials over a 10 m distance at a self-selected pace which would be used for reference voluntary contraction (RVC) normalization.
All data analysis was performed using Matlab 9.1 (The Mathworks, Release R2016a, Natick, MA). Prior to data processing and feature extraction, all trials were segmented in order to contain only motion data for each posture. Each trial’s raw EMG data were therefore segmented from initiation of descent through the completion of ascent using the vertical acceleration component from the accelerometer within the TA transmitter. The start of the trial was identified as the point when the vertical component of the acceleration signal exceeded a threshold value (set as the mean of the first 1000 frames of data + five times the standard deviation of those same 1000 frames) while the end of the trial was identified visually as the instance when the signal settled consistently below the same threshold.
There are multiple forms of machine learning algorithms which have been used to classify human movement data yet, to date, no single combination of features and classification algorithm has proved superior for all applications. Kim et al. [33] demonstrated that, of three potential classifiers, a k-Nearest Neighbor (kNN) classifier performed best when classifying directions of wrist movement based on EMG features. Therefore, a kNN classifier was chosen for the classification of high knee flexion postures in this study.
In order to build the kNN classifier, an overlapping windowing technique was employed such that features were calculated from consecutive windows of 142 ms (285 samples), with an overlap of 71 ms between each. Previous studies have utilized a variety of both time-domain, frequency-domain, and time-frequency-domain features for the classification of movements using EMG signals [15, 34–37]. Therefore, a combination of 9 features (5 frequency- and 4 time-domain) were extracted from each EMG signal window for initial classification.
Frequency features were derived from each window of raw EMG data using a fast Fourier Transform (FFT) and the magnitudes of the first five frequency features from this spectral analysis were retained. In order to extract the time-domain features, EMG data were full-wave rectified and linear enveloped using a second order forward backward low-pass Butterworth filter with a cut-off frequency of 6 Hz. EMG signals were then normalized to a percentage of their respective peak linear-enveloped RVC activation. Once processed, each window of normalized EMG data was represented using four time-domain features: the maximum, minimum, mean, and standard deviation. The total number of features initially extracted was therefore 72 (8 muscles×9 features/muscle) for each data window.
K-Nearest Neighbor classifier training and testing
Features extracted from a subset of 25 randomly selected participants were combined to create the algorithm building dataset (for training and testing) while the remaining five participants were held back to create the validation data set for our kNN cross-validated classifier. The developed kNN classifier was tasked with identifying the category of an unknown feature according to the nearest k-labeled training samples, which are the nearest neighbors to the test feature based on Euclidian distances [38]. A value of three was determined experimentally and assigned to k so as to minimize the cross-validation loss such that the classification category of any unknown feature would be assigned that of the majority of its three nearest training features within the multidimensional feature space.
For our eight-activity classification problem, each posture was identified as a different class, α, such that α i = 0, 1, 2 7. The algorithm building dataset was therefore composed of a pair of matrices (xi, zi), where x was an (n×72) training data matrix containing n windows of 72 features and z was the corresponding (n×1) class indicator matrix.
The algorithm building dataset was divided so that 80%of the feature sets for each motion were assigned as training samples while the remaining 20%of feature sets were withheld as test samples. A five-fold cross-validation was then used on the training set to develop the nearest neighbor classifier. Once the initial classifier was built, neighborhood component analysis was performed in order to reduce the number of features required for classification and reduce the risk of over-fitting to the training data [39]. With the reduced feature set, the nearest neighbor classifier was retrained and subsequently evaluated using the test samples (the 20%of features held-back from the algorithm building dataset) in order to determine the model’s performance when identifying unknown feature categories for data from subjects on which the classifier was trained.
Finally, reduced feature sets were extracted from the validation dataset (obtained from the five remaining participants’ data) in order to test the classification performance when provided with novel data.
Classification performance of all models was quantified overall as the ratio of correct classifications to the total number of classifications. Additionally, sensitivity and specificity were calculated for each activity to determine whether any postures suffered from greater classification errors than others within the training, and testing classification models. Sensitivity, a measure of true positive classifications, was calculated as the proportion of feature sets of a certain activity correctly identified, as in (1).
where TP, or true positives, are the number of feature sets within a given activity correctly identified as such, and P represents the total number of feature sets for the given activity [40]. Specificity, a measure of true negative classifications, was calculated as the proportion of feature sets not belonging to a specific activity identified as other activities, as in (2).
where TN, or true negatives, are the number of feature sets not belonging to a given activity correctly classified as belonging to the other activities, and N represents the total number of feature sets belonging to all other activities [40].
Five-fold cross-validation was used to develop the initial k-Nearest Neighbor classifier using 72 time- and frequency-domain features from the EMG data of 25 subjects. Initial classification accuracy after cross-validated training was found to be 75.0%. Sensitivity and specificity values for all classifiers have been presented in Tables 1 2 respectively. From this initial model, relatively consistent specificities were observed between postures (mean specificity was found to be 75.0%±0.4%) suggesting that no one posture was prone to higher misclassification rates than any other. The highest levels of misclassifications for this training data were observed in plantarflexed kneeling and heels-up squatting.
Confusion matrices for the classification of eight postures of high knee flexion based on training, feature reduced, testing, and novel subject EMG data using a kNN classifier
Confusion matrices for the classification of eight postures of high knee flexion based on training, feature reduced, testing, and novel subject EMG data using a kNN classifier
All values are expressed as percentages relative to total number of classifications for each high knee flexion posture. Bolded cells denote correct classifications. The eight high flexion postures analyzed were: dorsiflexed kneeling (DK), plantarflexed kneeling (PK), heels-up squatting (HS), flatfoot squatting (FS), floor height lifting (FL), knee height lifting (KL), child chair sitting (CCS), and cross-legged sitting (CLS) *Output class - Class predicted by the classifier. **Target class – Correct class of features.
Specificity for each classifier in the classification of eight postures of high knee flexion
The eight high flexion postures analyzed were: dorsiflexed kneeling (DK), plantarflexed kneeling (PK), heels-up squatting (HS), flatfoot squatting (FS), floor height lifting (FL), knee height lifting (KL), child chair sitting (CS), and cross-legged sitting (CLS).
Based on the neighborhood component analysis, 10 features (all in the time domain) emerged as having the highest influence on classification results. These features were peak normalized EMG values from four muscles (VM, RF, MG, and LG) as well as minimum normalized EMG values from six muscles (RF, VL, TA, LG, ST, and BF). Once the feature set had been reduced from 72 to 10 features, the cross-validated classification accuracy increased slightly, to 78.9%, suggesting that a greater number of features contributed to some confusion within the classifier.
The developed classifier’s performance was next evaluated using the test dataset, which constituted 20%of the feature sets from subjects included in the algorithm building dataset. Classification accuracy for this data was found to be 80.1%suggesting that the classifier was capable of identifying novel data from trials performed by individuals for which it was trained. Sensitivity and specificity values for each activity revealed that the greatest classification accuracy occurred in stooping while most common misclassifications occurred between dorsiflexed and plantarflexed kneeling, as well as between child chair sitting and cross-legged sitting, however consistent classification specificities were observed across all groups (80.0%±0.01%). Misclassifications can most likely be attributed to the similarity in muscular activation between the permutations of kneeling, squatting, and lifting postures studied, as well as the variability in muscle recruitment strategies between individuals studied.
Finally, the trained and tested classification model was used to predict the high flexion posture based on the 10 time-domain features (peak normalized EMG of VM, RF, MG, and LG and minimum normalized EMG of RF, VL, TA, LG, ST, and BF) extracted from the surface EMG data of 5 novel participants (withheld from the initial training dataset). Classification accuracy of novel data was found to be 18.4%. These results suggest that while the developed classification model was capable of identifying high flexion postures in novel data from subjects included in its training data, it is incapable of accurately identifying these same postures in data from novel subjects.
In this study, a novel approach for the classification of high knee flexion postures observed in childcare workers was proposed. A combination of 72 time- and frequency- based features were extracted from EMG data of eight lower limb muscles in order to develop a fivefold cross-validated kNN classifier for the classification of postures including dorsiflexed kneeling, plantarflexed kneeling, heels-up squatting, flatfoot squatting, floor height lifting, knee height lifting, child chair sitting, and cross-legged sitting. These features were then reduced using neighborhood component analysis and a final classifier was developed based on ten time-domain features that included the peak normalized EMG signals from VM, RF, MG, and LG as well as the minimum normalized EMG signals from RF, VL, TA, LG, ST, and BF. Experimental results showed that this model was capable of identifying high flexion activities in novel data from subjects on which the model was trained with 80.1%accuracy.
The EMG classification approach used in this paper has previously been adopted for the development of myoelectric signals for the control of upper body [29, 33], and lower-body [15, 30] prostheses as well as the classification of locomotion modes [41]. To our knowledge, this paper constitutes the first study to classify activities involving high knee flexion based on lower limb EMG. The ability to classify occupational postures without the need for a researcher present is essential for the assessment of high knee flexion postures within the workplace. Classified data may serve to provide quantitative measures of the frequency and duration of such high flexion postures, leading to a greater understanding of occupational requirements. These findings may ultimately drive the development of necessary workplace guidelines to minimize the risk of knee joint degenerative diseases and knee OA. The developed model was not capable of classifying movements for participants who were not included in the training data set but was capable of classifying novel data from individuals included in the training protocol. Thus, it may be possible to apply the same classification model to study occupational high knee flexion postures throughout multiple childcare facilities or within various occupational settings given the opportunity to collect a brief set of supervised training data from each individual.
The developed classification model’s recognition rate was above chance (12.5%) for the classification of novel data from individuals included in its training dataset as well as, albeit only slightly, for the classification of novel subject data. These findings suggest that EMG data alone may provide some (although relatively little) power to distinguish between high knee flexion postures, yet may prove to be a valuable addition to more traditional inertial-based classification approaches. The poor classification results on novel subjects may be attributed to high inter-subject variability in muscular contraction signals and suggests that a generic model for high flexion postures based on EMG signals alone is not possible. However, classification is possible using subject-specific training data and classification algorithms, and it is likely that subject-specific classification models would be more successful than using the group-wide model trained on data from 25 participants. Confusion in classification between kinematically similar postures may also have played a role in the rate of misclassifications, given that muscle activation was, in some cases, quite similar between many of the postures investigated in this study.
During the course of data collection, three raters were responsible for muscle palpation and the placement of EMG electrodes onto the participants. Multiple raters may have resulted in slight inconsistencies between electrode placements over the muscles of interest however the raters were trained in the same palpation procedures and it is believed that slight variations in electrode placements will not have affected the sensitivity of the developed classification model.
While the proposed algorithm was found to be capable of classifying high knee flexion postures in the test datasets from participants on which the model was trained, there are several limitations of this classifier which should be considered. The current classifier was trained using simulated occupational data, in a controlled laboratory environment. While participants were encouraged to complete each posture as they would outside of the laboratory setting, descent and ascent phases were controlled which most likely would not occur within a childcare setting. It would therefore be important to provide the classifier with additional training data of more variable descent and ascent strategies in order to minimize the risk for misclassification.
Similarly, postures were each recorded in separate trials so that no window of data contained a transition from one movement type to another. While the number of windows for each posture to be included in the training and testing data sets were chosen to capture the entirety of the descent and ascent phases of motion, it is possible that in an occupational setting an individual might transition from kneeling to sitting without returning to standing between the two postures. The developed algorithm would most likely not be capable of classifying these transitional movements. Future work could explore the possibility of extending the classification to distinguish between descent, static hold, and ascent portions of each posture which might enable the detection of more transitional movements. However, additional measurements would need to be taken in order to successfully classify a transitional window.
Additionally, the current study did not consider the potential for changes in EMG signal magnitudes and frequency spectrums with fatigue [42, 43]. Future work should consider the measurement of the current high knee flexion activities following a muscle fatiguing protocol to determine how the classification accuracy would be affected. It is currently unknown whether the high flexion postures employed in occupational childcare would elicit muscle fatigue throughout a working day.
In order to train a classification algorithm without bias towards any class of posture, an equal number of features should be provided per class. In the current study, certain EMG signals experienced dropouts in full flexion due to occlusion of the wireless transmitter signals by the body. In the cases where missing EMG data was observed, no feature sets were provided for these windows of partial data. Therefore, the feature sets for each class were not equal within the training dataset where the number of windows per class was 4798±807 (mean±standard deviation). Missing data was most commonly observed in plantarflexed kneeling, with only 3055 feature sets included compared to an average of 5046±427 between the remaining 7 classes. Classification accuracy of plantarflexed kneeling in the novel data was found to be the lowest of all activities (74.5%), therefore the decreased number of feature sets within the training data set may have played a role in the classifier’s difficulty in successfully identifying plantarflexed kneeling. Future work should seek to mitigate this potential bias by including an equal number of feature sets from each posture.
Finally, in this paper a supervised learning algorithm was utilized which required that manual labeling of the training data be completed. Given that the training data were collected in a laboratory setting, this labelling was performed at the time of collection, however additional steps would need to be taken if measurements were taken for more training data outside of the laboratory setting. A possible solution for this issue would be to have a labeler present during the supervised portion of collection to take note of the occurrence of activities of high knee flexion
Conclusion
Classification of eight high knee flexion postures (dorsiflexed kneeling, plantarflexed kneeling, heels-up squatting, flatfoot squatting, lifting an object from the ground, knee height lifting, child chair sitting, and cross-legged sitting) was performed using a k-Nearest Neighbor classification algorithm. A combination of ten time-domain features (peak magnitude of the VM, RF, MG, and LG as well as the minimum magnitude of the RF, VL, TA, LG, ST, and BF) was found to be most effective when classifying these knee flexion postures frequently adopted by childcare workers. A classification accuracy of 80.1%was obtained when classifying novel data from participants on which the model was previously trained. Classification accuracy when classifying novel subject data was 18.4%, suggesting that in order to identify high knee flexion postures based on EMG features, the model must first be trained on sample data from a given individual. While the current classifier has been trained using simulated occupational data, future work should focus on testing its performance with muscle activation data measured directly in occupational settings.
Conflict of interest
None to report.
Footnotes
Acknowledgments
Funding for this study was provided by the National Sciences and Engineering Research Council (NSERC) under Grant number 418647 as well as by the Queen Elizabeth II Graduate Scholarship in Science and Technology (QEII-GSST).
The authors would like to acknowledge G. Frew and C. Kapteyn for their contributions to the data collection.
