Abstract
Various sensors are embedded in different places of smart environments to monitor and collect data about status of environments. The goal of a smart environment is to improve quality of life by enhancing the efficiency of services, providing residents’ needs using different technologies, and mining the captured data in the environment. Mining such data for extracting valuable knowledge requires critical activities and situations in smart environments to be effectively detected. Activity recognition is of great interest for researchers in context-awareness computing. However, correlations between activities and their frequent patterns have never been addressed by the traditional activity recognition techniques. Recently, some researchers have considered the frequent pattern extraction for activity detection in smart environments. Despite that, sequences and time durations between activities and sensors’ activation have not been scrutinized for activity recognition. In this paper, an extension of frequent pattern-based algorithms is proposed for activity recognition. This novel algorithm considers sequence of activated sensors as well as time durations between them in order to extract the frequent sequential patterns for activity/situation detection in smart environments. The experiment results using the publicly-available datasets demonstrated that the suggested method is efficient and can significantly improve accuracy of activity recognition in smart environments, considering the sequence matching-based conflict resolution and the order of the activated sensors.
Introduction
The network of physical objects, which are embedded with sensors, software, and network connectivity that empowers these objects to collect and exchange data, is called the Internet of Things (IoT) [1]. Internet of Things permits the objects to be sensed and controlled remotely, and therefore, creates opportunities for a more direct integration of the physical world into the computer-based systems and results in an improved efficiency, accuracy, and economic benefit [2]. When the IoT is augmented with sensors and actuators, the technology becomes an instance of a more general class of the cyber-physical systems, which also encompasses critical technologies, such as smart grids, smart homes, intelligent transportation systems, and smart cities.
Various sensors are embedded in different places of smart cities, such as parking lots, streets, etc., to monitor and collect data about the status of cities. Mining such data to extract valuable knowledge can be beneficial for urban planning and decision making [3]. It enriches the quality of life through improving efficiency of services and supplying the residents’ needs. Activity recognition from sensor data in the smart and IoT environments can affect the analysis of sensor data, because in this way, the sensors’ raw data can be combined and converted into a meaningful situation, event, and activity and afterward, these activities and correlations can be mined. Mining the recognized activities can provide a chance to extract the valuable knowledge, which has not been achievable from the raw data, captured by sensors.
Data analysis system of smart environments is an instance of the context-aware applications, which help to make decisions in such a way to benefit the users of the system by analyzing and reasoning about the environment’s situation. These kinds of systems in smart environments need a subsystem to convert the sensors’ raw data into the current situations and activities, which are being performed in the environment. For example, a context-aware application in a smart home may turn off the TV, if it detects that nobody is at home, or all the residents are already asleep for a long time. In this example, the system can reason to turn off the TV, because the activity detection sub-system has detected the activity/situation of correct-sleeping or non-existence of anybody. This context-aware application can consequently detect the residents’ behavior pattern at home by analyzing the activities, being performed during days. In addition, the mined knowledge from smart environments by these kinds of systems can be employed for various other applications, such as the detection of abnormal situations, which can reduce the danger to a minimum extent.
Activity recognition is an inference engine in the data analysis system of smart environments. Using the labelled data, it can learn knowledge in order to detect the activities based on the sensor readings. The inference process depends on the machine learning techniques, ranging from Decision Trees, Hidden Markov Model (HMM), Support Vector Machine (SVM), etc. The knowledge in the activity detection sub-system can be represented by the decision rules, transition probability, support vectors, and weights of the potential functions [4].
In general, there are two ways of activity recognition based on the type of sensors: 1) Measuring the human body signals (such as the body’s velocity and acceleration), using the sensors connected to the human body and with the aid of the machine learning models [5]; 2) Capturing the interaction between human and objects in environments, and make an inference using the machine learning models to detect the activities [6].
There are various methods and algorithms for activity recognition in smart environments, which use the sensors’ triggers as features and train a model using the annotated sensor data. Thereafter, they use the model for activity recognition according to the further sensor readings. The classification model for activity recognition can create a map between the captured data of sensors and the activities in the training phase, and use this map in the testing phase. It is important to consider the discriminative power of some features for model training. Some works like Ling and Intille [7], Activity recognition from user-annotated acceleration data, [8] used various ways for selecting the discriminative features in order to yield a higher accuracy. Although human activities can be distinguished by sensor events, to some extent, in some activities these events are shared. This causes the accuracy of the traditional activity recognition methods to be reduced. Some researchers (like Wen et al. [4]) used the frequencies of sensor events, as a method to solve this problem. In this way, the frequent pattern of an activity is characterized by the frequently triggered sensors for that activity.
In this paper, we proposed an algorithm, which tries to solve the mentioned problems for activity recognition, based on the frequently activated sensors. Sequences of activation and time durations between activities are considered to detect activities more accurate. In the proposed algorithm, a sequence matching-based approach was employed for activity classification, using the extracted frequent sequential pattern for each activity. This causes the conflict between the activities and the commonly triggered sensors to be resolved and the accuracy of the algorithm was improved against the traditional machine learning-based methods and also, a recent work by Wen et al. [4] in which the accuracy of the traditional activity recognition methods was enhanced considering the frequency of sensors. The contributions of the paper can be concluded as follows:
An efficient frequent pattern-based activity recognition algorithm is proposed, which incorporates the sequences and the time durations between the triggered sensors in smart environments for a reasoning process. A sequence-based approach for scoring the conflicted activities in the classification is employed in order to resolve the conflict between the activities and the common and frequent sensor events. The average accuracy of the method is outperformed the traditional machine learning-based approaches and a recent efficient frequent pattern-based approach for activity recognition.
The rest of the paper is organized as follows: Section 2 explains the related works. Section 3 describes the details of the proposed algorithm. The experiments, comparisons with the traditional and frequent pattern-based methods, along with the achieved results are demonstrated in Section 4. Finally, the conclusions are presented in Section 5.
Frequent sequential pattern mining
With the emergence of computing in all aspects of environments, the amount of accessible data has exploded. A great deal of data are produced in cities, homes, etc., which are all stored in computers and ready to access in mass. Data mining is an important tool for the people, who wish to analyze all these data in order to determine the associated patterns. In this way, machine learning can attempt to tell how to automatically discover a good predictor based on the past experiences.
It is obvious that the time-stamp is an important attribute of each dataset, which is important in the data mining process and can give us more accurate and useful information. In recent years, sequential pattern mining [9] has become an essential data mining technique and been applied in many applications, such as intrusion detection system, gene analysis in bioinformatics, and customer behavior prediction in the ecommerce website. In general, the main objective of the sequential pattern mining is to discover the frequent sequences within a transactional database. Numerous approaches, including the projection-based [10, 11, 12], Apriori-based [11, 13], and pattern-growth-based algorithms [14], Efficient enumeration of frequent sequences, have been proposed to mine the sequential patterns. The problem was first proposed in [15], and the formal definition can be described as follows:
Activity recognition
Generally, activity recognition can be categorized into two sub-categories: knowledge-driven and data-driven models. In knowledge-driven models, the rules are used to represent the activities with a common sense. In this case, the rules are reused among various environments. The static limitation of the rules that recognizes the activities causes the models to be unable to manage the uncertainties and noises in the captured sensor data [9]. It should be indicated that human behaviors have a random and erratic nature. The data-driven models in activity recognition can solve this issue through training the model by a real dataset. The examples in the data-driven models are listed here: [17, 7] in Naïve Bayes, [18, 19] in CRF, [20, 21]in SVM, [22] in HMM, [23] in decision tree, and [24] in KNN-based approaches.
A multi-task clustering framework for the activity of the daily life analysis from the visual data, gathered from wearable cameras, was suggested in [25]. The authors in [26] introduced a hierarchical algorithm for the online human activity detection, with two levels in the feature extraction methods. At the lower level, the algorithm gets the sensor data from the accelerometer and the microphone of the user’s smartphone, and extracts the models about the motion and the environment detection of the user. At the higher level, the algorithm takes an input as the combination of the output from these models and extracts the model about the human activity detection.
Frequent patterns for activity recognition
In [27], researchers used the frequent pattern-mining approach for activity recognition, through relying on the relevant weights of objects, as the basis for activity discrimination. In their approach, for each activity, the most relevant objects are extracted according to their normalized usage frequency. An unsupervised approach was proposed in their work, which is based on the object-use fingerprints to recognize the activities without human labeling. The activity models based on the object-use fingerprints are built for activity recognition, which are the sets of contrast patterns, describing the significant differences of the object use between any two activity classes.
An approach was provided in [28], which suggested to apply the sequence alignment approach for pattern mining and matching, in the recognition of the human activities. The proposed sequence alignment algorithm is invoked to extract out the representative patterns, which denote the specific activities of a person from the training patterns. However, this work did not consider the frequency of sensor events in the activity recognition and also, differs from our work, because we use the sequence alignment for the conflict resolution of the activities, with a common frequent sequence sensor activation. It is worth expressing that the activities that trigger the same set of sensors are difficult to differentiate, although they present different patterns (such as the different frequencies of the sensor events). In [4], the authors presented an efficient association rule mining technique to find the association rules between the activities and their frequent patterns, and to build an activity classifier based on these association rules. The classification of the overlapped activities was addressed by incorporating the global and local weights of the patterns. In their work, the sequences of the sensors’ activation as well as the time durations between these activations were ignored.
In order to overcome the problems stated above, we proposed a frequent sequential pattern-based algorithm for activity recognition in smart environments. Through finding the frequent patterns of the sensors’ activation for an activity, we can use those patterns for the classification of the activities. In this algorithm, we considered the time duration between the sensor events and also, the order of the sensors’ activation, for the frequent sequential pattern extraction for each activity. Since some activities share some sensor events, in order to solve the conflicts in the classification of activities, we used the sequence alignment-based technique, which basically increases the average accuracy of classification over the previous works.
Frequent sequential pattern-based activity recognition (FSP-AR)
In this section, the description of the FSP-AR algorithm is given. This algorithm considers the sequence of the sensors’ triggers and the time durations between these activations to recognize the class of activities. At first, the activity trace data is investigated and then, the details of the algorithm are explored in order to show the operation of the algorithm.
Activity trace data (ATD)
An activity trace data contains the sensor data that are collected from smart environments. The sensor data are related to the set of sensor events, which are triggered when an activity is performed. An example of such a dataset is exhibited in Fig. 1. The sensors M018, M019, and M015 were triggered orderly and also, turned off in the same order. If such a pattern occurs repeatedly for the meal preparation activity in the smart environments (exactly the same pattern or with a little change), this frequent sequential pattern can be thereafter used to recognize a meal preparation activity. In addition, the conflict among several activities is also possible, when the activities have a common sensor trigger. Thus, a conflict resolution process was considered in our method in order to improve the accuracy of the algorithm. It is worth noting that the ultimate goal of our algorithm is to discover these kinds of sequential patterns for the activities and generate some rules for the activity recognition.
The frequent sequential pattern of the triggered sensors for an activity can be discovered as a set of sequences for sensors’ triggers, which are common across respective time duration for that activity, among a large number of sensors’ triggers in an activity trace data. For this purpose, a database of activity trace data in a smart home can be taken into account, where each record represents status of a sensor for an activity, during the specified time duration. These records are converted into sequences, in accordance with the activated sensors and the time duration between these activations. In a sequence, only the triggered sensors are considered. Let’s say that the activity trace data records the status of the sensors and the time-stamp. Then, the triggered sensors will be converted into the sequences of the sensor events for that activity. If the time distance between the sensor events is lower than the maximum time duration threshold (MTDT), those sensor events will be placed in the sequence. In this regard, the MTDT determines the time duration between the sensors’ triggers, in order to include in a sequence. The discovered patterns are the sequences of the sensors, most frequently triggered by the activities. An example can be that “80% of the meal preparation activity includes a sequence of triggers for the sensors M018, M019, and M015, within a duration of 3 seconds”. This pattern can be exploited for activity recognition. However, there are probably some conflicts among various activities, because of the common sensors’ triggers, and these conflicts should be resolved.
An instance of the activity trace data (ATD).
Discovering all of the frequent sequences of the triggered sensors in a large activity trace data is a quite challenging task. In fact, the search space is extremely large. For example, for m sensors available in a smart environment, there are
Employing the following format (Time, Sensor Name, Status) to show the status of sensors, during start and end of an activity. Converting sensor readings into the sequences of the triggered sensors, including the activated sensors, in which time gap is considered by MTDT. Discovering the frequent sequential pattern of the triggered sensors for the activities in smart environments. Recognizing the activities with the aid of the extracted frequent sequences from the triggered sensors. Incorporating sequence alignment methods to explore the similarity between the input sequences and the frequent sequential patterns of the activities for activity recognition, in order to reduce the conflicts between the activities’ class to the minimum extent.
The problem of discovering the sequential patterns of the triggered sensors for activity recognition can be stated as follows: let
Various methods for the frequent sequential pattern mining were proposed, such as SPADE [31], FreeSpan [12], GSP [32], and PrefixSpan [33]. Since PrefixSpan has the required potential and can be developed for parallel processing on the big data platform, we chose it as a base approach in order to experiment our method and discover the frequent sequential patterns of the triggered sensors for activity recognition.
Algorithm
Algorithm 1. FSP-AR-part1 shows the pseudo-code of the operation of our proposal for activity recognition. The algorithm gets the ATD (training part) and converts it into sequences, considering MTDT in steps 1 and 2. In this step, multiple sets of the sequences are created. Each set represents an activity and contains all the sequences of the triggered sensors for that activity. As mentioned before, the boundary of sequences of the sensors’ triggers for an activity are specified in ATD (see Fig. 3). In step 3, for each set of the sequences, the frequent sequential patterns (FSP) of the triggered sensors are extracted. These patterns are the representative of the sensor triggers for each activity. According to the min_sup, those FSPs, with the minimum support, are considered as the final FSP for an activity in step 4, and stored in the FSP_DB_ai in step 5 for activity i.
Then, the discovered FSPs in the previous part are used for activity recognition in the second part of the algorithm (Algorithm 1. FSP-AR-part2). In this section, at first, the ATD (test) is converted into sequences, for all of the activities in step 1. In steps 2–4, each sequence is compared against the FSPs of all of the activities, placed in FSP_DB_a in the first part of the algorithm. A score will be computed for each pair of sequence and activity class. This score determines how similar the triggered sensors of the input sequence are with the triggered sensors of the discovered FSPs for each activity. The class of activity with the maximum score will be selected for that sequence, in steps 5 to 8.
In order to evaluate the method, dataset of the smart environments from the CASAS research group was adopted. The dataset [34] contained the sensor data, collected from the home of a volunteer adult. The resident in the home was a woman. The woman’s children and grandchildren visited her on a regular basis. The motion and closure sensors in the experimented environment helped to construct the content of the dataset. The deployment of the sensors in the environment is portrayed in Fig. 2.
The dataset was in the form of ATD and annotated by the class label of activities. The dataset had sensor readings for 11 activities, but some of these activities had a small number of instances. Therefore, the latter group of activities were ignored in our experiments. The considered activities and the number of instances for each activity are listed in Table 1.
We used 70% of the dataset for each activity in the training process and for the extraction of the FSPs, and the remaining 30% for the testing phase.
The activities in the dataset
The activities in the dataset
Sensor deployment in the smart environment.
In order to evaluate the efficiency of the proposed approach, we considered an experimental environment with six nodes, having exactly the same specifications, with Intel Core i3-4160 CPU 3.60 GHz*4, with 6 GB RAM. One node was considered as a master and 5 nodes were considered as the slaves. The nodes in the clusters were connected through 802.11 WiFi interface (WLP3s0) with 72 Mb/s. The PrefixSpan algorithm was used as a base approach for the frequent sequential pattern discovery in the proposed approach in order to generate the FSPs of the triggered sensors. Therefore, the efficiency of the approach was mainly dependent on the efficiency of a parallel implementation of the PrefixSpan on the Spark platform, exploited as the experimental environment. The processing time dramatically increases, when the minimum support drops. The processing time used in all nodes was much less than the processing time in one node. However, increasing the number of nodes from 4 to 6 did not clearly improve the processing performance.
In this regard, we compared the accuracy of our approach with the other typical activity recognition algorithms, such as Random Forest, k-Nearest Neighbor algorithm (k-NN), Naïve Bayes (NB), Decision Tree (DT), and Frequent Pattern-based approaches (FP). Random forests or random decision forests [Ho, 1995] are an ensemble learning method for classification, regression, and other tasks that operate by constructing a multitude of decision trees at the training time, and outputting the class. The k-NN [Altman, 1992] is a non-parametric method, used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The k-NN is a type of the instance-based learning, or lazy learning, where the function is only approximated locally, and all the computations are deferred until the classification. The NB approach is a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features [Russell, 2003]. The DT method is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node holds a class label. The FP approaches find the frequent patterns and provide the association rules to represent the relationships between the sensors’ frequency and the activity class label.
Since the instance number for the activities was not balanced, we applied the recall measure to evaluate each activity separately and the accuracy measure to compare the accuracy of the methods, by considering all the instances for all activities. It is well known that the precision and recall measures are widely used for classification purposes. Precision can be thought as a measure of exactness (i.e., what percentage of the tuples, labeled as positive, are actually such), whereas recall is regarded as a measure of completeness (i.e., what percentage of the positive tuples are labeled as such). The recall and precision measures are depicted in Eq. (4). Precision (also called positive predictive value) is the fraction of relevant instances (TP) among the retrieved instances (TP
Figure 3 presents the recall of the experimented approaches, for the activity numbers 1 to 7 (A1–A7). Moreover, 70 and 90 in the title of the approaches show that the experiments were done with support 70 and 90. The FSP-AR70 and FSP-AR90 were the best approaches among the other approaches and have already achieved the highest average recall. The FSP-AR70 was more robust than FSP-AR90. The measures of recall for these approaches, separated by each activity, are presented in Table 2.
The measures of recall for the experimented approaches
The recall for the experimented approaches.
The RF method received the maximum zero recall (0%) for the following activities: Eating, Work, and Bed_to_Toilet. Although the FSP-AR70 was more robust over the FSP-AR90, the FSP-AR90 achieved the maximum complete recall (100%) among the other approaches. The results demonstrated that the FSP-AR was the best approach among the experimented approaches. It can be observed that using support 70 could provide better results against using support 90 in our approach, because the order among the sensor events is not so restrict for the activities in smart environments. Figure 4 presents the average accuracy of the experimented methods, considering all the activities. Table 3 shows the exact measurements of the accuracy for the approaches. The findings confirmed that the average accuracy of our approaches for support 70 and 90 are equal and can achieve better accuracy over the other approaches.
The accuracy of the experimented approaches, considering all activities
The average accuracy of the experimented methods.
Various sensors are embedded in different places of smart environments, such as parking lots, streets, etc., to monitor and collect data about the status of the environments. Mining such data to extract valuable knowledge can be beneficial. It enriches the quality of life through improving the efficiency of services and supplying the residents’ needs. In this regard, mining such data to extract the valuable knowledge requires the recognition of the activities and situations from the raw sensor data in the smart environment. Activity recognition is of great interest for researchers in the context-awareness computing. In this paper, we proposed a new activity recognition approach, which addressed the correlations between the triggered sensors for the activities and their frequent patterns. The algorithm tries to solve the problems of activity recognition based on the frequently activated sensors and by considering the sequences of activation and the time durations between the triggered sensors. In the suggested algorithm, a sequence matching-based approach was employed for activity classification, using the extracted frequent sequential pattern for each activity. This caused the conflict between the activities with common triggered sensors to be resolved. Furthermore, the accuracy of the algorithm was improved against the traditional machine learning-based methods as well as the recent works, which enhanced the accuracy of the traditional activity recognition approaches considering the frequency of sensors.
