Abstract
Ambient Assisted Living (AAL) systems are increasingly being deployed in real-world environments and for long periods of time. This significantly challenges current approaches that require substantial setup investment and cannot account for frequent, unpredictable changes in human behaviours, health conditions, and sensor deployments. The state-of-the-art methodology in studying human activity recognition is cultivated from short-term lab or testbed experimentation, i.e., relying on well-annotated sensor data and assuming no change in activity models. This paper propose a technique, EMILEA, to evolve an activity model over time with new types of activities. This technique novelly integrates two recent advances in continual learning: Net2Net – expanding the architecture of a model while transferring the knowledge from the previous model to the new model and Gradient Episodic Memory – controlling the update on the model parameters to maintain the performance on recognising previously learnt activities. This technique has been evaluated on two real-world, third-party, datasets and demonstrated promising results on enhancing the learning capacity to accommodate new activities that are incrementally introduced to the model while not compromising the accuracy on old activities.
Introduction
Ambient Assisted Living (AAL) refers to sensing, communication, and intelligence technologies deployed in living environments with the aim to improve quality of life [22]. Recently AAL has made great progress through the use of emerging sensing, machine learning (esp. deep learning), and robotic technologies. It spans a wide range of applications and we take an example in personal healthcare in smart home environments. An environment can be deployed with passive infrared motion sensors to track users’ whereabout, RFID sensors to detect users’ interactions with everyday objects, and resource monitoring sensors to monitor consumption of water, electricity, and gas. These sensor data will be collected and analysed to predict users’ activities, which can be further used to health tracking and disease diagnosis.
With the existing AAL systems, we can monitor and recognise people’s daily activities [32], track their health [30], and provide assistance with their completion of daily activities [27]. This success is enabling the move towards large-scale, in-the-wild, and long-term deployment of AAL systems. This move however comes with its own challenges, notably that neither the sensing technologies being deployed, nor people’s activity routines or health conditions, remain constant. This creates a need for continual learning in AAL systems. Continual learning is a subfield in machine learning, referred to as the ability to continually learn over time by accommodating new knowledge while retaining previous knowledge [20].
Most of the existing activity models that are built on supervised learning classifiers do not support this ability, as they need to retrain their models with all the data. For example, a system may collect a collection of sensor data on two activities such as ‘watch TV’ and ‘sleep’, where a classifier is trained on these data to be able to recognise them. Then after a while, the model might need to be extended to recognise a new activity ‘do rehabilitation exercise’. Then either the classifier needs to re-train the model with all the data on these three activities, which can be undesirable if the number of activities accumulates to a large number [1]. Alternatively, we might use an updatable classifier so that new instances can be used to incrementally and iteratively train the model, but the problem is that the classifier might suffer from catastrophic forgetting – the performance on recognising previous activities might be compromised by the update [19].
In this paper, we present EMILEA to evolve an activity model incrementally with new types of activities, which is built on recent advance in continual learning – Net2Net [8] and Gradient Episodic Memory (GEM) [19]. The former provides operations to extend the architecture of a neural network iteratively to enhance the learning capacity on an increasing number of activities. The latter mitigates the effect of catastrophic forgetting by controlling the gradient update while taking into account of the activities that have been learnt before. We conduct a comprehensive evaluation of the proposed technique on two real-world datasets to assess the strength and limitation of EMILEA, which sheds light on the future design of continual learning techniques for human activity recognition.
The rest of the paper is structured as follows. Section 2 reviews the literature of evolving activity models in human activity recognition and continual learning. Section 3 describes the approach including problem definition, workflow and the key components. Section 4 introduces the evaluation methodology and Section 5 presents the results and discusses the limitation of EMILEA. The paper concludes in Section 6.
Related work
In this section, we will review recent work on discovering and recognising new activities with a particular focus on how to evolve the models and also briefly look into continual learning techniques in the field of machine learning.
New activity discovery and recognition
In recent years, there is an increasing number of work devoted to discovering and recognising new activities. Clustering and one-class classifier are the most popular approaches. The idea is to add new clusters and classifiers for each new type of activities. Gjoreski et al. have used an agglomerative clustering technique to enable real-time clustering of streaming data [13]. To validate clusters for new activities, they have proposed two temporal assumptions on human activities; that is, a human activity usually lasts for a certain period of time and there should not be frequent transitions between activities. With these assumptions, they have filtered short outliers and been able to more accurately discover meaningful clusters.
Ye et al. use distance-based clustering to incrementally learn and recognise new daily routine activities such as preparing breakfast or sleeping from binary sensors embedded in a smart home [31]. An activity profile is built on top of each pre-defined activity using training data and is modelled as a cluster. Mathematically proved sufficient statistics are summarised on each cluster in order to enable model drift without the need of storing any historical sensor data. Then each incoming sensor data will be assessed on each activity profile, and if the sensor data does not match any existing activity profile; i.e., not falling into the corresponding cluster, then it is considered as abnormal and stored in a candidate pool. A clustering technique is consistently running on the candidate pool to identify converged clusters whose sufficient statistics do not significantly change. Once identified, the centre node of the cluster is taken for annotation query and an activity profile is built on this new cluster.
Shin et al. use Support Vector Data Description (SVDD) with a Gaussian kernel to detect abnormal activities of elder people, such as weakness or fall, based on features extracted from infra-red motion sensor data collected in houses [26]. The idea is to form a hypersphere that encompasses all positive instances with the minimal volume. An anomalous instance is the data point that falls out of the hypersphere.
For clustering techniques and one-class classifiers, it is less a problem that a new activity is just another cluster(s) or another classifier. But simply adding a new cluster or a new one-class classifier will make the model fragile; for example, there might be overlapping between clusters, which often needs to re-build. To tackle this issue, Fang et al. have proposed a hierarchical mixture model where each sub-model, built on a conditional independent von Mises–Fisher distribution, corresponds to a type of activity [12]. When a new activity is discovered, a sub-model will be created and added to the hierarchy. Then the contributing parameters on each sub-model will be updated.
Cheng et al. have adapted a zero-shot learner to recognise a new activity with limited training data [10]. A knowledge-driven model encodes the semantic relationship between high-level activities and low-level sensor attributes generated from accelerometer data. To include a new activity, domain experts and developers need to manually add new attributes and update the activity-attribute matrix with manually specified relationship between attributes and this new activity.
Continual learning
In this section, we describe recent advance in continual learning in the field of machine learning, including regularisation and dynamic approaches and memory replay, that can be applied to mitigate this problem [20].
Dynamic architectures
Approaches to alleviate forgetting include changing the architecture of the network when new information is received. The model consisting of a different number of neurons or layers from the previous model is then retrained.
Rusu et al. [25] have introduced a progressive network that stops any changes on the network and expands it by adding a new sub-network for the new data. The main idea is to keep a pool of models that are pre-trained with previous knowledge and add lateral connections to them for the new task [9]. To mitigate forgetting, the parameters (
Aljundi et al. [1] introduce ExpertGate that consists of a network of experts where each expert is a model trained on a specific task. The new tasks are added to the previously trained models in a sequential order in which the knowledge is transferred. A gating mechanism is built to decide which expert is required for activation. This removes the requirement of loading all models which is memory efficient as each model can be computationally intensive [9].
Chen et al. [8] propose a Net2Net approach where the networks can be widen (adding more neurons) and deepen (adding more layers), and knowledge from the previous network will be transferred or preserved in the newly constructed network.
Regularisation approaches
This subsection introduces approaches which enforce constraints when the neurons’ weights are updated such as Learning without Forgetting technique and the Elastic Weight Consolidation approach.
Li and Hoiem [18] propose an approach called Learning without Forgetting (LwF), where the shared parameters (
Kirkpatrick et al. [16] have proposed Elastic Weight Consolidation (EWC) to slow the training of the weights related to the tasks so that the expertise on old tasks can be retained. EWC is evaluated on the MNIST dataset [17] where the new task consists of a modification in the order of the input pixels of the images in the dataset. The results are promising and indicate that EWC can perform well on models that have the catastrophic forgetting limitation [9].
Another model called Gradient Episodic Memory (GEM), introduced by Lopez-Paz et al. [19], supports continual learning by using an episodic memory and by supporting backward transfer of important knowledge to previous tasks. The episodic memory stores a subset of data from the previous task to prevent the GEM model from increasing the loss on previous tasks when training of the current task. Parisi et al. [20] discuss how more memory is required during training for the GEM when compared to other regularisation techniques such as EWC.
Rebuffi et al. [23] have proposed incremental Classifier and Representation Learning (iCaRL), which makes the use of stored exemplars for the old tasks. Examples represent the most important information on the tasks and for each task that is introduced, a set of exemplars dedicated to that particular class is created [9]. iCaRL classifies the new sample into a class based on which class that has the most similar exemplars to it [9]. iCaRL also replays the stored data during training which mitigates forgetting [28]. In this approach, resources are slowly increased with the number of classes introduced [9].

EMILEA workflow.
EMILEA is built on two of the above techniques – Net2Net and GEM, with stored examples from learnt classes, borrowed from iCaRL. This allows the expansion of the network to learn more activities while at the same time mitigating the catastrophic forgetting effect by controlling the gradient updates with the stored examples. Net2Net and GEM target continual learning from different aspects. GEM updates parameters for the original network to optimise learning towards new classes while not reducing the loss on old classes, while Net2Net creates networks having a different structure from the original. EMILEA novelly integrates these two together to expand the network while adjusting the parameters at the same time.
We define continual learning in activity recognition as continually and incrementally learning activities in a sequential manner. Let
Figure 1 presents the workflow of EMILEA. It starts at the time
In this process, we need to address two questions: (1) how to extend the model

An example to illustrate catastrophic forgetting.
To tackle the above two questions, we novelly integrate two advanced continual learning techniques from the machine learning community: Net2Net [8] and Gradient Episodic Memory (GEM) [19]. The Net2Net introduces operations to extend a network with more neurons and layers to enhance the learning capacity in order to classify an increasing number of classes. GEM alleviates the forgetting effect by controlling the gradient updates to balance the performance on old and new classes. In the following, we will give a detailed description on these two techniques and present an algorithm on integrating them to tackle continual learning in activity recognition.
Net2Net introduces two operations to extend the network: Net2WiderNet – adding neurons to the hidden layer and Net2DeeperNet – adding layers to the network. Here we focus on Net2WiderNet as the sensor data are often simpler (i.e., with less dimensions and correlations between dimensions) than images. For simplicity, we call the Net2WiderNet as Net2Net.
The principle of Net2Net is to add neurons to a hidden layer and redistribute the weights and biases of that layer and the layer after. Let layer l and
It starts with randomly sampling q neurons from the original p neurons at layer l:
Then the new weight matrix
The extension of the model often starts from the last layer (the output layer) to accommodate new classes and/or from the second last layer to enhance the learning capability to discriminate a larger set of classes. Net2Net supports expanding the model at multiple layers. In this situation, the expansion will start from the second last layer and gradually move forward to the previous layers. The weights initialisation will be done iteratively layer after layer. Algorithm 1 illustrates the process.

Neural network expansion
Now we will describe how to use GEM to mitigate the forgetting effect. GEM assumes a memory space to host examples from the previous old classes. Learning new classes is to minimise the loss function on both old and new classes;
To prevent forgetting, GEM guarantees that the loss at previous tasks does not increase after each parameter update. That is, when observing a new training example
To assess whether the new update will increase the loss or not, GEM leverages the examples held in the memory by computing the angle between their loss gradient vector and the proposed update; that is, the above equation (4) will be re-phrased as the following:
If all the inequality constraints in the equation (5) are satisfied, then the proposed gradient update g is unlikely to increase the loss on previous classes. Otherwise, there is at least one previous class that would experience an increase in loss after the update. In this situation, the proposed gradient g will be projected to the closest gradient
The primal of a Quadratic Program (QP) with inequality constraints is applied to solve the above equation, which is described as:
The overall algorithm of EMILEA is presented in Algorithm 2.

EMILEA
The objective of EMILEA is to assess the performance of recognising both old and new activities over time by incrementally introducing new activities.
Evaluation process
The evaluation process works as follows. EMILEA is initially trained on randomly sampled 2 activities with 50% of their training data. Then we gradually extend the model with a new activity a time, which is randomly sampled from the remaining set of activities. For all the learnt activities, we hold out
At each expansion, we evaluate four types of accuracy:
New – accuracy on the test data of the new activities Old – accuracy on the test data of the old activities that have been learnt All – accuracy on the test data of the old and new activities; Base – accuracy on the test data of the initially sampled activities
Comparison techniques
We consider to compare with the baseline approaches and also variations of configurations in EMILEA. More specifically,
Selected datasets
We consider two publicly available, third-party datasets for evaluating the performance of EMILEA. The first dataset is PAMAP2 – Physical Activity Monitoring Dataset [24]. It contains 12 activities, including lying, standing, sitting, ironing, and house cleaning. These activities and their distribution are recorded in Fig. 3. The sensor data are collected on 9 subjects with 3 initial measurement units on each subject’s dominant arm, chest, and dominant side ankle.

Activity distribution on PAMAP2.
The other dataset is DSADS – Daily and Sports Activities Dataset [2–4]. It contains 19 activities, including sitting, running on a treadmill, exercising on a stepper, and rowing. Each of these activities is performed by 8 subjects for 5 minutes. The sensor data are collected on 8 subjects with 5 accelerometer units on each subject’s torso, right arm, left arm, right leg, left leg.
As the paper does not aim for feature extraction, we do not work on the raw accelerometer data on these two datasets, but on the extracted feature datasets [29]. For each sensor, 27 features are exacted, including mean, standard deviation, and correlations on axises. The statistics of these two datasets are listed in Table 1. Both datasets are ideal for validating EMILEA as they contain a large number of activities and have a high-dimensional feature space, which increases the challenge of continual learning.
Dataset description
As EMILEA will expand the network with continually increased activity classes, the initial configuration of the network should be small to avoid overfitting and reduce computational cost [8]. To decide the initial configuration, we consider to use two designs with 2 and 3 layers respectively. With respect to the numbers of neurons, we have run grid search with different numbers of neurons and choose the architecture that achieves the best accuracy on the randomly selected 2 base activities. In the end, we have settled the model with 2 hidden layers with 20 neurons and 40 neurons per hidden layer for the PAMAP2 and DSADS datasets respectively.
We set the learning rate as 0.001 and the batch size as 16. The learning rate is chosen using a grid parameter search to decide the best accuracy while also decreasing the cost. We start from 0.005 to 0.001 with a step size 0.001, and the setting is chosen because the model is already initialised with the previous model’s weights and the gradient update will benefit from a small learning rate.
In order to determine the number of training epochs, we run an experiment to see at which epoch the accuracy averaged on the activities that have been learnt so far will stabilise. We use the setting of Net2Net with GEM and a holdout percentage of 5%. For each new activity, we train the model with 20 epochs. Figure 4 presents the accuracy on old, new and all activities over time on the PAMAP2 dataset. As we can see, for majority of activities, the accuracy on all the activities will stabilise after 10 epochs. We can also observe that the accuracy on new activities is often high but occasionally reduces significantly (e.g., on the 8th activity). This is caused by the activity variability between the current and previous learnt activities. The order of learning new classes matters in continual learning. Ideally, the model can gain higher performance if it trains with easier classes and then gradually learns classes with increasing difficulty level [5,11].

Accuracy on old, new, and all activities over time on introducing a new activity a time on PAMAP2.
This section will present the results and discuss the limitation of EMILEA.
Summary result on PAMAP2
Summary result on PAMAP2
Summary result on DSADS
Table 2 and 3 presents the comparison of accuracy with different variants of design and baseline approaches on the PAMAP2 and DSADS datasets. We run each setting 10 times and present the mean and the standard deviation for each type of accuracy. We consider holdout percentages as 1% and 5%, which are set low to reduce the memory cost. On the PAMAP2 dataset, we expand the number of neurons from 1 to 3. Because the initial network architecture on DSADS dataset is larger, we increase the number of neurons from 2 to 8 in order to achieve observable enhancement on learning capacity.
The Base accuracy measures the accuracy on the first input activities

Accuracy on recognising base activities on PAMAP2.
Figure 5 presents the trend of base accuracy (with 1% holdout percentage) over time on the PAMAP2 dataset. Net2Net+GEM has maintained more consistent performance after adding 9 activities. GEM’s performance drops after 5 activities. The base accuracy on Net2Net stays low; i.e., between 20% and 40%. Clearly Naive has no capability of recognising any base activities at all.
When the holdout percentage increases to 5%, the gap of the base accuracy between GEM and Net2Net reduces; with GEM is 3% higher than Net2Net. Net2Net+GEM still achieves the best accuracy 70%.
On the DSADS dataset, we can observe the similar benefit of the combination of Net2Net and GEM. With 1% holdout data (which corresponds to 2 examples per activity type), Net2Net+GEM achieves the base accuracy of 47%, which is 9%, 14%, and 41% higher than Net2Net, GEM, and Naive approach. With 5% holdout data, Net2Net+GEM achieves the base accuracy of 77%, which is 2%, 12% and 71% higher than Net2Net, GEM, and Naive approach.
The increase on holdout data (from 1% to 5%) has more significant impact on the DSADS dataset than on the PAMAP2 dataset, which leads to 30%, 37%, and 32% increment on base accuracy of Net2Net+GEM, Net2Net, and GEM. The reason behind is that DSADS has more activity types (19 in DSADS and 12 in PAMAP2) and the difficulty level of discriminating these activities is also higher (68% of offline accuracy on DSADS and 78% of offline accuracy on PAMAP2). The increased holdout data can potentially help optimise towards the base activities. Figure 6 presents the trend of base accuracy (with 1% holdout percentage) on the DSADS dataset. The accuracy of Net2Net+GEM on DSADS drops much earlier than on PAMAP2, after 5 activities and dips around 30% after 9 activities. GEM and Net2Net both keep the accuracy between 30% and 40% over time.

Accuracy on recognising base activities on DSADS.

Accuracy on recognising new activities on both PAMAP2 and DSADS.
The New accuracy is the accuracy on recognising new activities
Figure 7 presents the trend of new accuracy on both PAMAP and DSADS datasets. On the DSADS dataset, Net2Net and Net2Net+GEM maintain high accuracy on recognising new activities. On the PAMAP2 dataset, the new accuracy of Net2Net+GEM drops significantly. After checking the inference results, some activities in the PAMAP2 dataset can be too similar to each other to distinguish. For example, the new accuracy drops from 77%, to 58%, 49%, and then to 17% when learning the following activities one by one: ironing, descending stairs, standing, and ascending stairs. However, the new accuracy on ascending stairs can be 100% when learning after vacuum cleaning and rope jumping. This shows that in continual learning, the challenge in activity recognition is not only on the activity itself but also the learning sequence. It is different from offline learning: the model will aim to optimise the parameters to discriminate all the classes. In continual learning, the model will optimise the parameters to discriminate the classes on hand and only adjust the parameters to accommodate new classes. However, if distinguishing the new class from the old classes requires drastic update on parameters, then the learning will not be effective, leading to low accuracy on new activities.
Every time after learning a new activity, we test on all the activities that have been learnt so far (

Accuracy on recognising old activities on PAMAP2.

Accuracy on recognising old activities on DSADS.
On the DSADS dataset, with 1% holdout percentage, Net2Net+GEM still achieves the best all accuracy of 59%, which is 14%, 18%, and 40% higher than Net2Net, GEM, and Naive approaches. With 5% holdout percentage, it achieves the accuracy of 75%, which is better than offline accuracy of 68% and is 5%, 15%, 46% higher than Net2Net, GEM, and Naive approaches. Figure 9 presents the trend of accuracy on recognising old activities. In this case, Net2Net+GEM consistently outperforms than the other alternatives. At the beginning, DSADS achieves better accuracy than the offline approach, because learning with fewer activities (which are 2 or 4) is easier than learning 19 activities altogether.
Continual learning in activity recognition can be a challenging task. In this paper, we propose a novel combination of Net2Net and GEM to extend the model to deal with the requirement on learning an increasing number of activities over time. The approach achieves much better accuracy in recognising old and new activities compared to Net2Net and GEM alone, and naive approach.
Computation cost
Comparison of training time on PAMAP2
Comparison of training time on PAMAP2
GEM works well in tackling the forgetting effect, maintaining the consistent accuracy on base and old activities. However, the computation cost on GEM is high. Table 4 shows the training time averaged per epoch on a modest computer1
Due to the high number of experiments, we are unable to run experiments one by one on a GPU machine. Instead, we run the experiments in parallel on the cluster nodes hosted in our school. The cluster nodes are computing resources shared with all the researchers, so the computation time can fluctuate due to the competition of memory and computation with the other tasks running at the same time. Therefore, the computation time recorded here is just an indication.
GEM takes longest to train, and it reaches to 15 hours with holdout percentage 5% on the DSADS dataset. The reason is that DSADS has 19 activities and for each iteration, GEM needs to make sure the loss on each activity does not decrease. Therefore, the more activities, the more checking needs to be done and gradient updates will take longer. Figure 10 shows the increase in training time (in logarithm) after adding a new activity a time on the DSADS dataset. One way to improve the computation time is to relax the constraint; that is, not enforcing not compromising the accuracy on the holdout data in all the previous tasks [7].
An interesting observation is that Net2Net+GEM takes 20 times less than GEM alone. After investigating, we find that it is difficult to guarantee no decrease in the loss on old classes in GEM, especially when the number of old classes is large. However, after extending the network with Net2Net, adding new neurons and redistributing the weights has weakened the loss on old classes and made the inequality constraint in Equation (5) easier to be satisfied.

Comparison of execution time (in logarithm) between GEM and Net2Net+GEM after each training.
Net2Net alone takes less time than the naive approach and similar to the offline approach. Now the question is: would we achieve similar accuracy by increasing the holdout percentage on the Net2Net alone approach? To answer this question, we run another set of experiments to increase the holdout percentage on both datasets and see when we can achieve comparable All accuracy on Net2Net+GEM with 5% holdout percentage. On DSADS in Fig. 11, when the holdout percentage increases to 10%, Net2Net can achieve similar accuracy to Net2Net+GEM. Compared to 0.8 hours on Net2Net+GEM and 15 hours on GEM, the training time per epoch on Net2Net only takes 0.5 seconds, which is more affordable with resource-constrained devices and the requirement for real-time training in human activity recognition. The increased holdout percentage only results in holding 228 more examples in memory. On PAMAP2 in Fig. 11, the holdout percentage needs to increase to 20% to reach comparable accuracy to Net2Net+GEM, which means that the memory needs to host 548 more examples. But again the gain on the training time is significant, which is 0.35 seconds compared to 0.5 hours and 1.9 hours on Net2Net+GEM and GEM.
Since the increase in the holdout data can improve the accuracy of continual learning, then one future direction could be adaptive holdout data management. For example, as the number of activities grows, the system might not be able to accommodate the same amount of holdout data for each learnt task. Then the questions are: can we dynamically reduce the holdout data on some of the old classes so as to accommodate data from new classes? If so, then we can look into the selection of holdout data in terms of different criteria, including recency – how recent the data is, diversity – whether the selected data covers the sensor feature space, and difficulty-to-learn – whether the data leads to high loss in training, indicating the complexity of tasks.

Comparison of all accuracy between Net2Net and Net2Net+GEM.
On Table 2 and 3, we have also observed that adding more neurons to the network will enhance the learning capacity but not significantly and the improvement between different numbers of neurons is within 3%. We have only attempted one way to extend the network by widening the second last layer, and there are many other options to explore, including adding more layers or adding neurons to different layers. However, how to determine the optimal expansion strategy is difficult. It would be desirable to depend on the new data
Accuracy improvement
After investigating our approach with an extensive set of experiments, we have built a comprehensive performance profile of EMILEA and thus identify the following areas to improve. First of all, we can increase the number of training epochs, however, which incurs higher computation cost. We might be able to perform training on a powerful GPU-powered workstation and deploy the learnt model on a resource-constrained device for activity recognition. Secondly, we only randomly sample training data as holdout data for each activity, however, these holdout data might not be representative. We will look into clustering techniques to select centroid examples and also consider to use more advanced techniques to assess the diversity of these examples so as to cover the whole input space [14].
Conclusion and future work
In this paper, we present EMILEA – evolving model for incrementally learning emerging activities, which is the very few first attempt that applies continual learning techniques to human activity recognition. Through extensive experiments on two real-world datasets, we have demonstrated the advantage of EMILEA, especially Net2Net in learning new activities over time. GEM helps mitigate catastrophic forgetting but the computation cost is too high, which might not be feasible for sensor-based human activity recognition. With the followup experiments on increasing the holdout percentage on Net2Net, we find that if the system can afford more memory; holding more examples in memory, Net2Net alone will be a better option to go, which can achieve comparable accuracy and also is more computationally efficient and thus affordable in real-world sensor-based human activity recognition systems. In terms of deployment and application, EMILEA can be employed in conjunction with new activity discovery techniques; that is, once a new set of activities is discovered, EMILEA will be activated to learn them.
In the future, we will extend the current technique to tackle evolving feature space; that is, when a new sensor is deployed, the input feature space will be changed.
