Abstract
In this paper, we present an architecture for time-constrained ontology evolution comprised of two tools: the J2OIM (JSON to Ontology Instance Mapper), which uses JavaScript Object Notation (JSON) objects to populate an ontology, and TICO (Time Constrained instance-guided Ontology evolution), which analyses streams or batches of instances as they are generated and attempts to identify potential changes to their definitions that may trigger evolutionary processes. These tools help compensate for identified gaps in literature in instance mapping and modular versioning. The case-study for these tools involves a predictive maintenance (PdM) scenario in which near real-time data sensor enriched by contextual data is continuously transformed into ontology individuals that trigger ontology evolution mechanisms. Results show it is possible to use the instance mapping mechanisms in an incremental fashion while assuring no duplicates are generated and the aggregation of similar information from distinct data points into intervals. Furthermore, they show how the ontology evolution processes effectively detect variations in ontology individuals, generating and updating existing concepts and roles.
Introduction
Among other things, ontologies can be used to establish and enforce comprehensible logical relationships between data points, and to clearly define and classify them. With the use of reasoners, ontologies can uncover implicit knowledge in data, adding new relationships and allowing for even further classification that can be used for many purposes (e.g. providing additional input features for machine learning and data mining algorithms). Data sources, however, often provide information in unstructured or semi-structured, using formats such as JavaScript Object Notation (JSON), Comma-Separated Values (CSV) or even plain text, which do not carry explicit semantics. While tools to automatically populate ontologies from said sources exist [1, 2], they only allow for one mapping to take place at a time, leaving the task of combining the results to the end user: adding extra steps to the ontology population process. This makes the tools harder to use in complex scenarios where automation is necessary, which is key when dealing with real-time data. As part of the solutions presented in this paper, the proposed architecture integrates a tool that transforms JSON data into ontology individuals which not only allows for incremental population of repositories, but multiple mappers to be used simultaneously. Furthermore, this tool takes into consideration the time-sensitive nature of the timeseries data, using known reification patterns for the representation of time and making sure no duplicate or redundant time information is generated. It also aggregates multiple identical readings during a period into the same ontology individual associated with only one interval.
Information provided in real time through streams – such as real-time sensor readings or even financial data – whenever constituting a timeseries, carry an implicit or explicit time dimension [3, 4] and are often used by machine learning and data mining algorithms for predictive purposes [5]. This data is extremely time-sensitive, and the time dimension needs to be addressed for the semantization to be used to its full potential. Similarly, any outcomes of the algorithms and models applied will also have a temporal validity associated, and the insights they provide may vary over time – meaning that any acquired knowledge that could be incorporated in the ontology should have a temporal dimension as well.
Data scientists often run several experiments while exploring datasets, applying different machine learning and data mining algorithms so they can understand and utilize the data. This includes finding the most expressive set of features to use for the models, which can involve engineering additional features [6]. The nature of the experiments depends on the goals of the data scientist and the nature of the tasks at hand [6, 7], and therefore it is not reasonable to expect any ontology to cover all possibilities pre-emptively. To ensure the ontology can describe the new features as they are needed, it must be updated over time, preferably in an automatic way. While ontology evolution [8] is a field focused on identifying and materializing changes in domain, existing approaches often fail to account for the time-sensitive nature of said changes, and approach evolution as going from one version of the entire ontology to the next [9]. This means that if a version of a given concept is necessary to describe a particular part of the data, it cannot be used with a description present in a different version of the same ontology simultaneously (e.g. when answering a query). Furthermore, identifying which version of the ontology to use to answer a specific query is left out of the question, and usually only the last version is kept in use. This paper addresses this gap in existing versioning tools and ontology evolution practices by giving each ontology concept is own, independent version history that can be accessed at any time, allowing a reasoner to decide which version of the concepts must be combined depending on the desired outcome. A framework for ontology evolution is thus proposed: this process is guided by incoming instances that deviate from their original domain description because of data transformation and feature engineering processes. The work presented here builds and expands upon that presented in [10], in which a prototype version of a JSON to ontology instance tool was first introduced. The version described in this paper improves on it by extending the data transformations services and by introducing a second tool that utilizes the data to guide ontology evolution processes.
The purpose of this work is two-fold: (1) to identify existing sources that provide timeseries data from semi-structured sources and represent that information through means of ontologies that fully describe the temporal factors implied. To achieve this, the data must be continuously and incrementally transformed from JSON into ontology individuals to populate the ontology. Secondly, (2) use the now semantically-described ontology individuals to guide an ontology evolution process that reflects the changes in domain that may have been brought out by experiences with feature engineering and variations in data sources over time.
The ontology and architecture presented here have been developed in the scope of the Predictive and Prescriptive Automation in Smart Manufacturing (PIANiSM) project [11], which aims to facilitate the implementation of prescriptive and predictive maintenance approaches in the plastic extrusion industry.
The rest of the paper is organized as follows: (2) Background, in which the knowledge domains and related works are explored, (3) Architecture, in which the developed architecture is presented, along with a brief explanation of its two main components and the application use-case, (4) Experiments and Results, which will describe the application of the architecture to a real-world scenario and analyse and discuss the results and (5) Conclusions and Future Work.
Background
The main objective of PIANiSM [11] project is to facilitate the application of predictive and prescriptive maintenance techniques, allowing for a better optimization of end-to-end value chains. The Portuguese use-case, which serves as the application scenario for the work described in this paper, deals exclusively with the plastic extrusion industry. Achieving the goals of the PIANiSM project requires the incorporation of different fields of knowledge, technologies and data sources, whose domains range from industrial maintenance strategies and industrial Internet of Things (IoT) to data science. Ontologies have been used in PdM, for example, in the representation and integration of knowledge from heterogenous sources [3, 4], for reasoning processes that generate diagnostics and discern between types of failures, and the description of machine learning and data science algorithms [12, 13]. In this scenario, real-time data is acquired from sensors installed in industrial equipment [14], serialized in JSON, subsequently processed according to temporal representation and constraints defined by the ontology, and finally used to generate predictions. In this later step, machine learning and data mining algorithms have been used to process and analyse temporally represented data and identify irregular or atypical patterns [15], which can be symptoms of abnormal component behaviours that may lead to potential failures [16].
For a better grasp of the architecture proposed in this paper, it is important to understand the choices involved in their development. As such, the following sections will provide some context on the four main tasks that the architecture aims to accomplish, namely: (1) the ontologies chosen to describe the domain; (2) the temporal representation pattern applied; (3) assessing whether existing tools for mapping JSON entities to ontological ones are sufficient for this particular scenario, and, finally, (4) brief outlook on the processes of ontology evolution that allow for the description of domain changes over time and to establish if existing tools can be applied to our scenario.
Ontologies for predictive maintenance
The distinct data sources utilized in this work are characterized by individual ontologies, representing time, sensors, manufacturing equipment, among others. These ontologies are encompassed by an ontology linking all the selected ontologies, which is described in detail in [15]. This ontology bridges several existing ones for the representation of the different domains that make part of the use-case presented in this paper, namely: (1) CDM-Core (and SSN) for the description of System and Sensor data, (2) Onto-DM for description of machine learning and data mining algorithms, their execution flows and results, (3) ExtrudOnt for the description of extrusion machines and (4) a novel representation of ERP data, including descriptions of manufacturing orders, materials, resources and machines [15]. Transforming temporal data from multiple and diversified sources into semantically structured data, despite the existence of a significant number of tools, is nonetheless a complex task, with no industry standards defined. Furthermore, when it comes to its semantic representation, while time can be represented through several popular ontologies, such as [17], and through different reification approaches [18, 19], when it comes to representing the temporal aspects of data provided in real-time via streams there is still much work to be done [9].
Time-representation
The change in classes and properties over time is an important factor in the representation of temporal entities in ontologies. A property may have different values at different time points, which often requires some level of reification. Additionally, properties must maintain the original semantics when converted to temporal relations. CHRONOS [19] is a Protégé [20] plug-in that facilitates the transformation of static entities into dynamic temporal entities and follows a W3C recommendation for time representation. While this tool easily adds a temporal dimension to data, it can currently only be used with specific, older versions of Protégé – which is not the ideal scenario for transformations that should occur in an automatic process, and not manually generated in an ontology development environment. As such, the temporal transformation processes applied to static data originating from plastic extrusion processes represented in this work had to be re-implemented, but were modelled corresponding to the approach explored in CHRONOS and its subsequent reasoning tool [21]. To do so, the 4D-Fluents pattern is used to represent the persistence of objects through time [18] by reifying the temporal part of an object. The changes in objects’ qualities over time are suitable for describing the time-sensitive data acquired from sensors in this scenario. Considering that sensor data changes at each period
Temporal object property between entities.
For the remainder of this document and for the sake of simplicity, please consider the full lines to represent OWL Classes and Object properties, and the dashed ones to represent Datatype properties and Literal values.
Event represents the temporal part of Machine, connecting Machine and Machine Operation during a specific Time Interval. As such, the same Machine may be related to a set of Events that take place at different times. Following the representation pattern of CHRONOS, the same property, hasOperation, is used both to connect Machine to Eventand Event to Machine Operation. Event takes place during a specific TimeInterval, with starting and ending Instants (using theObject Properties hasBeginning and hasEnd), which have their own individual time stamps, represented through the Datatype Property hasTimeStamp and connecting them to the XSD:DateTime value.
A semantic representation of data is proposed in [1] applying a set of transformation rules to extract a OWL2 Ontology originating from JSON documents. This approach only has the capability of transforming a single JSON document or file, not considering joining or establishing relationships between data stored in several files into a single ontology. Furthermore, the automatic transformation rules are based on the JSON’s structure, the key labels, and the nesting level of the document. This approach does not contemplate configurations or templates and does not require human intervention in the transformation process: the structure of the produced ontology is always the one provided by the input JSON file. In [2], the semantic transformation of data also starts from a single JSON source. Here, the goal is to transform JSON documents into a predefined ontology structure in terms of OWL classes, object and data properties, without the need to perform any changes to the original JSON Schema. To achieve this, an intermediary JSON Schema serves as template to map the original JSON properties to existing OWL classes, object and data properties. This template therefore allows for the automation of the necessary mapping process to perform the semantic transformations. However, much like in [1], the JSON Schema mapping only works with a single JSON document and does not allow for the interrelation of several JSON documents sources into the same ontology, making it harder to incrementally add data to it over time in an automatic fashion.
Ontology evolution
Ontology evolution is the process through which an ontology changes over time, and it encompasses the identification, description and application of changes [8].
Two main approaches to the identification of change can be found in literature, namely: (1) comparing the concepts and roles in the ontology with those in external corpora – establishing if new ones are necessary, if existing ones need change, and which ones have become obsolete (as is the case in [22]) – and (2) comparing different versions of the same ontology and identifying existing modifications in its structure (e.g. [23, 24, 25]).
In [24], the authors describe two main ways changes can be applied to an ontology: (1) naïve approaches, in which change is materialized into a new ontology version and all copies are fully stored (e.g. [26]), or (2) change-based approaches, in which a reference version is maintained, and individual changes are formalized into computable actions that can be applied to the reference version in order to produce the modified ontology version. For this purpose, it is necessary to properly describe each evolutionary action in some formal fashion, and [24, 25] use ontologies for this end. In [27], the Open Provenance Model is applied to show the changes between two versions of a given ontology, allowing to identify how, when and why a specific evolutionary action was taken. In [28], this idea is applied to stream data, under the assumption that all axioms may be eventually replaced with new ones as time progresses. Similarly, in [29], a framework that identifies when changes occurred that can be queried is introduced.
It should be noted, however, that the approaches mentioned here consider the ontology a monolithic structure: each step of the evolution gives rise to a new version of the ontology, and it is not possible to combine definitions from different versions when executing a query. The architecture described next aims to fill this gap in the literature by allowing each entity to have its own version history and all of them accessible at once – leaving the task of choosing which is the correct one to the reasoner.
Architecture
Data flow, from the individual sensor and software sources to the triple store, including feature engineering processes and identification of new ontological entities.
In the PdM field, data is subject to changes over time. Detecting those changes is in the core of many predictive processes, as they can be a symptom of component wear and potential future failure states. Not only can these processes generate new insights and add new knowledge to the known domain, but they can also demand the execution of different experiments – which often include feature engineering – that may change the initial conceptualization of the domain.
The proposed architecture is comprised of a set of tools and ontologies, which will obtain data directly from sensors, translate it from JSON to ontology entities and then use those as a potential trigger for ontology evolution. This flow is depicted in Fig. 2. The scenario considers two main sources of data: three plastic extruders, providing information regarding each extrusion head via several sensors, and the ERP software, which adds to this information with contextual data regarding equipment definitions, manufacturing orders and occurrences, among others. This data is stored in a relational database and made available through means of a webservice, in JSON format. The webservice also allows to customize some queries, such as allowing to query specific time intervals and which parameters can be queried for each machine (as they have different sensor types). The webservice contains a total of 20 endpoints, 5 of which pertain to information provided by the ERP, with the remainder corresponding to different sensor data from two different extrusion machines.
To make use of the ontological definitions provided by the ontologies employed, the data acquired by the sensors must be semanticized through some process. This is where J2OIM (JSON to Ontology Instance Mapper) comes into play, by establishing a set of mappings between JSON structures and ontology entities. One of the main objectives of its implementation was the representation of time-sensitive data related to extruder machines supplied by sensors in temporal N-ary relations whenever applicable.
The resulting ontology individuals can then be stored in a triple store and be used by subsequent processes and experiences. In order to generate good predictive models for PdM, it is necessary to engage in data analysis, which often includes feature engineering. Different experiments may require a number of different features to be engineered, and it is important to keep track of how and when those were created. However, since said features will highly depend on the context they were created for – and the experience they are part of – it is quite difficult to maintain an ontology that would describe all of them beforehand. The data analyst will generate new features and add them to the dataset they are working with before reflecting those changes in the ontology. In order to make it easier to assess when these changes have occurred, the TICO (TIme Constrained instance-guided Ontology evolution) tool will analyze the ontology’s individuals and see in which ways they are different from the definitions of the concepts they belong to, and create new classes in the ontology that reflect those changes. The new concepts will be reified through different frames, each of them representing a definition of the concept during a specific interval, allowing for an ontologist to potentially make further changes to the ontology that make use of the new features more easily, and have those changes be used only in the adequate timeframe. Because the ontology changes are described as new time slices of existing concepts with clearly defined beginnings and endings and no overlaps, there is only the need to add axioms to the ontology – for example, removing a property would basically entail adding an ending time to that specific property’s timeframe – meaning changes can simply be added to the triple store as well.
Finally, it is also important to note that while TICO makes changes to the ontology, J2OIM maps JSON instances always to the first known version of the ontology, as the mapping configurations are not automatically updated by TICO and that is not the focus of this work. As such, it is not possible to update the format of new instances to match the changes in the ontology; meaning the ontology itself, through means of a reasoner, must be able to properly infer which is the version of the concepts that should be used to classify the instances.
To achieve a semantic representation of data that can be used for reasoning processes to support PdM, we propose J2OIM: an architecture that supports the transformation of the JSON data acquired from the different sources into instances represented through ontologies. In this context, the ontology described in [15] is used for the representation of Machine and ERP data for predictive maintenance – in particular, that of plastic extrusion machines –, with time-sensitive data representation following a 4D-fluents pattern [18].
It is worth noting this does not correspond to the total data produced by the machines or available through the ERP, as some of it was kept confidential. These two data sources were considered for their relevance in PdM processes – while real-time data obtained from the machine’s sensors may show variations that are compatible with equipment wear and indicative of potential failure, the data from the ERP can further contextualize it. For example, the machine’s noise and vibration patterns when executing one type of operation with a specific material may be different from those obtained under different circumstances. By using the ERP information about manufacturing orders and materials, the reason for the variation in the patterns is established and they will not be considered a symptom of potential failure.
Total records of sample data
Total records of sample data
In total, the webservices were queried and the results converted into files containing 5 types of process data and representing data from 15 different sensors. These must be aggregated, related, and transformed into a concise semantic model. The data samples used in this work consisted of 3 machines, with 24 variables each, describing the machine, sensors, occurrences, and manufacturing operations. Table 1 discriminates the types of data and total records found in the sample.
The two types of data are structured differently. Records provided by sensors always take the form of a tuple (
Overall, the obtained data is not normalized and has no inter-file relation, rendering its interpretation difficult. To cope with this problem, a JSON mapping file was developed to map each property of each file to a triple subject-predicate-object. Once a property is mapped, it can be transformed into an individual with object properties and data properties, which will allow the insertion into an ontology file and a triple store.
In J2OIM, mappings are executed by a function map(OT,
OT is the target Ontology Model;
Finally, OI is the resulting set of ontology individuals described by OT.
J2OIM has three main modules that perform the transformation processes, namely: (1) the Main module, which orchestrates the transformation process, (2) the Data Loader module, which loads the data according to the defined Model and (3) the Data Writer module, which persists the data as defined in the mapping configuration.
Component diagram of J2OIM’s architecture.
Following Fig. 3, the Main module launches the J2OIM component to orchestrate the process. First, a controller is launched to execute the loading process of the configuration and data files based on the defined models. Afterwards, the loaded data is forwarded via a Batch Loader to an Observable interface.
The Data Loader fulfils the responsibilities regarding import and error handling. These are based on different configuration files, specifying how the importing process will take place, where are the source and mapping files and how to condensate and aggregate data. It guarantees that all machine data has been properly read, processed and sorted.
In the Data Writer module, the Ontology and Triple Store Writers are observing the Batch Loader, and for each record passed to the Observable interface, the writers are triggered and persist the data as defined in the mapping configuration file.
While data can be consumed in near real-time and in batches – meaning it is possible for data to arrive late – the J2OIM only attempts to correlate individuals within the same batch. However, the UUIDS of individuals an Intervals are generated through means of a function that guarantees the same UUID is generated for any Intervals with the same starting and ending dates. Still, in the unlikely scenario that late arriving data concerning a new Interval has the same beginning and ending dates as an existing one, J2OIM will replace the previously stored value. A similar UUID generation process guarantees that any two identical individuals of a particular class will have the same UUID, ensuring no repeated values are stored – only the latest-arriving version of the data.
Equipment behaviour is bound to change over time, and thus the definitions of what is considered normal and abnormal regarding it evolve as well. Furthermore, as new experiments are performed over data, new features may need to be computed, either by extending or combining existing ones. Because of this concept drift, the historic of previous states cannot be queried using the latest version of the ontology; at any given historical moment, it is important to assess the version of the ontology that must be considered by applying only the changes that are relevant for the considered interval. This also means that many of the existing concepts of the ontology should also take on a time dimension. While, as previously mentioned, reification is a common approach in a scenario such as this, the particular reification approach depends mainly on whether the quality assignments, which are only valid for a given time interval, can be overlapped with other quality assignments, and if such an overlap is always complete (with the same starting and ending timepoints) or partial. TICO is a novel framework that aims to incorporate and represent these changes in knowledge through means of ontology evolution practices.
Component diagram of TICO’s architecture.
TICO is guided by incoming ontology individuals. It analyses their structure – namely which roles and concepts are used to describe said individuals – and verifies whether that structure matches the definitions provided by the ontology. Should enough individuals display the same differences, they become potential evolutionary points for the ontology – e.g., if a role shows up associated with all individuals of a given concept, it is considered a potential restriction to add to either the necessary or necessary and sufficient conditions of said role –, and depending on configurations set up by the user, evolutionary actions that modify the ontology will be triggered. Because these individuals have been previously reified into 4D-Fluents by J2OIM, it is easy to query their time dimension, which can then be used to establish the Instant when a pattern arose and when it stopped being significant. Therefore, TICO can create a time-frame in which the evolutionary action deployed is valid, and reifies this through 4D-Fluents as well: by creating a TimeSlice of the concept that is added to the ontology and has a starting and, potentially, an ending date.
TICO has three main modules it uses to generate and execute evolutionary actions, namely: (1) the Comparator, which reads the inputs and orchestrates the data flow, (2) the Metrics module, which reads user configurations and stores metrics relative to ontology structures and (3) Evolutionary Actions, which creates said actions and generates TimeSlices for concepts.
As pictured in Fig. 4, the Comparator module kickstarts the process by having the similarly-named component read both a seed ontology – the Original Ontology Model – and a second one containing the individuals that would prompt modifications – the Ontology Individuals Model. It then triggers the Diff Operator component, which is responsible for analysing whether the original version of an individual’s describing concept matches with the individual’s definition and properties. To do so, it triggers an additional set of comparators, which will assess, among others, the differences in restrictions, class definitions and roles employed. The metrics and configurations employed to assess these differences are provided by the components of the Metrics module, which reads configuration files and calculates and stores different metrics across sets of individuals. Here, a user-defined threshold is used to determine whether enough individuals (either by percentage or in absolute numbers) with the same structure have been identified, and if their structure is different from the one in the ontology. The result of the application of the Diff Operators will be a set of one or more Evolutionary Actions, which are created by an Evolutionary Action Factory and stored through means of a Composite. These actions include, for the time being, the addition of roles (properties) and concepts (classes), along with TimeSlices of new or existing concepts and restrictions. Finally, when the Composite executes its Evolutionary Actions set, the resulting modified ontology is stored in an additional model, here described as Evolved Ontology Model.
Relations between subjects
The TimeSlices created by TICO do not overlap: when a new TimeSlice is discovered, the previous one is updated with an ending date corresponding to the start date of the new one. This ensures that an individual cannot be classified as belonging to two different TimeSlices, preventing potential inconsistencies, as their definitions will be different. Because the individuals arrive sequentially, as provided by the webservice and J2OIM, there is no expectation that incoming data will be producing TimeSlices that would overlap previously existing ones, although such possibility will be considered in the future to account for potentially late-arriving data.
The experimentation process described next can be separated into three main parts: (1) ontology instantiation, in which the JSON instances are converted to ontology individuals, (2) feature engineering, in which new features are created by data scientists and incorporated into the dataset and, finally, (3) ontology evolution, in which all individuals, including the ones created in the second step, are analysed to fuel ontology evolution processes and generate either new concepts or new versions of existing concepts.
Ontology instantiation
In this phase, the webservices are queried for the data collected from sensors and management software and transformed from JSON into ontology individuals.
All the data had to be tagged with an identifier or reference to establish relationships via object or datatype properties. For the present implementation, each data record read from the source JSON files was tagged with a UUID generated in runtime during the transformation phase. To provide readable information regarding each individual or subject created in the ontology and triple store, each subject reference is composed of the subjectUri defined in the mapping configuration, appended with a UUID based on its data contents. This ensures that any subject, even if inserted or referenced multiple times, maintains the same reference.
Events occurring in sensors.
Events occurring in machines.
Not all entities identified in the data pertain to time-sensitive data; some entities have non-temporal relations with others. The identified temporal relations associated with the extruder machine were: manufacturing operations and manufacturing occurrences, and temporal relations associated with a sensor where all the sensed output values that occurred over time. For each type of temporal relation, a new Event individual is created, which is then linked to the original individual: the Event is thus a relationship between two individuals that occurs within a specific TimeInterval. Table 2 presents an overview of all the individuals (or subjects), in the form of triples, differentiating those involved in non-temporal and temporal relations present in the plastic extrusion use-case. Here, the initial non-temporal relation Sensor-hasOutput-Value is now represented as a temporal relation Sensor-includesEvent-Event, Event-hasOutput-Value, Event-during-TimeInterval.
Figure 5 shows the different ontology individuals created when a JSON sensor reading is processed by J2OIM. The machine is now related to the sensors present in the head, and sensor data is related to a precise time interval, with a beginning and ending Instant, during which the reading provided by the hasOutput datatype property is valid. Figure 6, on the other hand, shows how the several Events observed on a particular Machine are now all interrelated and temporally represented. In this case, a ManufacturingOperation (executing a specific ManufacturingOrder) and an Occurrence are associated with the TimeIntervals in which they occur. Additionally, it should be noted thatthe individual Instants and Events are ordered – i.e., they use the role before/after to ensure the direction of time between them.
Once inserted in the ontology, extrusion head sensors instances have temporal relations to represent the output values occurring over time. In Table 5, an individual representing an extrusion head sensor has two temporal events (extEvent), representing the readings obtained by the sensor “Motor Current” of the head “D” in extruder machine “VK07” at two different, sequential intervals. The TimeInterval is comprehended between the Instants represented by the timestamps.
Individual of class Extrusion head Sensor (MotorCurrent) and sample output Events
Total resulting instances
The resulting instances (data properties, in the case of sensor data) obtained from the data transformation are listed in Table 4. These show that non-temporal data from the ERP resulted in the same number of instances as the initial data, as expected, while sensor readings resulted in about the half of the initial values – from 1,001,965 shown in Table 1 to 91,055 seen in Table 4 – due to the aggregation of repeating values over the same time interval and by averaging the values over each minute. On the other hand, because of the reification approach employed, 252,945 instances were created to describe Intervals and Instants.
Using “Speed” and “Throughput” output Events to create a VirtualEvent “Feed Rate”.
The instances generated by this process can then be used for different data science experiments, as seen in Fig. 2. In our case-study, sensor data is aggregated by the minute and new features are generated following the experiments described in [30]. As a result, two new features have been devised: the running average (with sample size of 5) of the Motor Current feature, and Feed Rate, which is given by combining values of Throughput and Speed. For the purposes of this work, consider Motor Current’s running average as an extension of the existing Motor Current generated Events, and Feed Rate as a new type of Event – a Virtual Event – that correlates the values of two Events generated by two existing Sensors at a given time. Figures 7 and 8 highlight the generated data and how it relates to the existing structures.
Extending the Motor Current sensor’s Events.
Since these two new features were created on demand, they are not part of the initial ontology used to describe the data, and cannot be used in subsequent reasoning processes. These new instances are then fed to TICO, which tries to identify both the new ontological structures and changes to existing ones, and update the ontology accordingly in an automatic fashion. Furthermore, these experiments mean that, potentially, there may be different versions of the ontology at hand: the original one, described in [15] and representing the instances as they are output by J2OIM; a second version which includes a new type of Event – the VirtualEvent and the derivedFrom role; and a third version that also describes the hasRunningAverage role and how it relates to Motor Current generated Events.
Evolutionary Actions triggered by the first batch of instances (Round 1)
Evolutionary Actions triggered by the first batch of instances (Round 1)
In order to simplify the comprehension of the ontology evolution process undertaken by TICO and the following examples, a seed ontology was generated, which contains only the minimal definitions used to describe the initial sensor data dataset and that may, therefore, be affected by evolutionary processes, as illustrated in Fig. 9.
The “seed” ontology.
The seed ontology is a subset of the ontology described in [15], including only four concepts (the Sensor, Event, TimeInterval and Instant classes) and ten roles, namely: six object properties (before, after, during, hasBeginning, hasEnd and includesEvent) and four datatype properties (hasTimeStamp, hasOutput, hasSensorName and hasSensorID). It was created for this experiment with the purpose of keeping the number of concepts low, as to allows us to keep a better track of the evolutionary processes and their results.
The alterations to the data fed to TICO are distributed according to three different experiment moments, or three rounds, each comprised of a sample of 2188 Events. The first round was used in a scenario in which the Feed Rate feature was required; the second one includes Motor Current’s running average; and the third round uses both new features simultaneously.
The batches were supplied sequentially to the framework, which allowed it to generate reports regarding the evolutionary actions created and executed following each round. Tables 5, 6 and 7 summarize the reports for each round, detailing the evolutionary actions that were effectively triggered. As seen in Table 5, a total of 19 Evolutionary Actions were triggered when comparing the seed ontology to the structures present in the first batch of instances.
Evolutionary Actions triggered by the second batch of instances (Round 2)
Evolutionary Actions triggered by the third batch of instances (Round 3)
Two object properties were created to accommodate for the roles present in the VirtualEvent classes and its relationship with existing Events, and three TimeSlices were created in total. The first one pertains to the VirtualEvent class, which is effectively new and must be added to the ontology, along with its first TimeSlice. The necessary conditions for this class are assumed to be the roles that appear in all individuals of the class analyzed in this batch. Because derivedFrom appears always twice per individual and its range is consistently an individual of class Event, two restrictions are created (Cardinality and SomeValuesFrom restrictions). While there are no changes between the original version of VirtualEvent just discovered and its first TimeSlice, they are effectively copies of each other – the only difference being that the first TimeSlice has a HasValue restriction on its list of necessary and sufficient conditions. Note that VirtualEvent could potentially be considered a subclass of Event, but TICO does not have evidence to support such relationship.
A TimeSlice for the Event class is also created, to accommodate for the appearance of the hasSensorCorrelation role. As before, since it consistently appears in relation to the Sensor class, the AllValuesFrom restriction was created with that range. However, while TICO managed to detect that this role only appears once per Event, it failed to specify the range of the Cardinality restriction it created. The same could have been achieved by making the hasSensorCorrelation role functional. While it may seem that TICO has opted for the least elegant option, it is important to note that it is not known whether the property may be used for a different cardinality restriction later, which could potentially conflict with a functional definition.
As for the existence of the remaining TimeSlice – of the Sensor class – while Fig. 9 may suggest that no changes to this class should have occurred, it is important to note that neither of the roles represented in the figure were part of the class’s definition. As such, and because they were always identified in conjunction to Sensor, a TimeSlice adding them to the necessary conditions of said class has been created.
It is also interesting to note here that, since both Sensor and VirtualEvent are not temporally reified the same way Event is, the framework could not identify a starting point for the generated TimeSlices that it could add to the classes’ necessary and sufficient conditions.
The second round, summarized in Table 6, resulted in a much smaller number of Evolutionary Actions, with a total of seven. TICO correctly identifies hasRunningAverage as a Datatype property, but fails to infer the details of its type, going for the broader type XSD:Literal. Because in this round the individuals of Event all display the new role, a new TimeSlice of the class is created with it in its list of necessary conditions. Similar to the previous round, the new TimeSlice is equivalent to the original class, as long as it takes place after the Instant that marks the moment when the framework initially identified the pattern. The major change is the modification of the TimeSlice created in the previous round, which is modified to have both a beginning and ending date by being equivalent to the intersection of the after Instant previously defined and the new, before Instant – the same that marks the beginning of the new TimeSlice.
Finally, since no new individuals of VirtualSensor have been found, no modifications to its TimeSlices have been produced, and the same happens for the Sensor class, since all its individuals maintain the same structure.
Finally, Table 7 describes the Evolutionary Actions triggered by the last batch of individuals, which include both new features. The most desired, and potentially more efficient scenario would be editing the existing definition of the first TimeSlice of Event to ensure it covers the two different intervals – this is, as the definition of Event is effectively the same now as it was when the first TimeSlice was created. However, TICO is unable to identify this and generates a third TimeSlice for Event, whose definition is similar to the first one but with a different starting date instead. It also edits the second TimeSlice to have a similar ending date, which may not be ideal, as it assumes there is no overlap between the two TimeSlices and no two different definitions of Event should exist simultaneously.
At the end of the round execution, the ontology has three TimeSlices that represent conceptualizations of the Event concept at different times: the Event concept was first modified from its original definition to include hasSensorCorrelation as part of its necessary and sufficient conditions during the first period, then having a new definition that sees different conditions (with the runningAverage role), and finally returning to a state in which hasSensorCorrelation was implicated again. The VirtualEvent concept is also discovered and associated with two derivedFrom roles, with no known ending date.
J2OIM’s great emphasis is in the creation of instances required for temporal representation, events, and intervals. About a million new instances were created to temporally represent sensor readings, occurrences, and manufacturing operations; which is likely to impact future reasoning processes that may need to be applied, and future tests will be able to determine the full impact of the chose approaches. On the other hand, it is important to note that J2OIM ensures Instants are never duplicated, and aggregates continuous sensor readings into the same interval.
TICO makes use of these temporal representations to correctly identify new classes and properties in incoming instances and analyse their structures to define potential necessary conditions for classes during a specific time interval. As such, whenever experiments are made that modify the data – either by extending existing features or including new ones, which is often a requirement when analysing and experimenting for PdM – the structure of the ontology is modified to accommodate for them. This allows a user not only to visualize the modifications but to use a reasoner to automatically classify instances according to the concept versions that were valid during at the time of the experiment. TICO achieves this by supplying a number of Evolutionary Actions with specific triggers, which range from the creation of classes to properties, time slices and restrictions.
While TICO is thus far able to infer Evolutionary Actions by comparing the ontology individuals to known descriptions of concepts, it does show some limitations that should be addressed in future work. First of all, as is evident from the results of Round 3, the framework is unable to identify that two versions of the same class may occur simultaneously. This is a direct result of including roles seen in individuals of a class into its necessary conditions – which may not always be a good solution, and may even be overly restrictive. Efforts should also be made to ensure that no excess of TimeSlices are created, especially if existing TimeSlices can be edited to include more than one interval. The reification approaches applied in this work already multiply the number of individuals and concepts, and this should not be exaggerated, both for computational and human readability reasons.
The interesting case of the VirtualEvent concept, which is not on itself reified to have starting and ending dates but to link to an Event that does also shows a potential improvement point for TICO, in which could analyze not just the direct descendants of a concept for an Instant to use as pivot, but also check the concepts the initial one relates to, and decide which Instant to use based on those.
The topic of automatic identification of subclasses (and by extension, superclasses) is yet another interesting case that must be studied with future experiments. Additionally, for the purposes of the work described here, not much attention was given to how to provide a time dimension to properties. Therefore, these future experiments will also include more work on the identification of domains and ranges of properties and their cardinalities, and other potential attributes (e.g. reflexiveness, transitivity, etc.).
All changes considered thus far have been addictive only – a new version of a concept is added to the ontology as enough changes are detected to trigger it – meaning there is no need to make subtractions to the ontology: it is possibly to merely declare the concept has reached the end of the period in which it is valid. However, future work will involve a deeper and more systematic approach to restriction creation and the identification of potential inconsistencies that may arise from that process and, as such, subtraction methods may become a necessary feature of TICO for inconsistency resolution.
Conclusions and future work
This paper presented an architecture for transformation of real-time timeseries data from semi-structured sources into ontology individuals that trigger ontology evolution processes. To achieve this, it introduces two main tools: the JSON to Ontology Instance Mapper, or J2OIM and the TIme Constrained instance-guided Ontology evolution tool, or TICO and demonstrates their results in a predictive maintenance case-study featuring real data from sensors and management software.
J2OIM allows for the parameterizable transformation of JSON instances into different ontological instances, which can be either temporally dynamic or static. This paper follows how this tool is used to transform the data acquired through different webservice endpoints and how the generated ontology individuals are used to populate ontologies and triple-store repositories. While the reification approach chosen for temporal representation generates additional ontology individuals that can affect the efficiency of reasoning processes, the aggregation of similar captures into intervals and the removal of duplicate entries such as instants with the same timestamp mitigates said effects.
TICO, as an ontology evolution tool, uses ontology individuals and analyses their features – namely their restrictions and roles – to establish whether the original definitions in the ontology need to be updated. To represent the changes in concepts over time, it creates new concepts when needed and TimeSlices with clearly defined beginning and ending instants for each different definition. By using ordered 4D-Fluents for the description of the time information associated with the ontology individuals, reasoners can use the TimeSlices to guarantee that the correct version of each concept is used to classify individuals. While some restriction types are created with the current version of TICO, more complex ones, and their respective impact on ontology consistency, will be the subject of future work.
Footnotes
Acknowledgments
The authors would like to Marta Fernandes for her valuable insights and proofreading which undoubtedly improved this paper immensely.
The present work has been developed under the EUREKA – ITEA3 Project PIANISM (Itea-17008), PIANISM (ANI
