Abstract
This article presents a framework that detects potential ontology building errors to improve the ontology quality. These potential errors are called ontology pitfalls in the literature. This work extends the existing ontology pitfall set in the literature and suggests new solutions for ontology repair. We have also developed a Java implementation for detection of the proposed pitfalls. These pitfalls were evaluated with well-known ontologies using this implementation.
1. Introduction
Ontologies have become a popular tool for knowledge sharing and reuse. With the popularity of ontologies, measuring the quality of ontology has also become an important research area. Ontology evaluation is one of the key stages in ontology development. Ontology must be correct and present the correct results to the user. The accuracy of the results cannot be guaranteed unless the quality of the ontology is guaranteed. Ontology quality evaluation approaches can be classified into two as follows [1]:
Syntactic quality evaluation approaches: evaluates structural aspects of an ontology based on ontology development guidelines, common pitfalls, structural metrics and so on. Examples of this approach include [2–10].
Semantic quality evaluation approaches: evaluates the semantic validity of concepts and relationships in an ontology based on the approval of field experts [11]. Examples of this approach include [12–15].
In this article, ontology evaluation is carried out using the syntactic quality evaluation approach. The first of the studies using the syntactic quality approach is OntoClean [2]. OntoClean is a methodology that evaluates the accuracy and adequacy of taxonomic relationships in ontology. It uses concepts such as essence, identity and unity, which have their origins in philosophy, to evaluate the correctness of taxonomic relationships.
OntoQA (metric-based ontology quality analysis) [3] is another method using the syntactic quality evaluation approach. It evaluates the given ontology on the basis of schema metrics and instance metrics. OntoMetrics [10] is an on-line platform for ontology metric calculation. OntoMetrics uses the metrics proposed in OntoQA [3].
Gangemi et al. [4] propose structural, functional and usability–related measures for ontology evaluation and validation. Burton-Jones et al. [5] propose an ontology evaluation model based on semiotic theory. Semiotic-based Ontology Evaluation Tool (S-OntoEval) [6] aims at assessing the quality of the ontology by taking several metrics into consideration for assessing the syntactic, semantic and pragmatic aspects of ontology quality. AktiveRank [7] is a system based on a set of metrics that measures in which extent the concepts in the selected domain are represented by the ontology. Users are assumed to be using an ontology search engine like Swoogle [16]. With the search engine ActiveRank uses, concepts that match the user’s query are found. ActiveRank applies metrics that calculate how well the query results and their associated neighbouring concepts are represented in the ontology. DoORS (Domain Ontology Ranking System) [9] is another ontology evaluation tool that uses metrics drawn from semiotic theory.
OOPS! (OntOlogy Pitfall Scanner!) [8] is a live and online catalogue of common pitfalls in ontology development. It introduces 40 pitfalls, which are extracted from the literature and the manual analysis of ontologies. Currently, OOPS! has been used over 2000 times by users of over 50 countries [17]. This article extends the OOPS! ontology pitfall catalogue, which is one of the most common and accessible ontology evaluation tools. The new extension set contains seven new pitfalls.
Section 2 introduces the pitfalls in OOPS! catalogue. Section 3 introduces the new pitfall set. This chapter includes the pitfall definitions, examples and solutions. Section 4 describes the implementation of the testing framework. Section 5 presents the results of the test and evaluation of the test results. Finally, the concluding section 6 reviews the potential future work.
2. OOPS! Ontology pitfall catalogue
OOPS! [8] is a live and online catalogue of common pitfalls in ontology development. It introduces pitfalls, including a compendium of pitfalls extracted from the literature review and from the manual analysis of ontologies. For each pitfall, OOPS! also incorporates its value of importance level (critical, important and minor) assigned by ontology experts.
The current version of the catalogue consists of a list of 40 pitfalls as well as their descriptions (Table 1). OOPS! is also supported by a web-based tool, independent of any ontology development environment, for detecting potential pitfalls that could lead to modelling errors in OWL (Web Ontology Language) ontologies. OOPS! can identify semi-automatically the pitfalls whose codes are shown in black in Table 1. However, OOPS! is not able to detect the pitfalls whose codes are shown in grey yet.
OOPS! ontology pitfall catalogue.
OWL: Web Ontology Language; URI: Uniform Resource Identifier.
3. New pitfall set
This work introduces seven new pitfalls (Table 2) as an extension to OOPS! pitfall catalogue (Table 1). The PhD thesis of Villalón [18] is the original work that proposes the OOPS! Ontology pitfall catalogue. Villalón [18] proposes three importance levels and explains them as follows:
Critical (1): it is crucial to correct the pitfall. Otherwise, it could affect the ontology consistency, reasoning and applicability, among other characteristics.
Important (2): although not critical for ontology function, it is important to correct this type of pitfall.
Minor (3): it does not represent a problem. However, correcting it makes the ontology in better form and understandable.
Extended pitfall list.
These levels do not have clear boundaries in the sense that a particular pitfall in a level could be debatable depending on: (1) modelling decisions, (2) ontology requirements and (3) context of use by an ontology application. After determining this framework, ontology developers with different levels of expertise are asked to determine the importance level of the proposed ontology pitfalls. The importance level can be ‘critical’ (3 points), ‘important’ (2 points), ‘minor’ (1 point) or ‘not important’ (0). Then, the importance level of the ontology pitfall is determined according to the weighted average of the answers given. Weighting depends on the expertise level of the ontology developer (‘expert’ 3, ‘medium confidence’ 2 and ‘low confidence’ 1).
In this study, the same method was applied to 20 participants of varying levels (10/expert, 8/medium confidence, 2/low confidence). When the results are examined, the importance level of
The remaining of this section gives the definitions and examples of the newly proposed pitfalls:
1. Class definition with inadequate detail
Example: the definition of ‘RedWine’ class contains only the ‘rdfs:subClassOf’ relationship that binds this class to its super-classes: <owl:Class rdf:ID = ‘RedWine’> < rdfs:subClassOf rdf:resource = ‘#Wine’/> </owl:Class>
Solution: extend the class definition with additional properties or relationships with other classes other than ‘rdfs:subclassOf’ or ‘owl:equivalentClass’. Alternatively, consider to remove the class definition.
2. Properties with similar names but missing structural relationships
same property has redundant definitions,
relationship between these two properties is missing,
there are unnecessary repetitions in the naming convention.
An example of each situation is presented as follows:
Example 1: property ‘front_axle’ of the ‘Car’ defined twice in the ontology: ‘front_axle_cm’ and ‘front_axle_inches’. Only one property definition remains in the ontology, and the unit specification should be removed from the property name: <owl:DatatypeProperty rdf:ID = ‘front_axle_cm’> <rdfs:domain rdf:resource = ‘#Car’/> <rdfs:range rdf:resource = ‘&xsd; positiveInteger’/> <owl:DatatypeProperty> <owl:DatatypeProperty rdf:ID = ‘front_axle_inches’> <rdfs:domain rdf:resource = ‘#Car’/> <rdfs:range rdf:resource = ‘&xsd; positiveInteger’/> <owl:DatatypeProperty>
Solution: remove one of the properties and delete the unit name from the remaining property name. If it is necessary, unit conversion should be handled in the application level rather than the data layer.
Example 2: there are two properties in the ontology ‘dimension’ and ‘dimension_width’, but they are not structurally related. Therefore, the missing structural relationship between them should be added to the ontology: <owl:DatatypeProperty rdf:ID = ‘dimension’> <rdfs:domain rdf:resource = ‘#Car’/> <rdfs:range rdf:resource = ‘&xsd; positiveInteger’/> <owl:DatatypeProperty> <owl:DatatypeProperty rdf:ID = ‘dimension_width’> <rdfs:domain rdf:resource = ‘#Car’/> <rdfs:range rdf:resource = ‘&xsd; positiveInteger’/> <owl:DatatypeProperty>
Solution:‘dimension_width’ property should be defined as ‘rdfs:subPropertyOf’ of ‘dimension’ property.
Example 3: there are a series of properties that are not structurally related to each other, but all of them have the expression ‘paper_’ in their names: <owl:DatatypeProperty rdf:ID = ‘paper_weight’/> <owl:DatatypeProperty rdf:ID = ‘paper_dimensions’/> <owl:DatatypeProperty rdf:ID = ‘paper_color’/> <owl:DatatypeProperty rdf:ID = ‘paper_quality’/>
Solution: create a new property having a name similar to ‘paper_properties’ and defined as the super property of all the properties above. Then remove the repeating ‘paper_’ expression in the names of these five properties.
3. Classes with similar names but missing structural relationships
Example: the example ontology below has two classes: ‘Mammals’ and ‘MarineMammals’. The name of the ‘Mammals’ class is subsumed by the class ‘MarineMammals’, so probably ‘MarineMammals’ class is subclass of ‘Mammals’ class. However, these two classes are not hierarchically related: <owl:Class rdf:ID = ‘Mammals’/> <owl:Class rdf:ID = ‘MarineMammals’/>
Solution: define a ‘rdfs:subClassOf’ relation between two concepts. The class whose name is subsumed by the other class should be defined as the super-class: <owl:Class rdf:ID = ‘Mammals’/> <owl:Class rdf:ID = ‘MarineMammals’> <rdfs:subClassOf rdf:resource = ‘#Mammals’/> </owl:Class>
4. Super-class with direct instances
Example: in the example below, we have two subclasses of Wine class: ‘WhiteWine’ and ‘RedWine’. We have an individual named ‘VasseFelixFiliusChardonnay’ that belongs to ‘WhiteWine’ class. We have another individual named ‘RadiusCabernet’ that belongs to ‘RedWine’ class. The last individual named ‘VieViteRose2018’ directly belongs to ‘Wine’ class: <owl:Class rdf:ID = ‘Wine’/> <owl:Class rdf:ID = ‘WhiteWine’> <rdfs:subClassOf rdf:resource = ‘#Wine’/> </owl:Class> <owl:Class rdf:ID = ‘RedWine’> <rdfs:subClassOf rdf:resource = ‘#Wine’/> </owl:Class> <WhiteWine rdf:ID = ‘VasseFelixFiliusChardonnay’/> <RedWine rdf:ID = ‘RadiusCabernet’/> <Wine rdf:ID = ‘VieViteRose2018’/>
Solution: create a new subclass of the class (in the example, ‘Wine’ class) and move the class individuals to the new subclass. If this cannot be done, check if this instance belongs to an already existing subclass (in the example, ‘WhiteWine’ or ‘RedWine’).
5. Data property with more than one value
Example: in the example below, we have an instance of ‘Lion’ class named ‘Alex’. The weight of this lion has two different numeric values, but these values actually correspond to the same value expressed in different units of measure (first one is ‘kg’ and the second one is ‘lbs’): <Lion rdf:ID = ‘Alex’> <weight rdf:datatype = ‘http://www.w3.org/2001/XMLSchema#integer’>175</weight> <weight rdf:datatype = ‘http://www.w3.org/2001/XMLSchema#integer’>385.808</weight> </Lion>
Solution: remove one of the redundant values and express the value using one and only one unit of measurement.
6. Empty class defined as the range of an object property
Example: in the example below, the range of the ‘hasColor’ property is defined as ‘Color’ class. However, ‘Color’ class has no individuals: <owl:DatatypeProperty rdf:ID = ‘hasColor’> <rdfs:domain rdf:resource = ‘#Car’/> <rdfs:range rdf:resource = ‘#Color’/> <owl:DatatypeProperty>
Solution: populate the ‘Color’ class properly.
7. Insufficient external links with other ontologies
Solution: create more external links to other domain ontologies.
4. Implementation of the testing framework
The proposed pitfalls are implemented in Java programming language using the Jena ontology library [20]. The developed application has been tested on 10 widely known ontologies listed as follows:
Pizza [21]: a well-known ontology in the semantic web community. It is developed for educational purposes by the University of Manchester.
MarineTLO [22]: is a top-level ontology about marine species, and it assists ongoing research on biodiversity.
GoodRelations [23]: is a lightweight ontology for exchanging e-commerce information, namely, data about products, offers, points of sale, prices, terms and conditions.
PROV (Provenance Ontology (or PROV-O)) [24]: provides a set of classes, properties and restrictions that can be used to represent and interchange provenance information generated in different systems and under different contexts.
Wine [25]: is an ontology for wines and the wine industry. It includes such concepts as vintages, wine regions, wineries, grape varieties and so on.
VIVO [26]: provides a set of types (classes) and relationships (properties) to represent researchers and the full context in which they work.
Symp (Symptom Ontology) [27]: is an ontology including disease symptoms encompassing perceived changes in function, sensations or appearance reported by a patient indicative of a disease.
DRPSNPTO (Dementia-Related Psychotic Symptoms Non-Pharmacological Treatment Ontology) [28]: represents the domain knowledge specific to non-pharmacological treatment of psychotic symptoms in dementia in the long-term care setting.
OM (Ontology of units of Measure) [29]: provides classes, instances and properties that represent the different concepts used for defining and using measures and units.
SOSA (Sensor, Observation, Sample and Actuator Ontology) [30]: a lightweight vocabulary to describe sensors, actuators, and samplers and the acts they can perform.
Statistics of the selected ontologies are presented in Table 3. The table shows the numbers of classes, object properties, data properties, class individuals and total axioms in the selected ontologies. Datatype properties relate individuals to literal data (e.g. strings, numbers, date times, etc.), whereas object properties relate individuals to other individuals. Axioms are the statements that are asserted to be true in the domain being described. In other words, the total axiom number indicates the number of all statements expressed in the ontology.
Statistics of the selected ontologies.
PROV: provenance ontology; DRPSNPTO: dementia-related psychotic symptoms non-pharmacological treatment ontology; OM: ontology of units of measure; SOSA: sensor, observation, sample and actuator ontology.
The proposed pitfall set was implemented using the queries below. The queries are defined in Semantic Web Rule Language (SWRL) [31] and its built-ins (swrlb):
5. Evaluation
Table 4 shows the results of the test performed.
The incidence of pitfalls in selected ontologies.
PROV: provenance ontology; DRPSNPTO: dementia-related psychotic symptoms non-pharmacological treatment ontology; OM: ontology of units of measure; SOSA: sensor, observation, sample and actuator ontology.

Knowledge organisation systems according to their semantic expressivity.

Example VIVO properties that are not structurally related to each other.
As we mentioned above, since the domain ontologies do not contain instance knowledge, we populated the Sensor, Observation, Sample and Actuator (SOSA) ontology manually to exemplify <owl:NamedIndividual rdf:about = ‘http://www.w3.org/ns/sosa/observation1’/> <hasFeatureOfInterestrdf:resource = ‘http://www.w3.org/ns/sosa/HurricaneMariaAPSample’/> <observedProperty rdf:resource = ‘http://www.w3.org/ns/sosa/AtmosphericPressure’/> <hasSimpleResult>101003 Pa</hasSimpleResult> <hasSimpleResult>29.826245135 inHg</hasSimpleResult> <resultTime>2017-09-19T23:00:01Z</resultTime> </owl:NamedIndividual>
6. Conclusion and future work
This study proposes a new pitfall set to extend the OOPS! ontology evaluation catalogue, which is one of the most common and accessible ontology evaluation tools. A Java application using the Jena library has been developed to test the proposed pitfall set. The proposed pitfalls were tested on some well-known ontologies, and the results were evaluated. A possible future work would be to further extend the new pitfall set. Rather than fundamental or syntactical errors, further extensions in pitfalls will focus on the quality of external links of an ontology and how much they cover the domain they are modelling.
Another potential work is to perform a new ontology evaluation tool, which implements the synthesis of pitfall-based and metric-based approaches. Gathering pitfall-based and metric-based evaluation methods under one roof will enable ontology evaluation to be performed more efficiently with a single tool. Scalability will be considered in the design of the tool so that large-scale ontologies can be supported efficiently.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
