Abstract
With the development of the Semantic Web, more and more semantic data including many useful knowledge bases has been published on the Web. Such knowledge bases always lack expressive schema information, especially disjointness axioms and subclass axioms. This makes it difficult to perform many critical Semantic Web tasks like ontology reasoning, inconsistency handling and ontology mapping. To deal with this problem, a few approaches have been proposed to generate terminology axioms. However, they often adopt the closed world assumption which is opposite to the assumption adopted by the semantic data. This may lead to a lot of noisy negative examples so that existing learning approaches fail to perform well on such incomplete data. In this paper, a novel framework is proposed to automatically obtain disjointness axioms and subclass axioms from incomplete semantic data. This framework first obtains probabilistic type assertions by exploiting a type inference algorithm. Then a mining approach based on association rule mining is proposed to learn high-quality schema information. To address the incompleteness problem of semantic data, the mining model introduces novel definitions to compute the support and confidence for pruning false axioms. Our experimental evaluation shows promising results over several real-life incomplete knowledge bases like DBpedia and LUBM by comparing with existing relevant approaches.
Introduction
With the development of the Semantic Web [1], especially the promotion of Linked Open Data project[2]1
Existing approaches to generate disjointness axioms and subclass axioms can be divided into two categories. One category is to obtain such axioms by labeling relations between two concepts manually. For instance, the work in [15] manually generate disjointness axioms. Obviously, it would be very tedious and even impossible when dealing with large-scale KBs. The other category is to learn terminology axioms automatically. Lehmann and his colleagues have developed a tool called DL-Learner [16] to learn possible candidate concept expressions which are scored by positive examples and negative examples [17]. For the approaches proposed in [10, 11], the confidence is measured by the number of the positive examples and negative examples [18]. This may lead to incorrect terminology axioms learned with a high confidence. The work in [19, 9] consider missing information as negative examples and their work has implemented systems BelNet
CWA assumes the truth value of the specified and derivable statements is true, and false otherwise. The main difference between CWA and OWA is that the latter makes the assumption that the truth value of an underivable statement is unknown (i.e., neither true nor false).
In this paper, we propose a novel framework which aims on Semantic Induction From Semantic data (SIFS) based on association rules to automatically generate disjointness axioms and subclass axioms for addressing the problem of incomplete semantic data under OWA. This framework consists of two phases. The first phase generates negative examples by a type inference algorithm [20]. The algorithm computes a set of type assertions by assigning a probability
The main contributions in this paper are listed as follows:
A novel schema induction approach based on association rule mining techniques is proposed to automatically generate high-quality disjointness axioms and subclass axioms from incomplete KBs. This approach exploits type inference to deal with the incompleteness of the semantic data under OWA. To improve the quality of learning results, new definitions of support and confidence are given by using negative examples as constraints. Based on the definitions, specific algorithms to generate disjointness axioms and subclass axioms are described. Our experiments are conducted on several real-life KBs. An exhaustive analysis of the experimental results is provided by comparing our approach with existing relevant ones.
The rest of this paper is organized as follows. Section 2 introduces the background knowledge. The overall process of SIFS is described in Section 3 and more details about the proposed mining approach are presented in Section 4 separately. Section 5 provides the details of our experimental settings like data sets and evaluation measures. In Section 6, the experimental results are analysed thoroughly by comparing our approach with existing ones. The related work is given in Section 7 and Section 8 concludes the paper.
In this section, we first introduce the background knowledge about knowledge base, association rule mining and type inference.
Knowledge base
In this paper, we focus on those knowledge bases expressed with RDF statements. Each RDF statement is a triple in the the form
Usually, a KB consists of an ABox and a TBox. The ABox includes a set of facts describing instances and their relations. The TBox defines the schema information like the relations between classes, domains and ranges of properties. Our task of this paper is to mine TBox information from an ABox. It is noted that, we use the terms knowledge base and ontology interchangeable.
Association rule mining
Association rule mining[21] is a rule-based method for discovering relations between variables in large databases by constructing transaction tables. Let
To decide whether an association rule
Usually, the support of
From the equations we can see that, the problem of mining association rules can be reduced to that of mining frequent itemsets.
Type inference
A RDF KB contains million facts, but it is still an incomplete KB in most cases. [20] is an statistic method to learn missing type assertions from a noisy RDF KB. For each property in a RDF KB, there is a characteristic distribution of types for both the subject and the object. For example the property dbpedia-owl:location is used in 247,601 triples in DBpedia. Eighty-seven percent objects of these triples are belong to dbpedia-owl:Place. Furthermore, it uses statistic method to assign a certain weigh for each each property, which reflects its capability of predicting the type. Finally, the missing type can be inferred by the weight. because each inferred type assertion have a probability, in this paper, the inferred type assertions are named probabilistic type assertions (namely PTA).
The overall process of schema induction from incomplete semantic data.
In this section, we show the overall process to mine disjointness axioms and subclass axioms from incomplete semantic data. Figure 1 illustrates the process consisting of four steps: terminology acquisition, type inference, transaction table construction and axiom generation. More details about each step are provided as follows.
Terminology acquisition
The process of SIFS starts with obtaining terminology information required by our type inference algorithm. The terminology information includes the type assertions which indicates a instance belongs to a concept and those relations between a pair of individuals. We store the downloaded KB in a KB persistence tool, the required terminology information could be obtained directly by a SPARQL endpoint. In such cases, two SPARQL queries are used to obtain the terminology.
PTA computation
Based on the obtained semantic data, the type inference algorithm given in [20] can be applied to enrich an existing knowledge base
Because KB does not includes negative examples, in order to obtain them we use a threshold to measure a instance
If
Here,
.
Suppose we obtain the following triples by a SPARQL endpoint:
After applying the type inference algorithm, the following PTAs can be obtained:
Assume the threshold is set to be 0.5 (see Section 5 for more details about how to select a threshold). Then the instance La_Fontaine is taken as a negative example of concept Place and a positive example of concept Person. According to this, the following PTAs can be generated for constructing the transaction table:
Note that, Person(La_Fontaine) is considered as a new type assertion and will be added to the original knowledge base.
Transaction table construction
Transaction table in SIFS
Transaction table in SIFS
With the generated PTAs, a transaction table can be constructed (take Table 1 as an example). In the table, each column (namely an item) presents a concept defined in the original KB or its negation and each row (namely a transaction) corresponds to one individual. Each cell in the table is a value between 0 and 1, which indicates the probability of a type assertion or a probabilistic type assertion. The association rules to be mined are of the form
This is quite different from a traditional transaction table used in association rule mining techniques whose value of a cell is either 0 or 1. Also, in the approach given in the work to learn disjointness axioms [8], the value of a cell is 1 when the corresponding type assertion explicitly stated in the original knowledge base and 0 otherwise.
.
Assume the following type assertions are explicitly stated in an original knowledge base
Besides, the following type assertions are inferred from
Based on these assertions, Table 1 can be constructed by setting the threshold be 0.5 to judge whether an instance is positive or not. For instance, Lake_Davos is a negative example of Person since
Table 2 presents the transaction table constructed in a traditional way according to
Transaction table in CWA
For mining rules from a transaction table in SIFS, the traditional association rule mining algorithms cannot be applied directly due to the difference between the tables in SIFS and those in CWA. To deal with this problem, novel association rule mining algorithms need to be proposed. It is well known that, the support and confidence are two commonly used measures of rule strength in the field of data mining. Therefore, novel definitions of support and confidence (see Section 4 for details) need to be given before proposing new association rule mining algorithms.
Inspired by the work given in [9], the rules in the form of
Association rule mining in SIFS
To generate axioms from a transaction table in SIFS, support and confidence have to be redefined. In the following, we present the novel definitions of support and confidence respectively. Then specific algorithms are proposed to mine those rules that are used for generating disjointness axioms and subclass axioms.
Definition of support
In SIFS, the support of 1-itemset needs to be calculated first, which indicates how many instances belong to a concept. The support of 1-itemset can be defined as below according to the absolute support of an itemset.
In the equation,
In order to keep downward closure property of association rule mining, the support must decrease monotonically if more concepts are added into a rule. Thus, the support of a rule like
In the equation,
Right now, the support of a rule in SIFS is still an absolute value. This means that a user who assigns a threshold for support has to know the absolute size of the original knowledge base. To deal with this problem, we give a definition of the proportional support. In a naive way, the absolute value of the support of a 1-itemset could be used (see Definition 2) as a denominator.
It is noted that, a concept may not have many (probabilistic) type assertions due to the incompleteness of the original knowledge base. For such concepts, we prefer to ignore them which is a commonly used strategy in association rule mining. This will speed up the mining process in SIFS since the searching space could be largely reduced.
In the context of closed world assumption, the influence of missing data cannot be ignored when defining the confidence of a rule. It is because not considering mising data may make the confidence become larger than the actual one (see the definition of confidence given in Section 2). Thus, wrong rules could be obtained according to those confidences. Take the transaction table given in Table 2 as an example. The confidence of the rule
To deal with this problem, we define the following negative rules which can be used to compute how many (probabilistic) type assertions violate those rules to be mined:
It is reasonable to use these negative rules since it is unknown whether a missing instance violates the rules to be mined or not. The proportional support can imply the possibility of the negative rules.
When defining confidence of a rule, it is required to combine the support of the rule and that of its negative rule. The support of a negative rule serves as the constraint to reduce the confidence. For the rule to generate subclass axioms, Eqs (4) and (5) are given. Since a disjointness axiom is symmetric, the support of the rule to generate disjointness axioms should be different with that of the rule to generate subclass axioms. We use the operator
It is noted that,
Based on the equations from Eqs (4) to (7), the confidence of a rule
In the equation,
InputInput OutputOutput A transaction table
column
Our goal is to mine disjointness axioms and subclass axioms from an incomplete KB under OWA. The mining algorithm (see Algorithm 4.2) takes a transaction table
Algorithm 4.2 first computes the absolute support for all 1-itemsets in the input transaction table (see lines 3 and 4). From line 5 to line 8, the algorithm calculates the absolute support for those 2-itemsets whose 1-itemsets own the support greater than the threshold
Experimental settings
The dbpedia is larger than 20 GB and to accelerate computing speed, we do all the calculating in memory. Our experiments are performed on a server with Intel Xeon E5-2670 CPU and 128 GB of RAM using 64 bit operating system Ubuntu 14.04. The maximal heap space is set to be 60 GB. In the following, we introduce the data sets, evaluation measures and the strategies to select thresholds in detail.
Data sets
The statistics of the gold standards
The statistics of the gold standards
The KBs used in the experiments include DBpedia,3
http://wiki.dbpedia.org/Downloads351.
For these KBs, we manually constructed gold standards based on their ontologies. To be specific, a set of disjointness axioms are added to each ontology by assuming the siblings are disjoint. The statistics of the gold standards can be seen in Table 3. This table presents he number of all concepts and those that have no type assertions explicitly declared in the corresponding ABox in the original KB. It also shows the number of subclass axioms, disjointness axioms and equivalent axioms. From the table we can observe that NTN is the most sparse KB since 84% concepts have no instances in the gold standard. The performance of SIFS could be better reflected over such a highly incomplete KB.
In order to measure the performance of the implemented systems to generate schema information, the traditional precision and recall are used. Precision is used to measure how many selected items are relevant and recall is to measure how many relevant items are retrieved. See below for the definitions of these measures in the context of generating schema information.
In the equations, precision is the fraction of generated axioms that are correct according to the corresponding gold standard and recall is the fraction of correct axioms that are generated.
In this section, the parameter setting depends on the experiment and do not need any domain knowledge. A threshold is an important factor in finding negative examples (see Section 3.2) and may effect the performance of SIFS. In order to obtain the most suitable threshold, we randomly selected 500 instances whose types are unknown and manually checked the correctness of the results. Figure 2 demonstrates more information. From the figure we can observe that, 0.5 should be the best threshold to differentiate positive examples from negative ones.
Precision and recall of SIFS-S and SIFS-P with different thresholds
Precision and recall of SIFS-S and SIFS-P with different thresholds
Experimental results of threshold value.
For SIFS, the default threshold of support 0.7 is used to choose the candidate association rules. That is, if the proportional support of a rule generated by SIFS exceeds 0.7, it would be regarded as a candidate rule. This threshold corresponds to
When deciding a suitable threshold for filtering rules according to their confidences, three possible values are used. The threshold corresponds to
The evaluation of our algorithm to generate axioms can be divided into four experiments. The first experiment tests the influence of standard support and proportional one to SIFS. The second experiment compares SIFS with GoldMiner to see their performance when adopting different world assumptions. The third experiment compares SIFS with BelNet
It is noted that, we use SIFS-S and SIFS-P to indicate our system using standard support and that using proportional support separately. SIFS-P can be seen as an implementation of Algorithm 4.2. SIFS-S is actually the implementation of Algorithm 4.2 by ignoring the computation of proportional support in line 10 and modifying the threshold and proportional support in lines 11–12 and 18–19 for adapting to the change.
SIFS-S vs. SIFS-P
In this section, we compare SIFS-S with SIFS-P by using different thresholds to filter candidate association rules. Table 4 presents precision and recall of both systems to generate subclass axioms and disjointness axioms respectively. This experiment is conducted over DBpedia which is the most challenging KB in our dataset due to its huge number of facts.
From the precision and recall to generate subclass axioms we can observe that, both systems perform similarly. When generating disjointness axioms, SIFS-P performs better than SIFS-S. For example, when
The reason why SIFS-S could find more disjointness axioms but obtain lower precision and recall is that, SIFS-S always omits the big difference between the number of instances of the concepts in a candidate rule (or axiom). Take the disjointness axiom
Precision (left) and recall (right) of systems to generate subclass axioms.
Precision (left) and recall (right) of systems to generate disjointness axioms.
In this section, we compare the systems SIFS-P and GoldMiner. Both systems adopt association rule mining to generate axioms while the difference is the fact that GoldMiner uses different world assumptions to generate negative examples. The precision and recall of the systems to generate subclass and disjointness axioms can be seen in Figs 3 and 4 respectively. In these figures, 0.7, 0.8 and 0.9 in the horizontal axis indicate the thresholds to filter those rules with lower confidences.
This reflects the advantage of using type inference. That is, more type assertions recommended by type inference can increase the support of a rule and thus more correct subclass axioms could be generated. Besides, comparing with the precision and recall of systems to generate disjointness axioms, those to generate subclass axioms are higher, especially with respect to recall (see the left figures in Figs 3 and 4). Because lacking negative examples generating disjointness axioms is much more challenging than generating subclass axioms.
When mining disjointness axioms, the advantage of using type inference becomes more obvious, especially for DBpedia. When the recalls of SIFS-P and GoldMiner are quite similar, the precisions of SIFS-P are much better. For example, from Fig. 4 about DBpedia we can observe that, the difference between the precisions of both systems at any threshold is about 0.1 when their recalls are quite similar. This means that SIFS-P has found less disjointness axioms while containing similar number of correct disjointness axioms. In the case of threshold 0.7 for DBpedia, SIFS-P found about 4000 axioms less than GoldMiner.
SIFS-P performs better than GoldMiner on recall for subclass, and precision on DBPedia for disjointness when generating disjointness axioms. The reason are in such two aspects: (1) One aspect is that more noisy negative examples may be introduced when adopting CWA due to the imbalance of the number of instances of different concepts. For example, the confidence of
SIFS-P vs. BelNet
In this section, SIFS-P is compared with BelNet
Obviously, the precisions of BelNet
Application to detect inconsistencies
In this section, the experiment is conducted to verify the usefulness of our algorithm for detecting logical inconsistencies in DBpedia. For each learned disjointness axiom of the form
Table 5 presents the experimental result of detected inconsistencies. From the table we can see that, many logical inconsistencies have been found in DBpedia 3.5.1 by our algorithm. In particular, 99 pairs of disjoint concepts share only 1 to 9 common instances and 4 pairs of disjoint concepts share no less than 10 but less than 50 instances. For example, we can find two type assertions Person(Nuri) and Place(Nuri) from DBpedia. And we check the data from dbpedia, the description of Person(Nuri) is same as Place(Nuri). In dbpedia, Nuri is an same instance of both concept Person and Place which are obviously mutually disjoint. Thus, a logical inconsistency is caused. Before learning disjointness axioms, such inconsistencies cannot be detected due to the incompleteness of schema information.
The statistics of the inconsistencies
The statistics of the inconsistencies
Based on the analysis of the above experiments, we obtain the following main observations:
The adoption of type inference provides a promising solution to deal with the imbalance problem and also generates higher quality of negative examples than those generated under CWA. The proportional support remedies the disadvantage of using absolute values and improves the quality of learning results. SIFS-P perform better than the existing systems to learn schema information from incomplete data in most cases. High-quality disjointness axioms could help a reasoner to automatically find more contradictions in KBs.
The most relevant work to ours is the approach given in [9] which uses association rule mining to mine schema information including disjointness and subclass axioms. Their algorithms have been implemented in the system GoldMiner. This approach is extended in [8] by considering naïve negative association rule mining to learn disjointness axioms. Our systems SIFS-S and SIFS-P can be seen as an extension of GoldMiner by considering type inference and proportional support. However, as our experiments have shown, GoldMiner has difficulties to deal with the incompleteness problem and the standard support used by GoldMiner may filter out those concepts with few instances.
Another relevant work is given in [18] which proposes an ontology learning approach by integrating the probabilistic inference capability of Bayesian Networks with the logical formalism of Description Logics. The approach considers the incompleteness of KBs and uses inference in a Bayesian network that leverages the structure in the data to solve the noisy issue brought by CWA. This work considers the schema learning as instance classification. To be consistent with OWA of the data in the semantic web, the traditional confusion matrix is extended for considering unknown results. The corresponding algorithms have been implemented in BelNet
Our approach is inspired by the rule mining model given in [22], which conforms to the OWA. This work introduces novel definitions of support and confidence. In order to acquire the unknown facts, the notion of functionality is applied to acquire negative examples of a binary relation. Since the goal of this approach is to mine horn logical rules and a type assertion is not functional, it is not suitable to generate axioms from an incomplete KB.
Inductive logic programming (ILP), which marries machine learning and data mining, is often used to learn schema information. In order to induce concept definition axioms from existing instances, Lehmann and his colleagues propose an approach in [23] to generate candidate concept descriptions by a downward refinement operator. Later on, this approach is extended to deal with very large dataset in [24]. All the relevant algorithms have been implemented in the tool DL-Learner [16]. However, noisy negative examples will influence the performance of such approaches.
There are also other works to generate schema information. In [10], all concepts are mapped to vectors with the same dimension. If the similarity of two concepts is lower than a threshold, their work is regarded as disjoint. The authors of [12] proposes an appropriate heuristic rule for learning disjointness axioms. This heuristic rule assumes that all sibling concepts are disjoint. A light-weight approach to enrich knowledge is presented in [25], which uses SPARQL queries to learn axioms. However, it needs an artificial process to optimize the quality of disjointness axioms. The approach given in [26] is a supervised learning approach, whose training set needs to be constructed manually. It uses various kinds of external resources like ontology, texts, and other softwares (e.g., the label of Del.icio.us, WordNet [27] and OntoClean [28]).
Conclusion
In this paper, we proposed a novel approach to learning disjointness and subclass axioms from incomplete semantic data under OWA. We first applied the type inference algorithm to generate new probabilistic type assertions, which provides a promising solution to deal with the incompleteness or imbalance problem. We then introduced a mining model consisting of novel definitions of support and confidence using negative examples as constraints. We conducted extensive experiments to compare proportional support with standard support, and our system with existing systems. The experimental results showed that the proportional support performs better than the standard one and SIFS-P outperforms GoldMiner and BelNet
In the future, we plan to extend the SIFS-P to learn more kinds of axioms such as the axioms with existential restriction, universal restriction and the limited extensional quantification. Besides, more datasets will be considered to test the performance of SIFS-P.
