Conjunctive query answering over unrestricted OWL 2 ontologies

Abstract

Conjunctive Query (CQ) answering is a primary reasoning task over knowledge bases. However, when considering expressive description logics, query answering can be computationally very expensive; reasoners for CQ answering, although heavily optimized, often sacrifice expressive power of the input ontology or completeness of the computed answers in order to achieve tractability and scalability for the problem. In this work, we present a hybrid query answering architecture that combines various services to provide a CQ answering service for OWL. Specifically, it combines scalable CQ answering services for tractable languages with a CQ answering service for a more expressive language approaching the full OWL 2. If the query can be fully answered by one of the tractable services, then that service is used, to ensure maximum performance. Otherwise, the tractable services are used to compute lower and upper bound approximations. The union of the lower bounds and the intersection of the upper bounds are then compared. If the bounds do not coincide, then the “gap” answers are checked using the “full” service. These techniques led to the development of two new systems: (i) RSAComb, an efficient implementation of a new tractable answering service for RSA (role safety acyclic) (ii) ACQuA, a reference implementation of the proposed hybrid architecture combining RSAComb, PAGOdA, and HermiT to provide a CQ answering service for OWL. Our extensive evaluation shows how the additional computational cost introduced by reasoning over a more expressive language like RSA can still provide a significant improvement compared to relying on a fully-fledged reasoner. Additionally, we show how ACQuA can reliably match the performance of PAGOdA, a state-of-the-art CQ answering system that uses a similar approach, and can significantly improve performance when PAGOdA extensively relies on the underlying fully-fledged reasoner.

Keywords

CQ answering OWL 2 ontology approximation RSA combined approach

1. Introduction

Conjunctive Query (CQ) answering over Knowledge Bases (KBs) is a crucial reasoning task for many applications. However, when considering expressive Description Logic (DL) languages, query answering is computationally very expensive, even when considering only complexity w.r.t. the size of the data (data complexity) [62]. Fully-fledged reasoners oriented towards CQ answering over unrestricted OWL 2 ontologies exist but, although heavily optimized, they are only effective on small to medium datasets. In order to achieve tractability and scalability for the problem, we need to rely on limiting the expressive power of the input ontology or sacrifice the completeness of the computed answers.

Query answering procedures have been developed for several fragments of OWL 2 for which CQ answering is tractable with respect to data complexity [9]. Three such fragments have been standardized as OWL 2 profiles, and CQ answering techniques for these fragments have been shown to be highly scalable at the expense of expressive power [10,46,51,68,72,73]. An interesting fragment of OWL 2, tractable for standard reasoning tasks, is RSA, an ontology language that subsumes all the OWL 2 profiles, first presented by Carral et al. [14] and for which a CQ answering algorithm based on the combined approach technique [46,47] was proposed by Feier et al. [22].

In order to deal with more expressive ontologies, several techniques have been proposed to compute a sound subset of answers to a given CQ. One such technique is to approximate the input ontology to a tractable fragment, so a tractable algorithm can then be used to answer CQs over the approximated ontology. A particularly interesting approach to CQ answering over unrestricted OWL 2 ontologies, using a combination of the aforementioned techniques, is adopted by PAGOdA [85]. Its “pay-as-you-go” approach uses a Datalog reasoner to handle the bulk of the computation, computing lower and upper approximations of the answers to a query, while relying on a fully-fledged OWL 2 reasoner (HermiT [24]) only as necessary to fully answer the query.

While PAGOdA is able to avoid the use of a fully-fledged OWL 2 reasoner in some cases (i.e., when the lower and upper answer approximations coincide), its performance rapidly deteriorates when the input query requires (extensive) use of the underlying OWL 2 reasoner. The computation of lower and upper bounds is achieved by under- and over-approximating the ontology into the $RL$ profile of OWL 2 so that a tractable reasoner can be used for CQ answering. The tractability of $RL$ is achieved by avoiding problematic interactions between axioms that can cause an exponential blow-up of the computation (so-called and-branching). As it turns out, this elimination of problematic interactions between axioms is rather coarse, and PAGOdA ends up falling back to the underlying OWL 2 reasoner even when it is not really needed.

This work borrows from this “pay-as-you-go” technique and builds upon existing CQ answering techniques over OWL 2 ontologies. We propose a new hybrid query answering architecture that combines black-box1

¹
By “black-box” we mean that the details of the reasoning process are not relevant, and indeed any reasoner providing comparable reasoning services could be used. However, the semantics are completely transparent, and the results could be interpreted/explained using a wide variety of techniques from the extensive literature on this topic (e.g., [3]). Note that this differs from the definition of the term that is often used, e.g., in the context of Machine Learning (ML), where the semantics of the system is opaque to the user.

services to provide a CQ answering service for OWL. Specifically, it combines scalable CQ answering services for tractable languages with a CQ answering service for a more expressive language approaching the full OWL 2. If the query can be fully answered by one of the tractable services, then that service is used. Otherwise, the tractable services are used to compute lower and upper bound approximations, taking the union of the lower bounds and the intersection of the upper bounds. If the bounds do not coincide, then the “gap” answers are checked using the “full” service. When considering ontology approximations “from below”, we introduce a novel algorithm to compute a lower bound to the answers to an input query by means of approximation to RSA. This is done by ensuring that all the constraints for the RSA language are satisfied in the input KB. Similarly, we propose an algorithm to compute an approximation “from above” targetting

{RSA}^{+}

, an extension of RSA, for which the combined approach for CQ answering for RSA is still complete. The combined approach for RSA can then be used in both cases to compute the answer bounds. These techniques led to the development of two new system: RSAComb and ACQuA (Answering Conjunctive Queries using Approximation).

RSAComb

An efficient implementation [39,41] of the combined approach algorithm for RSA [22], reorganized to fit the new implementation design and the integration of RDFox [54,55,57,60] as a backend Datalog reasoner. We streamlined the execution of the algorithm by factoring out those steps in the combined approach that are query independent to make answering multiple queries over the same knowledge base more efficient. In addition, we included an improved version of the filtering step for the combined approach. The system accepts any OWL 2 KB and includes a customizable approximation step to languages compatible with the RSA combined approach. The system is further extended with a reference implementation of the novel approximation algorithms for the computation of answer bounds mentioned above.

ACQuA

A reference implementation [42] of the hybrid architecture mentioned above, combining RSAComb, PAGOdA [85], and HermiT [24] to provide a CQ answering service for OWL. The resulting system ensures the same “pay-as-you-go” capabilities of the systems it is based on. The system has been designed to accommodate a high degree of modularity; the services it is built upon can be potentially substituted or augmented with more capable ones to improve the overall performance.

We carried out an extensive evaluation both for RSAComb, as a standalone tool, and for ACQuA, to assess their effectiveness, and compare our results with PAGOdA, aiming, primarily, at improving some shortcomings of the latter tool. Our experimental results show that the new technique yields significant performance improvements in several important application scenarios. Both ACQuA2

https://github.com/KRR-Oxford/ACQuA

and RSAComb3

https://github.com/KRR-Oxford/RSAComb

have been released as free and open source software. Source code and documentation are available online.

The present paper includes some previously published work:

The algorithm for the approximation of an unrestricted OWL 2 ontology to RSA, sound for CQ answering, was presented in [40].

A full description of RSAComb system was presented in [39].

2. Preliminaries

We assume familiarity with standard concepts of first-order logic (FOL) such as term, variable, constant, predicate, atom, literal; refer to [1,5] for a formal introduction to these concepts.

We define a rule as an expression of the form $φ (\vec{x}, \vec{y}) \to ψ (\vec{x})$ , with $φ (\vec{x}, \vec{y})$ a conjunction of literals over variables $\vec{x} \cup \vec{y}$ and $ψ (\vec{x})$ a non-empty conjunction of atoms over $\vec{x}$ . Given a role r, we denote $head (r)$ the set of atoms in $ψ (\vec{x})$ , and ${body}^{+} (r)$ (resp. ${body}^{-} (r)$ ) the set of positive (resp. negative) literals in $φ (\vec{x}, \vec{y})$ .

We will call a rule definite without negation in its body, and Datalog a function-free definite rule. A Datalog rule is disjunctive if it admits disjunction in the head. A fact is a Datalog rule with an empty body. The definition can be trivially extended to sets of rules.

A program Π is a set of rules. Let $pred (X)$ be the set of predicates in X (with X being either a set of atoms, a rule, or a program). A stratification of a program Π is a function $δ : pred (Π) \to {1, \dots, k}$ with $k ⩽ | pred (Π) |$ , s.t. for every rule $r \in Π$ and $p \in pred (head (r))$ it holds:

for every $q \in pred ({body}^{+} (r))$ , $δ (q) ⩽ δ (p)$ ;

for every $q \in pred ({body}^{-} (r))$ , $δ (q) < δ (p)$ ;

The stratification partition of Π induced by δ is the sequence $(Π_{1}, \dots, Π_{k})$ with each $Π_{i}$ be the set of rules $r \in Π$ s.t. ${max}_{a \in head (r)} (δ (pred (a))) = i$ . Programs $Π_{i}$ are called strata of Π. A program is stratified if it admits a stratification. All definite programs are stratified.

A stratified program Π has a least Herbrand model (LHM), which is constructed using the immediate consequence operator $T_{Π}$ . Let $H_{U}$ and $H_{B}$ be the Herbrand universe and the Herbrand base of Π. Let $S \in H_{B}$ , then, $T_{Π} (S)$ consists of all facts in $head (r) σ$ with $r \in Π$ and σ a substitution for the variables in r to $H_{U}$ satisfying ${body}^{+} (r) σ \subseteq S$ and ${body}^{-} (r) σ \cap S = \emptyset$ . The powers of $T_{Π}$ are defined as follows: (i) $T_{Π}^{0} (S) = S$ , (ii) $T_{Π}^{n + i} (S) = T_{Π} (T_{Π}^{n} (S))$ , (iii) $T_{Π}^{ω} (S) = ⋃_{i = 0}^{\infty} T_{Π}^{i} (S)$ . Let $(Π_{1}, \dots, Π_{k})$ be a stratification partition for Π. We define $U_{1} = T_{Π_{1}}^{ω} (\emptyset)$ and for each $1 ⩽ i < k$ , $U_{i + 1} = T_{Π_{i + 1}}^{ω} (U_{i})$ Then, the LHM of Π is $U_{k}$ and is denoted as $M [Π]$ .

Given a stratified program Π, we define $Π^{\approx, ⊤}$ the program extended with standard axiomatization rules for equality (≈) and truth value (⊤) [22].

2.1. Ontologies and conjunctive query answering

Next we give a brief overview of the description logic languages used in the paper. We will define them as restrictions of $SROIQ$ [36], the description logic underpinning the OWL 2 ontology language [58], standardized by W3C. An ontology signature is a triple $⟨ N_{C}, N_{R}, N_{I} ⟩$ of computable disjoint sets of concept names, role names and individuals respectively. Two special concepts are provided: ⊥ (bottom concept) and ⊤ (top concept). We define a role as an element of $N_{R} \cup {R^{-} ∣ R \in N_{R}}$ , where $R^{-}$ is called inverse role. We also introduce a function $Inv (\cdot)$ closed for roles s.t. $\forall R \in N_{R} : Inv (R) = R^{-}, Inv (R^{-}) = R$ . An RBox $R$ is a finite set of axioms of type (R2)–(R4) in Table 1 with $R, S, T$ roles. We denote $⊑_{R}^{*}$ as a minimal relation over roles closed by reflexivity and transitivity s.t. $R ⊑_{R}^{*} S$ , $Inv (R) ⊑_{R}^{*} Inv (S)$ hold if $R ⊑ S \in R$ . A role R is transitive if a role T exists such that $T ⊑_{R}^{*} R$ , $R ⊑_{R}^{*} T$ and either $T \circ T ⊑ T \in R$ or $T^{-} \circ T^{-} ⊑ T^{-} \in R$ . A TBox $T$ is a set of axioms of type (T1)–(T7) where $A, B \in N_{C}$ , $a \in N_{I}$ , R is a role and $Self (\cdot)$ denotes the local reflexivity of a role. An ABox $A$ is a finite set of assertions of type (A1)–(A2) with $A \in N_{C}$ , $a, b \in N_{I}$ and $R \in N_{R}$ . A $SROIQ$ ontology is a set of axioms $O = T \cup R$ .4

⁴
In order to achieve decidability of reasoning, $SROIQ$ ontologies must satisfy certain additional requirements. These, however, do not affect the technical results reported in this paper.

An ontology is

{SHOIQ}^{+}

if we restrict axioms (R4) to role transitivity (i.e.,

R = S = T

). An ontology is

SHOIQ

if we further exclude axioms of type (T6), (T7) and (R3). An

{ALCHOIQ}^{+}

ontology (resp.

ALCHOIQ

) is obtained from

{SHOIQ}^{+}

(resp.

SHOIQ

) by disallowing (R4) axioms altogether. A Horn-

{ALCHOIQ}^{+}

ontology (resp. Horn-

ALCHOIQ

) is obtained from

{ALCHOIQ}^{+}

(resp.

ALCHOIQ

) by restricting

m = 1

in axioms (T1) and (T4). Finally, given an ontology language

L

, we define an

L

knowledge base as a couple

K = ⟨ O, A ⟩

comprising an

L

ontology

O = T \cup R

and an ABox

A

Table 1

Normalized $SROIQ$ axioms and their translation into logic rules

Axiom / Role / Assertion α		Definite rules $π (α)$
(R1)	$R^{-}$	$R (x, y) \to R^{-} (y, x) R^{-} (y, x) \to R (x, y)$
(R2)	$R ⊑ S$	$R (x, y) \to S (x, y)$
(R3)	$R ⊓ S ⊑ ⊥$	$R (x, y) \land S (x, y) \to ⊥$
(R4)	$R \circ S ⊑ T$	$R (x, y) \land S (y, z) \to T (x, z)$
(T1)	$⊓_{i = 1}^{n} A_{i} ⊑ ⨆_{i = 1}^{m} B_{i}$	$⋀_{i = 1}^{n} A_{i} (x) \to ⋁_{i = 1}^{m} B_{i} (x)$
(T2)	$A ⊑ {a}$	$A (x) \to x \approx a$
(T3)	$\exists R . A ⊑ B$	$R (x, y) \land A (y) \to B (x)$
(T4)	$A ⊑ ⩽ m R . B$	$A (x) \land ⋀_{i = 1}^{m + 1} [R (x, y_{i}) \land B (y_{i})] \to ⋁_{1 ⩽ i < j ⩽ m + 1} y_{i} \approx y_{j}$
(T5)	$A ⊑ \exists R . B$	$A (x) \to R (x, f_{R, B}^{A} (x)) \land B (f_{R, B}^{A} (x))$
(T6)	$A ⊑ Self (R)$	$A (x) \to R (x, x)$
(T7)	$Self (R) ⊑ A$	$R (x, x) \to A (x)$
(A1)	$A (a)$	$\to A (a)$
(A2)	$R (a, b)$	$\to R (a, b)$

Table 1 also introduces a normal form for each of these description logic languages, and w.l.o.g. we assume that any ontology introduced is restricted to these axioms. Each axiom in Table 1 corresponds to a single logic rule, provided on the right. We define $π (\cdot)$ as the translation function from axioms to logic rules; the function can be trivially extended to sets of axioms by mapping π over the set and to knowledge bases (i.e., $π (K) = {π (α) ∣ α \in A \cup T \cup R}$ ). Furthermore, we define $Π_{K} = π {(K)}^{⊥, \approx}$ as the logic program derived from $K$ extended with bottom and equality axiomatization rules (as defined in [14]).

OWL 2 profiles [53] have been defined as fragments of OWL 2, designed to provide a desirable balance between computational complexity of standard reasoning tasks and expressiveness of the ontology language. We will define these standard profiles as fragments of Horn- $ALCHOIQ$ .5

⁵

Property chain and transitivity axioms are not taken into consideration here to keep definitions compatible with [22].

OWL 2 EL is based on the ${EL}^{+ +}$ DL language [4,6]; it does not contain inverse roles (R1) and axioms of type (T4);

OWL 2 RL is inspired by Description Logic Programs [29] and corresponds to a subset of Datalog; it does not contain axioms of type (T5);

OWL 2 QL is based on the DL-Lite $_{R}$ DL language [10]; it does not contain axioms (T2), (T4), axioms (T1) satisfy $n = 1$ and axioms (T3) satisfy $A = ⊤$ .

Note that, when not considering transitive roles, the definition of

{EL}^{+ +}

, the logic underpinning OWL 2 EL, matches

{ELHO}_{⊥}^{r}

[72,85].

A conjunctive query (CQ) q is a formula $\exists \vec{y} . ψ (\vec{x}, \vec{y})$ with $ψ (\vec{x}, \vec{y})$ a conjunction of function–free atoms over $\vec{x} \cup \vec{y}$ , and $\vec{x}$ , $\vec{y}$ are called answer variables and bounded variables, respectively. We call boolean conjunctive query (BCQ) a query where $| \vec{x} | = 0$ (i.e., the set of answer variable is empty). A query is atomic if $ψ (\vec{x}, \vec{y})$ consists of a single atom and $| \vec{y} | = 0$ .

A knowledge base $K = ⟨ T \cup R, A ⟩$ is satisfiable if $Π_{K} ⊭ \exists y . ⊥ (y)$ . A tuple of constants $\vec{a}$ is a possible answer to q w.r.t. $K$ if $| \vec{a} | = | \vec{x} |$ and each constant in $\vec{a}$ occurs in $K$ . $\vec{a}$ is a certain answer to q w.r.t. $K$ if $K$ is unsatisfiable or $\vec{a}$ is a possible answer and $Π_{K} ⊧ \exists \vec{y} . ψ (\vec{a}, \vec{y})$ . The set of certain answers to a query q is denoted by $cert (q, K)$ . We say that a possible answer $\vec{a}$ is a ground answer to q w.r.t. a satisfiable knowledge base $K$ if a tuple of constants $\vec{e}$ in $K$ exists such that $| \vec{y} | = | \vec{e} |$ and $Π_{K} ⊧ ψ (\vec{a}, \vec{e})$ . We denote the set of ground answers with $ground (q, K)$ . It is straightforward to see that $ground (q, K) \subseteq cert (q, K)$ .

CQs can be alternatively represented as Datalog rules. Let $P_{q}$ a fresh predicate of arity $| \vec{x} |$ uniquely associated with q. Then let $q_{r} = P_{q} (\vec{x}) \leftarrow ψ (\vec{x}, \vec{y})$ be the Datalog rule representing q. This allows to characterize certain answers by means of entailment of a single fact, i.e., $\vec{a} \in cert (q, K)$ iff $Π_{K} \cup {q_{r}} ⊧ P_{q} (\vec{a})$ .

We say that q is internalizable if it can be turned into an ontology axiom. This process is known as rolling-up [36] and is implemented in some solvers (e.g., Pellet [71]) to provide sound and complete CQ answering over OWL 2 DL over certain answer semantics when considering internalizable queries.

Given a knowledge base $K$ and a CQ q, we call conjunctive query answering the reasoning problem of computing all certain answers of q w.r.t. $K$ . The decision problem associated with CQ answering is called conjunctive query entailment (CQE). Given a knowledge base $K$ , a CQ q and a possible answer $\vec{a}$ , CQE is the problem of deciding whether $Π_{K} ⊧ \exists \vec{y} ψ (\vec{a}, \vec{y})$ .

2.2. PAGOdA

PAGOdA is a hybrid reasoner for sound and complete CQ answering over OWL 2 KBs, adopting a “pay-as-you-go” technique to compute the certain answers to a given query. The idea is to compute lower/upper bound approximations to the answers to a query by approximating the input ontology into a less expressive language and possibly provide a fallback (more expensive) algorithm to process the answers in the gap between the bounds; to achieve this, it uses a combination of a Datalog reasoner and a fully-fledged OWL 2 reasoner. PAGOdA treats the two systems as black boxes and tries to offload the bulk of the computation to the former and relies on the latter only when necessary.

The capabilities, performance, and scalability of PAGOdA inherently depend on the ability of the fully-fledged OWL 2 reasoner in use, and the ability to delegate the workload to a given Datalog reasoner. In the best scenario, with an OWL 2 reasoner, PAGOdA is able to answer internalisable queries [36] under certain answer semantics [85] over OWL 2 DL.

In the following is a high level description of the procedure adopted by PAGOdA to compute the answers to a query. This will prove useful to understand how this approach will be later integrated in our system, ACQuA. For a more in-depth description of the algorithm and heuristics in use, we refer the reader to [84].

Given a KB $K = ⟨ T \cup R, A ⟩$ 6

⁶
In the following we consider the input KB to be consistent and normalized. This is ensured by PAGOdA’s preprocessing step.

and a query q, PAGOdA executes the following steps in order to compute the answers to q w.r.t.

K

the Datalog reasoner is exploited to compute a lower bound $L^{q}$ and an upper bound $U^{q}$ to the answers to the query q. This is achieved by approximating the input KB $K$ into a tractable language to be handled by the Datalog reasoner. Depending on the approximation procedure, running the query over the approximated ontology will result in either a lower or an upper bound of the certain answers to the query. The lower bound $L^{q}$ is obtained as follows:

the disjunctive Datalog subset of the input ontology, denoted with $K^{D D}$ , is computed by dropping any axiom that does not correspond to a disjunctive Datalog rule;

using a variant of shifting [18], $K^{D D}$ is polynomially transformed in order to eliminate disjunction in the head. The resulting ontology $shift (K^{D D})$ is sound but not necessarily complete for CQ answering;

a first materialization is performed, i.e., $M_{1} = M [shift (K^{D D})]$ , and the resulting facts are added back to the input knowledge base to obtain $K^{'} = ⟨ T \cup R, A \cup M_{1} ⟩$ ;

the ${ELHO}_{⊥}^{r}$ [72] subset of $K^{'}$ is computed, denoted $K_{E L}^{'}$ , dropping any axiom that is not in ${ELHO}_{⊥}^{r}$ ;

the combined approach for ${ELHO}_{⊥}^{r}$ [51,73] is used to compute the answers to the query q over $K_{E L}^{'}$ .

The upper bound

U^{q}

is computed by executing the query over the ontology, modified as follows:

the ⊥ concept is substituted with a fresh concept name $⊥_{s}$ to avoid directly deriving falsehood;

disjuncts in the head of an axiom are reduced to a single disjunct. The “most favourable” disjunct is chosen according to a polynomial choice function that reasons over the dependency graph of the input ontology;

existential axioms of type (T5) are constant Skolemized.

If lower and upper bound coincide (i.e., $L^{q} = U^{q}$ ) then the Datalog reasoner was able to provide a sound and complete set of answers to the input query. The computation terminates;

otherwise, the “gap” between the upper and lower bound (i.e., $G^{q} = U^{q} ∖ L^{q}$ ) is a set of answers that need to be verified against the KB using a fully-fledged OWL 2 reasoner. The Datalog reasoner is again exploited for this step to compute a subset $K^{q}$ of the KB $K$ that is enough to check whether the answers in $G^{q}$ are certain or spurious;

for each $\vec{a} \in G^{q}$ , the fully-fledged reasoner is used to check whether $K^{q} ⊧ q (\vec{a})$ . This process is further optimized by reducing the number of answers in $G^{q}$ that need to be checked by means of summarization [16];

once all spurious answers have been removed from $G^{q}$ , $L^{q} \cup G^{q}$ is returned.

Let’s take the lower bound computation as an example: the two performed approximations (i.e., to disjunctive Datalog and to ${ELHO}_{⊥}^{r}$ ) are handled independently, by means of materialization in the first case, and the combined approach in the second; this allows PAGOdA to avoid having to deal with and-branching and the resulting intractability of most reasoning problems (see Definition 2.1). In fact, OWL 2 RL (Datalog) and ${ELHO}_{⊥}^{r}$ are used by PAGOdA to eliminate all interactions between axioms (T5) and either axioms (T4) or axioms (T3) and (R1).7

⁷

OWL 2 RL does not allow axioms (T5) and $EL$ (which contains ${ELHO}_{⊥}^{r}$ ) does not allow axioms (T4) or inverse roles (R1).

However, not all such interactions cause an exponential jump in complexity, and PAGOdA’s filtering of such cases is unnecessarily coarse. We will see in the next sections, how this procedure can be improved by introducing an alternative approximation algorithm.

PAGOdA’s reference implementation8

⁸

https://github.com/KRR-Oxford/PAGOdA

uses RDFox as a Datalog reasoner and HermiT as the underlying fully-fledged reasoner. It accepts any OWL 2 DL ontology as input, alongside a dataset in Turtle format and CQs in SPARQL [33].

PAGOdA ensures that the returned answers are always complete under ground semantics, while being ultimately limited by the capabilities of HermiT when considering the returned answers under certain answer semantics. HermiT does not natively support CQ answering and the process needs to be reduced to fact entailment first. This is possible when the input query is internalisable, i.e., the query can be rolled-up into a set of DL concept assertions. In this scenario PAGOdA returns a set of answers that is sound and complete under certain answers semantics if the bounds match or the query can be internalized into a DL concept. Otherwise, PAGOdA will return a sound set of answers (complete under ground semantics) and a bound on the incompleteness of the computed answers (under certain answers semantics).

2.3. The RSA ontology language

As we mentioned above, ACQuA combines multiple ontology approximation algorithms to compute the answers to an input query. As part of this work, we propose novel approximation algorithms that target RSA (and its extension ${RSA}^{+}$ ). In this section, we provide a brief introduction to the RSA ontology language [14], along with a description of a combined approach for CQ answering [22].

RSA (role safety acyclic) is a class of ontologies designed to subsume all OWL 2 profiles, while maintaining tractability of standard reasoning tasks. The RSA ontology language is designed to avoid interactions between axioms that can result in the ontology being satisfied only by exponentially large (and potentially infinite) models. This problem is often called and-branching and can be caused by interactions between axioms of type (T5) with either axioms (T3) and (R1), or axioms (T4), in Table 1.

Fig. 1.

Example of exponential model enumerating numbers from 0 to $2^{n} - 1$ for $n = 3$ . The KB is polynomial in n.

Example 2.1.

Interaction between existential quantifiers (axioms of type (T5)) and universal quantifiers (encoded by axioms of type (T3) and (R1)) can lead to an ontology that may only be satisfied by models of exponential size.

Consider the following knowledge base with ABox $A = {(\neg A_{0} ⊓ \dots ⊓ \neg A_{n - 1}) (a)}$ for some n, and a TBox containing the following axioms, for $0 ⩽ i < n$ : $\begin{array}{l} (1) & \neg A_{i} ⊓ ⊓_{j < i} A_{j} ⊑ B_{i} ⊓ \exists R . A_{i} ⊓ \forall R . (⊓_{j < i} \neg A_{j}) \\ (2) & \forall_{j > i} (B_{i} ⊓ A_{j} ⊑ \forall R . A_{j}) \\ (3) & \forall_{j > i} (B_{i} ⊓ \neg A_{j} ⊑ \forall R . \neg A_{j}) \end{array}$ The size of the knowledge base is polynomial w.r.t. n. It can be shown that this knowledge base enforces a chain of individual of length $2^{n}$ where each individual represents a number from 0 to $2^{n} - 1$ encoded in binary (i.e., each $A_{i}$ represents a bit in position i, where an $A_{i}$ encodes a 1 and a $\neg A_{i}$ encodes a 0). Figure 1 shows an example of the exponentially large model for $n = 3$ .

RSA is based on the Horn- $ALCHOIQ$ ontology language, restricting the interaction between axioms to ensure a polynomial bound on model size [14]. For the following section we will consider a generic Horn- $ALCHOIQ$ knowledge base $K = ⟨ T \cup R, A ⟩$ over the signature $Σ_{K} = ⟨ N_{C}, N_{R}, N_{I} ⟩$ .

Definition 2.1 ([22], Definition 1).

A role R in $K$ is unsafe if it occurs in axioms (T5), and there is a role S s.t. either of the following holds:

$R ⊑_{R}^{*} Inv (S)$ and S occurs in an axiom (T3) with left-hand side concept $\exists S . A$ where $A \neq ⊤$ ;

S is in an axiom (T4) and $R ⊑_{R}^{*} S$ or $R ⊑_{R}^{*} Inv (S)$ .

A role R in

K

is safe if it is not unsafe.

Note that, all OWL 2 profiles ( $RL$ , $EL$ and $QL$ ) as defined in Section 2.1, contain only safe roles.

Example 2.2.
In Example 2.1, R is unsafe. In fact, even for $n = 1$ , we have that $\begin{matrix} (4) & \neg A_{0} ⊑ B_{0} ⊓ \exists R . A_{0} B_{0} ⊓ A_{1} ⊑ \forall R . A_{1} \end{matrix}$ are part of the program and can be rewritten as $\begin{matrix} (5) & \neg A_{0} ⊑ B_{0} \neg A_{0} ⊑ \exists R . A_{0} B_{0} ⊓ A_{1} ⊑ X \exists R^{-} . X ⊑ A_{1} \end{matrix}$ using some standard normalization and the fact that $A ⊑ \forall R . B \equiv \exists R^{-} A ⊑ B$ .

Then, R appears in an axiom of type (T5), $S \equiv R^{-}$ , $R ⊑_{R}^{*} Inv (S)$ and S occurs in an axiom (T3) with left-hand side concept $\exists S . A$ where $A \neq ⊤$ ;

The distinction between safe and unsafe roles makes it possible to strengthen the translation π from Table 1 while preserving satisfiability and entailment of unary facts.
Definition 2.2 ([22], Definition 2).

Let $v_{R, B}^{A}$ be a fresh constant for each pair of concepts A, B and each safe role R in $K$ . The function $π_{safe}$ is defined for each axiom α in $K$ : $\begin{matrix} (6) & π_{safe} (α) = \{\begin{matrix} A (x) \to R (x, v_{R, B}^{A}) \land B (v_{R, B}^{A}) & if α is of type (T5) and R safe \\ π (α) & otherwise . \end{matrix} \end{matrix}$ Let $P = {π_{safe} (α) ∣ α \in K}$ and $P_{K} = P^{\approx, ⊤}$ .

Theorem 2.1 ([14], Theorem 2).

A Horn- $ALCHOIQ$ knowledge base $K$ is satisfiable iff $P_{K} ⊭ ⊥$ . If $K$ is satisfiable, then, $K ⊧ A (c)$ iff $A (c) \in M [P_{K}]$ for each unary predicate A and constant c in $K$ .

Note that if $K$ contains unsafe roles, the model $M [P_{K}]$ might be exponentially large or infinite.

Potential bad interactions between unsafe roles can be avoided by detecting any cyclic or diamond-shape materialization involving unsafe roles. This check is performed by constant Skolemizing all existential axioms with an unsafe role, and building a graph representing the materialization process, projected on unsafe interaction. Checking that the graph is an oriented forest, along with some additional conditions which preclude harmful interactions between equality-generating axioms and inverse roles, leads to the definition of RSA.

Definition 2.3 ([22], Definition 3).

Let $PE$ and $E$ be fresh binary predicates, let $U$ be a fresh unary predicate, and let $u_{R, B}^{A}$ be a fresh constant for each concept $A, B \in N_{C}$ and each role $R \in N_{R}$ . Then, for each axiom α in $K$ $\begin{matrix} (7) & π_{RSA} (α) = \{\begin{matrix} A (x) \to R (x, u_{R, B}^{A}) \land B (u_{R, B}^{A}) \land PE (x, u_{R, B}^{A}) & if α is of type (T5) \\ π (α) & otherwise . \end{matrix} \end{matrix}$ The program $P_{RSA}$ consists of $π_{RSA} (α)$ for each $α \in K$ , rule $U (x) \land PE (x, y) \land U (y) \to E (x, y)$ and facts $U (u_{R, B}^{A})$ for each $u_{R, B}^{A}$ , with R unsafe.

Let $M_{RSA}$ be the LHM of $P_{RSA}^{\approx, ⊤}$ . Then, $G_{K}$ is the digraph with an edge $(c, d)$ for each $E (c, d)$ in $M_{RSA}$ . Knowledge base $K$ is equality-safe if:

for each pair of atoms $w \approx t$ (with w and y distinct) and $R (t, u_{R, B}^{A})$ in $M_{RSA}$ and each role S s.t. $R ⊑ Inv (S)$ , it holds that S does not occur in an axiom (T4), and

for each pair of atoms $R (a, u_{R, B}^{A}), S (u_{R, B}^{A}, a)$ in $M_{RSA}$ with $a \in N_{I}$ , there is no role T such that both $R ⊑_{R}^{*} T$ and $S ⊑_{R}^{*} Inv (T)$ hold.

We say that $K$ is RSA if it is equality-safe and $G_{K}$ is an oriented forest.9

⁹
An oriented forest is a directed acyclic graph whose underlying undirected graph is a forest.

Definition 2.4.

With reference to Definition 2.3, let $K$ be a Horn- ${ALCHOIQ}^{+}$ knowledge base. Then, $K$ is ${RSA}^{+}$ if it is equality-safe and $G_{K}$ is an oriented forest.

The fact that $G_{K}$ is a DAG ensures that the LHM $M [P_{K}]$ is finite, whereas the lack of “diamond-shaped” subgraphs in $G_{K}$ guarantees polynomiality of $M [P_{K}]$ . The definition gives us a programmatic procedure to determine whether a Horn- $ALCHOIQ$ (resp. Horn- ${ALCHOIQ}^{+}$ ) KB is RSA (resp. ${RSA}^{+}$ ).

Theorem 2.2 ([14], Theorem 3).

If $K$ is RSA, then the size of $M [P_{K}]$ is polynomial in the size of $K$ .

Tractability of standard reasoning tasks for RSA ontologies follows from Theorem 2.1 and Theorem 2.2.

2.3.1. Combined approach for RSA

The combined approach for RSA consists of two main steps to be offloaded to a Datalog reasoner able to handle stratified negation and function symbols.

The first step computes the canonical model of an RSA ontology over an extended signature (introduced to deal with inverse roles and directionality of newly generated binary atoms). The computed canonical model is not universal and, as such, might lead to spurious answers in the evaluation of CQs.

The second step of the computation performs a filtration of the computed answers to identify only the certain answers to the input query.

Canonical model computation The computation of the canonical model for a knowledge base $K$ is performed by computing the LHM of the definite program obtained by translating $K$ according to the rules in Table 2. The translation is an enhanced version of the translation given in Table 1 where axioms of type (T5) are Skolemized if the role involved is unsafe, and constant Skolemized otherwise. Constant Skolemization of some axioms can introduce forks in the canonical model that can lead to spurious answers. In order to keep track of these forks, directionality is taken into account when Skolemizing an axiom; roles are annotated with the direction in which they are “generated” (during the materialization process), and the annotation is propagated according to axioms in the RBox. This is still not enough to detect spurious forks in the canonical model and, in particular, cycles of length one (self-loops) or two can be the source of ambiguity during materialization. In order to solve the ambiguity of the canonical model, cycles of length one and two are unfolded into cycles of length three and four, respectively.

First we define the Datalog program $E_{K}$ used to compute the canonical model for $K$ .

Table 2
Translation of Horn- $ALCHOIQ$ axioms to build $E_{K}$

Axioms in $K$ LP rules

non-(T5) axiom α $π (α)$

$R ⊑ S$ , $* \in {f, b}$ $R^{} (x, y) \to S^{} (x, y)$

R role, $* \in {f, b}$ $R^{*} (x, y) \to R (x, y)$

$R^{f} (x, y) \to Inv {(R)}^{b} (y, x)$

$R^{b} (x, y) \to Inv {(R)}^{f} (y, x)$

(T5) axiom, R unsafe $A (x) \to R^{f} (x, f_{R, B}^{A} (x)) \land B (f_{R, B}^{A} (x))$

(T5) axiom, R safe $A (x) \land notIn (x, unfold (A, R, B)) \to R^{f} (x, v_{R, B}^{A, 0}) \land B (v_{R, B}^{A, 0})$

$A (v_{R, B}^{A, i}) \to R^{f} (v_{R, B}^{A, i}, v_{R, B}^{A, i + 1}) \land B (v_{R, B}^{A, i + 1})$ , if $R \in confl (R)$ , for $i = 0, 1$

$A (x) \to R^{f} (x, v_{R, B}^{A, 1}) \land B (v_{R, B}^{A, 1})$ , for every $x \in cycle (A, R, B)$

Axioms in $K$	LP rules
non-(T5) axiom α	$π (α)$
$R ⊑ S$ , $* \in {f, b}$	$R^{} (x, y) \to S^{} (x, y)$
R role, $* \in {f, b}$	$R^{*} (x, y) \to R (x, y)$
$R^{f} (x, y) \to Inv {(R)}^{b} (y, x)$
$R^{b} (x, y) \to Inv {(R)}^{f} (y, x)$
(T5) axiom, R unsafe	$A (x) \to R^{f} (x, f_{R, B}^{A} (x)) \land B (f_{R, B}^{A} (x))$
(T5) axiom, R safe	$A (x) \land notIn (x, unfold (A, R, B)) \to R^{f} (x, v_{R, B}^{A, 0}) \land B (v_{R, B}^{A, 0})$
$A (v_{R, B}^{A, i}) \to R^{f} (v_{R, B}^{A, i}, v_{R, B}^{A, i + 1}) \land B (v_{R, B}^{A, i + 1})$ , if $R \in confl (R)$ , for $i = 0, 1$
$A (x) \to R^{f} (x, v_{R, B}^{A, 1}) \land B (v_{R, B}^{A, 1})$ , for every $x \in cycle (A, R, B)$

Definition 2.5 ([22], Definition 4).

Let $confl (R)$ be the set of roles S s.t. $R ⊑_{R}^{*} T$ and $S ⊑_{R}^{*} Inv (T)$ for some T. Let ≺ be a strict total order on triples $(A, R, B)$ , with R safe and $A, B$ concept names in $K$ . For each $(A, R, B)$ , let $v_{R, B}^{A, 0}$ , $v_{R, B}^{A, 1}$ and $v_{R, B}^{A, 2}$ be fresh constants; let $self (A, R, B)$ be the smallest set containing $v_{R, B}^{A, 0}$ and $v_{R, B}^{A, 1}$ if $R \in confl (R)$ ; and let $cycle (A, R, B)$ be the smallest set of terms containing, for each $S \in confl (R)$ ,

$v_{S, C}^{D, 0}$ if $(A, R, B) ≺ (D, S, C)$ ;

$v_{S, C}^{D, 1}$ if $(D, S, C) ≺ (A, R, B)$ ;

$f_{S, C}^{D} (v_{S, C}^{D, 0})$ and each $f_{T, E}^{F} (v_{S, C}^{D, 0})$ s.t. $u_{S, C}^{D} \approx u_{T, E}^{F}$ is in $M_{RSA}$ , if S is unsafe.

Finally,

unfold (A, R, B) = self (A, R, B) \cup cycle (A, R, B)

Let $R^{f}$ and $R^{b}$ be fresh binary predicates for each role R in $K$ , let $NI$ be a fresh unary predicate, and $notIn$ be a built-in predicate which holds when the first argument is not an element of the set given as the second element. Let $P$ be the smallest program with a rule $\to NI (a)$ for each constant a and all rules in Table 2. We define $E_{K} = P^{\approx, ⊤}$ .

The set $confl (R)$ intuitively contains the roles that are source of ambiguity in conjunction with R and hence need to be potentially unfolded if part of a loop; the arbitrary order ≺ determines the direction in which the loops are unfolded. Since the input ontology is RSA, there is no loop introduced by unsafe roles, and hence axiom of type (T5) involving unsafe roles don’t need to be constant Skolemized.

The canonical model for an RSA input ontology is defined as $M [E_{K}]$ .

Theorem 2.3 ([22],Theorem 3).

The following holds:

$M [E_{K}]$ is polynomial in $| K |$ ;

$K$ is satisfiable iff $E_{K} ⊭ \exists y . ⊥ (y)$ ;

if $K$ is satisfiable, $K ⊧ A (c)$ iff $A (c) \in M [E_{K}]$ ;

there are no terms $s, t$ and role R s.t. $E_{K} ⊧ R^{f} (s, t) \land R^{b} (s, t)$ .

Filtering spurious answers For the filtering step, a query dependent logic program $P_{q}$ is introduced to filter out all spurious answers to an input query q over the extended canonical model $M [E_{K}]$ computed in the previous section. The program identifies and discards any match that cannot be enforced by a TBox alone and hence correspond to spurious answers introduced by the canonical model. This includes the task of detecting forks and cycles in the model and answers that contain anonymous terms (i.e., functional terms and constants introduced as part of the canonical model program). Rules for the filtering program are provided in Table 3.

Table 3
Rules in $P_{q}$ . Variables u, v, w from U are distinct

(1) $ψ (\vec{x}, \vec{y}) \to QM (\vec{x}, \vec{y})$

(2) $\to named (a)$ for each constant a in $O$

(3a) $QM (\vec{x}, \vec{y}), not NI (y_{i}) \to id (\vec{x}, \vec{y}, i, i)$ for each $1 ⩽ i ⩽ | \vec{y} |$

(3b) $id (\vec{x}, \vec{y}, u, v) \to id (\vec{x}, \vec{y}, v, u)$

(3c) $id (\vec{x}, \vec{y}, u, v), id (\vec{x}, \vec{y}, v, w) \to id (\vec{x}, \vec{y}, u, w)$

(4a) for all $R (s, y_{i})$ , $S (t, y_{j})$ in q with $y_{i}, y_{j} \in \vec{y}$

$R^{f} (s, y_{i}) \land S^{f} (t, y_{j}) \land id (\vec{x}, \vec{y}, i, j) \land not s \approx t \to fk (\vec{x}, \vec{y})$

(4b) for all $R (s, y_{i})$ , $S (y_{j}, t)$ in q with $y_{i}, y_{j} \in \vec{y}$

$R^{f} (s, y_{i}) \land S^{b} (y_{j}, t) \land id (\vec{x}, \vec{y}, i, j) \land not s \approx t \to fk (\vec{x}, \vec{y})$

(4c) for all $R (y_{i}, s)$ , $S (y_{j}, t)$ in q with $y_{i}, y_{j} \in \vec{y}$

$R^{b} (y_{i}, s) \land S^{b} (y_{j}, t) \land id (\vec{x}, \vec{y}, i, j) \land not s \approx t \to fk (\vec{x}, \vec{y})$

for all $R (y_{i}, y_{j})$ , $S (y_{k}, y_{l})$ in q with $y_{i}, y_{j}, y_{k}, y_{l} \in \vec{y}$

(5a) $R^{f} (y_{i}, y_{j}) \land S^{f} (y_{k}, y_{l}) \land id (\vec{x}, \vec{y}, j, l) \land y_{i} \approx y_{k} \land not NI (y_{i}) \to id (\vec{x}, \vec{y}, i, k)$

(5b) $R^{f} (y_{i}, y_{j}) \land S^{b} (y_{k}, y_{l}) \land id (\vec{x}, \vec{y}, j, k) \land y_{i} \approx y_{l} \land not NI (y_{i}) \to id (\vec{x}, \vec{y}, i, l)$

(5c) $R^{b} (y_{i}, y_{j}) \land S^{b} (y_{k}, y_{l}) \land id (\vec{x}, \vec{y}, i, k) \land y_{j} \approx y_{l} \land not NI (y_{j}) \to id (\vec{x}, \vec{y}, j, l)$

(6) for each $R (y_{i}, y_{j})$ in q with $y_{i}, y_{j} \in \vec{y}$ and $* \in {f, b}$

$R^{} (y_{i}, y_{j}) \land id (\vec{x}, \vec{y}, i, v) \land id (\vec{x}, \vec{y}, j, w) \to {AQ}^{} (\vec{x}, \vec{y}, v, w)$

for each $* \in {f, b}$

(7a) ${AQ}^{} (\vec{x}, \vec{y}, u, v) \to {TQ}^{} (\vec{x}, \vec{y}, u, v)$

(7b) ${AQ}^{} (\vec{x}, \vec{y}, u, v) \land {TQ}^{} (\vec{x}, \vec{y}, v, w) \to {TQ}^{} (\vec{x}, \vec{y}, u, w)$

(8a) $QM (\vec{x}, \vec{y}) \land not named (x) \to sp (\vec{x}, \vec{y})$ for each $x \in \vec{x}$

(8b) $fk (\vec{x}, \vec{y}) \to sp (\vec{x}, \vec{y})$

(8c) ${TQ}^{} (\vec{x}, \vec{y}, v, v) \to sp (\vec{x}, \vec{y})$ for each $* \in {f, b}$

(9) $QM (\vec{x}, \vec{y}) \land not sp (\vec{x}, \vec{y}) \to Ans (\vec{x})$

Filtering program $P_{q}$ and its extension $P_{q, K}$ with $E_{K}$ from Def. 2.5 are defined as follows.

Definition 2.6 ([22], Definition 5).

Let $q = \exists \vec{y} . ψ (\vec{x}, \vec{y})$ be a CQ, let QM, sp, and fk be fresh predicates of arity $| \vec{x} | + | \vec{y} |$ , let id, ${AQ}^{*}$ , ${TQ}^{*}$ with $* \in {f, b}$ be fresh predicates of arity $| \vec{x} | + | \vec{y} | + 2$ , let Ans be a fresh predicate of arity $| \vec{x} |$ , let named be a fresh unary predicate, and let U be a set of fresh variables s.t. $| U | ⩾ | \vec{y} |$ . Then, $P_{q}$ is the smallest program with all rules in Table 3, and $P_{q, K}$ is defined as $E_{K} \cup P_{q}$ .

Theorem 2.4 ([22], Theorem 4).

Let $P_{q}$ be the filtering program for q, and $P_{q, K} = E_{K} \cup P_{q}$ . It holds that [ 22 ]:

$P_{q, K}$ is stratified;

$M [P_{q, K}]$ is polynomial in $| K |$ and exponential in $| q |$ ;

if $K$ is satisfiable, $\vec{x} \in cert (q, K)$ iff $P_{q, K} ⊧ Ans (\vec{x})$ .

We can then build a worst-case exponential algorithm that, given an ontology $K$ and a CQ q, it materializes $P_{q, K}$ and returns all instances of predicate Ans. We obtain a “guess and check” algorithm that leads to an NP–completeness result for BCQs [22]. The algorithm first materializes $E_{K}$ in polynomial time and then guesses a match σ to q over the materialization; finally it computes $(P_{q, K}) σ$ .

Theorem 2.5 ([22], Theorem 5).

Checking whether $K ⊧ q (\vec{x}, \vec{y})$ with $K$ an RSA ontology and $q (\vec{x}, \vec{y})$ a BCQ is NP –complete in combined complexity. 10

¹⁰
It is worth noting that, to the best of our knowledge, at the moment a similar bound for ${RSA}^{+}$ is not know.

3. Overview

We propose a hybrid query answering architecture which provides CQ answering capabilities for OWL 2 by means of combining different answering services treated as black-boxes. In particular, we combine scalable CQ answering services targeting tractable ontology languages with answering services for more expressive languages approaching the full OWL 2.

Given an input CQ over a certain knowledge base, we process the query using the tractable services; if the query can be fully answered by one of these tractable services, we simply provide the resulting answers to the user. Otherwise, we compute multiple lower and upper bounds to the answers to the query by approximating the knowledge base “from above” and “from below” and taking the union of the lower bounds and the intersection of the upper bounds. Finally, if the bounds do not coincide, the “gap” answers are validated by using the “full” service.

As part of this work, we introduce a novel algorithm to compute a lower bound to the answers to an input query by means of approximation to RSA. Similarly, we propose an algorithm to compute an approximation “from above” targetting ${RSA}^{+}$ .

The reference implementation ACQuA is built on top of the following tools:

RSAComb, a novel system for CQ answering over RSA ontologies, based on the combined approach, extended with algorithms to computes bounds of the answers to a query via approximation of the input KB to RSA;

PAGOdA, providing lower and upper bounds to the answers to a query and techniques to further refine these bounds to provide CQ answering capability over OWL 2 DL;

a fully-fledged reasoner (such as HermiT) for CQ answering over a certain ontology language.

These tools allowed us to build a fine-grained “pay-as-you-go” approach, offering suitable, performant solutions depending on the inputs to the system; overall, this results in a lower complexity of the answer computation, when support for high expressivity is not needed. Note that we included both RSAComb and PAGOdA in ACQuA because, in general, the bounds computed by RSAComb are incomparable with the ones produced by PAGOdA, as we will show in Sections 4 and 5. However, any of these components could be potentially substituted or augmented with more capable ones; in particular, any relevant service mentioned above could be used in ACQuA (e.g., the fully-fledged reasoner HermiT could be substituted with Konclude).

Fig. 2.

Workflow of the ACQuA system.

Given a generic KB $K = ⟨ T \cup R, A ⟩$ and a CQ $q (\vec{x}) = \exists \vec{y} φ (\vec{x}, \vec{y})$ containing only symbols in $K$ , the combination of RSAComb, PAGOdA, and HermiT performs the following steps to compute the full set of answers to $q (\vec{x})$ over $K$ (see Fig. 2).

A preliminary satisfiability check is performed over the input knowledge base $K$ . The procedure terminates if $K$ is unsatisfiable.

If $K$ is either $RL$ or ${ELHO}_{⊥}^{r}$ return the answers provided by the lower bound algorithm in PAGOdA.11

¹¹

In this case, $K$ falls in one of the profiles for which the lower bound computation in PAGOdA is sound and complete for CQ answering.

Otherwise, proceed to step 3.

If $K$ is RSA, return the full set of answers computed by RSAComb.12

¹²

In this case, RSAComb provides a sound and complete algorithm for CQ answering over $K$ .

Otherwise, proceed to step 4.

Compute the bounds to the answers to q as $L^{q} = L_{P}^{q} \cup L_{R}^{q}$ and $U^{q} = U_{P}^{q} \cap U_{R}^{q}$ , with $⟨ L_{P}^{q}, U_{P}^{q} ⟩$ and $⟨ L_{R}^{q}, U_{R}^{q} ⟩$ the lower and upper bounds computed by PAGOdA and RSAComb, respectively.

If $G^{q} = U^{q} ∖ L^{q} = 0$ , return $L^{q}$ . Otherwise, proceed to step 6.

Compute $K^{q}$ , a subset of $K$ , relevant for the answering of $q (\vec{x})$ .

Use HermiT on $K^{q}$ , to check the entailment of the answers in $G^{q}$ and remove any remaining spurious answer.

Return $L^{q} \cup G^{q}$ .

The choice of fully-fledged reasoner ultimately determines the class of ontologies for which CQ answering is sound and complete under ground and/or certain answer semantics for the overall system. Thanks to RSAComb, ACQuA is sound and complete for CQ answering under certain answer semantics for ontologies in RSA [22]. With the integration of PAGOdA, and a suitable fully-fledged reasoner, like HermiT, ACQuA is able to answer internalisable queries [36] over OWL 2 DL under certain answer semantics [85].

Steps 2,6,7 and the computation of $L_{P}^{q}, U_{P}^{q}$ in step 4 are offloaded to PAGOdA; we refer the reader to [85] for more details. We will instead focus our attention on the underlying RSAComb reasoner; in particular we dedicate Sections 4–5 to the description of the novel algorithms used in step 4 to compute $L_{R}^{q}, U_{R}^{q}$ via approximation to RSA. In Section 6 we provide more details on the design and architecture of ACQuA and RSAComb (both as a standalone system and its integration in ACQuA).

To help the reader follow along with the description of the proposed techniques, we consider the following running example.

Example 3.1.

Consider the KB $K_{e x} = ⟨ T_{e x} \cup R_{e x}, A_{e x} ⟩$ and the CQs $Q_{e x}$ , with ABox $A_{e x}$ containing assertions (a1)–(a10), TBox $T_{e x}$ containing axioms (t1)–(t9), RBox $R_{e x}$ containing axioms (r1)–(r2), and $Q_{e x}$ containing queries (q1)–(q2) in Table 4.

Table 4

Running example $K_{e x}$

(r1)	$published \equiv {publishedBy}^{-}$	(a1)	$PhDStudent (bart)$
(r2)	$accepts ⊑ reviews$	(a2)	$Researcher (lisa)$
		(a3)	$Journal (journal 1)$
(t1)	$PhDStudent ⊑ Student ⊓ Researcher$	(a4)	$Journal (journal 2)$
(t2)	$JournalPaper ⊓ Thesis ⊑ ⊥$	(a5)	$Journal (journal 3)$
(t3)	$Report ⊑ Paper ⊔ Thesis$	(a6)	$writes (bart, work 1)$
(t4)	$Journal ⊑ \exists published . Paper$	(a7)	$writes (lisa, work 1)$
(t5)	$Researcher ⊑ \exists writes . Paper$	(a8)	$published (journal 1, work 1)$
(t6)	$Paper ⊑ \exists presentedAt . Conference$	(a9)	$JournalPaper (work 1)$
(t7)	$Paper ⊑ ⩽ 1 presentedAt . Conference$	(a10)	$Report (work 1)$
(t8)	$Conference ⊑ \exists accepts . Paper$
(t9)	$\exists {reviews}^{-} . Conference ⊑ ConferencePaper$

(q1)	$q_{1} (x_{2}) = publishedBy (x_{1}, x_{2})$
(q2)	$q_{2} (x_{1}, x_{2}) = published (x_{1}, x_{3}) \land published (x_{2}, x_{3}) \land x_{1} \neq x_{2}$

Intuitively, the ABox contains a collection of statements about researchers and their research outputs; on top of that, the ontology (RBox and TBox) models additional information about the relationships between different types of papers and their publications processes. The CQs ask about venues and works published in multiple venues. For reader’s convenience, a visual representation of the ABox $A_{e x}$ is shown in Fig. 3.

Fig. 3.

Graphical representation of the ABox $A_{e x}$ .

Some axioms are not expressed in normal form (see Table 1) and can be further normalized as follows: axiom (t1) can be rewritten as $\begin{matrix} (t1bis) & PhDStudent ⊑ Student PhDStudent ⊑ Researcher \end{matrix}$ while axiom (r1) becomes $\begin{matrix} (r1bis) & published ⊑ {publishedBy}^{-} publishedBy ⊑ {published}^{-} \end{matrix}$ Note that $K_{e x}$ is not in Horn- $ALCHOIQ$ because of axiom (t3), and hence it is neither RSA nor ${RSA}^{+}$ .

4. Lower bound computation

In this section we present a technique to compute a lower bound to the answers to an input query, by means of approximating the input KB to RSA.

RSA is not purely syntactically defined, and instead introduces a set of constraints over the ontology language Horn- $ALCHOIQ$ ; as such, the naïve approximation that consists in discarding any axiom type which is not in the target approximation language does not work. Instead, we split our approximation algorithm in three sub-steps, each building on top of the previous one:

From a generic $SROIQ$ ontology to $ALCHOIQ$ by discarding any axiom that is not in the target language;

From $ALCHOIQ$ to Horn- $ALCHOIQ$ by means of program shifting;

From Horn- $ALCHOIQ$ to RSA by modifying the input KB in order to enforce the constraints imposed by the RSA definition.

4.1. Approximation to $ALCHOIQ$

This first step is performed by discarding any axiom that is not in $ALCHOIQ$ , namely axioms of type (R3)–(R4) and (T6)–(T7) in Table 1.

Let $K^{'}$ be the $ALCHOIQ$ restriction of a KB $K$ . By the monotonicity of FOL all certain answers w.r.t. $K^{'}$ are also certain answers w.r.t. $K$ ; in other words, discarding axioms in a KB will lead to having more models, which, in turn, leads to fewer answers, under certain answer semantics. Moreover, if $K^{'}$ is unsatisfiable, so is $K$ .

In Example 3.1, $K_{e x}$ is in $ALCHOIQ$ , so no axioms are discarded.

4.2. Approximation to Horn- $ALCHOIQ$

We will now describe how to reduce an $ALCHOIQ$ ontology to Horn- $ALCHOIQ$ . This involves the approximation of axioms of type (T1), (T4) by eliminating the disjunction in the head of the axioms.13

¹³
While axioms of type (T4) do not use disjunction explicitly, their translation into definite rules involve disjunction in the head of the rule.

Simply discarding them is not desirable, especially when considering that disjunctive axioms are quite common in practice.

To address this and improve the approximation to Horn- $ALCHOIQ$ we rely on a technique known as program shifting [18] to convert disjunctive Datalog rules into Datalog. Program shifting is a polynomial compilation of disjunctive logic rules into Datalog rules that preserve soundness of CQ answering, and acts on the translation $π (\cdot)$ of the axioms into definite rules.

Example 4.1.

In Example 3.1, we know that assertions $Report (work 1)$ and $JournalPaper (work 1)$ hold (because of assertions (a10), (a9)). Moreover, by (t2), we know that $Thesis (work 1)$ does not hold. Using this information, along with (t3), we can derive $Paper (work 1)$ . This derivation is deterministic and can be captured by Datalog rules. To make this reasoning explicit, we introduce a fresh atom $\overline{Thesis}$ that intuitively represents the complement of $Thesis$ , and add the following axioms to $K_{e x}$ : $\begin{matrix} (8) & JournalPaper ⊑ \overline{Thesis} Report ⊓ \overline{Thesis} ⊑ Paper \end{matrix}$ These rules can be used to derive $Paper (work 1)$ .

Program shifting is formally defined as follows.

Definition 4.1 ([85], Def. 4.3).

Let r be a normalized disjunctive Datalog rule. For each predicate P in r, let $\overline{P}$ be a fresh predicate of the same arity. The shifting of r, denoted $shift (r)$ , is the following set of rules:

if r is of the form $\begin{matrix} (9) & β_{1} \land \dots \land β_{n} \to ⊥ \end{matrix}$ then $\begin{matrix} (10) & shift (r) = {r} \cup {β_{1} \land \dots \land β_{i - 1} \land β_{i + 1} \land \dots \land β_{n} \to {\bar{β}}_{i} ∣ 1 ⩽ i ⩽ n} \end{matrix}$

if r is of the form $\begin{matrix} (11) & β_{1} \land \dots \land β_{n} \to γ_{1} \lor \dots \lor γ_{m} \end{matrix}$ then $shift (r)$ consists of the following rules: $\begin{array}{l} (12) & β_{1} \land \dots \land β_{n} \land {\bar{γ}}_{1} \land \dots \land {\bar{γ}}_{m} \to ⊥ \\ (13) & β_{1} \land \dots \land β_{i} \land {\bar{γ}}_{1} \land \dots \land {\bar{γ}}_{j - 1} \land {\bar{γ}}_{j + 1} \land \dots {\bar{γ}}_{m} \to γ_{j} for 1 ⩽ j ⩽ m \\ (14) & β_{1} \land \dots \land β_{i - 1} \land β_{i + 1} \land \dots \land β_{n} \land {\bar{γ}}_{1} \land \dots {\bar{γ}}_{m} \to {\bar{β}}_{i} for 1 ⩽ i ⩽ n \end{array}$

This can be generalized to sets of rules Σ as follows:

\begin{matrix} (15) & shift (Σ) = ⋃_{r \in Σ} shift (r) \end{matrix}

We apply this technique to our $ALCHOIQ$ KB in order to reduce ourselves to a Horn KB. This procedure guarantees to produce a polynomial approximation of the input KB which is sound (but not necessarily complete) w.r.t. CQ answering. For r a disjunctive Datalog rule with n atoms in the body and m atoms in the head, $shift (r)$ contains $n + m + 1$ rules.

Theorem 4.1.
Let $K^{'} = ⟨ O^{'}, A ⟩$ be the $ALCHOIQ$ restriction of the KB $K = ⟨ O, A ⟩$ . Moreover, let $K^{″} = ⟨ shift (O^{'}), A ⟩$ . Then $cert (q, K^{″}) \subseteq cert (q, K^{'})$ .
Proof.
See the Appendix. □

4.3. Approximation to RSA

In this section we provide a description of an algorithm to approximate the Horn- $ALCHOIQ$ KB $K$ obtained in the previous step into an RSA KB $K^{'}$ such that $cert (q, K^{'}) \subseteq cert (q, K)$ for any KB q. Given a Horn- $ALCHOIQ$ KB $K$ , checking if $K$ is RSA consists of the following steps (see Def. 2.3):

checking whether $G_{K}$ is an oriented forest;

checking whether $K$ is equality safe.

We first consider step 1. If $G_{K}$ is not an oriented forest, then its underlying undirected graph has a cycle. In order to make $G_{K}$ an oriented forest we want to detect these cycles, break them and propagate the changes back to $K$ .

Cycles can be broken by removing nodes from $G_{K}$ . Nodes in $G_{K}$ are of the form $u_{R, B}^{A}$ , paired with a corresponding existential axiom $A ⊑ \exists R . B \in K$ of type (T5). The action of deleting a node from the graph can be propagated back to $K$ by removing the corresponding (T5) axiom. Due to monotonicity of FOL, deleting axioms from $K$ produces a lower bound approximation of $K$ w.r.t. CQ answering.

Lemma 4.1.
Let $K = ⟨ O, A ⟩$ be a Horn- $ALCHOIQ$ KB, $G_{K}$ be its dependency graph as defined in Def. 2.3 and $u_{R, B}^{A}$ a node in $G_{K}$ . The dependency graph $G_{K^{'}}$ corresponding to $K^{'} = ⟨ O ∖ {A ⊑ \exists R . B}, A ⟩$ does not contain $u_{R, B}^{A}$ .
Proof.
This is proven by observing that, by definition of dependency graph, constant $u_{R, B}^{A}$ can solely be introduced by the corresponding axiom $A ⊑ \exists R . B$ , and hence, removing the axiom from $O$ will remove the node from $G_{K^{'}}$ . □

Using the Datalog reasoner, we compute $M_{RSA}$ from the program $P_{RSA}^{\approx, ⊤}$ obtained from $K$ , and retrieve all instances of role E to build $G_{K}$ . Finally, we use the modified DFS visit of the graph shown in Algorithm 1 to detect any cycle in $G_{K}$ . During the visit, the algorithm determines a representative node for each cycle, which will be the node selected to be removed. In order to keep the visit as efficient as possible we determine these nodes eagerly, by selecting the last processed node when a cycle is detected. Let D be this set of nodes, then for every $u_{R, B}^{A} \in D$ we remove the corresponding axiom $A ⊑ \exists R . B$ in $K$ . Note that D is, in general, not unique and different such sets might lead to different lower bounds.

Algorithm 1:
Cycle detection in $G_{K}$

Next, we need to deal with equality safety (step 2). According to the definition of RSA, the following steps can be performed to ensure this property:

delete any (T4) axiom that involves a role S such that there exists $w \approx t$ (with w and t distinct) and $R (t, u_{R, B}^{A})$ in $M_{RSA}$ and $R ⊑ Inv (S)$ ;

if there is a pair of atoms $R (a, u_{R, B}^{A}), S (u_{R, B}^{A}, a)$ in $M_{RSA}$ with $a \in N_{I}$ and a role T such that both $R ⊑_{R}^{} T$ and $S ⊑_{R}^{} Inv (T)$ hold, then remove some axiom of type (R2) to break the derivation chain that deduces either $R ⊑_{R}^{} T$ or $S ⊑_{R}^{} Inv (T)$ .

Again, by removing some selected axioms we are able to force the input Horn- $ALCHOIQ$ ontology to satisfy RSA additional constraints. In the following, we summarize steps 1–2 described above with the function $lower (\cdot)$ from KBs to KBs.
Theorem 4.2.
Let $K$ be a $SROIQ$ KB, and $K^{'}$ its syntactic restriction to $ALCHOIQ$ . Then, we have that $cert (q, lower (shift (K^{'}))) \subseteq cert (q, K)$ .
Proof.
By Section 4.1 and Theorem 4.1 we know that $\begin{matrix} (16) & cert (q, shift (K^{'})) \subseteq cert (q, K^{'}) \subseteq cert (q, K) \end{matrix}$ Moreover, we can observe that $lower (\cdot)$ only removes axioms from the input ontology; by monotonicity of FOL we have that $cert (q, lower (shift (K^{'}))) \subseteq cert (q, shift (K^{'})) \subseteq cert (q, K)$ . □

As mentioned in Section 2, PAGOdA uses a similar approach to compute a lower bound by approximating the input ontology first to disjunctive Datalog and then to Datalog; this is done by discarding any axiom that is not in the language, while introducing some additional heuristics to handle specifically disjunctive and existential axioms.

Note that, in general, the lower bound resulting from the algorithm proposed here is incomparable with the one produced by PAGOdA.

The next example shows a scenario in which the lower bound computed by our algorithm is tighter than the one produced by PAGOdA. On the other hand, the RSA language fully captures OWL 2 RL (used internally by PAGOdA) only when not considering property chain axioms.

Both approximation techniques will be later combined into ACQuA.
Example 4.2.
Consider our running Example 3.1 and $K_{e x}^{'} = shift (K_{e x})$ . Then
$presentedAt$ is unsafe because of axioms (t6), (t7);

$accepts$ is unsafe because of axioms (t8), (t9) and (r2);
whereas all other roles are safe.

Now, let $u_{i}$ , for $1 ⩽ i ⩽ 4$ be unique, fresh constants, and $P_{RSA}^{e x}$ be the logic program (according to Def. 2.3), corresponding to $K_{e x}^{'}$ . In particular $\begin{array}{l} (17) & Journal (x) \to published (x, u_{1}) \land PE (x, u_{1}) \land Paper (u_{1}) \\ (18) & Researcher (x) \to writes (x, u_{2}) \land PE (x, u_{2}) \land Paper (u_{2}) \\ (19) & Paper (x) \to presentedAt (x, u_{3}) \land PE (x, u_{3}) \land Conference (u_{3}) \land U (u_{3}) \\ (20) & Conference (x) \to accepted (x, u_{4}) \land PE (x, u_{4}) \land Paper (u_{4}) \land U (u_{4}) \end{array}$ is the translation (according to Def. 2.3) of (t4), (t5), (t6), and (t8). Finally, $M_{RSA}^{e x}$ is the LHM of $P_{RSA}^{e x}$ .

The dependency graph $G_{K_{e x}^{'}}$ is shown in Fig. 4. $G_{K_{e x}^{'}}$ is not an oriented forest and as such we need to detect a set of nodes whose removal will turn the graph in Fig. 4 into a tree. Let’s assume Algorithm 1 returns the set ${u_{4}}$ to be removed from $G_{K_{e x}^{'}}$ . We propagate this change to $K_{e x}^{'}$ by removing axiom (t8). $K_{e x}^{'}$ was already equality safe. We denote the KB resulting from this process with $K_{e x}^{″} = lower (K_{e x}^{'})$ .
Fig. 4.
Graphical representation of $G_{K_{e x}}$ .

If we consider the query (q1), then: $\begin{matrix} (21) & cert (q_{1}, K_{e x}^{″}) = {journal 1, journal 2, journal 3} . \end{matrix}$ It can be verified that the lower bound computed by PAGOdA is not as tight and results in the set of answers containing only $journal 1$ .
5. Upper bound computation

We will now look at the problem of approximating a generic input KB $K$ to a KB $K^{'}$ from above, such that answering an input query over the approximated KB will return an upper bound to the answers. More formally, given an input KB $K$ , we want to find a KB $K^{'}$ s.t. $cert (q, K) \subseteq cert (q, K^{'})$ for any CQ q. We initially consider ${ALCHOIQ}^{+}$ as the source ontology language, not taking property chain axioms (T4) into account, and approximate the ontology to ${RSA}^{+}$ . Some additional comments on how to handle axioms (T4) will also be provided.

We adopt a similar approach to the one used in the lower bound computation and divide the procedure in steps. Given an ${ALCHOIQ}^{+}$ KB, we proceed as follows

replace any occurrence of ⊥ in the knowledge base with a fresh nullary predicate $⊥_{f}$ with no special meaning;

approximate disjunctive rules by removing all but one disjunct from the head of the rule. For each rule, the selected disjunct is chosen deterministically using an efficient choice function;

enforce the constraints that define the RSA ontology language on the Horn- ${ALCHOIQ}^{+}$ KB obtained in the previous step.

5.1. ⊥ substitution

As described above, ACQuA performs a preliminary satisfiability check on the input KB; in spite of this, while strengthening the KB, we might cause the KB to become unsatisfiable.

In order to provide a meaningful upper bound even in cases where the approximation leads to an unsatisfiable KB, we adopt an approach initially proposed in PAGOdA. The idea is to substitute every occurrence of ⊥ with a fresh nullary predicate $⊥_{f}$ with no predefined meaning; by doing so we avoid the derivation of the entire Herbrand base, ignoring the fact that the final KB approximation might be unsatisfiable. Note that, despite the fact that ⊥ is stripped of its built-in semantics in FOL, weakening the KB, it can be shown (see [85, Lemma 5.4,Theorem 5.5]) that we can still compute a meaningful upper bound for any input query.

This step has been included purely for theoretical purposes. RDFox, used in the implementation of the approximation algorithm, will not explicitly check for satisfiability during query answering, making it possible to consider correct the answers to a query even when the KB is unsatisfiable.

5.2. Approximation of disjunctive rules

According to Table 1, axioms of type (T1) and (T4) can introduce disjunction in the head of rules. This usually results in non-determinism in the answering process and a corresponding jump in computational complexity. In order to rewrite these axioms and avoid the introduction of this operator, we borrow a technique used in a similar fashion in PAGOdA. The approach consists in replacing any disjunction in the head of a rule with one of the disjuncts. It is easy to see that this strengthens the KB and eliminates any non-determinism introduced by the disjunction. The surviving disjunct is chosen deterministically using an efficient choice function; the idea is to analyse the dependency graph of the KB and choose a disjunct which does not eventually lead to a contradiction. To this end, a standard dependency graph of the KB is built and disjuncts are ordered according to their distance from $⊥_{f}$ ; The ordering is used to select those disjuncts that are less likely to lead to a contradiction, by picking (one of) the furthest from $⊥_{f}$ . For more details on the definition of a choice function, see [85, Section 8.2].

Given a choice function ch that returns a concept name out of an input set, we define the process of eliminating disjunction from the head of a rule as follows

Definition 5.1.
Let δ be a function from axioms to axioms eliminating disjunction from the head of any axiom of type (T1) and (T4): $\begin{matrix} (22) & δ (α) = \{\begin{matrix} ⊓_{i = 1}^{n} A_{i} ⊑ ch ({B_{j} ∣ 1 ⩽ j ⩽ m}) & if α \equiv ⊓_{i = 1}^{n} A_{i} ⊑ ⨆_{j = 1}^{m} B_{j} \\ A ⊑ ⩽ 1 R . B & if α \equiv A ⊑ ⩽ m R . B \\ α & otherwise \end{matrix} \end{matrix}$ The definition of δ can be trivially extended to sets of axioms and KBs.
Example 5.1.
Consider axioms (t3) from our running example. Let $ch$ be a choice function, such that $\begin{array}{l} (23) & ch ({Paper, Thesis}) = Paper \end{array}$ Then $δ (t3)$ is $\begin{matrix} (t3’) & Report ⊑ Paper \end{matrix}$

5.3. From Horn- ${ALCHOIQ}^{+}$ to ${RSA}^{+}$

The final step of the approximation process consists in enforcing the additional constraints that the RSA language introduces on top of Horn- $ALCHOIQ$ . We apply these constraints on top of the KB obtained in the previous step, which is Horn- ${ALCHOIQ}^{+}$ , obtaining an ${RSA}^{+}$ KB. We will later prove that the algorithm for the combined approach for RSA applied to an ${RSA}^{+}$ ontology is complete w.r.t. CQ answering.

Given $K$ a Horn- ${ALCHOIQ}^{+}$ KB and $G_{K}$ its dependency graph as defined in Def. 2.3, checking if $K$ is ${RSA}^{+}$ consists of:

checking whether $G_{K}$ is an oriented forest;

checking whether $K$ is equality safe.

In order to ensure equality safety we proceed similarly to the lower bound case. For any pair of atoms $w \approx t, R (t, u_{R, B}^{A}) \in M_{R S A}$ and role S s.t. $R ⊑ Inv (S)$ , if S occurs in an axiom $α \equiv C ⊑ ⩽ 1 S . D$ of type (T4), we convert α into the axiom $C ⊓ \exists S . D ⊑ ⊥$ . It is easy to see that, for any $C, D \in N_{C}$ and S role, ${C ⊓ \exists S . D ⊑ ⊥} ⊧ C ⊑ ⩽ 1 S . D$ and hence the rewriting is a strengthening of the KB.

On the other hand, for each pair of atoms $R (a, u_{R, B}^{A}), S (u_{R, B}^{A}, a) \in M_{R S A}$ , with $a \in N_{I}$ and role T such that $R ⊑_{R}^{*} T$ and $S ⊑_{R}^{*} Inv (T)$ , we know that term $u_{R, B}^{A}$ was introduced by an axiom $A ⊑ \exists R . B$ of type (T5). In order to satisfy the constraint, we mark this axiom for constant Skolemization, meaning that when translated into a logic rule this axiom will be translated into $A (x) \to R (x, c) \land B (c)$ for some unique fresh constant c.14

¹⁴
This is equivalent to rewriting the axiom as $A ⊑ \exists R . {c}, {c} ⊑ B$ .

Moreover, we assume to have a boolean function

marked (α)

over axioms that returns true if α is a marked axiom.

Finally, we reduce $G_{K}$ to an oriented forest. We proceed similarly to the lower bound computation described in Section 4.3. In fact, we can reuse Algorithm 1 to gather a possible set of nodes D, whose removal would render the dependency graph an oriented forest. As explained before, each node $u_{R, B}^{A}$ uniquely identifies an axiom $A ⊑ \exists R . B$ of type (T5) in the input KB. In order to break the cycles while strengthening the KB we mark the axioms in D for constant Skolemization.

These steps can be summarized in the definition of $δ^{'}$ : Definition 5.2.

We define $δ^{'}$ as a function from axioms to sets of axioms. $\begin{matrix} (24) & δ^{'} (α) = \{\begin{matrix} {C ⊓ \exists S . D ⊑ ⊥_{f}} & if α \equiv C ⊑ ⩽ 1 S . D and \exists R unsafe s.t. R ⊑ Inv (S) and \\ w \approx t, R (t, u_{R, B}^{A}) \in M_{R S A} with w, t distinct \\ {A ⊑ \exists R . {b_{R, B}^{A}}, {b_{R, B}^{A}} ⊑ B} & if α \equiv A ⊑ \exists R . B and marked (α) \\ {α} & otherwise \end{matrix} \end{matrix}$ where $b_{R, B}^{A}$ is a fresh constant, unique to axiom $A ⊑ \exists R . B$ .

Finally, given $K = ⟨ O, A ⟩$ , we define $upper (K) = ⟨ ⋃_{α \in O} δ^{'} (α), A ⟩$ .

Theorem 5.1.

Let $K$ be a satisfiable ${ALCHOIQ}^{+}$ KB and $K^{'} = upper (δ (K))$ . Moreover, let $q (\vec{x}) = \exists \vec{y} φ (\vec{x}, \vec{y})$ be a CQ. Then,

$K^{'}$ is ${RSA}^{+}$ ,

$cert (q, K) \subseteq cert (q, K^{'})$ ,

if $\vec{x} \in cert (q, K)$ then $P_{K^{'}, q} ⊧ Ans (\vec{x})$ .

Proof.

See the Appendix. □

Example 5.2.

Consider, again, our running example (Example 3.1) and $K_{e x}^{'} = δ (K_{e x})$ . Let $P_{RSA}^{e x}$ be its translation into logic rules (according to Def. 2.3) and $M_{RSA}^{e x}$ its LHM, as in Example 4.2. We know that the dependency graph $G_{K_{e x}}$ (shown in Fig. 4) is not an oriented forest. Let’s assume, again, that Algorithm 1 returns ${u_{4}}$ . We mark axiom (t8) for constant Skolemization by $δ^{'}$ , instead of the standard Skolemization that would be applied by Def. 2.5 ( $accepts$ is unsafe). We denote the KB resulting from this process with $K_{e x}^{″} = upper (K_{e x}^{'})$ .

If we consider the query (q2), then: $\begin{matrix} (25) & cert (q_{2}, K_{e x}^{″}) = \emptyset . \end{matrix}$ It can be verified that the upper bound computed by PAGOdA is not as tight and results in the following set of answers $\begin{matrix} (26) & {⟨ journal 2, journal 3 ⟩, ⟨ journal 3, journal 2 ⟩} \end{matrix}$

5.4. Property chain axioms

Our tests show that, general property chain axioms (axioms of type (R4) in Table 1) are quite uncommon in practice. Transitive property axioms, on the other hand, are a specialization of (R4) that can be easily found in common ontologies. While we ignored the presence of these axioms so far, it can be shown that completeness is still guaranteed when including them in the language [14, Theorem 2,Proposition 1]. Intuitively, due to monotonicity of FOL, including more axioms in the computation of the canonical model will lead to a strengthening of the KB. Furthermore, the computational complexity for the computation of the canonical model is still bound by the translation of the problem into Datalog, for which new heuristics have being recently proposed to efficiently handle transitive closure of roles [38]. Note that, in this case, we are not modifying the filtration step, which will then only be able to detect a fraction of the spurious answers, effectively computing an upper bound of the certain answers.

6. Design and architecture

We proposed a new framework to compute CQ answering over unrestricted OWL 2 ontologies by using answer bounds and further refinement steps. The approach has been implemented in a system called ACQuA [42], which, as discussed in the previous sections, offloads different steps in the computation to a selection of underlying systems used as black boxes, i.e., RSAComb, PAGOdA and HermiT.

ACQuA is inspired by the “pay-as-you-go” philosophy that drove the development of PAGOdA and as such shares similarities and capabilities with the latter tool. The idea is to take different steps depending on how the input KB is classified. The input KB needs to go through a consistency check and normalization procedure first. If the normalized KB is inside one of the two ontology languages for which PAGOdA provides full support (i.e., OWL 2 RL and ${ELHO}_{⊥}^{r}$ ), we use the PAGOdA lower bound algorithm to compute the answers to the query. This check is purely syntactic over the normalized ontology and can be performed by leveraging the OWLAPI [35] interface for OWL 2 profile checking. If the first check fails (i.e., the ontology is not in any of the aforementioned ontology languages), we check whether the ontology is in RSA using RSAComb. If the input ontology is RSA we use the RSAComb algorithm directly (described in Section 2.3.1) and collect the full set of answers to the query. If none of the tractable services for CQ answering are able to capture the KB, we use them to compute lower and upper bound approximations, taking the union of the lower bounds and the intersection of the upper bounds. If the combined bounds match, we have computed a sound and complete set of answers for the input query. If, however, this is not the case, we use PAGOdA’s algorithm to compute a subset of the input KB relevant to answer the query, and fall back to HermiT to filter any spurious answers from the gap between the bounds. A summary of these steps was provided in Section 3, along with a visual representation in Fig. 2.

In this section we will describe some design and implementation details that led to the development of ACQuA. In particular, we will focus our attention on RSAComb, a novel implementation of the RSA combined approach for CQ answering, and how the tool can be used to compute lower and upper bounds to the answers of an input query.

6.1. RSAComb

RSAComb [41] is an optimized implementation of the combined approach for CQ answering in RSA. We streamlined and reorganized the algorithm to make the different steps either ontology or query independent. On top of that we designed and implemented an API to introduce approximation capabilities in the system; RSAComb is able to take an unrestricted ontology as input and potentially apply an approximation algorithm (targeting RSA or ${RSA}^{+}$ ) before computing the answers to a query. The system ships with reference implementations of the algorithms for the computation of answer bounds introduced in Sections 4–5.

The system is written in Scala and uses the OWLAPI [35] to interface with the input ontology and manipulate OWL 2 axioms. RDFox is used as an underlying Datalog reasoner; RSAComb has been designed to maximize the amount of computation to be offloaded to RDFox, by redefining problems in terms of queries over a materialized RDF store.

RDFox is used as a black box, and RSAComb can be adapted to use any Datalog reasoner with support for stratified negation and Skolemization. Nonetheless, the use of RDFox allowed us to introduce some optimizations based on particular features provided by the tool.

These are:

a SKOLEM operator,15

¹⁵
https://docs.oxfordsemantic.tech/tuple-tables.html#rdfox-skolem

which provides a way to uniquely associate a sequence of terms with a fresh term;

support for named graphs to isolate and cache partial computation;

support for “TBox reasoning” in order to reason directly on the structure of an ontology even when outside the supported OWL fragment.

We designed and built RSAComb around these general principles:

Modularity

The code should be modular and different steps in the algorithm should be as independent of each other as possible. It should be easy to reimplement (or enhance) an intermediate step of the algorithm as long as the signature and the interface with the system as a whole remain unaltered. We achieved this by an extensive use of Scala traits, building a collection of interfaces that describe the behaviour of the different actors that take part in the execution of the combined approach for RSA. As explained in the following sections, the integration with RDFox was also key to providing a good level of modularity to the systems.

Scalability

The system has to be able to scale efficiently even for large amounts of data. Partial results are computed when needed and reused whenever possible. A more detailed analysis on the performance and scalability of the system is provided in Section 8.

Integration

It should be equally possible to use the system as a self-contained application or integrate it in another system. As such, our software presents a simple but effective command line interface alongside a well-structured set of classes exposing all the necessary tools to work with RSA ontologies, while hiding unnecessary implementation details. The different steps can also be disabled for user convenience.

We will first provide a description of RSAComb as an implementation of the RSA combined approach and then go into details on how lower and upper bound algorithms are implemented in the system.

6.1.1. Overview

Fig. 5.

Workflow of the RSAComb system.

Figure 5 summarizes the workflow of RSAComb:

the approximation steps take an unrestricted OWL 2 KB as input and approximate it to a target language handled by the RSA combined approach;

the canonical model for the resulting RSA KB is computed by materializing the data against a logic program derived from the input ontology;

a filtering program is derived from the input query and is combined with the canonical model to produce the set of certain answers to the input query over the approximated knowledge base.

The process of importing the input ontology (TBox, RBox) into the system is performed using the OWLAPI. Since importing large amounts of data (ABox) into the system might be expensive, data files are read and data is loaded on demand and reused whenever possible to maximize performance.

As mentioned above, two approximation algorithms ship with the system. The first approximation algorithm is an implementation of the algorithm presented in Section 4; it targets the RSA ontology language and maintains soundness w.r.t. CQ answering, i.e., answers to a CQ are a lower bound to the answers to the query over the original KB. A copy of the ontology, translated into Datalog according to Def. 2.3, is imported into RDFox along with the data and materialized by the reasoner. The dependency graph and equality safety checks (see Definition 2.3) are implemented as queries over the RDF store exposed by RDFox; the original knowledge base is altered accordingly. The second approximation is an implementation of the algorithm introduced in Section 5; it targets ${RSA}^{+}$ and maintains completeness w.r.t. CQ answering, i.e., answers to a CQ are an upper bound to the answers to the query over the original knowledge base.

The canonical model is computed for the knowledge base in Step (ii); this is done by converting each axiom in the KB into a logic rule according to Def. 2.5 and uploading it into RDFox. Note that the translation from axioms into logic rules is different from the one in Step (i), hence the need to reload them into RDFox. The data, on the other hand, is safely reused. Finally, the potentially spurious answers to the input query introduced during the canonical model computation are filtered out in Step (iii). It is worth noting that, in this scenario, steps (i),(ii) are query independent, while step (iii) is ontology independent. As such, when multiple queries are submitted over the same KB, steps (i-ii) are performed “on-demand” and only once, while the third step is performed for each input query.

6.1.2. Canonical model computation

Fig. 6.

RSAComb: canonical model computation.

The computation of the canonical model (Figure 6) involves the conversion of the input RSA ontology into logic rules as described in Def. 2.5, and where function symbols are simulated using RDFox’s built-in Skolemization feature.

Example 6.1.

A Skolemized rule derived from an existential axiom (T5) $\begin{matrix} (27) & A (x) \to R (x, f_{R, B}^{A} (x)) \land B (f_{R, B}^{A} (x)) \end{matrix}$ can be turned into the following RDFox – compatible rule

where the built-in operator SKOLEM binds ?Y to a unique value generate from the string “A,R,B” and term ?X.

The system performs the conversion and then offloads the materialization of the rules, combined with the input data, to RDFox.

Since the canonical model is query independent, this process can be performed once and the result cached and reused for every subsequent query over the same input ontology. We achieve this using RDFox’s support for RDF named graphs, which enables us to perform operations on specific “named” subsets of the data. Further operations on the graph operate and produce additional data in different named graphs, leaving the materialized canonical model intact.

Axiomatization of ⊤ and ≈ RDFox has built-in support for ⊤ (owl:Thing) and equality (owl:sameAs), so that ⊤ automatically subsumes any new class introduced within an RDF triple, and equality between terms is always consistent with its semantics.

In both cases we are not able to use these features directly: in the case of top axiomatization, we import axioms as Datalog rules, which are not taken into consideration when RDFox derives new ⊤ subsumptions;16

¹⁶

RDFox accepts both OWL 2 axioms encoded as RDF triples and Datalog rules; these are very different entities in the system and the semantics of special concepts/roles (like ⊤ and ≈) is applied to the former.

in the case of equality axiomatization, the feature cannot be enabled along other features like aggregates and negation-as-failure (with the latter extensively used in our system).

To work around this, we introduce the axiomatization for both predicates explicitly. For every concept name $C \in N_{C}$ and for every role name $R \in N_{R}$ in the input ontology, we add the following rules to RDFox:

This gives us the correct semantics for owl:Thing.

Similar rules are introduced to axiomatize equality. We make the role reflexive, symmetric and transitive:

and introduce substitution rules to complete the axiomatization. For every concept name $C \in N_{C}$ and for every role name $R \in N_{R}$ in the input ontology, we add:

The notIn and named predicates Our work also includes a few clarifications on theoretical definitions and their implementation. In the canonical model computation [22], the notIn predicate is introduced to simulate the semantics of set membership and in particular the meaning of notIn[a, b] is “a is not in set b”. During the generation of the canonical model program performed by RSAComb, we have complete knowledge of any set that might be used in a notIn atom. For each such set S, and for each element $a \in S$ , we introduce the fact in[a,S] in the canonical model. We then replace any occurrence of notIn[?X,?Y] in the original program $E_{K}$ with NOT in[?X, ?Y], where NOT is the operator for negation-as-failure in RDFox. This is possible because we know that $E_{K}$ is stratified; moreover the negated predicate introduced in these rules is fully instantiated at program generation and does not appear in the head of any rules, maintaining the stratified structure of the program.

We generate the instances of the predicate NI, representing the set of non-anonymous terms in the materialized canonical model, with the following rule:

where rsa:named is a predicate representing the set of constants in the original KB.

A final improvement has been made to the computation of the cycle function used during the generation of the canonical model program performed by RSAComb. The original definition involved a search over all possible triples $(A, R, B)$ where $A, B \in N_{C}$ and $R \in N_{R}$ in the original ontology. We realized that traversing the whole space would significantly slow down the computation, and is not necessary; we instead restrict our search over all $(A, R, B)$ triples that appear in a (T5) axiom $A ⊑ \exists R . B$ in the original normalized ontology.

6.1.3. Filtering program and answer computation

Fig. 7.

RSAComb: answer filtering.

As depicted in Fig. 7, answer filtration involves the computation of the filtering program from the input query, the filtering of the materialized canonical model and the final process of gathering the answers.

RSAComb performs the translation of the query into a set of logic rules. This step was modified w.r.t. the original definition [22] to be completely ontology independent by moving the generation of rsa:named instances to the canonical model computation step. Furthermore, we redesigned the filtering step to restrict ourselves to use only unary and binary predicates and, as a result, keep the filtering somewhat closer to the realm of description logics (and to the language supported by RDFox). Filtering rules are then greatly simplified by making extensive use of the Skolemization operator provided by RDFox, hence avoiding some expensive joins that would result from a standard reification process.

Example 6.2.

Let $q (\vec{x}) = ψ (\vec{x}, \vec{y})$ be a CQ with $\vec{x} = x_{1}, \dots, x_{m}$ , $\vec{y} = y_{1}, \dots, y_{n}$ . Rule (3c) of the filtering program (see Table 3) computes the transitive closure of the predicate $id$ , keeping track of identity between anonymous terms w.r.t. a specific match for the input query. $\begin{matrix} (28) & id (\vec{x}, \vec{y}, u, v), id (\vec{x}, \vec{y}, v, w) \to id (\vec{x}, \vec{y}, u, w) \end{matrix}$ A standard technique to reduce the arity of predicates is reification. Provided we have access to a function KEY to compute a new term that uniquely identifies a tuple of terms, we can reify any n-ary atom into a set of n atoms of arity 2. E.g., an atom $P (x, y, z)$ becomes $P_{1} (k, x), P_{2} (k, y), P_{3} (k, z)$ , where $k = KEY (x, y, z)$ and $P_{n}$ , for $1 ⩽ n ⩽ arity (P)$ , are fresh predicates of arity 2. Rule (3c) can be reified as: $\begin{array}{l} (29) & \begin{aligned} {id}_{1} (k, x_{1}), \dots, {id}_{m + n} (k, y_{n}), {id}_{m + n + 1} (k, u), {id}_{m + n + 2} (k, v), \\ {id}_{1} (j, x_{1}), \dots, {id}_{m + n} (j, y_{n}), {id}_{m + n + 1} (j, v), {id}_{m + n + 2} (j, w), \\ l : = KEY (\vec{x}, \vec{y}, u, w) \to {id}_{1} (l, x_{1}), \dots, {id}_{m + n} (l, y_{n}), {id}_{m + n + 1} (l, v), {id}_{m + n + 2} (l, w) \end{aligned} \end{array}$ The problem with this approach is that it increases the number of joins to be performed to match the body of the rule.

Using the SKOLEM functionality in RDFox, we are able to reduce the arity of a predicate P ( $id$ in this example) without having to introduce $arity (P)$ fresh predicates (see (30)). The SKOLEM predicate associates a list of terms with a unique blank node; the list of terms and the variable that will be bound to the blank node are passed to the SKOLEM predicate as a single list of arguments. To this end, an atom $id (\vec{x}, \vec{y}, u, v)$ in the original rule becomes an atom $id (k, j)$ of arity 2 where $SKOLEM (\vec{x}, \vec{y}, k)$ and $SKOLEM (\vec{x}, \vec{y}, u, v, j)$ hold, and k and j are bound to two blank nodes uniquely associated with the sequences of terms $⟨ \vec{x}, \vec{y} ⟩$ and $⟨ \vec{x}, \vec{y}, u, v ⟩$ , respectively. Joins over multiple terms ( $id$ joining over $(\vec{x}, \vec{y})$ in (28)) can now be rewritten into simpler joins ( $id$ joining over a single term k):17

¹⁷

Rule 30 showcases how the SKOLEM predicate can be used in both directions: given a sequence of terms, we can pack them into a single fresh term; given a previously Skolemized term, we can unpack it to retrieve the corresponding sequence of terms.

\begin{matrix} (30) & id (k, j), SKOLEM (\vec{x}, \vec{y}, u, v, j), id (k, l), SKOLEM (\vec{x}, \vec{y}, v, w, l), SKOLEM (\vec{x}, \vec{y}, u, w, t) \to id (k, t) \end{matrix}

The complete rewriting of the filtering program is provided in Table 5. According to the documentation18

¹⁸

https://web.archive.org/web/20230124163101/https://docs.oxfordsemantic.tech/5.5/tuple-tables.html#rdfox-skolem

for the SKOLEM operator in RDFox, it can be easily shown that the rewriting is not changing the semantics of the rules, but instead packs and unpacks subsets of variables in order to make rule matching more efficient.

Table 5

Improved rules for filtering step for the RSA combined approach

(1)	$ψ (\vec{x}, \vec{y}), SKOLEM (\vec{x}, \vec{y}, s) \to QM (s)$
(2)	rsa:named instances computed in the canonical model step.
(3a)	$QM (s), SKOLEM (\vec{x}, \vec{y}, s), not NI (y_{i}), SKOLEM (i, i, k) \to id (s, k)$ for each $1 ⩽ i ⩽ \| \vec{y} \|$
(3b)	$id (s, k_{1}), SKOLEM (u, v, k_{1}), SKOLEM (v, u, k_{2}) \to id (s, k_{2})$
(3c)	$id (s, k_{1}), SKOLEM (v, u, k_{1}), id (s, k_{2}), SKOLEM (u, w, k_{2}), SKOLEM (v, w, k_{3}) \to id (s, k_{3})$
(4a)	for all $R (a, y_{i})$ , $S (b, y_{j})$ in q with $y_{i}, y_{j} \in \vec{y}$
	$R^{f} (a, y_{i}), S^{f} (b, y_{j}), SKOLEM (i, j, k), id (s, k), SKOLEM (\vec{x}, \vec{y}, s), not a \approx b \to fk (s)$
(4b)	for all $R (a, y_{i})$ , $S (y_{j}, b)$ in q with $y_{i}, y_{j} \in \vec{y}$
	$R^{f} (a, y_{i}), S^{b} (y_{j}, b), SKOLEM (i, j, k), id (s, k), SKOLEM (\vec{x}, \vec{y}, s), not a \approx b \to fk (s)$
(4c)	for all $R (y_{i}, a)$ , $S (y_{j}, b)$ in q with $y_{i}, y_{j} \in \vec{y}$
	$R^{b} (y_{i}, a), S^{b} (y_{j}, b), SKOLEM (i, j, k), id (s, k), SKOLEM (\vec{x}, \vec{y}, s), not a \approx b \to fk (s)$
	for all $R (y_{i}, y_{j})$ , $S (y_{m}, y_{n})$ in q with $y_{i}, y_{j}, y_{m}, y_{n} \in \vec{y}$
(5a)	$R^{f} (y_{i}, y_{j}), S^{f} (y_{m}, y_{n}), SKOLEM (j, n, k_{1}), id (s, k_{1}), SKOLEM (\vec{x}, \vec{y}, s)$ ,
	$y_{i} \approx y_{m}, not NI (y_{i}), SKOLEM (i, m, k_{2}) \to id (s, k_{2})$
(5b)	$R^{f} (y_{i}, y_{j}), S^{b} (y_{m}, y_{n}), SKOLEM (j, m, k_{1}), id (s, k_{1}), SKOLEM (\vec{x}, \vec{y}, s)$ ,
	$y_{i} \approx y_{n}, not NI (y_{i}), SKOLEM (i, n, k_{2}) \to id (s, k_{2})$
(5c)	$R^{b} (y_{i}, y_{j}), S^{b} (y_{m}, y_{n}), SKOLEM (i, m, k_{1}), id (s, k_{1}), SKOLEM (\vec{x}, \vec{y}, s)$ ,
	$y_{j} \approx y_{n}, not NI (y_{j}), SKOLEM (j, n, k_{2}) \to id (s, k_{2})$
(6)	for each $R (y_{i}, y_{j})$ in q with $y_{i}, y_{j} \in \vec{y}$ and $* \in {f, b}$
	$R^{*} (y_{i}, y_{j}), id (s, k_{1}), id (s, k_{1}), SKOLEM (i, v, k_{1}), SKOLEM (\vec{x}, \vec{y}, s)$ ,
	$id (s, k_{2}), SKOLEM (j, w, k_{2}), SKOLEM (v, u, k_{3}) \to {AQ}^{*} (s, k_{3})$
	for each $* \in {f, b}$
(7a)	${AQ}^{} (s, k) \to {TQ}^{} (s, k)$
(7b)	${AQ}^{} (s, k_{1}), SKOLEM (u, v, k_{1}), {TQ}^{} (s, k_{2}), SKOLEM (v, w, k_{2}), SKOLEM (u, w, k_{3}) \to {TQ}^{*} (s, k_{3})$
(8a)	$QM (s), SKOLEM (\vec{x}, \vec{y}, s), not named (x) \to sp (s)$ for each $x \in \vec{x}$
(8b)	$fk (s) \to sp (s)$
(8c)	${TQ}^{} (s, k), SKOLEM (v, v, k) \to sp (s)$ for each $ \in {f, b}$
(9)	$QM (s), not sp (s), SKOLEM (\vec{x}, \vec{y}, s), SKOLEM (\vec{x}, k) \to Ans (k)$

The filtering program is, then, loaded into RDFox and the materialization is updated taking into account the newly introduced rules. The triples produced by this materialization update are stored in a separate named graph to keep the product of filtration separate from the canonical model. This is possible because the signature of the atoms in the head of rules introduced by the filtering program is separate from the signature of the canonical model. When processing a new query, the only step we need to take is to drop the named graph associated with the filtration from the previous query, leaving unaltered all other triples. Better yet, here we have the possibility to execute queries in parallel, each one associated with a separate filtering program and hence storing their derivations in different named graphs. The materialization update for each of the queries is isolated and does not interfere with the other processes.

At this point, the task of gathering the answers to the query over the input KB is reduced to querying a materialized named graph for the atoms representing the certain answers.

Example 6.3.

Given a query $q (\vec{x}) = \exists φ (\vec{x}, \vec{y})$ , with $\vec{x} = ⟨ x_{1}, x_{2}, x_{3} ⟩$ , we can retrieve the answers to q with the following query

where we first collect all instances ?K of the class rsa:Ans, and then we unpack them at line ?? using the custom RDFox syntax for the SKOLEM operator, to retrieve the actual answers. When answering BCQs, we only need to check for an rsa:Ans witness, i.e., an instance of rsa:Ans in the RDF store:

6.2. Lower bound approximation to RSA

As described in Section 4, we propose a novel algorithm for approximating an unrestricted input KB to RSA. The procedure is composed of 3 main steps:

Approximation to $ALCHOIQ$ via axiom filtering;

Approximation to Horn- $ALCHOIQ$ via program shifting;

Approximation to RSA by reducing the ontology dependency graph to an oriented forest and ensuring equality safety properties.

The first two steps are entirely carried out by RSAComb in a straightforward way. The knowledge base is first filtered by axiom type and then program shifting (Def. 4.1) is applied to all relevant axioms. The last step is designed to partially offload the task to RDFox; this involves:

building and reasoning over a custom dependency graph derived from the materialization of the input data over a Horn- $ALCHOIQ$ KB;

reasoning over the knowledge base itself, and in particular performing some RBox reasoning task.

We first translate the axioms in the knowledge base according to Definition 2.3, and import them, along with the data, into RDFox. The imported data and its materialization contain all instances of the atom E, used to build the dependency graph for the input ontology. After retrieving all instances of E, querying the RDFox triple store with the following query

RSAComb builds the dependency graph for the input KB. Using Algorithm 1 we detect and break cycles by iteratively removing nodes. The existential axioms corresponding to the nodes returned by the visit are removed from the input ontology.

For the equality safety check we need to reason over the ontology itself and in particular perform some reasoning over its RBox. Regardless of the support offered by the Datalog reasoner for this task, axioms in a knowledge base can be encoded as RDF triples.19

¹⁹
https://www.w3.org/TR/2012/REC-owl2-mapping-to-rdf-20121211/

RDFox supports importing OWL 2 axioms and the conversion into RDF triples is performed automatically. RBox reasoning (Listing 1) is then achieved by importing the following rules into the RDF store.

Listing 1.

Rules for role subsumption reasoning

These encode reflexivity and transitivity of sub-role axioms (R2) (lines 7–9), taking into account inverse (lines 3–5) and equivalent roles (line 1), as well.

Once both the data and the axioms have been imported and materialized according to their respective rules, the equality safety condition (i) of Definition 2.3 can be formulated as a query (see Listing 2).

For each pair of atoms $w \approx t$ , with w and t distinct, and $R (t, u_{R, B}^{A})$ (lines 2–4) in $M_{RSA}$ and each role S s.t. $R ⊑ Inv (S)$ (lines 5–6), we query for the tuple $⟨ A, S, B ⟩$ such that $A ⊑ ⩽ 1 S . B$ is part of the input KB (lines 7–11). For each triple $⟨ A, S, B ⟩$ returned by the query we can remove the corresponding axiom (T4) from the input ontology.

Listing 2.

Condition 1 of equality safety in RSA definition

Similarly, condition (ii) can be formulated as a query (see Listing 3).

Listing 3.

Condition 2 of equality safety in RSA definition

For each pair of atoms $R (a, u_{R, B}^{A}), S (u_{R, B}^{A}, a)$ in $M_{RSA}$ with $a \in N_{I}$ (lines 2–5), we detect roles $R, S$ such that there exists a role T for which $R ⊑_{R}^{*} T$ (lines 6–8) and $S ⊑_{R}^{*} Inv (T)$ (lines 9–10). Note that, when detecting $R ⊑_{R}^{*} T$ we “isolate” the first step of the subPropertyOf chain (line 6) and query for that couple of roles $⟨ R, P ⟩$ . In this case the returned couple $⟨ R, P ⟩$ , identifies an axiom of type (T2) whose removal will break a chain of sub-properties from R to T, making the knowledge base equality safe.

6.3. Upper bound approximation to RSA

The approximation algorithm proposed in Section 5 is implemented in a similar way. Again, the procedure is divided into the following steps:

rewriting of ⊥ into a new nullary predicate $⊥_{f}$ with no predefined meaning,

rewriting of disjunctive rules to eliminate disjunction, and

approximation to ${RSA}^{+}$ .

As discussed before, the first step is not performed in practice. During the computation of the KB approximation and the upper bound set of answers, we simply ignore the satisfiability of the KB. Note that, even if ⊥ is derived during the process of materialization, RDFox will not derive the entire Herbrand base, to keep the operation as efficient as possible. We can use this to our advantage and still compute a meaningful upper bound approximation.

The rewriting of disjunctive rules is also straightforward, and is performed directly by RSAComb. The choice function is implemented as in PAGOdA, in order to avoid the derivation of ⊥ (see Section 5.2).

Finally, the third step involves the same framework introduced in the previous section for the lower bound computation, and in particular the construction of the dependency graph and role subsumption reasoning are performed in the same way. Both the enforcing of equality safety and the reduction of the dependency graph to a forest involve a rewriting of the KB according to Def. 5.2, and are implemented directly in RDFox.

Finally, RSAComb can be used to run the combined approach algorithm on the resulting ${RSA}^{+}$ KB. According to Theorem 5.1 the answers produced by RSAComb are an upper bound to the answers to the query.

7. Related work

Conjunctive query answering over knowledge bases is one of the foundational problems when reasoning over ontologies. This, along with its corresponding decision problem (conjunctive query entailment, CQE) have been analysed both from the theoretical point of view (with extensive research on its computational complexity) and from the practical point of view, leading to a number of algorithms and their implementation in various reasoning systems. For a summary of the complexity results on decidability of CQE for some of the ontology languages mentioned in this work, see Table 6.

Table 6
Decidability of CQE for a selection of ontology languages

Ontology language Combined complexity Data complexity

${ELHO}_{⊥}^{r}$ NP–complete [72] P–complete [69,72]

OWL 2 QL NP–complete [10,12] AC $^{0}$ [10,12]

OWL 2 EL PSpace–complete [48,72] P–complete [48,72]

OWL 2 RL NP–complete [1,29] P–complete [1,29]

RSA NP–complete [22] P–complete [22]

Horn- $ALCHOIQ$ ExpTime–complete [13] P–complete [13]

$SHOIQ$ open open

$SROIQ$ open open

Ontology language	Combined complexity	Data complexity
${ELHO}_{⊥}^{r}$	NP–complete [72]	P–complete [69,72]
OWL 2 QL	NP–complete [10,12]	AC $^{0}$ [10,12]
OWL 2 EL	PSpace–complete [48,72]	P–complete [48,72]
OWL 2 RL	NP–complete [1,29]	P–complete [1,29]
RSA	NP–complete [22]	P–complete [22]
Horn- $ALCHOIQ$	ExpTime–complete [13]	P–complete [13]
$SHOIQ$	open	open
$SROIQ$	open	open

We will next provide an overview of the techniques proposed in the literature to perform CQ answering and in particular look at those tools that make use of bounds computation in order to drive the query execution. A closer comparison of ACQuA with these tools is also provided.

7.1. Query answering techniques

Support for CQ answering is offered natively by several existing reasoners. Some of them achieve this by ensuring sound and complete answers for a specific semantics over a certain family of ontology languages, while others limit the language in which the queries can be expressed. We will now give an overview of the several CQ answering techniques present in the literature.

7.2. Reduction to entailment checking

The first technique we are going to discuss is based on the reduction of CQ answering to entailment checking. Tableau–based DL reasoners like Pellet [71], HermiT [24], RacerPro [31] construct a finite structure that represents a model for the input KB and use blocking conditions to ensure the termination of the procedure. These reasoners usually target standard reasoning tasks and only offer limited support for CQ answering. Still, internalisable CQs can be rolled-up and included in the KB, effectively reducing CQ answering to entailment checking of a fresh concept entailed by the rolled-up query.

Pellet [71] provides support for CQ answering under ground semantics and supports CQ answering under certain answer semantics limited to tree-shaped queries (which can be internalized using the rolling-up technique).

HermiT [24] is a fully-fledged reasoner for OWL 2, based on the hypertableau calculus [59]; it does not provide a direct interface to answer CQs but reduction to entailment checking can be manually performed as a preliminary step.

RacerPro is a tableau–based system for the $SHIQ$ DL language; it implements a technique for instance retrieval, called filter and refine [82] and is tailored towards KBs with large ABoxes. The idea behind this technique is to first determine obvious (non-)solutions to a concept description (filter) and subsequently perform an optimized instance check (using ABox locality properties) for the remaining individuals (refine). It supports a superset of CQs under ground semantics.

Another tableau-based reasoner, Konclude [77], has been recently adapted to perform CQ answering over expressive ontologies [75], using an absorption-based technique [74,76]. The assertional part of a KB is divided into small packets used to parallelize the model construction of the tableau algorithm. This parallelizable approach, along with the use of caching to avoid the need of synchronization mechanisms between workers, can be used to derive possible answers to a CQ. Candidate answers are then checked using entailment checking, where bindings for the answer variables are restricted to individuals appearing in the possible answers. According to [75], the approach works best when considering ground queries, while the presence of existential variables can require a substantial amount of additional computation.

Overall, the systems described in this section are not primarily designed for CQ answering under certain answer semantics and instead target other reasoning tasks. The technique of reducing CQ answering to entailment checking is supported for expressive ontology languages but may not scale as well as other approaches. Optimizations have been proposed to further limit performance issues; examples are query execution order, based on the input KB [45] and data summarization [17].

When considering the development of fully-fledged reasoners targetting OWL 2, such as HermiT and Konclude, improvements on these reasoners can translate into improvements for hybrid systems like ACQuA and PAGOdA, which directly use these tools as black boxes.

7.3. Materialisation-based reasoners

Materialization-based reasoners are also widely in use and implement the forward chaining algorithm on top of (some fragment of) Datalog. Materialization-based systems are often built on top of RDF management systems; i.e., data management systems based on the Resource Description Framework representing knowledge as statements in the form of triples.

Triple stores like Jena [52], Sesame [8] and Virtuoso [21] offer query answering capabilities over RDBMS and support the RDFS description language. OWLim [7] provides support for OWL 2 RL ontologies. A materialisation-based reasoner extensively used in this work is RDFox [60], an RDF store supporting arbitrary Datalog rules over unary and binary predicates. The nature of the tool allows for important optimizations, e.g., incremental updates, and parallel materialization, at the expense of a limited expressivity in the supported description logic language [55–57,60]. RDFox covers most SPARQL 1.1 over an extension of Datalog. There are several other engines that support CQ answering over (extensions of) Datalog; among them, it is worth mentioning DLV [49], which provides support for CQ answering over an extension of disjunctive Datalog.

Although OWL 2 RL is expressive enough to cover a large portion of practical use cases, it lacks some common patterns like disjunctive knowledge or existentially quantified knowledge, that would potentially render the materialization process either non-deterministic or infinite. Typically, materialisation-based reasoners can still process ontologies outside OWL 2 RL, ignoring axioms that do not fall into the language. Answers to queries are still sound, but might not be complete, effectively providing a lower bound to the set of certain answers. This technique is used in the system PAGOdA [85] to effectively compute a sound lower bound to the set of certain answers to a CQ.

7.4. Ontology-mediated query rewriting

DLs are often used to model the domain of interest as collections of concepts and roles. In this sense, ontologies offer a great tool to build high-level semantics on top of some structured data (e.g., relational database).

OBDA directly applies this principle, creating a layer of abstraction on top of an existing data store; an ontology becomes an entry point for the user to access the underlying data via query answering. Another advantage of this approach is that it can rely on the underling data store (e.g., a RDBMS) to carry out the reasoning tasks. The OBDA framework [83] uses an ontology to rewrite an input query (i.e., expanding it by incorporating parts of the ontology). It then uses a set of mappings20

²⁰
Often expressed in the W3C standard R2RML language [15].

to transform the rewritten query into a query over the underlying relational data sources. The process is called perfect reformulation [66] and ensures that the answers to the query over the dataset and the ontology are the same as the answers to the rewritten query over the dataset alone.

It is worth noting that, since the query addresses the data source(s) indirectly, any updates made to the source are immediately reflected into the system. This is in contrast with the materialisation-based approach, where updates in the source require the recomputation (or the update) of the materialized dataset.

The OBDA approach is based on ontologies that fall into the DL-Lite family of DL languages, and hence the OWL 2 QL profile, for which the rewriting of CQs into unions of first-order queries is guaranteed to exist [10]. Perfect reformulation is implemented in QuOnto [2] and further integrated into the MASTRO system [11]. Unfortunately, the query rewriting process can lead to an exponentially larger first-order query [10] and polynomial rewriting is guaranteed only for small fragments of OWL 2, such as OWL 2 QL. For a more in-depth analysis on the performance of the OBDA approach we refer the reader to the Optique project and their work with Equinor [37,44].

The query rewriting technique has been applied to $EL$ and $ELH$ [69], showing that UCQs can be rewritten into a Datalog query. The same result does not hold for ${EL}^{+}$ and ${EL}^{+ +}$ . REQUIEM [64,65] implements a resolution-based query rewriting technique for ${ELHIO}^{\neg}$ , a DL covering both DL-Lite and $EL$ . The rewriting is based on the resolution calculus to saturate the set of rules in the ontology and subsequently filter out those containing functional terms. However, the introduction of inverse roles leads to a significant jump in complexity: CQE for $EL$ and $ELH$ is NP–complete, whereas it becomes ExpTime–complete for ${ELHIO}^{\neg}$ . Depending on the language of the input ontology the rewriting can be a UCQ or a (linear) Datalog query.

We briefly mention the work done in Clipper [19,20] which implements a query rewriting technique for Horn- $SHIQ$ . The rewriting differs from the ones mentioned above since Clipper modifies the dataset as well. A set of inference rules are used to saturate the input ontology and the data is materialized against the Datalog rules in the saturation. The query is rewritten against the subset of existentially quantified rules in the ontology and evaluated against the augmented dataset. The saturated ontology and the rewriting might be exponential in size w.r.t. the input ontology and query. Query rewriting has also been applied to linear ${Datalog}^{\pm}$ ontologies (see [61]).

A different approach involves the manipulation and rewriting of the input query [25,28]. The authors propose a decision procedure for CQE for $SHIQ$ and $SHOQ$ based on the rewriting of the query into a forest shape. By applying the rolling-up technique [36], the problem is reduced to testing the consistency of an extended KB.

7.4.1. Combined approaches

The combined approach is another widely known technique for computing a sound and complete set of answers to a CQ. In this scenario the dataset is first augmented by materializing entailed facts w.r.t. the ontology in order to build a model for the input knowledge base. This process is usually query-independent and performed in polynomial time. Spurious answers are then systematically identified by means of a filtration step or by rewriting the query [47,51] in order to derive the certain answers to the input query. An example is the combined approach for CQ answering over RSA ontologies (see Section 2.3.1), widely exploited in this work. The technique has been applied to several description logics in the $EL$ family, such as the extension of $ELH$ with ⊥ and range axioms [51] and ${ELHO}_{⊥}^{r}$ [73], as well as in the DL-Lite family, e.g., DL-Lite_horn with number restriction [47] and DL-Lite $_{R}$ [50]. More recently the combined approach has been applied to Horn- $ALCHOIQ$ [13], the ontology language underlying RSA.

In general these techniques are designed for a specific ontology language and do not support unrestricted OWL 2 ontology. On the other hand, as shown in this work and in PAGOdA, the combined approach can be easily used as an intermediate step in the computation.

7.4.2. Hybrid approaches

We will now look at tools that combine more than one technique described above to implement CQ answering.

Hydrowl [78] is a reasoner for CQ answering combining an OWL 2 RL reasoner, a query rewriting system and a fully-fledged OWL 2 reasoner. Hydrowl uses a repairing strategy [79] (limited to those ontologies for which a repairing exists) and query rewriting to answer an input query q. First a query base, i.e., a set of atomic queries that can be answered using the OWL 2 RL reasoner, is derived from the query. It is checked whether the query base “covers” the query, and in that case the OWL 2 RL reasoner is used to answer the query; otherwise the tool falls back to the fully-fledged reasoner. Further investigation on the computation of the query base [85] shows that the algorithm is not always able to automatically extract a set of atomic queries, thus compromising the correctness of the approach.

Absorption-based query entailment checking [74] (inspired by the absorption technique introduced by Steigmiller et al. [76]) also falls into the category of hybrid approaches. An input query is rewritten in order to make its entailment more efficiently detected by the model constructed using an extended version of the tableau algorithm. In this sense, the rewritten query is used to identify the individuals that are involved in the entailment of the query and, at the same time, to guide the construction of the model in the tableau algorithm. The technique is sound for CQE for expressive ontology languages, such as $SHIQ$ and $SHOQ$ .

PAGOdA [85] uses a hybrid technique to compute the answers to a query, mixing different approaches. The idea is to compute lower/upper bound approximations to the answers to a query by approximating the input ontology into a less expressive language and possibly provide a fallback (more expensive) algorithm to process the answers in the gap between the bounds. For a more detailed description of the approach see Section 2.2 and [85]. ACQuA builds on top of these techniques.

7.4.3. Ontology approximation

The idea of approximating an expressive language into less expressive (but more tractable) languages has been exploited before. This was first introduced by Selman and Kautz [70] and Del Val [81] in the context of logic theories (both propositional and FOL) and has been applied in the context of ontologies and CQ answering as well. Besides PAGOdA, some of the systems that use ontology approximation to explore and restrict the set of answers to a given CQ are SCREECH [34], TrOWL [80] and SHER [17].

The SCREECH system [34] is able to compute an (unsound or incomplete) approximation of the answers to a query under ground semantics. It achieves that by performing a query dependent (and possibly exponential) rewriting of the input $SHIQ$ ontology to disjunctive Datalog first, and then further to Datalog. Compared to ACQuA, SCREECH can only handle CQ answering under ground semantics over $SHIQ$ ontologies.

TrOWL [80] is a system providing CQ answering capabilities over OWL 2 DL. It uses a semantic approximation [63] technique to transform an OWL 2 DL ontology into OWL 2 QL for CQ answering and a syntactic approximation [67] from OWL 2 DL to OWL 2 EL for TBox reasoning. While being sound and complete for CQ answering, approximations steps in TrOWL are ontology and query dependent, making in harder to reuse partial results in the computation. Moreover, the semantic approximation requires the use of a fully-fledged reasoner to compute a KB approximation whose axioms are valid w.r.t. the input ontology.

The SHER [17] system is a tableau-based reasoner for $SHIN$ which provides instance retrieval capabilities. The system uses a summarization technique to compute an upper bound to the answers to an instance query. Spurious answers are then filtered out by a following relaxation step [16,17]. Again, this system is sound and complete for instance CQ answering for ontologies within the $SHIN$ DL language.

In addition, a way to approximate an OWL 2 ontology into an OWL 2 QL ontology maintaining completeness for instance queries is proposed as part of the filter and refine technique presented by Wandelt et al. [82]. The idea is to transform every axiom $C ⊑ D$ in an OWL 2 ontology into a stronger OWL 2 QL axiom $C^{'} ⊑ D^{'}$ such that C subsumes $C^{'}$ and $D^{'}$ subsumes D. The technique is, however, non-deterministic in nature and the approximation can sometimes lead to an unsatisfiable ontology.

Under the umbrella of approximate reasoning for CQ answering, the query extension technique [26,27] is of particular relevance. This algorithm aims at improving the bounds of the answers by extending the query with additional atoms obtained analysing the input ontology. The resulting query can then be used to restrict the bounds of subqueries of the initial query.

Finally, different notions of approximation for ontology-mediated queries over a selection of expressive languages like $ALC$ and $ALCI$ have been explored [32]. The authors aim at designing polynomial time approximations towards tractable languages like $ELI$ or some restricted classes of TGDs “from below and from above” (lower and upper bounds) with respect to CQs (and other query formalisms as well).

8. Evaluation

We provide here an extensive evaluation over a range of benchmark ontologies. We start by looking at some performance results for RSAComb [41], our implementation of the combined approach for RSA, followed by a comparison of our system ACQuA [42] with PAGOdA.21

²¹
https://github.com/KRR-Oxford/PAGOdA (commit 8651164c).

Section 8.1 provides an in-depth description of the benchmarks used for the evaluation. Ontologies, data, queries, and scripts used to run tests and generate the graphs shown in this section are freely available online [43].

All experiments were performed on an Intel(R) Xeon(R) CPU E5-2640 v3 (2.60GHz) with 16 real cores, extended via hyper-threading to 32 virtual cores, 512 GB of RAM and running Fedora 33, kernel version 5.10.8-200.fc33.x86_64. We were able to make use of the multicore CPU and distribute the computation across cores, especially for intensive tasks offloaded to RDFox.

8.1. Benchmarks and tools

We use two different sets of benchmark ontologies:

the PAGOdA batch mimics the evaluation process originally performed for PAGOdA [85];

the Oxford Ontology Repository (OOR) batch is a subset of the Oxford Ontology Repository,22

²²
http://krr-nas.cs.ox.ac.uk/ontologies/UID/

and it’s used to provide a broader evaluation on a wide range of ontology benchmarks.

The PAGOdA batch consists of a selection of ontologies and benchmark data that comes with the PAGOdA distribution.23

²³

https://www.cs.ox.ac.uk/isg/tools/PAGOdA/2015/jair/

These resources include ontology, data, and queries for:

LUBM and UOBM, standard benchmarks with a data generator (depending on a numerical parameter) and sample queries. When referring to a dataset generated for a particular parameter we will use LUBM(n) and UOBM(n) for some number n. PAGOdA provides an additional set of queries more challenging for the tool.

Reactome and Uniprot, realistic ontologies for which both data and relevant queries are provided. To test scalability, the datasets of these ontologies have been sampled in subsets of increasing size.

A summary of the statistics regarding each of these ontologies can be found in Table 7 where n is the parameter passed to the data generator for LUBM and UOBM.

Table 7

Benchmarks statistics, with LUBM/UOBM data generators depending on a parameter n

	# Axioms	# Facts	# Queries
LUBM(n)	93	$n \times 10^{5}$	35
UOBM(n)	186	$2.6 n \times 10^{5}$	20
Reactome	559	$1.2 \times 10^{7}$	130
Uniprot	442	$1.2 \times 10^{8}$	240

This batch aims at testing the system with a set of well–established benchmark queries. These queries have been defined to stress a system on a broad range of aspects of an answering routine, and vary a lot in terms of complexity and number of answers.

For the OOR batch we selected 126 ontology from the repository with non-empty ABoxes. A summary of the statistics of the ontologies in the repository can be found online.24

²⁴

http://krr-nas.cs.ox.ac.uk/ontologies/readme.htm

Since the Oxford Ontology Repository does not provide any test queries, we generated, for each ontology, a set of sample queries by extracting atomic concept, atomic role and existential patterns from the structure of the ontology. In order to generate a suitable number of queries we used the following step:

Import the ontology into RDFox as RDF triples.

Query for a specific pattern in the ontology, e.g., to retrieve all existential axioms in the ontology.

Convert those patterns into queries.

Using this method, we extracted 14135 concept atomic queries, 4434 role atomic queries and 3893 existential queries for a total of 22462 queries over 126 ontologies. Apart from the basic atomic patterns, we included existential queries of the form

By doing so, we aimed for an empirical confirmation of our ability to produce the correct set of answers under certain answer semantics, for a set of queries that potentially provide different results when considering them under ground semantics.

While the generated queries have a limited number of atoms, the primary aim of this batch is to stress the system with a high number of queries per ontology; this allows us to draw conclusions on the behaviour of the system on a wider variety of test cases.

The collection of SPARQL queries and the scripts to generate them are part of our benchmark distribution [43].

8.2. PAGOdA batch

We now present the test result obtained using the first set of benchmarks. We first tested RSAComb as a standalone system, in order to evaluate its performance and scalability. Later we compare the performance of ACQuA against the original PAGOdA. This is particular usefully since we were able to draw a very close comparison between the two tools and improve upon the observations provided by PAGOdA [85]. This also helped us identify how and when ACQuA’s algorithm outperformed PAGOdA, solving some performance issues with the latter tool.

8.2.1. RSAComb

As part of this work, we introduced RSAComb, an improved implementation of the combined approach algorithm for RSA, released as free and open source software [39]. Given that the original reference implementation [22] was not available when we started this work, and some details about the testing process are not provided, we will not try to draw a comparison between the results provided here and those provided in the original paper.

Our implementation is written in Scala and uses RDFox25

²⁵
https://www.oxfordsemantic.tech/product

as the underlying Datalog reasoner. At the time of writing, development and testing have been carried out using Scala v2.13.5 and RDFox v5.2. We can easily interface Scala with Java libraries and in particular the OWLAPI [35] for easy ontology manipulation. Thanks to the Java wrapper API provided with RDFox we were able to take advantage of a tight integration with the tool and simplify the following integration into ACQuA.

In the following we provide tests result of our system against LUBM [30] and Reactome26

²⁶

https://elixir-europe.org/platforms/data/core-data-resources

using the set of queries originally used in the testing of the combined approach for RSA [22]. All results provided below are averages of at least 3 measurements.

Fig. 8.

Scalability of approximation to RSA and canonical model computation in RSAComb.

In Fig. 8 we show the scalability of our algorithm for the lower bound approximation to RSA and the computation of the canonical model for the approximated ontology. The two steps are query independent and the trend appears to be linear w.r.t. the dataset size, both in LUBM and Reactome; this can be explained by observing that the approximation algorithm involves the materialization of the input dataset against a modified version of the ontology, hence depending on the size of the whole KB;

Fig. 9.

RSAComb answer filtering in Reactome.

The filtering process is instead less dependent on the size of the data and more dependent on its composition and distribution. As such, a bigger dataset does not necessarily correspond to a greater amount of filtering, as shown in Fig. 9, where we reported the execution time for query 1 and 2 in Reactome. This figure also shows how the filtering depends on the data distribution; both queries take longer on a 50% sample of the data than on other datasets (even larger ones) due to its specific content. In general, we noticed that the time spent by the system on the filtering step is considerably lower than the time spent on the canonical model computation (as described below, and shown in Fig. 11).

Fig. 10.

Answer filtering in Query 2 in LUBM.

This unpredictability of the filtering step can “backfire” when a huge amount of filtering is involved. In Fig. 10 we show the filtering time for query 2 in LUBM along with the amount of unfiltered answers that the filtering program needs to process. It is worth noting that less than 1‰ of the unfiltered answers are found to be part of the certain answers. Figure 10 confirms the previous claims that the filtering step grows proportionally to the amount of filtering that is needed for a particular query. Finally, this figure shows how our system is able to handle a gigantic filtering step, processing hundreds of millions of facts in a reasonable amount of time.

Fig. 11.

Percent time distribution of canonical model computation (at the bottom, in blue) and answer filtering (at the top, in yellow) in Reactome.

Finally, Fig. 11 shows how execution time is distributed among the two main tasks of the combined approach. Filtering takes consistently less that 20% of the total execution time, when considering bigger datasets. As mentioned before, we can limit the impact of the canonical model computation by computing it “offline” whenever we find ourselves in a scenario in which we need to perform query answering over a fixed ontology.

8.2.2. ACQuA

We will now provide test results for the PAGOdA batch against ACQuA; this will allow us to draw a direct comparison between the tool and PAGOdA [85], for which the same benchmarks were used during the evaluation phase. During our tests we were able to reproduce the results provided in the original paper except for UOBM, for which PAGOdA does not terminate with a timeout of 10h and outputs no relevant information.

We chose this as a first set of benchmarks because we were able to use the extensive analysis on PAGOdA’s performance to guide our research and easily detect those cases that our system could improve. PAGOdA initially divided its test results into three groups:

queries for which the bounds match;

queries with a non-empty gap, but for which summarization is able to filter out all remaining spurious answers;

queries where HermiT is called on at least one of the test datasets.

When considering RSAComb and PAGOdA, the building blocks of ACQuA, separately, efficiency in the two tools mainly depends on the input ontology and the type of query answered, with PAGOdA showing worse performance when heavily relying on HermiT. On the other hand, when combining the tools into ACQuA, RSAComb is able to further limit the occurrence of these cases, providing better performance overall.

In general ACQuA is able to match PAGOdA’s results in all queries in the (G1-2) groups. This should not come as a surprise, since the already excellent results from PAGOdA weren’t leaving much room for improvement and were showing that more complex CQ answering techniques were not needed for these families of queries. In particular, for query in the G1 group, ACQuA does not perform any additional step other than PAGOdA’s computation of lower and upper bounds (avoiding the use of HermiT altogether).

For this reason we will be focusing on those queries falling in the (G3) group, for which PAGOdA’s performance does not scale well.

Fig. 12.

Scalability of query processing times for LUBM, UOBM and Reactome in ACQuA vs PAGOdA.

According to [85, Section 10.3.2], and partially confirmed by our tests, PAGOdA falls back to HermiT in the following queries to compute the correct set of answers: queries 32 and 34 in LUBM, query 18 in UOBM (for some data sizes) and query 65 in Reactome. Figure 12 sums up the results for our tests. Pre-processing times for the ontology are not taken into account here since the process is common to both tools and, in general, can be computed offline.

LUBM For both queries we were able to compute matching bounds, skipping the call(s) to HermiT altogether. This resulted in a significant improvement on the query processing time (Fig. 12a). It is interesting to notice that in this case the nature of the data and the queries seem to lead to a linear growth with respect to the size of the data.

Additionally, while testing the tools, we noticed that PAGOdA was having some difficulties returning a sound set of certain answers to queries involving existential knowledge, potentially falling back to ground semantics instead.

Example 8.1.

LUBM TBox contains the following axiom describing the fact that each research assistant works for at least one research group $\begin{matrix} (31) & ResearchAssistant ⊑ \exists worksFor . ResearchGroup \end{matrix}$ The following query

should return all 39 instances of $ResearchAssistant$ contained in LUBM(1), but PAGOdA returns 0 answers (which is only correct under ground semantics).

UOBM We could not perform a direct comparison with UOBM since we were unable to reproduce PAGOdA’s results [85], with the tool timing out on a 10h run with no relevant output, even for the smaller datasets. Regardless, we were able to observe a recognizable pattern in the results for query 18. In Fig. 12b, we report our results against an estimate of PAGOdA’s performance; the estimation was carried out by looking at the graphs of the original paper and considering the closest values that the resolution of images allowed us to read. Even in this case we were able to avoid the use of HermiT, consequently improving the query processing time overall.

Reactome We were able to answer query 65 with matching bounds, avoiding again the use of HermiT. This resulted in an improvement of almost 600 seconds for the full Reactome dataset.

Furthermore, we found that the answers returned by PAGOdA for some of the queries in LUBM were only correct if considering CQ answering under ground semantics. Examples of these are query 15–16 from the PAGOdA benchmarks, for which PAGOdA was able to return only an incomplete set of answers.27

²⁷

This is most likely due to a bug in the PAGOdA codebase.

In these cases ACQuA was able to fix the issue and compute the sound and complete set of answers under certain answer semantics, by avoiding the use of PAGOdA overall.

8.3. OOR batch

For the second batch of benchmarks executed on the Oxford Ontology Repository, we were able to identify a set of queries for which PAGOdA requires the use of HermiT for the full computation of the query answers.

Table 8
PAGOdA and ACQuA statistics on OOR batch (over 126 ontologies and 22462 queries)

Ontologies processed Queries executed Non-empty queries

PAGOdA 103 18235 1455

ACQuA 126 22462 2256

	Ontologies processed	Queries executed	Non-empty queries
PAGOdA	103	18235	1455
ACQuA	126	22462	2256

As shown in Table 8, PAGOdA was able to process 103 out of 126 ontologies considered, executing around 81% of the generated queries; ACQuA, on the other hand, was able to process the entire set of ontologies, answering the full suite of generated queries. Around 10% of the queries have a non-empty answer set, and while ACQuA was able to answer all of them, PAGOdA can reliably answer only $\sim 65 %$ .

We identified a set of 18 queries (role atomic queries) over DOLCE [23] for which PAGOdA required the use of HermiT. In these cases, only the lower bound computed by PAGOdA is exact, while ACQuA was able to compute a matching upper bound. This was detected in two different fragments of DOLCE from the repository, corresponding to ontology 14 and 24. Ontology 24 corresponds to the full DOLCE ontology, while ontology 14 is a fragment of 24 partially restricting the ABox. Both ontologies are classified as $SHOIN (D)$ .

In Fig. 13 we provide quantitative and performance results for the queries over ontology 24, where we denote the lower bound, upper bound and query processing time for the corresponding tools with L, U and T respectively. We omit ontology 14 since the results are similar to the ones reported for ontology 24.

In the first 16 queries we obtained comparable results (see Fig. 13). This is understandable since DOLCE is a relatively small ontology (with a very small ABox) and this ended up hiding the performance differences that would potentially appear with larger datasets. Moreover, it should be noted that PAGOdA is able to deal with the larger upper bound by performing a limited amount of calls to HermiT (up to 8). The number of calls increases to 19 in the last two queries; these are also the cases in which we can observe a greater gain in performance using ACQuA.

Fig. 13.

Execution time (T) on DOLCE queries in PAGOdA (red) vs ACQuA (orange), alongside quantitative results for the lower (L) and upper bounds (U) computed by the two tools.

Finally, we found a set of 23 queries across multiple ontologies for which PAGOdA returned an unsound set of answers. This is the whole set of queries that PAGOdA was unable to process. This behaviour was not observed in ACQuA, which was able to compute the correct answers to the queries, without calling PAGOdA.

For the rest of the tested queries PAGOdA and ACQuA had comparable performance and were able to compute matching bounds.

To conclude this section, we provide a list of performance results and improvements highlighted by our evaluation:

RSAComb shows linear scalability for preprocessing and canonical model computation steps. Moreover, the filtering time is lower on average than the canonical model computation;

RSAComb handles huge amounts of data in a reasonable amount of time;

the additional logic introduced by ACQuA is able to outperform PAGOdA in a variety of test cases, improving both the lower and upper bounds;

ACQuA is able to fix some performance issues present in PAGOdA, by computing matching and hence further limiting the use of HermiT.

9. Discussion and conclusions

In this work, we presented a new hybrid query answering architecture that combines black-box services to provide a CQ answering system for OWL. Our system builds upon scalable CQ answering services for tractable restrictions of OWL, combining them with a CQ answering routine for a more expressive language. The technique is based on the computation of lower and upper bounds to the answers to a CQ and their progressive refinement to compute the full set of certain answers. We proposed two novel algorithms to compute lower and upper bounds to the answers to a query via approximation to RSA and ${RSA}^{+}$ , respectively, along with reference implementations, used in two new systems:

RSAComb, an efficient implementation of the combined approach for RSA [22], introducing a new design and the use of RDFox as the underlying Datalog reasoner. The system improves upon the theoretical contribution of the original work by introducing several heuristics, in order to render the algorithm more efficient in practice. The system accepts any OWL 2 KB and includes a customizable approximation step to languages compatible with the RSA combined approach. To this end, we included a reference implementation of the novel approximation algorithms for the computation of answer bounds mentioned above.

ACQuA, a reference implementation of the hybrid architecture combining RSAComb, PAGOdA [85], and HermiT [24] to provide a CQ answering service for OWL. By building ACQuA, we showed how the novel idea of chaining multiple services to refine answer approximations is feasible in practice. Ideally, the services it is built upon can be substituted or paired with more capable ones to improve the system performance–wise.

We provided an extensive evaluation of the systems, first testing scalability and performance of RSAComb as a standalone system and then, comparing ACQuA against PAGOdA. We showed how the additional computational cost introduced by reasoning over a more expressive language like RSA can still provide a significant improvement compared to relying on a fully-fledged reasoner. Additionally, we showed how ACQuA can reliably match PAGOdA’s performance, and further limit performance issues originally present in PAGOdA, especially when the tool has to extensively rely on HermiT.

We intend to further extend this work in a few different directions. The RSAComb-based algorithms for the computation of answer bounds depend on a cycle-detection procedure over a KB dependency graph. We think that altering the traversal of the graph and adopting (query dependent) heuristics in the cycle-detection algorithm could improve the quality of the computed bounds.

Moreover, ACQuA mostly focuses on ontology manipulation for computing bounds and further processing gap answers. While query independent processes can be cached or computed offline, a different, complementary, approach would be to study the problem of computing answer bounds from a query perspective. An example of such a technique for computing bounds to answers to SPARQL queries has been presented by Glimm et al. [27].

To conclude, this work led us to believe that relying on hybrid frameworks and leveraging existing systems for CQ answering is a winning strategy that can render the problem more viable in practice. Thanks to its modularity, this approach can benefit from the broader research in the area of knowledge representation, description logics, and CQ answering.

Footnotes

Proofs

This chapter provides proofs for lemmas and theorems used in Sections 4–5.28

²⁸

These proofs are the result of several discussions with the authors of [22].

In the following we will consider either an RSA or an ${RSA}^{+}$ KB $K = ⟨ O, A ⟩$ (and explicitly state when some result holds only for one of the two languages) and a CQ $q (\vec{x}) = \exists \vec{y} . ψ (\vec{x}, \vec{y})$ . For $P_{K, q}$ , $E_{K}$ and $π {(K)}^{\approx, ⊤}$ , we will refer to their LHMs as $M$ , $M_{c}$ (canonical) and $M_{u}$ (universal), respectively. Note that, by definition of $P_{K, q}$ , it is the case that $M_{c} \subseteq M$ .

We start with the notations concerning terms and atoms. For terms s and t, we write $s ⩽ t$ ( $s < t$ ) iff s is (strictly) contained in t. The root of a term t is its non-functional part, i.e., $root (f_{1} (f_{2} (\dots (f_{n} (a)) \dots))) = a$ . We say that a term t has type $(A, R, B)$ if t is either of the form $v_{R, B}^{A, i}$ or of the form $f_{R, B}^{A} (\cdot)$ .

The derivation level of a ground atom $a = P (\vec{t}) \in M [Π]$ with Π a stratified program, is denoted by $level (a, M [Π])$ and is a pair of natural numbers $(k, l)$ where k denotes the stratum of P and l is the smallest number such that $a \in T_{Π_{k}}^{l} (U)$ , where $U = \emptyset$ , if $k = 1$ , and $U = T_{Π_{k - 1}}^{ω} (U_{i})$ , otherwise. The derivation level of a ground term $t \in terms (M [Π])$ , where Π is a stratified program, is denoted as $level (t, M [Π])$ and is a pair of natural numbers $(k, l)$ , such that t occurs in an atom $a \in M [Π]$ s.t. $level (a, M [Π]) = (k, l)$ but t does not occur in any atom $a \in M [Π]$ such that $level (a, M [Π]) = (k^{'}, l^{'})$ and $k^{'} < k$ , or $k^{'} = k$ and $l^{'} < l$ . When a program Π has only one stratum k, the stratum is dropped from the derivation level of the corresponding atom/term.

We next relate terms in $M_{c}$ and $M_{u}$ to terms in $M_{R S A}$ .

Acknowledgements

This work was supported by the AIDA project (Alan Turing Institute, EP/N510129/1), the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project number 237889), Samsung Research UK, Siemens AG, and the EPSRC projects AnaLOG (EP/P025943/1), OASIS (EP/S032347/1) and UK FIRES (EP/S019111/1). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript (AAM) version arising from this submission.

References

Abiteboul,

Hull and

Vianu, Foundations of Databases, Addison-Wesley, 1995, http://webdam.inria.fr/Alice/ . ISBN 0-201-53771-0.

Acciarri,

Calvanese,

G.D.

Giacomo,

Lembo,

Lenzerini,

Palmieri and

Rosati, QuOnto: Querying ontologies, in: Proceedings, the Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, Pittsburgh, Pennsylvania, USA, July 9–13, 2005,

M.M.

Veloso and

Kambhampati, eds, AAAI Press / The MIT Press, 2005, pp. 1670–1671, http://www.aaai.org/Library/AAAI/2005/isd05-001.php .

Alrabbaa,

Borgwardt,

Koopmann and

Kovtunova, Explaining ontology-mediated query answers using proofs over universal models, in: Rules and Reasoning – 6th International Joint Conference on Rules and Reasoning, RuleML+RR 2022, Proceedings, Berlin, Germany, September 26–28, 2022,

Governatori and

Turhan, eds, Lecture Notes in Computer Science, Vol. 13752, Springer, 2022, pp. 167–182. doi:10.1007/978-3-031-21541-4_11.

Baader,

Brandt and

Lutz, Pushing the EL envelope, in: IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Scotland, UK, July 30–August 5, 2005,

L.P.

Kaelbling and

Saffiotti, eds, Professional Book Center, 2005, pp. 364–369, http://ijcai.org/Proceedings/05/Papers/0372.pdf .

Baader,

Horrocks,

Lutz and

Sattler, An Introduction to Description Logic, Cambridge University Press, 2017.

Baader,

Lutz and

Brandt, Pushing the EL envelope further, in: Proceedings of the Fourth OWLED Workshop on OWL: Experiences and Directions, Washington, DC, USA, 1–2 April 2008,

Clark and

P.F.

Patel-Schneider, eds, CEUR Workshop Proceedings, Vols 496, CEUR-WS.org, 2008, http://ceur-ws.org/Vol-496/owled2008dc_paper_3.pdf .

Bishop,

Kiryakov,

Ognyanoff,

Peikov,

Tashev and

Velkov, OWLIM: A family of scalable semantic repositories, Semantic Web Journal2(1) (2011), 33–42. doi:10.3233/SW-2011-0026.

Broekstra,

Kampman and

van Harmelen, Sesame: A generic architecture for storing and querying RDF and RDF schema, in: The Semantic Web – ISWC 2002, First International Semantic Web Conference, Proceedings, Sardinia, Italy, June 9–12, 2002,

Horrocks and

J.A.

Hendler, eds, Lecture Notes in Computer Science, Vol. 2342, Springer, 2002, pp. 54–68. doi:10.1007/3-540-48005-6_7.

Calvanese,

De Giacomo,

Lembo,

Lenzerini and

Rosati, Data complexity of query answering in description logics, in: Proceedings, Tenth International Conference on Principles of Knowledge Representation and Reasoning, Lake District of the United Kingdom, June 2–5, 2006, AAAI Press, 2006, pp. 260–270.

10.

Calvanese,

De Giacomo,

Lembo,

Lenzerini and

Rosati, Tractable reasoning and efficient query answering in description logics: The DL-lite family, Journal of Automated Reasoning39(3) (2007), 385–429. doi:10.1007/s10817-007-9078-x.

11.

Calvanese,

G.D.

Giacomo,

Lembo,

Lenzerini,

Poggi,

Rodriguez-Muro,

Rosati,

Ruzzi and

D.F.

Savo, The MASTRO system for ontology-based data access, Semantic Web Journal2(1) (2011), 43–53. doi:10.3233/SW-2011-0029.

12.

Calvanese,

G.D.

Giacomo,

Lembo,

Lenzerini and

Rosati, Data complexity of query answering in description logics, Artificial Intelligence195 (2013), 335–360. doi:10.1016/j.artint.2012.10.003.

13.

Carral,

Dragoste and

Krötzsch, The combined approach to query answering in horn-ALCHOIQ, in: Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference, KR 2018, Tempe, Arizona, 30 October–2 November 2018,

Thielscher,

Toni and

Wolter, eds, AAAI Press, 2018, pp. 339–348, https://aaai.org/ocs/index.php/KR/KR18/paper/view/18076 .

14.

Carral,

Feier,

B.C.

Grau,

Hitzler and

Horrocks, Pushing the boundaries of tractable ontology reasoning, in: Proceedings, Part II, The Semantic Web – ISWC 2014 – 13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014, Lecture Notes in Computer Science, Vol. 8797, Springer, 2014, pp. 148–163.

15.

Das,

Sundara and

Cyganiak, R2RML: RDB to RDF Mapping Language, W3C Recommendation, W3C, 2012, https://www.w3.org/TR/2012/REC-r2rml-20120927/.

16.

Dolby,

Fokoue,

Kalyanpur,

Kershenbaum,

Schonberg,

Srinivas and

Ma, Scalable semantic retrieval through summarization and refinement, in: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada, July 22–26, 2007, AAAI Press, 2007, pp. 299–304, http://www.aaai.org/Library/AAAI/2007/aaai07-046.php .

17.

Dolby,

Fokoue,

Kalyanpur,

Schonberg and

Srinivas, Scalable highly expressive reasoner (SHER), Journal of Web Semantics7(4) (2009), 357–361. doi:10.1016/j.websem.2009.05.002.

18.

Eiter,

Fink,

Tompits and

Woltran, On eliminating disjunctions in stable logic programming, in: Principles of Knowledge Representation and Reasoning: Proceedings of the Ninth International Conference (KR2004), Whistler, Canada, June 2–5, 2004,

Dubois,

C.A.

Welty and

Williams, eds, AAAI Press, 2004, pp. 447–458, http://www.aaai.org/Library/KR/2004/kr04-047.php .

19.

Eiter,

Ortiz,

Simkus,

Tran and

Xiao, Towards practical query answering for horn-SHIQ, in: Proceedings of the 2012 International Workshop on Description Logics, DL-2012, Rome, Italy, June 7–10, 2012,

Kazakov,

Lembo and

Wolter, eds, CEUR Workshop Proceedings, Vols 846, CEUR-WS.org, 2012, http://ceur-ws.org/Vol-846/paper_20.pdf .

20.

Eiter,

Ortiz,

Simkus,

Tran and

Xiao, Query rewriting for horn-SHIQ plus rules, in: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, Toronto, Ontario, Canada, July 22–26, 2012,

Hoffmann and

Selman, eds, AAAI Press, 2012, http://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4931 .

21.

Erling and

Mikhailov, Virtuoso: RDF support in a native RDBMS, in: Semantic Web Information Management – A Model-Based Perspective,

R.D.

Virgilio,

Giunchiglia and

Tanca, eds, Springer, 2009, pp. 501–519. doi:10.1007/978-3-642-04329-1_21.

22.

Feier,

Carral,

Stefanoni,

B.C.

Grau and

Horrocks, The combined approach to query answering beyond the OWL 2 profiles, in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, 2015, AAAI Press, 2015, pp. 2971–2977.

23.

Gangemi,

Guarino,

Masolo,

Oltramari and

Schneider, Sweetening ontologies with DOLCE, in: Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, 13th International Conference, EKAW 2002, Proceedings, Siguenza, Spain, October 1–4, 2002,

Gómez-Pérez and

V.R.

Benjamins, eds, Lecture Notes in Computer Science, Vol. 2473, Springer, 2002, pp. 166–181. doi:10.1007/3-540-45810-7_18.

24.

Glimm,

Horrocks,

Motik,

Stoilos and

Wang, HermiT: An OWL 2 reasoner, Journal of Automated Reasoning53(3) (2014), 245–269. doi:10.1007/s10817-014-9305-1.

25.

Glimm,

Horrocks and

Sattler, Conjunctive query entailment for SHOQ, in: Proceedings of the 2007 International Workshop on Description Logics (DL2007), Brixen-Bressanone, Near Bozen-Bolzano, Italy, 8–10 June, 2007,

Calvanese,

Franconi,

Haarslev,

Lembo,

Motik,

Turhan and

Tessaris, eds, CEUR Workshop Proceedings, Vols 250, CEUR-WS.org, 2007, http://ceur-ws.org/Vol-250/paper_63.pdf .

26.

Glimm,

Kazakov,

Kollia and

G.B.

Stamou, OWL query answering based on query extension, in: Proceedings of the 11th International Workshop on OWL: Experiences and Directions (OWLED 2014) Co-Located with 13th International Semantic Web Conference on (ISWC 2014), Riva del Garda, Italy, October 17–18, 2014,

C.M.

Keet and

V.A.M.

Tamma, eds, CEUR Workshop Proceedings, Vol. 1265, CEUR-WS.org, 2014, pp. 1–12, http://ceur-ws.org/Vol-1265/owled2014_submission_1.pdf .

27.

Glimm,

Kazakov,

Kollia and

G.B.

Stamou, Lower and upper bounds for SPARQL queries over OWL ontologies, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 25–30, 2015,

Bonet and

Koenig, eds, AAAI Press, 2015, pp. 109–115, http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9715 .

28.

Glimm,

Lutz,

Horrocks and

Sattler, Conjunctive query answering for the description logic SHIQ, Journal of Artificial Intelligence Research31 (2008), 157–204. doi:10.1613/jair.2372.

29.

B.N.

Grosof,

Horrocks,

Volz and

Decker, Description logic programs: Combining logic programs with description logic, in: Proceedings of the Twelfth International World Wide Web Conference, WWW 2003, Budapest, Hungary, May 20–24, 2003, ACM, 2003, pp. 48–57.

30.

Guo,

Pan and

Heflin, LUBM: A benchmark for OWL knowledge base systems, Journal of Web Semantics3(2–3) (2005), 158–182. doi:10.1016/j.websem.2005.06.005.

31.

Haarslev,

Hidde,

Möller and

Wessel, The RacerPro knowledge representation and reasoning system, Semantic Web3(3) (2012), 267–277. doi:10.3233/SW-2011-0032.

32.

Haga,

Lutz,

Sabellek and

Wolter, How to approximate ontology-mediated queries, in: Proceedings of the 18th International Conference on Principles of Knowledge Representation and Reasoning, KR 2021, Online Event, November 3–12, 2021, 2021, pp. 323–333. doi:10.24963/kr.2021/31.

33.

Harris and

Seaborne, SPARQL 1.1 Query Language, W3C Recommendation, W3C, 2013, https://www.w3.org/TR/2013/REC-sparql11-query-20130321/.

34.

Hitzler,

Krötzsch,

Rudolph and

Tserendorj, Approximate OWL instance retrieval with SCREECH, in: Logic and Probability for Scene Interpretation, 24.02.–29.02.2008,

A.G.

Cohn,

D.C.

Hogg,

Möller and

Neumann, eds, Dagstuhl Seminar Proceedings, Vol. 08091, Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, 2008, http://drops.dagstuhl.de/opus/volltexte/2008/1615/ .

35.

Horridge and

Bechhofer, The OWL API: A Java API for OWL ontologies, Semantic Web2(1) (2011), 11–21. doi:10.3233/SW-2011-0025.

36.

Horrocks and

Tessaris, A conjunctive query language for description logic aboxes, in: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, Texas, USA, July 30–August 3, 2000,

H.A.

Kautz and

B.W.

Porter, eds, AAAI Press / The MIT Press, 2000, pp. 399–404, http://www.aaai.org/Library/AAAI/2000/aaai00-061.php .

37.

Hovland,

Kontchakov,

M.G.

Skjæveland,

Waaler and

Zakharyaschev, Ontology-based data access to slegge, in: Proceedings, Part II, The Semantic Web – ISWC 2017 – 16th International Semantic Web Conference, Vienna, Austria, October 21–25, 2017, Lecture Notes in Computer Science, Vol. 10588, Springer, 2017, pp. 120–129.

38.

Hu,

Motik and

Horrocks, Modular materialisation of datalog programs, in: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, the Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, the Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27–February 1, 2019, AAAI Press, 2019, pp. 2859–2866. doi:10.1609/aaai.v33i01.33012859.

39.

Igne,

Germano and

Horrocks, RSAComb: Combined approach for CQ answering in RSA, in: Proceedings of the 34th International Workshop on Description Logics (DL 2021) Part of Bratislava Knowledge September (BAKS 2021), Bratislava, Slovakia, September 19th to 22nd, 2021,

Homola,

Ryzhikov and

R.A.

Schmidt, eds, CEUR Workshop Proceedings, Vols 2954, CEUR-WS.org, 2021, http://ceur-ws.org/Vol-2954/paper-18.pdf .

40.

Igne,

Germano and

Horrocks, Computing CQ lower-bounds over OWL 2 through approximation to RSA, in: The Semantic Web – ISWC 2021 – 20th International Semantic Web Conference, ISWC 2021, Proceedings, Virtual Event, October 24–28, 2021,

Hotho,

Blomqvist,

Dietze,

Fokoue,

Ding,

P.M.

Barnaghi,

Haller,

Dragoni and

Alani, eds, Lecture Notes in Computer Science, Vol. 12922, Springer, 2021, pp. 200–216. doi:10.1007/978-3-030-88361-4_12.

41.

Igne,

Germano and

Horrocks, RSAComb – Combined approach for Conjunctive Query answering in RSA, Zenodo, 2022. doi:10.5281/zenodo.6564261.

42.

Igne,

Germano and

Horrocks, ACQuA – A hybrid framework providing a CQ answering service for OWL, Zenodo, 2022. doi:10.5281/zenodo.6564388.

43.

Igne,

Germano and

Horrocks, Benchmarks and scripts for ACQuA and RSAComb, Zenodo (2022). doi:10.5281/zenodo.6564995.

44.

Kharlamov,

Hovland,

Jiménez-Ruiz,

Lanti,

Lie,

Pinkel,

Rezk,

M.G.

Skjæveland,

Thorstensen,

Xiao,

Zheleznyakov and

Horrocks, Ontology based access to exploration data at statoil, in: 11–15, 2015, Proceedings, Part II, The Semantic Web – ISWC 2015 – 14th International Semantic Web Conference, Lecture Notes in Computer Science, Vol. 9367, Springer, Bethlehem, PA, USA, 2015, pp. 93–112.

45.

Kollia and

Glimm, Optimizing SPARQL query answering over OWL ontologies, Journal of Artificial Intelligence Research48 (2013), 253–303. doi:10.1613/jair.3872.

46.

Kontchakov,

Lutz,

Toman,

Wolter and

Zakharyaschev, The combined approach to query answering in DL-lite, in: Principles of Knowledge Representation and Reasoning: Proceedings of the Twelfth International Conference, KR 2010, Toronto, Ontario, Canada, May 9–13, 2010, AAAI Press, 2010.

47.

Kontchakov,

Lutz,

Toman,

Wolter and

Zakharyaschev, The combined approach to ontology-based data access, in: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16–22, 2011,

Walsh, ed., IJCAI/AAAI, 2011, pp. 2656–2661. doi:10.5591/978-1-57735-516-8/IJCAI11-442.

48.

Krötzsch,

Rudolph and

Hitzler, Conjunctive Queries for a Tractable Fragment of OWL 1.1, 2007, pp. 310–323. ISBN 978-3-540-76297-3.

49.

Leone,

Pfeifer,

Faber,

Eiter,

Gottlob,

Perri and

Scarcello, The DLV system for knowledge representation and reasoning, ACM Transactions on Computational Logic7(3) (2006), 499–562. doi:10.1145/1149114.1149117.

50.

Lutz,

Seylan,

Toman and

Wolter, The combined approach to OBDA: taming role hierarchies using filters, in: The Semantic Web – ISWC 2013 – 12th International Semantic Web Conference, Proceedings, Part I, Sydney, NSW, Australia, October 21–25, 2013,

Alani,

Kagal,

Fokoue,

Groth,

Biemann,

J.X.

Parreira,

Aroyo,

N.F.

Noy,

Welty and

Janowicz, eds, Lecture Notes in Computer Science, Vol. 8218, Springer, 2013, pp. 314–330. doi:10.1007/978-3-642-41335-3_20.

51.

Lutz,

Toman and

Wolter, Conjunctive query answering in the description logic EL using a relational database system, in: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11–17, 2009,

Boutilier, eds, 2009, pp. 2070–2075, http://ijcai.org/Proceedings/09/Papers/341.pdf.

52.

McBride, Jena: Implementing the RDF model and syntax specification, in: Proceedings of the Second International Workshop on the Semantic Web – SemWeb’2001, Hongkong, China, May 1, 2001,

Decker,

D.A.

Fensel,

A.P.

Sheth and

Staab, eds, CEUR Workshop Proceedings, Vols 40, CEUR-WS.org, 2001, http://CEUR-WS.org/Vol-40/mcbride.pdf .

53.

Motik,

B.C.

Grau,

Horrocks,

Wu,

Fokoue and

Lutz, OWL 2 Web Ontology Language Profiles (Second Edition), W3C Recommendation, W3C, 2012, https://www.w3.org/TR/2012/REC-owl2-profiles-20121211/.

54.

Motik,

Nenov,

Piro and

Horrocks, Handling owl: SameAs via rewriting, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 25–30, 2015,

Bonet and

Koenig, eds, AAAI Press, 2015, pp. 231–237.

55.

Motik,

Nenov,

Piro and

Horrocks, Incremental update of datalog materialisation: The backward/forward algorithm, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 25–30, 2015,

Bonet and

Koenig, eds, AAAI Press, 2015, pp. 1560–1568.

56.

Motik,

Nenov,

Piro and

Horrocks, Combining rewriting and incremental materialisation maintenance for datalog programs with equality, in: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25–31, 2015,

Yang and

M.J.

Wooldridge, eds, AAAI Press, 2015, pp. 3127–3133.

57.

Motik,

Nenov,

Piro,

Horrocks and

Olteanu, Parallel materialisation of datalog programs in centralised, main-memory RDF systems, in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, Québec, Canada, July 27–31, 2014,

C.E.

Brodley and

Stone, eds, AAAI Press, 2014, pp. 129–137.

58.

Motik,

Patel-Schneider and

Parsia, OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax (Second Edition), W3C Recommendation, W3C, 2012, https://www.w3.org/TR/2012/REC-owl2-syntax-20121211/.

59.

Motik,

Shearer and

Horrocks, Hypertableau reasoning for description logics, Journal of Artificial Intelligence Research36 (2009), 165–228. doi:10.1613/jair.2811.

60.

Nenov,

Piro,

Motik,

Horrocks,

Wu and

Banerjee, RDFox: A highly-scalable RDF store, in: 11–15, 2015, Proceedings, Part II, The Semantic Web – ISWC 2015 – 14th International Semantic Web Conference,

Arenas,

Ó.

Corcho,

Simperl,

Strohmaier,

d’Aquin,

Srinivas,

Groth,

Dumontier,

Heflin,

Thirunarayan and

Staab, eds, Lecture Notes in Computer Science, Vol. 9367, Springer, Bethlehem, PA, USA, 2015, pp. 3–20.

61.

Orsi and

Pieris, Optimizing query answering under ontological constraints, Proceedings of the VLDB Endowment4(11) (2011), 1004–1015, http://www.vldb.org/pvldb/vol4/p1004-orsi.pdf . doi:10.14778/3402707.3402737.

62.

Ortiz and

Simkus, Reasoning and query answering in description logics, in: Reasoning Web. Semantic Technologies for Advanced Query Answering – 8th International Summer School 2012, Proceedings, Vienna, Austria, September 3–8, 2012,

Eiter and

Krennwallner, eds, Lecture Notes in Computer Science, Vol. 7487, Springer, 2012, pp. 1–53. doi:10.1007/978-3-642-33158-9_1.

63.

J.Z.

Pan and

Thomas, Approximating OWL-DL ontologies, in: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada, July 22–26, 2007, AAAI Press, 2007, pp. 1434–1439.

64.

Pérez-Urbina,

Horrocks and

Motik, Efficient query answering for OWL 2, in: The Semantic Web – ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25–29, 2009, Proceedings, Lecture Notes in Computer Science, Vol. 5823, Springer, 2009, pp. 489–504. doi:10.1007/978-3-642-04930-9_31.

65.

Pérez-Urbina,

Motik and

Horrocks, Tractable query answering and rewriting under description logic constraints, J. Appl. Log.8(2) (2010), 186–209. doi:10.1016/j.jal.2009.09.004.

66.

Poggi,

Lembo,

Calvanese,

G.D.

Giacomo,

Lenzerini and

Rosati, Linking data to ontologies, J. Data Semant.10 (2008), 133–173. doi:10.1007/978-3-540-77688-8_5.

67.

Ren,

Gröner,

Lemcke,

Rahmani,

Friesen,

Zhao,

J.Z.

Pan and

Staab, Validating process refinement with ontologies, in: Proceedings of the 22nd International Workshop on Description Logics (DL 2009), Oxford, UK, July 27–30, 2009,

B.C.

Grau,

Horrocks,

Motik and

Sattler, eds, CEUR Workshop Proceedings, Vol. 477, CEUR-WS.org, http://ceur-ws.org/Vol-477/paper_59.pdf.

68.

Ren,

J.Z.

Pan,

Guclu and

M.J.

Kollingbaum, A combined approach to incremental reasoning for EL ontologies, in: Web Reasoning and Rule Systems – 10th International Conference, RR 2016, Aberdeen, UK, September 9–11, 2016, Proceedings, Lecture Notes in Computer Science, Vol. 9898, Springer, 2016, pp. 167–183. doi:10.1007/978-3-319-45276-0_13.

69.

Rosati, On conjunctive query answering in EL, in: Proceedings of the 2007 International Workshop on Description Logics (DL2007), Brixen-Bressanone, Near Bozen-Bolzano, Italy, 8–10 June, 2007,

Calvanese,

Franconi,

Haarslev,

Lembo,

Motik,

Turhan and

Tessaris, eds, CEUR Workshop Proceedings, Vols 250, CEUR-WS.org, 2007, http://ceur-ws.org/Vol-250/paper_83.pdf .

70.

Selman and

H.A.

Kautz, Knowledge compilation and theory approximation, Journal of the ACM43(2) (1996), 193–224. doi:10.1145/226643.226644.

71.

Sirin,

Parsia,

B.C.

Grau,

Kalyanpur and

Katz, Pellet: A practical OWL-DL reasoner, Journal of Web Semantics5(2) (2007), 51–53. doi:10.1016/j.websem.2007.03.004.

72.

Stefanoni and

Motik, Answering conjunctive queries over EL knowledge bases with transitive and reflexive roles, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, Texas, USA, January 25–30, 2015,

Bonet and

Koenig, eds, AAAI Press, 2015, pp. 1611–1617, http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9310 .

73.

Stefanoni,

Motik and

Horrocks, Introducing nominals to the combined query answering approaches for EL, in: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, Bellevue, Washington, USA, July 14–18, 2013,

desJardins and

M.L.

Littman, eds, AAAI Press, 2013, http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6156 .

74.

Steigmiller and

Glimm, Absorption-based query entailment checking for expressive description logics, in: Proceedings of the 32nd International Workshop on Description Logics, Oslo, Norway, June 18–21, 2019,

Simkus and

G.E.

Weddell, eds, CEUR Workshop Proceedings, Vols 2373, CEUR-WS.org, 2019, http://ceur-ws.org/Vol-2373/paper-25.pdf .

75.

Steigmiller and

Glimm, Parallelised ABox reasoning and query answering with expressive description logics, in: The Semantic Web – 18th International Conference, ESWC 2021, Virtual Event, Proceedings, June 6–10, 2021,

Verborgh,

Hose,

Paulheim,

Champin,

Maleshkova,

Ó.

Corcho,

Ristoski and

Alam, eds, Lecture Notes in Computer Science, Vol. 12731, Springer, 2021, pp. 23–39. doi:10.1007/978-3-030-77385-4_2.

76.

Steigmiller,

Glimm and

Liebig, Reasoning with nominal schemas through absorption, Journal of Automated Reasoning53(4) (2014), 351–405. doi:10.1007/s10817-014-9310-4.

77.

Steigmiller,

Liebig and

Glimm, Konclude: System description, Journal of Web Semantics (JWS)27 (2014), 78–85. doi:10.1016/j.websem.2014.06.003.

78.

Stoilos, Hydrowl: A hybrid query answering system for OWL 2 DL ontologies, in: Web Reasoning and Rule Systems – 8th International Conference, RR 2014, Proceedings, Athens, Greece, September 15–17, 2014,

Kontchakov and

Mugnier, eds, Lecture Notes in Computer Science, Vol. 8741, Springer, 2014, pp. 230–238. doi:10.1007/978-3-319-11113-1_20.

79.

Stoilos, Ontology-based data access using rewriting, OWL 2 RL systems and repairing, in: The Semantic Web: Trends and Challenges – 11th International Conference, ESWC 2014, Proceedings, Anissaras, Crete, Greece, May 25–29, 2014,

Presutti,

d’Amato,

Gandon,

d’Aquin,

Staab and

Tordai, eds, Lecture Notes in Computer Science, Vol. 8465, Springer, 2014, pp. 317–332. doi:10.1007/978-3-319-07443-6_22.

80.

Thomas,

J.Z.

Pan and

Ren, TrOWL: Tractable OWL 2 reasoning infrastructure, in: Proceedings, Part II, The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, May 30–June 3, 2010,

Aroyo,

Antoniou,

Hyvönen,

ten Teije,

Stuckenschmidt,

Cabral and

Tudorache, eds, Lecture Notes in Computer Science, Vol. 6089, Springer, 2010, pp. 431–435. doi:10.1007/978-3-642-13489-0_38.

81.

A.D.

Val, First order LUB approximations: Characterization and algorithms, Artif. Intell.162(1–2) (2005), 7–48. doi:10.1016/j.artint.2004.01.003.

82.

Wandelt,

Möller and

Wessel, Towards scalable instance retrieval over ontologies, International Journal of Software and Informatics4(3) (2010), 201–218, http://www.ijsi.org/ch/reader/view_abstract.aspx?file_no=i59 .

83.

Xiao,

Calvanese,

Kontchakov,

Lembo,

Poggi,

Rosati and

Zakharyaschev, Ontology-based data access: A survey, in: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, July 13–19, 2018, ijcai.org, 2018, pp. 5511–5519.

84.

Zhou, PAGOdA: Pay-as-you-go ontology query answering using a datalog reasoner, PhD thesis, University of Oxford, UK, 2015, https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.711817.

85.

Zhou,

B.C.

Grau,

Nenov,

Kaminski and

Horrocks, PAGOdA: Pay-as-you-go ontology query answering using a datalog reasoner, Journal of Artificial Intelligence Research54 (2015), 309–367. doi:10.1613/jair.4757.

(1)	$ψ (\vec{x}, \vec{y}) \to QM (\vec{x}, \vec{y})$
(2)	$\to named (a)$ for each constant a in $O$
(3a)	$QM (\vec{x}, \vec{y}), not NI (y_{i}) \to id (\vec{x}, \vec{y}, i, i)$ for each $1 ⩽ i ⩽ \| \vec{y} \|$
(3b)	$id (\vec{x}, \vec{y}, u, v) \to id (\vec{x}, \vec{y}, v, u)$
(3c)	$id (\vec{x}, \vec{y}, u, v), id (\vec{x}, \vec{y}, v, w) \to id (\vec{x}, \vec{y}, u, w)$
(4a)	for all $R (s, y_{i})$ , $S (t, y_{j})$ in q with $y_{i}, y_{j} \in \vec{y}$
	$R^{f} (s, y_{i}) \land S^{f} (t, y_{j}) \land id (\vec{x}, \vec{y}, i, j) \land not s \approx t \to fk (\vec{x}, \vec{y})$
(4b)	for all $R (s, y_{i})$ , $S (y_{j}, t)$ in q with $y_{i}, y_{j} \in \vec{y}$
	$R^{f} (s, y_{i}) \land S^{b} (y_{j}, t) \land id (\vec{x}, \vec{y}, i, j) \land not s \approx t \to fk (\vec{x}, \vec{y})$
(4c)	for all $R (y_{i}, s)$ , $S (y_{j}, t)$ in q with $y_{i}, y_{j} \in \vec{y}$
	$R^{b} (y_{i}, s) \land S^{b} (y_{j}, t) \land id (\vec{x}, \vec{y}, i, j) \land not s \approx t \to fk (\vec{x}, \vec{y})$
	for all $R (y_{i}, y_{j})$ , $S (y_{k}, y_{l})$ in q with $y_{i}, y_{j}, y_{k}, y_{l} \in \vec{y}$
(5a)	$R^{f} (y_{i}, y_{j}) \land S^{f} (y_{k}, y_{l}) \land id (\vec{x}, \vec{y}, j, l) \land y_{i} \approx y_{k} \land not NI (y_{i}) \to id (\vec{x}, \vec{y}, i, k)$
(5b)	$R^{f} (y_{i}, y_{j}) \land S^{b} (y_{k}, y_{l}) \land id (\vec{x}, \vec{y}, j, k) \land y_{i} \approx y_{l} \land not NI (y_{i}) \to id (\vec{x}, \vec{y}, i, l)$
(5c)	$R^{b} (y_{i}, y_{j}) \land S^{b} (y_{k}, y_{l}) \land id (\vec{x}, \vec{y}, i, k) \land y_{j} \approx y_{l} \land not NI (y_{j}) \to id (\vec{x}, \vec{y}, j, l)$
(6)	for each $R (y_{i}, y_{j})$ in q with $y_{i}, y_{j} \in \vec{y}$ and $* \in {f, b}$
	$R^{} (y_{i}, y_{j}) \land id (\vec{x}, \vec{y}, i, v) \land id (\vec{x}, \vec{y}, j, w) \to {AQ}^{} (\vec{x}, \vec{y}, v, w)$
	for each $* \in {f, b}$
(7a)	${AQ}^{} (\vec{x}, \vec{y}, u, v) \to {TQ}^{} (\vec{x}, \vec{y}, u, v)$
(7b)	${AQ}^{} (\vec{x}, \vec{y}, u, v) \land {TQ}^{} (\vec{x}, \vec{y}, v, w) \to {TQ}^{*} (\vec{x}, \vec{y}, u, w)$
(8a)	$QM (\vec{x}, \vec{y}) \land not named (x) \to sp (\vec{x}, \vec{y})$ for each $x \in \vec{x}$
(8b)	$fk (\vec{x}, \vec{y}) \to sp (\vec{x}, \vec{y})$
(8c)	${TQ}^{} (\vec{x}, \vec{y}, v, v) \to sp (\vec{x}, \vec{y})$ for each $ \in {f, b}$
(9)	$QM (\vec{x}, \vec{y}) \land not sp (\vec{x}, \vec{y}) \to Ans (\vec{x})$

Conjunctive query answering over unrestricted OWL 2 ontologies

Abstract

Keywords

1. Introduction

2.1. Ontologies and conjunctive query answering

4 In order to achieve decidability of reasoning, SROIQ ontologies must satisfy certain additional requirements. These, however, do not affect the technical results reported in this paper.

6 In the following we consider the input KB to be consistent and normalized. This is ensured by PAGOdA’s preprocessing step.

Theorem 2.1 ([14], Theorem 2).

Definition 2.3 ([22], Definition 3).

9 An oriented forest is a directed acyclic graph whose underlying undirected graph is a forest.

2.3.1. Combined approach for RSA

Theorem 2.3 ([22],Theorem 3).

Theorem 2.4 ([22], Theorem 4).

Theorem 2.5 ([22], Theorem 5).

10 It is worth noting that, to the best of our knowledge, at the moment a similar bound for RSA + is not know.

4.1. Approximation to ALCHOIQ

4.2. Approximation to Horn- ALCHOIQ

13 While axioms of type (T4) do not use disjunction explicitly, their translation into definite rules involve disjunction in the head of the rule.

Theorem 4.1. Let K ′ = ⟨ O ′ , A ⟩ be the ALCHOIQ restriction of the KB K = ⟨ O , A ⟩ . Moreover, let K ″ = ⟨ shift ( O ′ ) , A ⟩ . Then cert ( q , K ″ ) ⊆ cert ( q , K ′ ) . Proof. See the Appendix. □ 4.3. Approximation to RSA

5.1. ⊥ substitution

5.2. Approximation of disjunctive rules

14 This is equivalent to rewriting the axiom as A ⊑ ∃ R . { c } , { c } ⊑ B .

6. Design and architecture

6.1. RSAComb

15 https://docs.oxfordsemantic.tech/tuple-tables.html#rdfox-skolem

19 https://www.w3.org/TR/2012/REC-owl2-mapping-to-rdf-20121211/

7. Related work

7.2. Reduction to entailment checking

7.3. Materialisation-based reasoners

7.4. Ontology-mediated query rewriting

20 Often expressed in the W3C standard R2RML language [15].

7.4.2. Hybrid approaches

7.4.3. Ontology approximation

8. Evaluation

21 https://github.com/KRR-Oxford/PAGOdA (commit 8651164c).

22 http://krr-nas.cs.ox.ac.uk/ontologies/UID/

8.2.1. RSAComb

25 https://www.oxfordsemantic.tech/product

Table 8 PAGOdA and ACQuA statistics on OOR batch (over 126 ontologies and 22462 queries) Ontologies processed Queries executed Non-empty queries PAGOdA 103 18235 1455 ACQuA 126 22462 2256

Footnotes

Proofs

Acknowledgements

References

⁴
In order to achieve decidability of reasoning, $SROIQ$ ontologies must satisfy certain additional requirements. These, however, do not affect the technical results reported in this paper.

⁶
In the following we consider the input KB to be consistent and normalized. This is ensured by PAGOdA’s preprocessing step.

⁹
An oriented forest is a directed acyclic graph whose underlying undirected graph is a forest.

¹⁰
It is worth noting that, to the best of our knowledge, at the moment a similar bound for ${RSA}^{+}$ is not know.

4.1. Approximation to $ALCHOIQ$

4.2. Approximation to Horn- $ALCHOIQ$

¹³
While axioms of type (T4) do not use disjunction explicitly, their translation into definite rules involve disjunction in the head of the rule.

Theorem 4.1.
Let $K^{'} = ⟨ O^{'}, A ⟩$ be the $ALCHOIQ$ restriction of the KB $K = ⟨ O, A ⟩$ . Moreover, let $K^{″} = ⟨ shift (O^{'}), A ⟩$ . Then $cert (q, K^{″}) \subseteq cert (q, K^{'})$ .
Proof.
See the Appendix. □

4.3. Approximation to RSA

¹⁴
This is equivalent to rewriting the axiom as $A ⊑ \exists R . {c}, {c} ⊑ B$ .

¹⁵
https://docs.oxfordsemantic.tech/tuple-tables.html#rdfox-skolem

¹⁹
https://www.w3.org/TR/2012/REC-owl2-mapping-to-rdf-20121211/

²⁰
Often expressed in the W3C standard R2RML language [15].

²¹
https://github.com/KRR-Oxford/PAGOdA (commit 8651164c).

²²
http://krr-nas.cs.ox.ac.uk/ontologies/UID/

²⁵
https://www.oxfordsemantic.tech/product

Table 8
PAGOdA and ACQuA statistics on OOR batch (over 126 ontologies and 22462 queries)

Ontologies processed Queries executed Non-empty queries

PAGOdA 103 18235 1455

ACQuA 126 22462 2256