Distributed and consistent multi-image feature matching via QuickMatch

Abstract

In this work, we consider the multi-image object matching problem in distributed networks of robots. Multi-image feature matching is a keystone of many applications, including Simultaneous Localization and Mapping, homography, object detection, and Structure from Motion. We first review the QuickMatch algorithm for multi-image feature matching. We then present NetMatch, an algorithm for distributing sets of features across computational units (agents) that largely preserves feature match quality and minimizes communication between agents (avoiding, in particular, the need to flood all data to all agents). Finally, we present an experimental application of both QuickMatch and NetMatch on an object matching test with low-quality images. The QuickMatch and NetMatch algorithms are compared with other standard matching algorithms in terms of preservation of match consistency. Our experiments show that QuickMatch and Netmatch can scale to larger numbers of images and features, and match more accurately than standard techniques.

Keywords

Computer vision feature matching object matching distributed matching multi-image matching

1. Introduction

Matching is a fundamental operation common in robotics and computer vision algorithms such as Structure from Motion (SfM), Simultaneous Localization and Mapping (SLAM), and object detection and tracking. Matching in these domains is typically performed on feature descriptors (such as Scale Invariant Feature Transform (SIFT), Oriented FAST and Rotated BRIEF (ORB), and Speeded Up Robust Features (SURF) by Lowe (2004), Rublee et al. (2011), and Bay et al. (2008), respectively) which extract discriminative characteristics from high-dimensional data (i.e., an image). At a fundamental level, these features are used to associate unique objects in the universe to their appearance in multiple views. These different views might be acquired by a geographically dispersed camera or robotic network, which can produce large amounts of data. It is therefore important to have the ability to match features across multiple images (i.e., perform multi-image matching) efficiently, consistently, accurately, and, ideally, in a distributed manner. For instance, match quality is key in settings such as SfM and SLAM, because the quality of the reconstruction is a direct result of how well the features match across images. The speed of image matching is extremely critical in applications such as real-time object detection. Matching consistency of features across many images is also critical for SfM and SLAM since feature points must be tracked not just between images, but across multi-image sequences for the best results. The pervasiveness of graphical processing units (GPUs), networked systems, distributed robotics, cloud computing, and multi-core processors has made the distributability of computer vision solutions relevant to their applicability. These tools have also shown great advantages in speed and bandwidth for other classic algorithms (Garcia et al., 2010; Warn et al., 2009). To this end, we introduce a novel version of our matching algorithm that distributes the computational load of the feature matching problem over multiple agents; the resulting algorithm can handle considerably more features, with only a slight performance loss, while minimizing the amount of necessary communications across nodes.

1.1. Problem overview and contributions

We propose a solution to the following problem: given a set of images taken from a team of robots or a camera network, match unique object features in a distributed manner, as they appear and disappear in the images taken from multiple perspectives. This problem arises often in object detection, localization, and tracking (Bradski, 2000; Cunningham et al., 2012; Zhou et al., 2015), homography estimation (Szeliski, 2010), SfM (Hartley and Ziserman, 2017), and formation control (Montijano et al., 2016). Solutions to this problem are traditionally computationally complex, and often mismatch features when considering more than two images (Bradski, 2000; Lowe, 2004), by either missing matches between two or more views of the same entity in the universe, or by introducing associations between separate universe entities. Multi-image correspondences allow for greater match reliability and a more accurate representation of objects in the universe. Our proposed solution leverages a relatively recent algorithm, QuickMatch (Tron et al., 2017), to quickly and reliably discover correspondences across multiple images. The experiments presented in this article benchmark QuickMatch’s performance by tracking a target object and estimating its trajectory under realistic conditions (i.e., over images with clutter, repeated structures, and poor image quality). We then propose NetMatch, an extension of QuickMatch to a distributed computation setting, which largely preserves match quality while minimizing communication across agents.

1.2. Related work

Feature matching is a basic step in many computer vision algorithms. Pairwise matching is the classical approach to this task, where feature descriptors between two images are compared based on a distance metric (e.g., Euclidean or Manhattan distance), and declared a match if this distance is below some threshold (Lowe, 2004; Vedaldi and Fulkerson, 2008). Pairwise matching is often posed as a nearest-neighbor search problem (Dollar and Zitnick, 2013; Garcia et al., 2010). This method is used in two standard algorithms, Brute Force (BF) matching (Bradski, 2000), and Fast Library for Approximate Nearest Neighbors (FLANN) matching (Muja and Lowe, 2014). Pairwise matching has difficulties consistently matching entities with repetitive structure or similar appearance (e.g., adjacent windows on a building facade) because the distance metric alone does not consider the distinctiveness (smallest distance between features from the same image) of the features. Including distinctiveness of features during matching has been shown to be beneficial (Lowe, 2004). For multi-image matching, pairwise matches scale poorly with the number of images and across multiple images, and match correspondences often do not belong to the same ground-truth object (and, hence, link together correspondences between different entities). Graph matching has also been used for pairwise matching. This approach attempts to match vertices (features) and edges (matches) simultaneously to determine better pairwise matches shown by Yan et al. (2015a) and Yan et al. (2015b), but it cannot handle the multi-image setting for the same reasons as above.

Beyond pairwise matching, a number of other approaches exist for multi-image matching (where multiple images are directly considered) that are based on optimization, graphs, and clustering. Optimization-based approaches are based upon non-convex problems where optimization constraints must often be relaxed to reliably obtain solutions (Chen et al., 2014; Leonardos et al., 2017; Oliveira et al., 2005; Pachauri et al., 2013; Yan et al., 2015a). Moreover, these approaches typically require a priori knowledge the number of objects, which is often not available, and they do not consider distinctiveness of the features. Approaches based on the explicit consideration of cycles in graphs are early predecessors to the QuickMatch algorithm and have largely been used to remove inconsistent matches as shown by Huang and Guibas (2013). Cycle consistency has also recently been used in a distributed manner to perform matching by Hu et al. (2018) and Aragues et al. (2011). Clustering can be cast as finding clusters of similar features. Algorithms such as k-means (Mackay, 2003) and spectral clustering (Ng et al., 2002) have been explored to this end, but also often require a predefined number of objects, and do not consider that a unique feature only occurs once in an image, meaning repeated structures are also often called a match.

Pairwise matching can also be approached as an unsupervised learning problem, and therefore there are a number of distributed approaches to the classic algorithms, especially for the k-nearest neighbor problem. The k-nearest neighbor problem has been extensively explored in settings such as friend suggestion on Facebook, image classification, and recommendation systems. For this reason, many parallelized approaches have been introduced by Johnson et al. (2019), Andre et al. (2017), Gieseke et al. (2014), and others. In a feature matching setting, a CUDA-based distributed approach to the BF algorithm has been proposed that is approximately 100× faster than its centralized counterpart. Beyond matching, parallel computing has also been applied to feature extraction algorithms such as SIFT (Warn et al., 2009), offering speed increases along the entire feature matching pipeline. These approaches still do not consider the multi-image matching problem, however they offer a strong motivation for distributing existing multi-image matching techniques. Machine learning techniques have also been used to extract more meaningful features that are characteristic of objects in general, which adds a layer of semantic understanding to the multi-image matching problem (Hariharan et al., 2015; Novotny et al., 2017; Wang et al., 2018).

QuickMatch and NetMatch are grounded in density-based clustering algorithms (see the work by Fukunaga and Hostetler (1975), Vedaldi and Soatto (2008), Ester et al. (1996), and Hu et al. (2017), for examples), which find clusters by estimating a non-parametric density distribution of data (Parzen, 1962; Rosenblatt, 1956). These approaches do not require prior knowledge of the number or shape of clusters, and can be modified to include feature distinctiveness by construction.

This article is an extension of previous work presented at the International Symposium of Experimental Robotics (Serlin et al., 2018) and the International Conference on Computer Vision (Tron et al., 2017). The primary extensions from this prior work are as follows.

We introduce a method for distributing the matching problem across multiple agents based on a feature space partition. While this method is agnostic to the choice of the local clustering method, in this article we propose NetMatch, which extends QuickMatch to the distributed setting.

We experimentally demonstrate that QuickMatch and NetMatch outperform traditional methods in a realistic robotic application (tracking). Moreover, we show that the performance of NetMatch is comparable with its centralized counterpart, while using minimal network communications.

1.3. Outline

The remainder of this article is structured as follows. We begin by introducing preliminary concepts for features, feature space partitions, and multi-image matching. We then present the multi-image matching problem. Next, we present first the centralized, and then decentralize solutions to the presented problem, and we detail the theory of the approach preformed in the experiment. We then present simulations to compare the centralized and distributed solutions. Finally, we present the experiment preformed and its results.

2. Preliminaries

2.1. Images and features

We consider a set of images $I = {1, \dots, i, \dots, N}$ . From each image $i \in I$ , we extract a set of features $K_{i}$ with a single feature vector denoted $x_{ik} \in R^{F}$ , where F is the dimension of the feature space (e.g., for SIFT, $F = 128$ ), i is the image index, and $k \in K_{i}$ is the feature index in that image. We denote the set of all features as $K_{I}$ and the cardinality of a set as $| K_{I} |$ .

2.2. Agents and feature space partitions

In the distributed setting, we consider a set of computational units, or agents, $A = {1, 2, \dots, a, \dots, m}$ where $a \in A$ denotes a single agent. The feature space is denoted as $Y = R^{F}$ , and is partitioned into m convex Voronoi tessellation regions, where each region is assigned to a single agent as $Y_{a} \subseteq Y$ with $⋃_{a \in A} Y_{a} = Y$ . The Voronoi partition center or seed of each $Y_{a}$ is given as $P_{a} \in R^{F}$ . We define a labeling function $ℓ : x_{ik} \to a$ that maps a feature $x_{ik}$ to a given agent depending on which $Y_{a}$ it exists in. This labeling function has a set-valued inverse, $ℓ^{- 1} : a \to {x_{ik}}$ , which returns the set of features contained in any given agent region.

2.3. Multi-image matching

The multi-image matching problem presented here is summarized from Tron et al. (2017), and is posed in the light of both cycle consistency in a graph of features and clustering. We refer the reader to the original paper for detailed proofs regarding the equivalence between these two interpretations.

We begin by assuming that the feature space $R^{F}$ has a distance metric defined as $d (x_{ik}, x_{i' k'}) \to R \geq 0$ between two features. We denote the set of all features $X = {x_{ik}}_{k \in 1 \dots N}^{i \in I}$ . When two features are a match, i.e., $x_{i_{1} k_{1}} \to x_{i_{2} k_{2}}$ , $i_{1} \neq i_{2}$ , we mean that the two features represent the same entity. Based on this type of correspondence, we define a directed graph of the matches as $G = (V, E)$ with $V = {x_{ik}}$ and $E \subset V \times V$ is the set of matches $x_{i_{1} k_{1}} \to x_{i_{2} k_{2}} \in E$ .

We define multi-image matches as subsets of the features in $X$ . We therefore define, given a subset of features $C \subset V$ , a subgraph of $G$ restricted to $C$ as $G |_{C} = (C, E')$ with $E' = {x_{i_{1} k_{1}} \to x_{i_{2} k_{2}} \in E : x_{i_{1} k_{1}}, x_{i_{2} k_{2}} \in C}$ . A graph $G = (V, E)$ is connected if for any $x_{i_{1} k_{1}}, x_{i_{n} k_{n}} \in V$ there exist a sequence $x_{i_{1} k_{1}}, x_{i_{2} k_{2}}, \dots, x_{i_{n} k_{n}}$ , called a path, such that $x_{i_{j}, k_{j}} \to x_{i_{j + 1}, k_{j + 1}} \in E \forall j \in {1, \dots, n - 1}$ , and a cycle is a path where the start and end features are the same, or $x_{i_{1} k_{1}} = x_{i_{n} k_{n}}$ . We define a clique as a graph $G$ with $x_{i_{1} k_{1}} \to x_{i_{2} k_{2}} \in E \forall x_{i_{1} k_{1}}, x_{i_{n} k_{n}} \in V$ .

A set of multi-image matches is a set $M = {C_{c}}$ of clusters $C_{c} = {x_{i_{1} k_{1}}, \dots, x_{i_{n} k_{n}}}$ , where each one corresponds to a single entity in the universe, such that:

(C1) $M$ is a partition of $X$ (i.e., each feature $x_{ik}$ appears exactly in one set $C_{c}$ );

(C2) each set $C_{c}$ has at most one feature per image;

(C3) there is an induced directed graph $G_{M} = (X, E_{M})$ of pairwise matches such that, for every $C_{c} \in M$ , the subgraph $G_{ℳ} | C_{c}$ is a clique (i.e., $G_{M}$ is a union of cliques).

Conditions (C1) and (C2) ensure that the same entity cannot appear in multiple clusters, and an entity cannot appear more than once per image. Condition (C3) requires that features from the same cluster always match.

Given the above statements, three properties are implied about $G_{M}$ .

(P1) Symmetry: $x_{i_{1} k_{1}} \to x_{i_{2} k_{2}} \in E_{M}$ implies $x_{i_{2} k_{2}} \to x_{i_{1} k_{1}} \in E_{M}$ .

(P2) Cycle constraint: Given a path $x_{i_{1} k_{1}}, \dots, x_{i_{n} k_{n}}$ in $G_{M}$ , having $i_{1} = i_{n}$ implies $k_{1} = k_{n}$ meaning the path is a cycle.

(P3) Single match: A feature cannot correspond to two different features in another image, meaning that if $x_{i_{1} k_{1}} \to x_{i_{2} k_{2}}$ and $x_{i_{1} k_{1}} \to x_{i_{2} {k'}_{2}}$ belong to $G_{M}$ then $k_{2} = k'_{2}$ .

3. Problem statement

Given a set of images $I = {1, 2, \dots, i, \dots, N}$ and a set of $K_{i}$ feature vectors, $x_{ik}$ , extracted from each image, determine matches ( $x_{i_{1} k_{1}} \leftrightarrow x_{i_{2} k_{2}} : i_{1} \neq i_{2}$ ) between features from separate images, such that matched features represent the same point in the scene, and build a set of multi-image matches $M$ .

4. Multi-image matching via QuickMatch and NetMatch

We present both a centralized and decentralized solution to the stated problem. It is important to note that this problem cannot be solved with pairwise matches alone, as they are not capable of considering cliques directly; specifically, they cannot prevent the violation of (C3). The centralized solution introduces the QuickMatch algorithm and demonstrates its effectiveness with a two-stage, offline, centralized implementation on a system of distributed ground robots and a central computer. Features are first extracted using off-the-shelf feature extraction methods (SIFT), and the features are then matched using the QuickMatch algorithm to find a given reference object. These matches are used to perform homography estimation between the reference object and the camera network to generate target trajectories. The decentralized solution, NetMatch, then augments the QuickMatch algorithm with a framework for distributing the features across multiple computational units while maintaining match consistency. The same framework could be used for other matching algorithms as well, however in this article we compare its results with the centralized solution to demonstrate near equivalence in terms of accuracy.

4.1. Feature extraction

Feature extraction aims to find and describe representative points from high-dimensional data, such as an image (Bay et al., 2008; Lowe, 2004; Rublee et al., 2011; Zhou et al., 2015). Features themselves are also high-dimensional vectors, but typically of much lower dimension than the original data. In this experiment, the SIFT feature is used, which extracts a set $K_{i}$ of 128-dimensional vectors that represent the appearance of each feature point. See Lowe (2004), and Vedaldi and Fulkerson (2008) for more details on this standard feature extraction algorithm. Other feature types can be used, and we also tested with ORB features and SURF, however SIFT was qualitatively the most reliable.

4.2. QuickMatch

The QuickMatch algorithm begins by calculating the distance between all features (here we use the Euclidean distance). For each image, the minimum distance, $σ_{i}$ , between any two features is used as the distinctiveness of features for that image. As defined above, $x_{ik}$ is a point in the high-dimensional feature space. The feature density $D (x_{ik})$ is then calculated for each point using the formula

D (x) = \sum_{ik \in K_{I}} h (x, x_{ik}; σ_{i})

(1)

h (x_{1}, x_{2}; σ_{i}) = \exp (- \frac{∥ x_{1} - x_{2} ∥}{2 σ_{i}^{2}})

(2)

σ_{i} = min_{\forall k \in i} d (x_{ik}, x_{ik'})

(3)

where h is a non-negative kernel function, and $σ_{i}$ is the distinctiveness of image i. With this feature density, the features are organized into a tree structure, with parent nodes being the nearest feature with a higher density,

T : parent (x_{ik}) = \underset{x_{i' k'} \in J}{\arg \min} d (x_{ik}, x_{i' k'})

(4)

J = {i' k' \in K_{I} : k \neq k', D (x_{i' k'}) > D (x_{ik})}

(5)

Edges are directed to parents along the gradient of feature density, and ultimately toward the center of the parent cluster or to another distant cluster. Once the tree has been constructed, edges are broken if either of two criteria are met (based on (C1)–(C3) above).

If parent and child groups have nodes from the same image (i.e., $i_{1} = i_{2}$ ).

If the edge is larger than a user-defined threshold ( $ρ$ ) times the distinctiveness $σ_{i}$ (i.e., $d (x_{ik}, parent (x_{ik})) \geq ρ σ_{i}$ ).

This method results in a forest of trees ( $M$ ), where each tree is a cluster ( $C_{c}$ ) representing a unique entity in the universe. In practice, each tree represents a point that is common among images, meaning the algorithm discovers common features among very similar objects. Feature discovery will be explored further in the results section, where groups of matching points are employed for object detection and homography transformations.

Algorithm 1. QuickMatch
Input: $K, ρ$
Output: Clusters $C_{c}$ ( $M$ )
for all $x_{ik}, x_{i' k'}$ do
Compute $h (x_{ik}, x_{i' k'}, σ)$
for all $x_{ik}$ do
Compute $D (x_{ik})$
for all $x_{ik}$ do
Compute $parent (x_{ik})$ to build $T$
for edges in T do ▹ From shortest to longest
$x_{ik} \in C_{c}, x_{i' k'} \in C_{c'}$ are ends of edge
if $C_{c} \cap C_{c'} = \emptyset$ and $d (x_{ik}, x_{i' k'}) \leq ρ \underset{σ}{\arg \min} (C_{c}, C_{c'})$ then
Merge $C_{c}, C_{c'}$
else
Remove edge
return $M$

4.3. NetMatch

The centralized QuickMatch algorithm is limited in the number of features it can process, because a single computational unit must handle all of the processing. This limitation motivates distributing the computation across many computational nodes in order to increase the number of features considered. A similar situation occurs when the data is acquired and naturally distributed across different nodes. To this end, we introduce the NetMatch algorithm, which distributes features over a network of computational nodes such that each can run QuickMatch (or another equivalent algorithm) on only a subset of the feature space.

NetMatch follows three steps.

Each node transmits its computed features in such a way that one node receives all the features belonging to a given partition of the feature space; for instance, in a one-dimensional feature space, if a node is assigned the partition element $[1, 5]$ , and another node had a feature $x = 3$ , the feature would be transmitted from the latter to the former.

Each node performs QuickMatch independently on all received features in its assigned partition, stopping, by default, at the tree-building phase (i.e., without breaking the trees into clusters).

Computational nodes identify clusters falling near the edges of the partition. We refer to such clusters as contested.

If necessary, the contested clusters are transferred to other nodes for re-processing.

At a high level, each node needs to process only a subset of the entire dataset, and, more importantly, each point is typically transferred only once, without having to flood the entire network with the entire dataset (additional transmissions are required only for edge cases, which are usually a minority, see also Table 1 for a quantitative analysis in our experiments). We present below the details of each step.

Table 1.

Comparison of QuickMatch with NetMatch by agent number on the Graffiti dataset

Number of agents	1	2	3	4	5	6	7	8	9	10	15	20	25
Compute time per agent (s)	1.37	21.53	24.62	25.64	27.72	27.93	29.40	29.06	29.29	30.00	31.21	30.59	30.68
Post-QP compute time per agent (s)	1.37	5.14	3.81	2.25	2.53	1.95	1.85	1.05	1.12	1.18	0.77	0.72	0.67
QP time per agent (s)	NA	16.39	20.81	23.39	25.18	25.98	27.53	27.46	27.76	28.59	30.08	29.82	30.00
Contested clusters (%)	0	18.21	28.62	27.51	21.43	29.24	35.42	46.34	40.13	39.78	42.87	48.71	52.16
Number of clusters found	1,320	1,313	1,314	1,278	1,265	1,264	1,281	1,258	1,269	1,264	1,246	1,259	1,309
Contested features missed (%)	NA	0.20	0.13	0.13	0.17	0.36	0.10	0.23	0.09	0.37	0.43	0.76	0.90

4.3.1. Feature space partitioning

The goal of partitioning the feature space is to split the computation load of QuickMatch among m agents, ultimately allowing for more features to be matched, while approximately maintaining multi-image match quality. We begin by partitioning the feature space and send features to their nearest region center for processing. This initial partition can split a cluster between multiple regions. Finding and merging these split clusters will be the focus of the following sections.

We modify the labeling function presented in the preliminary section to denote the round being considered as $ℓ (x_{ik}, r) \to a$ and $ℓ^{- 1} (a, r) \to {x_{ik}}$ . In practice, we consider only $r = {0, 1}$ , where $0$ is the initial partitioning, and $1$ is the labeling after partition reassignment because only one round is required by the NetMatch algorithm. The algorithm begins with a naive partitioning of the features

ℓ (x_{ik}, 0) = \underset{a \in A}{\arg \min} d (x_{ik}, P_{a}) \forall x_{ik}, ik \in K_{I}

(6)

where the set of Voronoi seeds $P_{A} = {P_{a}}_{a \in A}$ can be found using, for instance, an unsupervised k-means algorithm (MacQueen, 1967), either on a training set, or on the actual data with a distributed version of k-means (the latter, under our all-to-all communication model, can be implemented with a small amount of communications and, more importantly, without sharing all the data across all the nodes).

As an alternative, in larger datasets, and in a more distributed setting, the choice of $P_{A}$ can be done at random (using a common seed for a pseudorandom generator, meaning all of the nodes can agree on the same set of $P_{A}$ without explicit communication); however, the random partition seeding procedure can have an effect on the overall match quality (see also Figure 7 for a quantitative analysis with our experimental setup).

Example 1. Given a set of features shown in Figure 1 (in gray), we generate three Voronoi seeds, $P_{0}, P_{1}, P_{2}$ (in blue) as shown. The Voronoi partition boundary is represented by the dashed lines.

Fig. 1.

Illustration of Voronoi boundary distance problem.

4.3.2. Tree building

Once $P_{A}$ is determined, we send features to their respective agents based on $ℓ (x_{ik}, 0)$ , and at each agent. We then calculate the density of each feature as in Equation (1); unlike QuickMatch, however, we propose to explicitly require the use of a density kernel with finite support, in order to limit interaction of features across the partition boundaries. Specifically, in our implementation we use a truncated quadratic kernel,

h (x_{1}, x_{2}; σ) = {\begin{matrix} - \frac{{∥ x_{1} - x_{2} ∥}^{2}}{σ} + 1 & ∥ x_{1} - x_{2} ∥ < σ \\ 0 & otherwise \end{matrix}

(7)

Once the density is calculated, we build a tree $T_{a}$ for each agent $a \in A$ according to

T_{a} : parent (x_{ik}) = \underset{x_{i' k'} \in J}{\arg \min} d (x_{ik}, x_{i' k'})

(8)

J = {i' k' : k \neq k', D (x_{i' k'}) > D (x_{ik}), x_{ik} \in Y_{a}}

(9)

Similarly to the distinctiveness in QuickMatch, we now introduce an agent-specific distinctiveness metric, defined as

σ_{a} = max_{p \in K_{I} \cap ℓ^{- 1} (a)} (σ_{p}) \forall i \in ℓ^{- 1} (a, 0)

(10)

where

σ_{p} = d (x_{p}, parent (x_{p}))

(11)

(in practice, p indexes the features in a given partition).

We next use ${σ_{a}}$ to determine whether individual features may belong to clusters artificially split by the boundaries of the Voronoi partition, as detailed in the next step.

Note (NetMatch Lite). If the NetMatch algorithm is stopped at this point, and the trees built ( $T_{a}$ ) are broken based on the rules in QuickMatch above, this algorithm is denoted NetMatch Lite. We use this algorithm in our experiments to benchmark the improvements gained by evaluating contested features (i.e., the next steps) in the full NetMatch algorithm. The NetMatch Lite algorithm can be considered as naively running k-means to partition the feature space, and then running QuickMatch individually in each partition.

Example 1 (Continued). With the features (in gray) in Figure 1, three trees are constructed (one in each partition) based on the density of features. This means, in this example, the features nearest the seed points would have a higher density and would therefore be closer to the root of the tree.

4.3.3. Partition boundary distance

To find the minimum distance between a feature and the boundaries that are generated by the Voronoi partition, we formulate the following problem: Consider a set of root nodes ${P_{0}, \dots, P_{m}}$ and their corresponding regions ${Y_{0}, \dots, Y_{m}}$ , given a test node $x_{t} \in Y_{t}$ and evaluation node $P_{e} \in Y_{e}$ , where $Y_{e} \subset Y \ Y_{t}$ , find the minimum distance $d_{\min}$ , and the corresponding hyper-plane, to $x_{t}$ . A graphical illustration of this problem can be found in Figure 1.

Given the test feature node $x_{t}$ , its current root node $P_{t}$ and the root node that we want to evaluate $P_{e}$ , we formulate a quadratic program (QP) as follows:

\begin{matrix} min_{x \in R^{F}} & {(x_{t} - x)}^{T} (x_{t} - x) \\ s . t . & {(\frac{P_{e} - P_{t}}{∥ P_{e} - P_{t} ∥})}^{T} (x - P_{t}) - \frac{d (P_{t}, P_{e})}{2} \geq 0 \end{matrix}

(12)

where $d (P_{t}, P_{e})$ is the pairwise distance between $P_{e}$ and $P_{t}$ . We obtain the closet point $x_{\min}$ on the Voronoi boundary that has the minimum distance to $x_{t}$ . By iterating through all possible root nodes $P_{e}, \forall e \neq t$ , we find the global minimum distance and its corresponding root node and its partition index

d_{ika} = d (x_{ika}, x_{\min})

(13)

The boundary distance calculation could be simplified by considering only the partition boundary between the two nearest points in $P_{A}$ since the boundary is a Voronoi partition (however, this optimization is not present in our current implementation for the experiments in Sections 5 and 6).

Example 1 (Continued). Feature $x_{t}$ in Figure 1 is singled out to determine its minimum distance to a boundary. That distance, $d_{\min}$ is shown, along with the nearest point on the boundary $x_{\min}$ , which is used for computing $d_{ika}$ .

4.3.4. Contested feature re-assignment

Once we are able to determine the distance between a feature and the partition boundary, we can then determine the contested features in each region. These features are those that may require reassignment to another partition’s tree and are in danger of being mismatched due to the feature space partitions splitting their cluster. This contested set of features $S_{a}$ is defined as

\begin{matrix} S_{a} = {a' \in A \ a : (d_{ika} + d_{aa'}) < σ_{p}} \\ \forall x_{i, k} \in ℓ^{- 1} (a, r) \forall a \in A \end{matrix}

(14)

where

d_{aa'} = min_{x_{ik} \in ℓ^{- 1} (a', 0)} d_{ika'}

(15)

\begin{matrix} d_{ika'} = d (x_{ika'}, Y_{a}) \\ a' \neq ℓ (x_{ik}, 0) \end{matrix}

(16)

where $d_{ika}$ is the distance between each feature $x_{ika}$ and the boundary of the partition of $a'$ , and $d_{aa'}$ is the minimum distance between the feature in $a'$ and the $Y_{a}$ boundary.

Intuitively, if the inequality in Equation (14) is true, there could be a feature in $a'$ that could belong to the same cluster as $x_{ik}$ . Note that this test is conservative, because it considers a worst-case analysis, but it also requires only a single value $d_{aa'}$ be communicated between agents. Figure 2 illustrates an example of this check; in this case, $x_{ika}$ is not contested with $a'$ because the inequality in Equation (14) is false (i.e., $(d_{ika} + d_{aa'}) \geq σ_{p}$ ).

Fig. 2.

Example of a comparison to determine whether a point is contested.

Example 1 (Continued). For each feature, we check the inequality $(d_{ika} + d_{aa'}) \geq σ_{p}$ , which indicates whether there could be a nearer point on the other side of the boundary than to the feature’s current parent feature. This comparison is illustrated in Figure 2.

Once the contested features are determined, we break the agent tree into a forest of trees (clusters) in the same manner as QuickMatch, except that the distinctiveness $σ_{i}$ is calculated over only the features in the agent’s own partition.

With the clusters formed, each agent then determines which clusters have contested points. Formally, we check if $S_{a} \cap C_{c} \neq \emptyset$ (i.e., the intersection between a cluster and the contested points is not empty). For each cluster with a contested point we determine the minimum partition index among the contested points, and then send the cluster to that agent, now denoted as $a'$ .

Upon arrival at $a'$ , that agent checks whether

(\underset{x_{ik} \in a'}{\arg \min} d (x_{ik}, x_{i' k'}) \forall x_{i' k'} \in C_{c}) \in C_{a'}

(17)

where $C_{c}$ is the newly acquired cluster. If the nearest point to $C_{c} \in Y_{a'}$ is also contested, both $C_{c}$ and the cluster containing that nearest point from Equation (17) ( $C_{c'}$ ) are sent to the agent with the lowest contested partition index. This lower index requirement prevents switching incomplete clusters repeatedly between agents. This process is performed in the order of decreasing agent index, reducing communication requirements, and the transfer of clusters to at most once per agent.

Example 1 (Continued). Once the partition tree is broken into a forest, for each new cluster tree we check whether any features in that cluster violated the $(d_{ika} + d_{aa'}) \geq σ_{p}$ inequality. If any feature has, the cluster is marked for transfer and sent to the nearest agent of lower index. Each agent will do the same, meaning the transferred features can only go to a lower-indexed agent.

Upon receiving a new cluster, an agent checks whether the cluster contains any points that it would consider contested. If so, the agent sends the whole cluster to the nearest agent of lower index, if one exists.

The final step in reassignment is to rerun the tree building and breaking routines from above to rebuild all of the clusters. Although this is computationally intensive, it does not require any further inter-agent communication. It was also found to perform better than only reassigning the contested points to the appropriate tree and breaking it accordingly. For a simple example of the complete NetMatch framework, see the Appendix.

5. Simulations

This article looks to compare primarily QuickMatch with NetMatch in simulation, and with off-the-shelf matching tools in an experimental setting. For a detailed comparison of the performance of QuickMatch with other state-of-the-art matching methods (particularly multi-image matching methods), see Tron et al. (2017).

We begin with an illustrative example of NetMatch on a synthetic two-dimensional dataset shown in Figures 3 and 8. Our test case includes 4 agents and 250 feature points. There are 25 underlying clusters, generated with random Gaussian distributions around 25 evenly spaced points, each with 10 sample features. Figure 3 illustrates many of the steps in Algorithm 2. Figure 3(a) shows the $ℓ^{- 1} (a, 0)$ labels for all of the agents. The Voronoi partition seeds for the partition are shown as triangles. Note that the clusters along the boundaries have features of different colors, meaning these clusters would be split and improperly matched with just the naive partitioning approach.

Fig. 3.

(a) Synthetic dataset features with naive k-means partition assignments. (b) Synthetic dataset with $x_{\min}$ boundary points plotted to highlight partition. (c) QuickMatch solution for each agent with k-means partition. (d) Initial proposed feature switches of contested clusters. Note the values on the highlighted features denote the proposed $a'$ partition index. (e) Final proposed switch assignments after partitioning section of NetMatch is complete.

The partition generated by the Voronoi seeds can be seen in Figure 3(b). Here, the $x_{\min}$ point is plotted for each feature. It can be seen that at least three of the clusters are split by this partition. This is further shown in Figure 3(c), which shows the result of QuickMatch being run on each agent partition individually. Each marker style represents a different cluster membership (i.e., points with the same marker belong to the same cluster). Most notably, the central cluster is split into three clusters.

Algorithm 2. NetMatch
Input: $K, ρ, \| A \|$
Output: Clusters $C_{c}$
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Compute $P_{A}$ given K
Compute $ℓ (x_{ik}, 0)$ ▹ Equation (6)
for all $a \in A$ do
for all $x_{ika}, x_{i' k' a}$ do
Compute $h (x_{ika}, x_{i' k' a}, σ_{i})$ ▹ Equation (7)
for all $x_{ika}$ do
Compute $D (x_{ika})$ ▹ Equation (1)
for all $x_{ik}$ do
Compute $parent (x_{ika})$ to build $T_{a}$ ▹ Equation (8)
for all $x_{ik}$ do
Compute $d_{ika}$ ▹ Equation (13)
for all $a \in A$ do
for all $a' \in A, a' \neq a$ do
Compute $d_{aa'}$ ▹ Equation (15)
for all $x_{ika}$ do
if $d_{ijk} + d_{aa'} \leq σ_{p}$ then
$x_{ika} \in S_{a}$ ▹ Equation (14)
for all edges in $T_{a}$ do
if $C_{c} \cap C_{c'} = \emptyset$ and $d (x_{ika}, x_{i' k' a}) \leq ρ \underset{σ}{\arg \min} (C_{c}, C_{c'})$ then
Merge $C_{c}, C_{c'}$
else
Remove edge from $T_{a}$
for all $C_{c} \in Y_{a}$ do
if $C_{c} \cap S_{a} \neq \emptyset$ then
Compute $min index (a')$ of $C_{c} \cap S_{a}$
if $index (a') < index (a)$ then
Send $C_{c}$ to $a'$
for all $a \in A$ do ▹ From high to low {\rm index}
if $C_{c}$ received from any $a'$ then
if $(\underset{x_{ika} \in a}{\arg \min} d (x_{ik}, x_{i' k'}) \forall x_{i' k'} \in C_{c}) \in S_{a}$ then
Compute $min index (a')$ of $C_{c} \cap S_{a}$
if $index (a') < index (a)$ then
Send $C_{c}$ to $a'$
for all $a \in A$ do
for all $x_{ika}, x_{i' k' a}$ do
Compute $h (x_{ika}, x_{i' k' a}, σ_{i})$ ▹ Equation (7)
for all $x_{ika}$ do
Compute $D (x_{ika})$ ▹ Equation (1)
for all $x_{ik}$ do
Compute $parent (x_{ika})$ to build $T_{a}$ ▹ Equation (8)
for all edges in $T_{a}$ do
if $C_{c} \cap C_{c'} = \emptyset$ and $d (x_{ika}, x_{i' k' a}) \leq ρ \underset{σ}{\arg \min} (C_{c}, C_{c'})$ then
Merge $C_{c}, C_{c'}$
else
Remove edge from $T_{a}$
return All $C_{c} (M)$

Figure 3(d) shows the clusters staged for initial agent switching. Each feature that is switched has a label of where it is being sent. Note that the central cluster has multiple labels, meaning even after the switch, the cluster will be segmented. In addition, note that whole clusters are staged for transfer, and that the contested region is very conservative, since clusters even somewhat far from the boundary are being swapped. After the agents are switched, the agents check whether the switched clusters are nearest to other contested points. Figure 3(e) shows the reassignment of the clusters after this check is performed. Note that after the first swap, the points in the central cluster are all moving toward $a_{0}$ . This highlights one drawback to this approach, if the boundaries are drawn such that higher index agents have many contested clusters, the lower index agents end up getting assigned more features.

6. Experiments

6.1. Homography and localization

Homography is a projective transformation between two perspective images of a planar scene that can also be used to determine the relative pose of an object with respect to a given reference image. The study of homography and localization of images and objects is textbook material; however, a brief overview is provided below as this process is used to experimentally test QuickMatch’s performance. More information on both homography and object localization can be found in Hartley and Zisserman (2017) and Sankaranarayanan et al. (2008).

Given the image coordinates $\tilde{x}$ (expressed in homogeneous coordinates) of a point belonging to a planar surface as seen in a reference view, the image of the same point in a novel view can be found given the homography matrix H as $H \tilde{x}$ (again, expressed using homogeneous coordinates). The $H$ matrix can be estimated with a set of known relative points (or matched features) between two views. To improve the estimate of $H$ , random sample consensus (RANSAC) is used to remove match outliers by randomly sampling the matches, finding a fit of the data, and then removing any matches that fall outside of a user defined region (see Szeliski, 2010).

Given $H$ , it is also possible to recover the relative pose between the two views; this can then be used to indirectly localize (recover the translation and rotation) of an approximately planar object in a relative coordinate system up to a distance scale factor, as shown in Figure 4(a). Given some known size of the target object (e.g., height), the scale ambiguity can be resolved, recovering the full object relative position. In our applications, where each image is taken together with images from other cameras in the network, and where the pose of each camera is known, the target object can be accurately positioned in the global reference frame, allowing for the generation of a target’s trajectory (e.g., Figure 7).

Fig. 4.

(a) Overhead view of experimental area with trajectory of the target object, position of the robots, and the approximate field of view for the camera network (shown in yellow). (b) Prospective view of experimental area with modified iRobot Create2 platform, target object, and overhead OptiTrack motion capture system.

As localization using homography is limited to approximately planar surfaces, multiple reference images (as used here) are required to identify different sides of an object. Second, despite the use of robust estimation (RANSAC), inaccurate matches are still possible, resulting in outlier measurements in distance and bearing. These inaccuracies are amplified by the sensitivity to errors in the estimate of the object’s height when calculating the target distance. To account for these errors, in practice, multiple measurements can be used to estimate each position, and then a filter can be used to smooth the target’s trajectory (e.g., a Kalman filter).

6.2. Experimental setup

The experiment consists of a team of five iRobot Create2 ground robots, each with a forward-facing camera, distributed throughout the experimental area shown in Figure 4. Each camera has a $62 \times 48$ field of view, and takes a $640 \times 480$ px image at 2 Hz. Through the center of the area, the target object is driven along the trajectory shown in Figure 4(a) over approximately 30 seconds. All cameras are triggered simultaneously and the images are sent to a central computer for feature extraction and matching. The central computer has an Intel i7-7800x 3.5 GHz processor, and runs Ubuntu 16.04 LTS and ROS Kinetic. Features are extracted using SIFT with an octave layer of 6, a contrast threshold of 0.10, an edge threshold of 15, and sigma of 1.0. The matches from QuickMatch (using $ρ = 1.1$ ) and NetMatch (using $ρ = 1.1$ and $m = 6$ ) are used to determine which cameras observe the target object at each time step, based on the number of matches with a target image (in this experiment 10 matches are required). The matches between each reference images and the current images are used to determine the homography between them, using RANSAC with a threshold of 10.0. The homography is used to generate a bounding box around the target object using a perspective transformation on the target image corners. The relationship between pixel height of this box and distance from the camera is calibrated beforehand using an object of known size (in this case a checkerboard pattern of known dimensions). The localization points are recorded to build a target trajectory, which is then compared with ground-truth measurements from an OptiTrack motion capture system (Figure 4(b)). In an offline setting, the recorded images are fed into the NetMatch algorithm to compare its performance to QuickMatch in the experimental case. Once the feature matches are generated, the same pipeline is used from above to generate target positions.

7. Results

QuickMatch and NetMatch are evaluated in two ways: pure matching performance, and in the context of a target localization application. The algorithms are first compared with standard matching algorithms in the OpenCV Software Package (Bradski, 2000), BF, and FLANN. These algorithms use the Euclidean distance metric and a threshold match distance of 0.75 (Bradski, 2000; Lowe, 2004). Unlike QuickMatch and NetMatch, these algorithms cannot consider matches across more than two images but do have very low execution times.

QuickMatch is implemented in Python and takes 5.6 seconds to find matches between 6,254 SIFT features (from 115 images), whereas BF and FLANN are both implemented in C++, and both take approximately 0.05 seconds to find the matches between the reference image features, and the same 6,254 features. This time difference arises from two factors: the inherently slower run time of Python compared with C++ (Fourment and Gillings, 2008), and the extra comparisons done by QuickMatch to solve the entire multi-match problem. If BF and FLANN compared all images with all other images combinatorially (as QuickMatch implicitly does) their computation times would be $~ 5.75$ s, which is comparable with QuickMatch’s slower Python implementation. This time also does not account for the post-processing time necessary to reconcile inconsistent matches from both BF and FLANN, which is not required in QuickMatch. NetMatch’s computation time is typically one to two orders of magnitude larger than QuickMatch because it is currently being run as a single thread process.

7.1. Precision versus recall

Although QuickMatch and NetMatch are slower, they outperform both BF and FLANN in the number of matches correctly found, and generally in terms of precision versus recall (PR) and precision–recall area under the curve (PRAUC), which are common metrics for evaluating matching algorithms (Ting, 2011). Figure 6(a) shows the precision (fraction of correctly matched images) versus recall (fraction of possible matches found) curves for QuickMatch, NetMatch, BF, and FLANN. For any recall level, QuickMatch and NetMatch maintain a higher precision level than either BF or FLANN. Both BF and FLANN have terminations before a recall of 0.9 because at that level of discrimination, they are unable to find any matches in the data. QuickMatch and NetMatch on the other hand are still able to find some matches. These curves are non-monotonic because mismatched features appear at a higher rate than correctly matched features at higher thresholds. Note that NetMatch has higher precision than QuickMatch at lower recall levels; we hypothesize that this is due to the fact that the introduction of partitions in the dataset reduces the interference between modes of different clusters in the feature density. At higher recalls, however, the partitioning limits its ability to find all of the possible matches. PRAUC is a threshold agnostic metric used for comparing overall performance of matching algorithms (Ting, 2011). In terms of PRAUC, QuickMatch achieves 0.64, whereas BF and FLANN reach 0.49 and 0.45, respectively. The overall increase in precision stems for QuickMatch’s ability to consider more instances of the reference object, by matching cycles of features across multiple images. It is therefore able to find the reference object not only more consistently, but with many more matched features. An example of these matches is shown in Figure 5.

Fig. 5.

(a) Example image matches between the reference object image (left) and an experimental image (right): Circles represent features, and lines indicate matches. (b) Homography and localization of car with prospective transform of bounding box.

NetMatch and NetMatch Lite (see the note in Section 4.3.2 for details) are also evaluated on the same dataset as described previously. There are variables in NetMatch (random seed positions and number of agents) that result in a variable PR curve and therefore PRAUC. Figure 6(b) shows the effect of these variables on the PRAUC curve. First, note that as the number of computational agents increases, the PRAUC for NetMatch decreases slightly due to losses from split clusters. Second, note that the error bars (which represent one standard deviation of PRAUC over 10 trials with different partition seeds) show that the random seed position has a large effect on match quality. Specifically, a good partition can increase the PRAUC by as much as 13% in the best case and conversely a poor partition can decrease the PRAUC by as much as 11% compared with QuickMatch. The NetMatch Lite formulation (removing the contested feature swap) does not greatly reduce PRAUC (on average 3.5%) because many matches are still found (since the percentage of contested clusters is small). However, the match quality of NetMatch Lite is shown in the following to be much worse than NetMatch in an experimental setting (see Section 6). In addition, the standard deviations show that NetMatch Lite has a generally lower PRAUC and may be more susceptible to poor Voronoi seeding.

Fig. 6.

(a) Precision versus recall curves for the QuickMatch, NetMatch ( $m = 6$ ), BF, and FLANN algorithms. All algorithms are run on the same feature vectors. A match is considered to exist if the number of matched features is above a threshold. NetMatch is shown using six computation units. (b) PRAUC calculation for NetMatch and NetMatch Lite with increasing computation unit distribution. Error bars represent one standard deviation values over 10 trials.

7.2. Homography and localization

In order to further demonstrate the utility of the algorithms, matches are used to localize a target object in relation to the camera network, and then estimate its global trajectory. This was done using all of the above algorithms with again an identical set of SIFT features. QuickMatch and NetMatch consider multi-image matches between the set of target images and the set of five robot images at each time step, whereas BF and FLANN consider matches between each target image and the robot image individually. Once feature matches are generated, RANSAC is used to estimate the homography matrix $H$ for each pair of images while also removing outliers from the matches. The homography between the reference image and each robot image is used to generate a bounding box around the target in the robot image as shown in Figure 5 (b). This bounding box, given a known camera calibration, provides bearing and height information for the target. The target height is known and is used to find the relate distance to the target with the bounding box height. With these two values, a distance and a bearing, the object can be localized with respect to each robot.

These steps are performed using the match data from each of the above algorithms. Figure 7(a)–(e) show the results of the localization estimation for each algorithm. Red points are estimate target poses for each time step, blue points denote the ground truth measurements, black octagons are the camera positions, and the green regions are the one standard deviation error between all localization estimates at each time step. The localization error was found by taking the absolute distance between the estimated and ground-truth position at each time step. The average error is partly a factor of object height estimation, and the variance is indicative of the match quality. QuickMatch had an error of $0.2118 \pm 0.4254$ meters, BF had an error of $0.2349 \pm 0.4027$ meters, NetMatch had an error of $0.1167 \pm 0.8461$ meters, FLANN had an error of $0.6232 \pm 1.1722$ meters, and NetMatch Lite had an error of $0.1324 \pm 1.7818$ meters. QuickMatch outperforms NetMatch, NetMatch Lite, and FLANN in terms of variance, which is indicative of its higher match quality. The BF matcher also performs well and maintains a low variance, but does not find as many matches. FLANN is the worst performing of the standard algorithms, and has a number of extremely erroneous estimates. In general, monocular camera distance measurements are very sensitive to match errors, meaning target localization error is an indirect method for testing the overall accuracy of each method. Figure 7(c) shows a histogram of the localization error, which is found by comparing the localization estimate to the ground-truth pose at each time step. The histogram makes it clear that QuickMatch maintains a higher number of accurate matches and has a small number of highly erroneous estimates. It also shows that NetMatch has a large number of estimates around zero, and that NetMatch Lite does very poorly when estimating the target object’s position. In practical applications, a Kalman filter would be employed to smooth the estimates, but the values are left unaltered here to demonstrate the algorithm’s output.

Fig. 7.

(a) QuickMatch trajectory estimates. (b) NetMatch trajectory estimates with six agents. (c) Histogram of estimate error for each algorithm. (d) FLANN trajectory estimates. (e) BruteForce trajectory estimates. (f) NetMatch Lite trajectory estimates.

7.3. Centralized versus distributed comparison

To test the difference between the distributed and centralized approaches, we look at how many clusters are split by naive partitions in the feature space to determine whether the distribution scheme above is even warranted. This test is performed on images from the Graffiti dataset (http://www.robots.ox.ac.uk/~vgg/data/data-aff.htm), in order to simulate realistic conditions where the ground truth is unknown. Given the clusters of features determined by QuickMatch, $C_{c} \in M$ , we define the count of each agent membership in the cluster as

q_{a} (C_{c}) = | C_{c} \cap ℓ^{- 1} (a) |

(18)

where $q_{a}$ is the number of features in $C_{c}$ with label a. With this we can define the split quality Q of a cluster $C_{c}$ as

Q (C_{c}) = \frac{max_{a \in A} (q_{a})}{| C_{c} |}

(19)

With this quality metric, we can quantify the number of contested clusters in a given partition as $C_{c} : Q (C_{c}) < 1$ and the percent of contested clusters as

p_{contested} = \frac{| C_{c} \in M : Q (C_{c}) < 1 |}{| C_{c} |}

(20)

With this metric $p_{contested}$ , we evaluate both two methods for creating the initial seeds for the Voronoi partition, as well as the ability for NetMatch to find and appropriately reassign contested clusters. For the synthetic dataset, the ground truth is known, however for the Graffiti dataset, we assume that the QuickMatch clusters are the ground truth.

To determine how many contested features, and hence split clusters, are missed by NetMatch, we calculate the following precision

p_{split} = \frac{| S_{A} |}{\sum_{C_{c} : Q (C_{c}) < 1} | C_{c} |}

(21)

where $S_{A}$ is the set of all contested features in $K_{I}$ . In other words, we can consider NetMatch to be in part as a classifier that needs to detect which features are contested; then, $p_{split}$ represents the precision of the classifier (the number of features that are declared as contested over the number of features that ought to be declared).

The results of these tests are listed in Table 1. The first column in Table 1 lists the results of the centralized QuickMatch algorithm. Row five shows that as the number of agents increases, intuitively, the percentage of contested clusters also increases. At the same time, however, the post-QP computation time decreases as agent number decreases, because each agent has to work on considerably fewer features. The largest computational requirement in NetMatch is finding the boundary distances with the QP. The time reported for this is per-agent, however it is worth noting the implementation of this QP is sub-optimal. In the QP formulation, we consider each partition individually, meaning we solve m QPs for each feature. In the future, we plan to reduce this to combining all of the constraints to solve a single QP and this is a focus of future work. One key aspect of NetMatch to note is that it finds around 99% of all contested points, meaning it is very good at finding split clusters. One interesting result of NetMatch is that it matches the features into fewer, larger clusters owing to the use of its finite density kernel. This is a counter-intuitive result and will be a subject of future study for this algorithm.

The final clustering results from the synthetic dataset in the Simulation section are shown in Figure 8. Figure 8(a) shows the result of the centralized QuickMatch algorithm run on the synthetic data, whereas Figure 8(b) shows the final result from NetMatch. Note that many of the features from the higher index agents have been shifted to the lower index agents, but ultimately each cluster has features belonging to only one agent. Ultimately both algorithms are able to cluster all of the features correctly.

Fig. 8.

(a) Clustering result from the centralized QuickMatch algorithm on the synthetic dataset. Feature color denotes cluster membership. (b) Clustering result from the NetMatch algorithm on the synthetic dataset. Feature color denotes agent membership.

7.4. Feature discovery

The QuickMatch and NetMatch algorithms implicitly discovers common features among images by creating clusters of similar features. These clusters correspond to specific locations in the universe, and therefore can be used to find both targets and landmarks across images. Landmarks, although not used in this article, are points that occur commonly across all images (except when occluded), and are useful for multi-agent localization tasks. In the experimental images collected, landmarks were the clusters with the largest number of features, because many of this images did not contain the target object. An example landmark cluster is shown in Figure 9(a). Features belonging to the target object are generally smaller than the landmark clusters, but can still be extracted, and show key features of the target. Figure 9(b) shows one such cluster, which is the front hood of the car model. Feature discovery is one attribute of QuickMatch and NetMatch that does not exist in either BF or FLANN and can be useful for discerning what features are most descriptive of images from the network.

Fig. 9.

(a) Landmark feature cluster. (b) Target feature cluster.

8. Conclusion

This article has highlighted the utility of QuickMatch multi-image matching for object matching and presented the NetMatch algorithm. QuickMatch was able to find many more object feature matches than standard methods by considering matches across all images, not just pairwise matches. The presented experiment has tested the QuickMatch algorithm in an experimental setting with realistic conditions, and shown that multi-image matching is superior to standard methods at matching the reference object (even as it enters and exits images across the entire camera network). QuickMatch has also been tested with a target object localization and again outperforms both the BF and FLANN algorithms. Beyond testing QuickMatch, we have demonstrated the capabilities of NetMatch as a distributed solution to the same problem. We have also demonstrated QuickMatch’s feature discovery ability by showing a characteristic landmark and target feature cluster from the test images. This approach is the precursor to an online and decentralized approach. Our future work will focus on the online version of object discovery and localization and multi-camera homography. We also plan to decrease NetMatch’s QP implementation computation time. Overall, QuickMatch has been shown to be a versatile multi-feature matching algorithm that outperforms standard pairwise matching algorithms, and NetMatch offers an avenue for the QuickMatch framework to handle a large volume of features with minimal inter-agent communication.

Footnotes

Appendix

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation (Grant Nos. NRI-1734454 and IIS-1717656).

ORCID iD

Zachary Serlin

Supplemental material

Supplemental material for this article is available online.

References

Andre

Kermarrec

Le Scouarnec

(2017) Accelerated nearest neighbor search with quick ADC. In: The 2017 ACM on International Conference on Multimedia Retrieval. New York: ACM Press, pp. 159–166.

Aragues

Montijano

Sagues

(2011) Consistent data association in multi-robot systems with limited communications. In: Robotics: Science and Systems, pp. 97–104.

Bay

Ess

Tuytelaars

Gool

(2008) Speededup robust features (SURF). Computer Vision and Image Understanding 110(3): 346–359.

Bradski

(2000) The OpenCV Library. Dr. Dobb’s Journal of Software Tools. https://github.com/opencv/opencv/wiki/CiteOpenCV

Chen

Guibas

Huang

(2014) Near-optimal joint object matching via convex relaxation. In: International Conference on Machine Learning.

Cunningham

Wurm

Burgard

Dellaert

(2012) Fully distributed scalable smoothing and mapping with robust multi-robot data association. In: IEEE International Conference on Robotics and Automation, pp. 1093–1100.

Dollar

Zitnick

(2013) Structured forests for fast edge detection. In: International Conference on Computer Vision.

Ester

Kriegel

Sander

(1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. Portland, OR: AAAI Press, pp. 226–231.

Fourment

Gillings

(2008) A comparison of common programming languages used in bioinformatics. BMC Bioinformatics 9: 82.

10.

Fukunaga

Hostetler

(1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory 21(1): 32–40.

11.

Garcia

Debreuve

Nielsen

Barlaud

(2010) K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching. In: IEEE International Conference on Image Processing, Hong Kong, 2010, pp. 3757–3760.

12.

Gieseke

Heinermann

Oancea

Igel

(2014) Buffer k-d trees: Processing massive nearest neighbor queries on GPUs. In: 31st International Conference on International Conference on Machine Learning (ICML’14), Vol. 32. JMLR.org I-172–I-180.

13.

Hariharan

Arbelaez

Girshick

Malik

(2015) Hyper-columns for object segmentation and fine-grained localization. In: IEEE Conference on Computer Vision and Pattern Recognition.

14.

Hartley

Zisserman

(2017) Multiple View Geometry in Computer Vision. 2nd ed. Cambridge: Cambridge University Press.

15.

Huang

Thibert

Guibas

(2018) Distributable consistent multi-object matching. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 2463–2471.

16.

Huang

Qiu

Chen

Chu

(2017) PS-DBSCAN: An efficient parallel DBSCAN algorithm based on Platform of AI (PAI). arXiv preprint arXiv:1711.01034.

17.

Huang

Guibas

(2013) Consistent shape maps via semidefinite programming. Computer Graphics Forum 32(5): 177–186.

18.

Johnson

Douze

Jégou

(2019) Billion-scale similarity search with GPUs. IEEE Transactions on Big Data. https://ieeexplore.ieee.org/document/8733051

19.

Leonardos

Zhou

Daniilidis

(2017) Distributed consistent data association via permutation synchronization. In: IEEE International Conference on Robotics and Automation (ICRA), Singapore, 2017, pp. 2645–2652.

20.

Lowe

(2004) Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2): 91–110.

21.

MacKay

(2003) Information Theory, Inference and Learning Algorithms. Cambridge: Cambridge University Press.

22.

MacQueen

(1967) Some methods for classification and analysis of multivariate observations. In: The Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley, CA: University of California Press, pp. 281–297.

23.

Montijano

Cristofalo

Zhou

Schwager

Sagues

(2016) Vision-based distributed formation control without an external positioning system. IEEE Transactions on Robotics 32(2): 339–351.

24.

Muja

Lowe

(2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(11): 2227–2240.

25.

Jordan

Weiss

(2002) On spectral clustering: Analysis and an algorithm. Neural Information Processing Systems 2: 849–856.

26.

Novotny

Larlus

Vedaldi

(2017) Anchornet: A weakly supervised network to learn geometry-sensitive features for semantic matching. In: IEEE International Conference on Computer Vision and Pattern Recognition.

27.

Oliveira

Costeira

Xavier

(2005) Optimal point correspondence through the use of rank constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 1016–1021.

28.

Pachauri

Kondor

Singh

(2013) Solving the multi-way matching problem by permutation synchronization. In: Twenty-seventh Conference on Neural Information Processing Systems.

29.

Parzen

(1962) On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3): 1065–1076.

30.

Rosenblatt

(1956) Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27(3): 832–837.

31.

Rublee

Rabaud

Konolige

Bradski

(2011) ORB: An efficient alternative to SIFT or SURF. In: IEEE International Conference on Computer Vision, Barcelona, Spain, pp. 2564–2571.

32.

Sankaranarayanan

Veeraraghavan

Chellappa

(2008) Object detection, tracking and recognition for multiple smart cameras. Proceedings of the IEEE 96(10): 1606–1624.

33.

Serlin

Sookraj

Belta

Tron

(2018) Consistent multi-robot object matching via QuickMatch. In: The International Symposium of Experimental Robotics.

34.

Szeliski

(2010) Computer Vision: Algorithms and Applications. Springer Science & Business Media.

35.

Ting

(2011) Precision and recall. In: Sammut

Webb

(eds) Encyclopedia of Machine Learning. Boston, MA: Springer.

36.

Tron

Zhou

Esteves

Daniilidis

(2017) Fast multi-image matching via density-based clustering. In: The IEEE International Conference on Computer Vision.

37.

Vedaldi

Fulkerson

(2008) VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/.

38.

Vedaldi

Soatto

(2008) Quick shift and kernel methods for mode seeking. In: IEEE European Conference on Computer Vision. Berlin: Springer, pp. 705–718.

39.

Wang

Zhou

Daniilidis

(2018) Multi-image semantic matching by mining consistent features. In: IEEE International Conference on Computer Vision and Pattern Recognition.

40.

Warn

Emeneker

Cothren

Apon

(2009) Accelerating SIFT on parallel architectures. In: IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, 2009, pp. 1–4.

41.

Yan

Cho

Zha

Yang

Chu

(2015a) Multi-graph matching via affinity optimization with graduated consistency regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://ieeexplore.ieee.org/document/7254216

42.

Yan

Wang

Zha

Yang

Chu

(2015b) Consistency driven alternating optimization for multigraph matching: A unified approach. IEEE Transactions on Image Processing 24(3): 994–1009.

43.

Zhou

Zhu

Daniilidis

(2015) Multi-image matching via fast alternating minimization. In: The IEEE International Conference on Computer Vision.