Robot@Home,a robotic dataset for semantic mapping of home environments

Abstract

This paper presents the Robot-at-Home dataset (Robot@Home), a collection of raw and processed sensory data from domestic settings aimed at serving as a benchmark for semantic mapping algorithms through the categorization of objects and/or rooms. The dataset contains 87,000+ time-stamped observations gathered by a mobile robot endowed with a rig of four RGB-D cameras and a 2D laser scanner. Raw observations have been processed to produce different outcomes also distributed with the dataset, including 3D reconstructions and 2D geometric maps of the inspected rooms, both annotated with the ground truth categories of the surveyed rooms and objects. The proposed dataset is particularly suited as a testbed for object and/or room categorization systems, but it can be also exploited for a variety of tasks, including robot localization, 3D map building, SLAM, and object segmentation. Robot@Home is publicly available for the research community at http://mapir.isa.uma.es/work/robot-at-home-dataset .

Keywords

Benchmark contextual information domestic robots home environment mobile robots object categorization/recognition robotic dataset room categorization/recognition semantic mapping

1. Introduction

The extraction and representation of semantic knowledge of the world is a crucial step toward achieving intelligent robots (Pronobis et al., 2010). Semantic maps enrich traditional metric and topological maps with high-level information, which enables the robot to process commands such as “go to the bedroom and stop the alarm” (Galindo and Saffiotti, 2013). In this way, the robot has to create and manage its own internal representation of the world incorporating the needed semantic knowledge, for example, this room is a bedroom and contains an alarm clock placed on a night table. Two major problems arise in the extraction of this information: object categorization, that is, the labeling of parts of the robot sensory data as belonging to a certain object class (bed, night table, alarm clock, etc.), and room categorization, that is the classification of areas of the environment as rooms of a certain type (kitchen, bedroom, living room, etc.).

In order to cope with this categorization problem¹ a large number of sample data are needed to test, validate and compare different solutions. Considering this, the research community has released a number of public repositories on the internet, e.g. PASCAL (Everingham et al., 2010), NYUv2 (Silberman et al., 2012), ImageNet (Russakovsky et al., 2014), and SUN3D (Xiao et al., 2013). However, these datasets exhibit shortcomings when used by cutting-edge categorization techniques leveraging contextual information (Anand et al., 2013; Ruiz-Sarmiento et al., 2015b). Synthetic data could be used instead under specific circumstances (Ruiz-Sarmiento et al., 2015a), albeit that real sensory datasets are preferred in most cases.

In this work we present the Robot-at-Home dataset (Robot@Home), a compilation of raw and processed data gathered by a mobile robot in different domestic settings. This dataset is unique in three aspects: the sensory system employed for its gathering, the diversity and amount of provided data, and the availability of dense ground truth information. Data collection followed a place-centric perspective (Xiao et al., 2013), and comprises 87,000+ timestamped observations as sequences of RGB-D images and 2D laser scans taken in five apartments. These raw data fully cover the common challenges to be faced by a robotic categorization system, such as changing lighting conditions, occlusions, viewpoint variations, or cluttered room layouts. On the other hand, the processed data include:

per-pixel labeling (ground truth information) of every RGB-D observation, along with the category of the room containing them;

3D reconstructions in the form of colored point maps and 2D geometric maps of the inspected rooms;

per-point object labeling of the 3D reconstructed rooms along with their room category;

topology of each apartment, stating the connectivity of the rooms within them.

During the data collection, a total of 36 rooms were completely inspected, so the dataset is rich in contextual information of objects and rooms. This is a valuable feature, missing in most of the state-of-the-art datasets, which can be exploited by, for instance, semantic mapping systems that leverage relationships such as pillows are usually on beds or ovens are not in bathrooms. Robot@Home is publicly available and is accompanied with the software application employed for its processing, named the Object Labeling Toolkit (OLT) (Ruiz-Sarmiento et al., 2015c).

The sensory system comprises a rig of four RGB-D cameras and a radial laser scanner (see Figure 1). The rig covers ~180^∘ horizontally and ~58^∘ vertically, which permits the user to simulate the performance of sensors with different field of views, a valuable feature during the design of a robotic sensing system (de la Puente et al., 2014). Sensors have been intrinsically and extrinsically calibrated with state-of-the-art algorithms (Fernandez-Moral et al., 2014; Gómez-Ojeda et al., 2015; Teichman et al., 2013). It is worth mentioning that a number of distinctive patterns and objects have been strategically added to the apartments for possible exploitation of the dataset in robotic competitions, such as those in RoboCup@Home (Almeida et al., 2016) or RobotVision (Martinez-Gomez et al., 2014), where robots need to detect predefined patterns in the environment to accomplish certain challenging missions: to explore specific areas, to efficiently find a particular object, and so on.

Fig. 1.

Robotic platform employed to collect the dataset along with details of the sensors mounted on it.

In summary, this dataset contributes a repository suitable for a variety of robotic tasks such as object/room categorization or recognition,² object segmentation, 2D/3D map building, and robot localization among others.

The next section contrasts Robot@Home with other datasets also applicable to the categorization problem. Section 3 presents the robotic platform used and the methodology followed for gathering the raw data, while Section 4 describes the dataset content and some use cases. Finally, Section 5 summarizes the paper.

2. Related datasets

Mobile robots have traditionally resorted to intensity images to categorize objects and/or rooms, which motivated the collection of datasets providing this kind of information (Everingham et al., 2010; Russakovsky et al., 2014; Russell et al., 2008). Nowadays, the tendency is for the datasets to also include depth information (Anand et al., 2013; Janoch et al., 2011; Lai et al., 2011), given the proven benefits of exploiting morphological and spatial information in assisting categorization methods (Ruiz-Sarmiento et al., 2014). These datasets can be roughly classified as: object-centric, view-centric, and place-centric.

Object-centric datasets, such as ACCV (Hinterstoisser et al., 2013), RGBD Dataset (Lai et al., 2014, 2011), KIT object models (Kasper et al., 2012), or BigBIRD (Singh et al., 2014), provide RGB-D observations in which a unique object spans over each image. The exploitation of these images for categorization exhibits some drawbacks: (1) they are not representative of the typical images gathered by a robot at a real environment; (2) they prevent the utilization of valuable contextual information of objects; and (3) they are not suitable for the room categorization problem.

On the other hand, view-centric datasets such as Berkeley-3D (Janoch et al., 2011), Cornell-RGBD (Anand et al., 2013), UMA-Offices (Ruiz-Sarmiento et al., 2015a), NYU (Silberman and Fergus, 2011; Silberman et al., 2012), TUW (Aldoma et al., 2014), or UBC VRS (Meger and Little, 2012), consist of isolated RGB-D images, or a sequence of them, which cover a partial view of the working environment. This information permits the exploitation of contextual relations but only from a local, reduced perspective, since information of the entire scene is not collected. Therefore, their use for the categorization problem is still limited.

Finally, place-centric datasets such as SUN3D (Xiao et al., 2013) provide comprehensive information from the inspected room, or even the entire work environment, typically through the registration of RGB-D images. This type of dataset provides the best option to take advantage of both depth and contextual information in the categorization problem, albeit, unfortunately its number is quite limited. A dataset worth mentioning at this point is ViDRILO (Martinez-Gomez et al., 2015), which comprises five sequences of RGB-D observations of two office buildings collected by a robot combining object and environment-centric perspectives. This dataset annotates each observation with its room type and the objects found within it, although this labeling is not per-pixel and the number of object categories is reduced. Table 1 shows a summary of datasets applicable to the categorization problem and their characteristics, including the novel, place-centric Robot@Home dataset.

Table 1.

Summary of related datasets.

Dataset	CR	DT	EOC	ERC	#Obs./Size (GB)
ACCV Hinterstoisser et al. (2013)		object-centric			18,000 / 3.6
Berkeley-3D Janoch et al. (2011)		view-centric	✓(local)	✓(limited)	849 / 0.8
UMA-Offices Ruiz-Sarmiento et al. (2015a)		view-centric	✓(local)	✓(limited)	25 / 0.01
BigBIRD Singh et al. (2014)		object-centric			150,000 / 2,625
Cornell-RGBD Anand et al. (2013)	✓	view-centric	✓(local)	✓(limited)	207 / 0.1
KIT object models Kasper et al. (2012)		object-centric			163,188 / –
Multi-sensor 3D Object Dat. Garcia-Garcia et al. (2016)		object-centric			1,792 / 0.84
NYUv1 Silberman and Fergus (2011)		view-centric	✓(local)	✓(limited)	51,000 / 90
NYUv2 Silberman et al. (2012)		view-centric	✓(local)	✓(limited)	408,000 / 428
RGBD Dataset Lai et al. (2011)		object-centric			– / 84
RGBD Dataset 2 Lai et al. (2014)		view-centric			11,427 / 5.5
TUW Aldoma et al. (2014)	✓	view-centric	✓(local)	✓(limited)	124 / 0.43
SUN3D Xiao et al. (2013)		place-centric	✓	✓	– / –
UBC VRS Meger and Little (2012)	✓	view-centric	✓(local)		1,082 / –

Robot@Home	✓	place-centric	✓	✓	87,891 / 9.6

Note: CR: collected by a robot; DT: dataset type; EOC: enables object context exploitation; ERC: enables room categorization.

3. Data collection

3.1. Robotic platform

The Robot@Home dataset has been collected using the commercial robot Giraff (Giraff Technologies AB, 2015), which consists of a motorized wheeled platform endowed with a videoconferencing set. The robot is controlled by a low-cost onboard computer running Windows 7, with a CPU Intel^Ⓡ Core^TM2 T7200 at 2 Ghz, 1 GB of RAM, and a 160 GB hard disk. This platform has been enhanced with the following sensors:

four Asus XTion Pro Live RGB-D cameras (ASUS, 2015) with a 58^∘x45^∘ field of view (FOV). These devices can provide synchronized intensity and depth images at VGA (640 × 480) or QVGA (320 × 240) resolutions;

a Hokuyo laser scanner model URG-04LX-UG01 (Hokuyo Automatic Co., 2015), a device that surveys 2D planes with a FOV of 240^∘ and 0.352^∘ of angular resolution.

The four RGB-D devices have been mounted vertically on an octagonal rig, which sets a radial configuration of camera’s optical axes, with an angular difference of 45^∘ (see Figure 1). The rig is placed in the front part of the robot, at a height of ~0.92 m,³ and the devices are connected to the onboard computer using a PCIe card with four USB 2.0 ports. Notice that the rig could hold up to eight RGB-D cameras, but we considered that the utilization of four slots was enough for the purposes of this dataset. This setup yields two important advantages: first, there is no overlap among the FOV of the four units, avoiding in this way possible sensor interferences, and second, the combination of the output of the devices produces RGB-D observations with ~ 180^∘ of horizontal FOV (see Figure 2).

Fig. 2.

RGB and depth images from the four RGB-D devices mounted on the robot in two locations: a kitchen and a bedroom.

Concerning the 2D laser scanner, it is mounted at the front part of the robot base (see Figure 1), at a height of ~0.31 m. In this position the sensor cannot perceive any part of the robot while it surveys a plane horizontal to the floor at its maximum FOV.

3.2. Sensor calibration

In order to provide accurate information within the Robot@Home dataset, the sensors mounted on the robot must be calibrated both intrinsically and extrinsically. The locations of the devices mounted on the robot, that is, their extrinsic parameters w.r.t. the robot frame,⁴ have been computed in a three-steps process. First, the RGB-D devices were calibrated between them following the technique in Fernandez-Moral et al. (2014). Then, the relative pose between the RGB-D devices and the laser scanner is obtained by the procedure presented in Gómez-Ojeda et al. (2015). Finally, the position of the RGB-D rig in the robot frame is computed by minimizing the error of fitting planes to the walls and the floor of a room using RANSAC (Fischler and Bolles, 1981) while the robot is turning on the spot, and imposing vertical and horizontal conditions respectively. At this point every sensor is accurately related to the robot frame.

Regarding the sensors’ internal parameters, for the correction of the depth images from the RGB-D devices we have resorted to the CLAMS framework (Teichman et al., 2013), while for the RGB and the laser scanner data we have relied on the factory values given their good outcome.

3.3. Software for the collection of data

Data streams coming from the five devices, that is, four RGB-D cameras and the laser scanner, must be conveniently managed and stored. For that, in this work we have opted for the rawlog-grabber application from the Mobile Robot Programming Toolkit project (Blanco Claraco, 2015), which provides mechanisms to collect and save sensory data from different sources into a file. In a nutshell, this software launches a dedicated thread for each sensor that time-stamps and saves the collected data to a compressed binary file in the Rawlog common robotic dataset format,⁵ which is automatically translated to human-readable information (plain text files and PNG images). Sensory observations have been saved at a frequency ranging from 1.25 Hz up to 10 Hz for the 2D laser scanner, and from 1 Hz up to 11 Hz for each RGB-D camera. These values are limited by the computational performance of the onboard computer, which has been compensated for by reducing the robot speed (maximum of 0.1 m/s and 10 deg/s for linear and angular speeds respectively), ensuring in this way a good coverage of the inspected areas.

3.4. Collection methodology

The data provided by Robot@Home have been collected within five dwelling apartments, named anto, alma, pare, rx2, and sarmis. For illustrative purposes, Figure 3 depicts their geometric maps, showing the annotations for the room categories in one of them. Raw data were collected in different sessions, each one containing a number of sequences of RGB-D observations and laser scans. These sequences were gathered by teleoperating the robot to fully inspect each individual room. Figure 3 shows an example of the path followed by the robot while collecting a sequence of the sarmis house.

Fig. 3.

(Left) Example of 2D geometric map of the sarmis house, annotated with the type of the inspected rooms (orange boxes). The black dots represent the path followed by the robot during the inspection of the house, starting at the green triangle (livingroom) and ending up at the red one (corridor). (Right), examples of geometric maps of the remaining domestic settings. For a better understanding of the descriptions resorting to color the reader is referred to the online version of this work.

A total of seven sessions were conducted, three in the sarmis house and one in each of the remaining settings. During the data collection, special attention was paid to conveniently steer the robot in order to provide different viewpoints of the objects in the scene, so they can appear partially or totally occluded. As an example, Figure 4 shows a pencil case that is fully visible in the first and third images, although showing a different pose, while it is partially occluded in the second one, and totally disappears in the fourth image.

Fig. 4.

(Top row) Different viewpoints from a sequence of cropped intensity images of the same set of objects, and (bottom row) their associated depth images. Notice that throughout the sequence some objects are totally or partially occluded by others. Numbers indicate the order of the viewpoint within the sequence.

Moreover, a number of particular characteristics have been intentionally included in each scenario to provide additional data for testing different object recognition algorithms and techniques, specifically the following:

Inclusion of distinctive objects. A number of patterns/objects have been placed in different rooms within these houses, specfically: teddies in alma, fruits in anto, numerical patterns in pare (see top row of Figure 5), and geometric patterns in rx2 (see bottom row in Figure 5).

Varying lighting conditions. Each of the three sessions in sarmis house was conducted at a different time of the day, which means that the objects were visualized under different lighting conditions.

Varying sets of objects. In those three sessions, the set of objects placed in each room from session to session differs, with objects dis/appearing as well as being moved (see Figure 6).

Fig. 5.

(Top row) Numerical patterns in pare house. (Bottom row) Geometric patterns in rx2 house.

Fig. 6.

Intensity images of a bedroom from the same RGB-D sensor illustrating the change of lighting conditions during the three conducted sessions at sarmis house. The overlapped numbers represent the identifier of the session. It can be also observed how the set of visible objects differs from session to session.

4. Dataset description

4.1. Raw data

The Robot@Home dataset comprises ~75 min of recorded data from a total of 83 sequences collected in the aforementioned sessions. These raw data include:

laser scanner data : 2D observations from the laser scanner (see first row in Figure 8) captured in the inspected rooms;

RGB-D data : observations from the four RGB-D cameras, including intensity images, depth images, and 3D point clouds (see second and third row in Figure 8);

topological information of the rooms connectivity, stating the rooms that are reachable by the robot from a certain location.

Table 2 shows a summary of the information gathered from each apartment, including the number of sequences, rooms inspected, number of 2D laser scans and RGB-D observations, as well as the time spent in their collection.

Table 2.

Number of sequences, rooms, and observations per house and time spent collecting them.

					Sarmi-house
	alma	anto	pare	rx2	1st S.	2nd S.	3rd S.	Dataset
# Sequences	6	10	11	5	17	17	15	81
# Rooms	10	18	20	8	25	25	23	129
# Observations	15,535	22,301	26,506	9,906	4,939	4,218	4,486	87,891
# Laser scans	3,100	4,407	5,291	2,016	1,311	1,146	1,224	18,310
# RGBD obs.	12,435	17,894	21,215	7,890	3,866	3,276	3,519	69,581
Time (min)	5.12	7.53	8.48	3.22	17.47	15.25	16.30	74.57

The surveyed scenarios include a total of 36 rooms (some of them visited several times), divided into eight categories, that contain ~1900 object instances belonging to 157 categories. An exhaustive list of the categories of the objects and rooms appearing in the dataset can be consulted at the dataset website.

4.2. Processed data

The raw data have been processed in order to enrich the dataset with the following information:

2D geometric maps of each inspected room/house, built by registering the observations from the laser scanner;

3D colored point maps : 3D reconstructions of rooms based on the registration of the collected RGB-D data;

labeled 3D point maps including per-point object and room labels (category and instance) within the reconstructed rooms;

labeled RGB-D observations including per-pixel object labels (category and instance) within each RGB-D observation, i.e., both intensity and depth images, and per-point labels within their respective point clouds.

Processed data have been produced employing two software tools, namely the aforementioned Mobile Robot Programming Toolkit (MRPT), and the Object Labeling Toolkit (OLT) (Ruiz-Sarmiento et al., 2015c). OLT comprises a set of public tools⁶ aimed at helping in the management and labeling of sequential RGB-D observations. The next sections describe in more detail the applications and methodologies followed to process the raw data.

4.2.1. 2D geometric maps

The ICP-slam application within MRPT has been used to register sequences of laser scans for building 2D geometric maps. Thereby, the Robot@Home dataset contains a total of 41 geometric maps, one per inspected room and a global map for each house (see Figures 3 and 8, fourth row). These maps are distributed along with the logs produced during the SLAM process, which include additional information such as the estimated path followed by the robot, snapshots of the scans’ registration over time, and so on.

4.2.2. 3D colored point maps

We have used the Mapping tool from OLT in order to produce aligned 3D representations of the recorded RGB-D data. This software registers sequences of RGB-D observations using the generalized iterative closest point technique (GICP) (Segal et al., 2009). This ICP variant requires an initial pose estimation to accurately align RGB-D observations, which in our case is obtained using visual odometry (Jaimez and González-Jiménez, 2015). Some examples of the provided reconstructions of rooms are shown in Figure 8 (fifth row).

4.2.3. Labeled 3D point maps

Each reconstructed room has been labeled with the Label scene tool from OLT. This tool allows us to easily set bounding boxes to the objects appearing in the point cloud reconstruction, and include annotations with the ground truth information about their category, for example, counter, book, couch, shelf, as well as an object id to identify the particular instance, that is, counter-1, book-3, and so on. Figure 8 (sixth row) illustrates some examples of annotations, while Figure 7 shows a snapshot of the labeling process.

Fig. 7.

Snapshot of a kitchen from the alma house during its labeling process through the Label scene OLT component.

4.2.4. Labeled RGB-D observations

Each RGB-D observation within the collected sequences has been also labeled with the category/instance of their contained objects through the Label rawlog application within OLT. This tool is fed with both the recorded sequence and the labeled, reconstructed map (obtained as described in the previous section) in order to automatically propagate the ground truth information to the RGB-D observations. The outcome of this process is a per-pixel labeling of the intensity and depth images within each observation, as well as a per-point labeling of its point cloud data (please refer to Ruiz-Sarmiento et al. (2015c) for further information). The last row of Figure 8 depicts depth images colored according to the propagated ground truth labels.

Fig. 8.

Excerpts of the information provided by Robot@Home. From top to bottom, examples of 2D laser scans from three different rooms, RGB and depth images gathered from them, their built 2D geometric maps and 3D reconstructions, the labels in such reconstructions as boxes where colors stand for different object categories, and, finally, the labeled depth information.

4.3. Usage

All the raw and processed data within Robot@Home have been conveniently structured into data types and sessions at its site, so the interested user can download chunks of information according to his/her needs (see Figure 9). The data are available in (human readable) plain text files⁷ and PNG images. Some of their immediate applications are listed below.

Fig. 9.

Tree structure of the data provided in the dataset webpage. Notice that the different types of data are available to the user both, split in sessions, or all together. The topology of the houses and the 2D geometric maps have particular, more convenient download options.

Semantic mapping. The Robot@Home dataset is specially suited as a benchmark for algorithms aimed at robotic semantic mapping through the categorization of objects and/or rooms, given its collection by a mobile robot and the inclusion of annotated 3D reconstructions and sequences of RGB-D observations (Oliveira et al., 2015; Ruiz-Sarmiento et al., 2016). It can be also considered for testing recognition algorithms (Bo et al., 2013), since the provided ground truth information also includes the instance of the object/room to which it belongs to, for example, sofa_1, bottle_3, bathroom_1, and so on.

Robot@Home also enables the benchmarking of categorization systems relying on different kinds of information, namely: (a) exclusively using laser scans, RGB, depth, or RGB-D observations, (b) employing a stream of data from a sequence, (c) resorting to partial registrations of such a sequence, or (d) exploiting the resultant whole registered scene.

From a semantic point of view, the compiled data are useful as input for modern categorization systems leveraging contextual information within domestic settings (Anand et al., 2013; Ruiz-Sarmiento et al., 2015a). This enables, for example, the exploitation of typical objects’ and rooms’ configurations such as beds are in bedrooms, microwaves are not in bathrooms, or cushions are on couches, in order to enhance the categorization performance.

An additional feature worth mentioning is that Robot@Home is ready to be used by the Benchmark rawlog application from OLT. This software compares two sequences of labeled RGB-D observations and computes the similarity of their annotations. In other words, it permits us to compare a sequence from the dataset including ground truth annotations, with the same sequence labeled by a categorization algorithm, retrieving information about the performance of such an algorithm. Thereby, a common benchmarking frame for the comparison of algorithms that exploit the Robot@Home dataset can be easily set.

In order to standardize the dataset usage for categorization/recognition purposes, we encourage the utilization of a leave-one-out cross-validation procedure (Arlot and Celisse, 2010), where the data from one apartment are employed for testing and those from the remaining ones for training. This process is repeated five times, changing the testing home, and the individual results are finally averaged.

Object/room segmentation. Segmentation or clustering algorithms (Carreira and Sminchisescu, 2012; Mura et al., 2014) can be also benchmarked given the per-pixel and per-point labeling of its reconstructed rooms and RGB-D sequences, which sets the extension and boundaries of the objects and rooms appearing in the dataset.

Simulation of virtual sensors. The coverage provided by the rig of RGB-D sensors, that is, ~180^∘ horizontally and ~58^∘ vertically, enables the simulation of virtual sensors with different field of views. This is a valuable feature in the design phase of a robotic sensing system (de la Puente et al., 2014), since it permits the dataset user to test different sensing configurations in order to find the most convenient one for his/her purposes.

Data compression/transmission. Many robotic platforms have limited resources, which are typically shared among a number of software processes. In these cases efficient compression/transmission algorithms for dense sensory information are a plus (Kammerl et al., 2012; Mekuria and Cesar, 2016), for which the amount of data provided within Robot@Home can be a useful testbed for checking their performance.

Other. Finally, the provided data can be also exploited for addressing typical robotic problems such as 3D map building, localization (Castellanos and Tardos, 2012) or SLAM (Cadena et al., 2016), since the robot’s poses can be accurately estimated from the sequence of 2D scans.

5. Summary

In this work we have presented the Robot@Home dataset, a collection of data gathered by a mobile robot in domestic settings, publicly available at http://mapir.isa.uma.es/work/robot-at-home-dataset , the main purpose of which is to serve as a testbed for semantic mapping algorithms through the categorization of objects and/or rooms. Such a robot has been endowed with a rig of four RGB-D devices and a 2D laser scanner, which have been extrinsically and intrinsically calibrated employing state-of-the-art algorithms. Robot@Home comprises (a) sequences of RGB-D observations and 2D laser scans from five home environments, (b) topological information about the connectivity of the rooms in those homes, (c) 2D geometric maps of the inspected rooms/homes, and (d) 3D reconstructions. Ground truth information about the categories of the observed objects and rooms is available in the form of (e) annotated bounding boxes over the reconstructed rooms and (f) labeled sequences of RGB-D observations.

The surveyed scenarios include characteristics that turn the dataset into a sandbox to test robotic categorization systems dealing with issues such as changing lighting conditions, cluttered room layouts, occlusions, or changing viewpoints. Additionally, a number of distinctive patterns and objects have been intentionally placed in these scenarios to enable their exploitation in robotic competitions. Although Robot@Home is especially suited as a benchmark tool for object and/or room categorization systems taking advantage of contextual relations among objects and rooms, its possible usages are diverse, for example, object/room instance recognition, object segmentation, data compression/transmission, and so on.

Footnotes

Acknowledgements

We are very grateful to our colleague E. Fernandez-Moral for his support for the extrinsic calibration of the RGB-D devices, and to Mariano Tarifa Jaimez for his valuable advice.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Spanish projects “IRO: Improvement of the sensorial and autonomous capability of robots through olfaction” (2012-TEP-530) and “PROMOVE: Advances in mobile robotics for promoting independent life of elders” (DPI2014-55826-R), and the European project “MoveCare: Multiple-actors virtual empathic caregiver for the elder” (Call: H2020-ICT-2016-1, contract number: 732158).

Notes

References

Aldoma

Faulhammer

Vincze

(2014) Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In: IEEE/RSJ international conference on intelligent robots and systems, Chicago, IL, pp. 5016–5023. DOI: 10.1109/IROS.2014.6943275.

Almeida

Steinbauer

. (2016) In: RoboCup 2015: Robot World Cup XIX (Lecture Notes in Computer Science, vol. 9513). Cham: Springer. ISBN 978-3-319-29338-7.

Anand

Koppula

Joachims

. (2013) Contextually guided semantic labeling and search for three-dimensional point clouds. International Journal of Robotics Research 32(1): 19–34.

Arlot

Celisse

(2010) A survey of cross-validation procedures for model selection. Statistics Surveys 4: 40–79.

ASUS (2015) Xtion PRO LIVE. Available at: http://www.asus.com/Multimedia/Xtion_PRO_LIVE/ (accessed 13 December 2016).

Blanco Claraco

(2015) Mobile Robot Programming Toolkit (MRPT). Available at: http://www.mrpt.org (accessed 28 April 2015).

Ren

Fox

(2013) Unsupervised feature learning for RGB-D based object recognition. In: The 13th international symposium on experimental robotics, Part VI, Springer Tracts in Advanced Robotics, pp. 387–402.

Cadena

Carlone

Carrillo

. (2016) Simultaneous localization and mapping: Present, future, and the robust-perception age. arXiv Preprint 1606.05830.

Carreira

Sminchisescu

(2012) CPMC: Automatic object segmentation using constrained parametric min-cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(7): 1312–1328.

10.

Castellanos

Tardos

(2012) Mobile Robot Localization and Map Building: A Multisensor Fusion Approach. Springer Science & Business Media, DOI: 10.1007/978-1-4615-4405-0.

11.

de la Puente

Bajones

Einramhof

. (2014) RGB-D sensor setup for multiple tasks of home robots and experimental results. In: IEEE/RSJ international conference on intelligent robots and systems, Chicago, IL, pp. 2587–2594. DOI: 10.1109/IROS.2014.6942915.

12.

Everingham

van Gool

Williams

. (2010) The PASCAL Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88(2): 303–338.

13.

Fernandez-Moral

González-Jiménez

Rives

. (2014) Extrinsic calibration of a set of range cameras in 5 seconds without pattern. In: Proceedings of the 2014 IEEE/RSJ international conference on intelligent robots and systems (IROS 2014), Chicago, IL, pp. 429–435. DOI: 10.1109/IROS.2014.6942595.

14.

Fischler

Bolles

(1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24(6): 381–395.

15.

Galindo

Saffiotti

(2013) Inferring robot goals from violations of semantic knowledge. Robotics and Autonomous Systems 61(10): 1131–1143.

16.

Garcia-Garcia

Orts-Escolano

Oprea

. (2016) Multisensor 3D object dataset for object recognition with full pose estimation. Neural Computing and Applications 1–12. DOI: 10.1007/s00521-016-2224-9.

17.

Giraff Technologies AB (2015) Giraff robot. Available at: http://www.giraff.org/ (accessed 13 December 2015).

18.

Gómez-Ojeda

Briales

Fernández-Moral

. (2015) Extrinsic calibration of a 2D laser-rangefinder and a camera based on scene corners. In: IEEE international conference on robotics and automation (ICRA), Seattle, WA, pp. 3611–3616. DOI: 10.1109/ICRA.2015.7139700.

19.

Hinterstoisser

Lepetit

Ilic

. (2013) Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Proceedings of the 11th Asian conference on computer vision (ACCV’12), Volume 1, pp.548–562. Berlin, Heidelberg: Springer-Verlag.

20.

Hokuyo Automatic Co (2015) Hokuyo URG-04LX-UG01. Available at: http://www.hokuyo-aut.jp (accessed 6 April 2015).

21.

Jaimez

Blanco

González-Jiménez

(2015) Efficient reactive navigation with exact collision determination for 3D robot shapes. International Journal of Advanced Robotic Systems 12(5): 63.

22.

Jaimez

González-Jiménez

(2015) Fast visual odometry for 3-D range sensors. IEEE Transactions on Robotics 31(4): 809–822.

23.

Janoch

Karayev

Jia

. (2013) A category-level 3-D object dataset: Putting the Kinect to work. In: Proceedings of the 1st workshop on consumer depth cameras for computer vision (ICCV workshop), London: Springer, pp. 141–165. DOI: 10.1007/978-1-4471-4640-7_8.

24.

Kammerl

Blodow

Rusu

. (2012) Real-time compression of point cloud streams. In: Proceedings of the 2012 IEEE international conference on robotics and automation, Saint Paul, MN, pp. 778–785. DOI: 10.1109/ICRA.2012.6224647.

25.

Kasper

Xue

Dillmann

(2012) The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics. The International Journal of Robotics Research 31(8): 927–934.

26.

Kiselev

Kristoffersson

Melendez-Fernandez

. (2015) Evaluation of using semi-autonomy features in mobile robotic telepresence systems. In: Proceedings of the 7th IEEE international conference on cybernetics and intelligent systems (CIS) and the 7th IEEE international conference on robotics, automation and mechatronics (RAM), Siem Reap, Cambodia, pp. 147–152.

27.

Lai

Fox

(2014) Unsupervised feature learning for 3D scene labeling. In: IEEE international conference on robotics and automation (ICRA), Hong Kong, pp. 3050–3057. DOI: 10.1109/ICRA.2014.6907298.

28.

Lai

Ren

. (2011) A large-scale hierarchical multiview RGB-D object dataset. In: Proceedings of the 2011 IEEE international conference on robotics and automation (ICRA), Shanghai, pp. 1817–1824. DOI: 10.1109/ICRA.2011.5980382.

29.

Martinez-Gomez

Cazorla

Garcia-Varea

. (2014) Overview of the ImageCLEF 2014 Robot Vision Task. In: CLEF 2014 evaluation labs and workshop, online working notes, pp. 296–307.

30.

Martinez-Gomez

Cazorla

Garcia-Varea

. (2015) ViDRILO: The visual and depth robot indoor localization with objects information dataset. International Journal of Robotics Research 34(14): 1681–1687. DOI: 10.1177/0278364915596058

31.

Meger

Little

(2012) The UBC visual robot survey: A benchmark for robot category recognition. In: Proceedings of experimental robotics—The 13th international symposium on experimental robotics (ISER 2012), Canada, 18–21 June 2012, pp.979–991. Québec City: Springer International Publishing. DOI: 10.1007/978-3-319-00065-7_65.

32.

Mekuria

Cesar

(2016) MP3DG-PCC, open source software framework for implementation and evaluation of point cloud compression. In: Proceedings of the 2016 ACM on multimedia conference (MM ’16). New York, NY, USA: ACM, pp.1222–1226. DOI: 10.1145/2964284.2973806.

33.

Melendez-Fernandez

Galindo

González-Jiménez

(2016) An assisted navigation method for telepresence robots. In: Proceedings of the 10th international conference on ubiquitous computing and ambient intelligence, Gran Canaria, Spain, pp. 463–468. Cham: Springer International Publishing. DOI: 10.1007/978-3-319-48746-5_47.

34.

Mura

Mattausch

Villanueva

. (2014) Automatic room detection and reconstruction in cluttered indoor environments with complex room layouts. Computers & Graphics 44: 20–32.

35.

Oliveira

Lopes

Lim

. (2015) Concurrent learning of visual codebooks and object categories in open-ended domains. In: Proceedings of the 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), Hamburg, pp. 2488–2495. DOI: 10.1109/IROS.2015.7353715.

36.

Pronobis

Jensfelt

Sjöö

. (2010) Semantic modelling of space. In: Christensen

Kruijff

GJM

Wyatt

(eds) Cognitive Systems (Cognitive Systems Monographs, volume 8). Berlin Heidelberg: Springer, pp.165–221.

37.

Ruiz-Sarmiento

Galindo

González-Jiménez

(2014) Mobile robot object recognition through the synergy of probabilistic graphical models and semantic knowledge. In: Workshop on cognitive robotics (CogRob), European conference on artificial intelligence, Pargue, Czech Republic.

38.

Ruiz-Sarmiento

Galindo

González-Jiménez

(2015a) Exploiting semantic knowledge for robot object recognition. Knowledge-Based Systems 86: 131–142.

39.

Ruiz-Sarmiento

Galindo

González-Jiménez

(2015b) Joint categorization of objects and rooms for mobile robots. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS), Hamburg, pp. 2523–2528. DOI: 10.1109/IROS.2015.7353720.

40.

Ruiz-Sarmiento

Galindo

González-Jiménez

(2015c) OLT: A toolkit for object labeling applied to robotic RGB-D datasets. In: Proceedings of the European conference on mobile robots (ECMR), Lincoln, pp. 1–6. DOI: 10.1109/ECMR.2015.7324214.

41.

Ruiz-Sarmiento

Galindo

González-Jiménez

(2016) Building multiversal semantic maps for mobile robot operation. Knowledge-Based Systems 119: 257–272. DOI: 10.1016/j.knosys.2016.12.016.

42.

Russakovsky

Deng

. (2014) ImageNet large scale visual recognition challenge. CoRR abs/1409.0575.

43.

Russell

Torralba

Murphy

. (2008) LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision 77(1–3): 157–173.

44.

Segal

Haehnel

Thrun

(2009) Generalized-ICP. In: Proceedings of robotics: Science and systems, Volume 2, Issue 4, Seattle, WA. DOI: 10.15607/RSS.2009.V.021.

45.

Silberman

Fergus

(2011) Indoor scene segmentation using a structured light sensor. In: Proceedings of the international conference on computer vision-Workshop on 3D representation and recognition, Barcelona, pp. 601–608. DOI: 10.1109/ICCVW.2011.6130298.

46.

Silberman

Hoiem

Kohli

. (2012) Indoor segmentation and support inference from RGBD images. In: Proceedings of the 12th European conference on computer vision (ECCV 2012), Florence, Italy, pp.746–760. DOI:10.1007/978-3-642-33715-4_54

47.

Singh

Sha

Narayan

. (2014) BigBIRD: A large-scale 3D database of object instances. In: Proceedings of the 2014 IEEE international conference on robotics and automation (ICRA), Hong Kong, pp. 509–516. DOI: 10.1109/ICRA.2014.6906903.

48.

Teichman

Miller

Thrun

(2013) Unsupervised intrinsic calibration of depth sensors via SLAM. In: Proceedings of robotics: Science and systems, vol. 248, Berlin, Germany. DOI: 10.15607/RSS.2013.IX.027.

49.

Xiao

Owens

Torralba

(2013) SUN3D: A database of big spaces reconstructed using SfM and object labels. In: Proceedings of the 2013 IEEE international conference on computer vision (ICCV), Sydney, NSW, pp. 1625–1632. DOI: 10.1109/ICCV.2013.458.