Abstract
This paper presents the Robot-at-Home dataset (Robot@Home), a collection of raw and processed sensory data from domestic settings aimed at serving as a benchmark for semantic mapping algorithms through the categorization of objects and/or rooms. The dataset contains 87,000+ time-stamped observations gathered by a mobile robot endowed with a rig of four RGB-D cameras and a 2D laser scanner. Raw observations have been processed to produce different outcomes also distributed with the dataset, including 3D reconstructions and 2D geometric maps of the inspected rooms, both annotated with the ground truth categories of the surveyed rooms and objects. The proposed dataset is particularly suited as a testbed for object and/or room categorization systems, but it can be also exploited for a variety of tasks, including robot localization, 3D map building, SLAM, and object segmentation. Robot@Home is publicly available for the research community at
Keywords
1. Introduction
The extraction and representation of semantic knowledge of the world is a crucial step toward achieving intelligent robots (Pronobis et al., 2010). Semantic maps enrich traditional metric and topological maps with high-level information, which enables the robot to process commands such as “go to the bedroom and stop the alarm” (Galindo and Saffiotti, 2013). In this way, the robot has to create and manage its own internal representation of the world incorporating the needed semantic knowledge, for example, this room is a bedroom and contains an alarm clock placed on a night table. Two major problems arise in the extraction of this information: object categorization, that is, the labeling of parts of the robot sensory data as belonging to a certain object class (bed, night table, alarm clock, etc.), and room categorization, that is the classification of areas of the environment as rooms of a certain type (kitchen, bedroom, living room, etc.).
In order to cope with this categorization problem 1 a large number of sample data are needed to test, validate and compare different solutions. Considering this, the research community has released a number of public repositories on the internet, e.g. PASCAL (Everingham et al., 2010), NYUv2 (Silberman et al., 2012), ImageNet (Russakovsky et al., 2014), and SUN3D (Xiao et al., 2013). However, these datasets exhibit shortcomings when used by cutting-edge categorization techniques leveraging contextual information (Anand et al., 2013; Ruiz-Sarmiento et al., 2015b). Synthetic data could be used instead under specific circumstances (Ruiz-Sarmiento et al., 2015a), albeit that real sensory datasets are preferred in most cases.
In this work we present the Robot-at-Home dataset (Robot@Home), a compilation of raw and processed data gathered by a mobile robot in different domestic settings. This dataset is unique in three aspects: the sensory system employed for its gathering, the diversity and amount of provided data, and the availability of dense ground truth information. Data collection followed a place-centric perspective (Xiao et al., 2013), and comprises 87,000+ timestamped observations as sequences of RGB-D images and 2D laser scans taken in five apartments. These raw data fully cover the common challenges to be faced by a robotic categorization system, such as changing lighting conditions, occlusions, viewpoint variations, or cluttered room layouts. On the other hand, the processed data include:
per-pixel labeling (ground truth information) of every RGB-D observation, along with the category of the room containing them;
3D reconstructions in the form of colored point maps and 2D geometric maps of the inspected rooms;
per-point object labeling of the 3D reconstructed rooms along with their room category;
topology of each apartment, stating the connectivity of the rooms within them.
During the data collection, a total of 36 rooms were completely inspected, so the dataset is rich in contextual information of objects and rooms. This is a valuable feature, missing in most of the state-of-the-art datasets, which can be exploited by, for instance, semantic mapping systems that leverage relationships such as pillows are usually on beds or ovens are not in bathrooms. Robot@Home is publicly available and is accompanied with the software application employed for its processing, named the Object Labeling Toolkit (OLT) (Ruiz-Sarmiento et al., 2015c).
The sensory system comprises a rig of four RGB-D cameras and a radial laser scanner (see Figure 1). The rig covers ~180∘ horizontally and ~58∘ vertically, which permits the user to simulate the performance of sensors with different field of views, a valuable feature during the design of a robotic sensing system (de la Puente et al., 2014). Sensors have been intrinsically and extrinsically calibrated with state-of-the-art algorithms (Fernandez-Moral et al., 2014; Gómez-Ojeda et al., 2015; Teichman et al., 2013). It is worth mentioning that a number of distinctive patterns and objects have been strategically added to the apartments for possible exploitation of the dataset in robotic competitions, such as those in RoboCup@Home (Almeida et al., 2016) or RobotVision (Martinez-Gomez et al., 2014), where robots need to detect predefined patterns in the environment to accomplish certain challenging missions: to explore specific areas, to efficiently find a particular object, and so on.

Robotic platform employed to collect the dataset along with details of the sensors mounted on it.
In summary, this dataset contributes a repository suitable for a variety of robotic tasks such as object/room categorization or recognition, 2 object segmentation, 2D/3D map building, and robot localization among others.
The next section contrasts Robot@Home with other datasets also applicable to the categorization problem. Section 3 presents the robotic platform used and the methodology followed for gathering the raw data, while Section 4 describes the dataset content and some use cases. Finally, Section 5 summarizes the paper.
2. Related datasets
Mobile robots have traditionally resorted to intensity images to categorize objects and/or rooms, which motivated the collection of datasets providing this kind of information (Everingham et al., 2010; Russakovsky et al., 2014; Russell et al., 2008). Nowadays, the tendency is for the datasets to also include depth information (Anand et al., 2013; Janoch et al., 2011; Lai et al., 2011), given the proven benefits of exploiting morphological and spatial information in assisting categorization methods (Ruiz-Sarmiento et al., 2014). These datasets can be roughly classified as: object-centric, view-centric, and place-centric.
Object-centric datasets, such as ACCV (Hinterstoisser et al., 2013), RGBD Dataset (Lai et al., 2014, 2011), KIT object models (Kasper et al., 2012), or BigBIRD (Singh et al., 2014), provide RGB-D observations in which a unique object spans over each image. The exploitation of these images for categorization exhibits some drawbacks: (1) they are not representative of the typical images gathered by a robot at a real environment; (2) they prevent the utilization of valuable contextual information of objects; and (3) they are not suitable for the room categorization problem.
On the other hand, view-centric datasets such as Berkeley-3D (Janoch et al., 2011), Cornell-RGBD (Anand et al., 2013), UMA-Offices (Ruiz-Sarmiento et al., 2015a), NYU (Silberman and Fergus, 2011; Silberman et al., 2012), TUW (Aldoma et al., 2014), or UBC VRS (Meger and Little, 2012), consist of isolated RGB-D images, or a sequence of them, which cover a partial view of the working environment. This information permits the exploitation of contextual relations but only from a local, reduced perspective, since information of the entire scene is not collected. Therefore, their use for the categorization problem is still limited.
Finally, place-centric datasets such as SUN3D (Xiao et al., 2013) provide comprehensive information from the inspected room, or even the entire work environment, typically through the registration of RGB-D images. This type of dataset provides the best option to take advantage of both depth and contextual information in the categorization problem, albeit, unfortunately its number is quite limited. A dataset worth mentioning at this point is ViDRILO (Martinez-Gomez et al., 2015), which comprises five sequences of RGB-D observations of two office buildings collected by a robot combining object and environment-centric perspectives. This dataset annotates each observation with its room type and the objects found within it, although this labeling is not per-pixel and the number of object categories is reduced. Table 1 shows a summary of datasets applicable to the categorization problem and their characteristics, including the novel, place-centric Robot@Home dataset.
Summary of related datasets.
Note: CR: collected by a robot; DT: dataset type; EOC: enables object context exploitation; ERC: enables room categorization.
3. Data collection
3.1. Robotic platform
The Robot@Home dataset has been collected using the commercial robot Giraff (Giraff Technologies AB, 2015), which consists of a motorized wheeled platform endowed with a videoconferencing set. The robot is controlled by a low-cost onboard computer running Windows 7, with a CPU IntelⓇ CoreTM2 T7200 at 2 Ghz, 1 GB of RAM, and a 160 GB hard disk. This platform has been enhanced with the following sensors:
four Asus XTion Pro Live RGB-D cameras (ASUS, 2015) with a 58∘x45∘ field of view (FOV). These devices can provide synchronized intensity and depth images at VGA (640 × 480) or QVGA (320 × 240) resolutions;
a Hokuyo laser scanner model URG-04LX-UG01 (Hokuyo Automatic Co., 2015), a device that surveys 2D planes with a FOV of 240∘ and 0.352∘ of angular resolution.
The four RGB-D devices have been mounted vertically on an octagonal rig, which sets a radial configuration of camera’s optical axes, with an angular difference of 45∘ (see Figure 1). The rig is placed in the front part of the robot, at a height of ~0.92 m, 3 and the devices are connected to the onboard computer using a PCIe card with four USB 2.0 ports. Notice that the rig could hold up to eight RGB-D cameras, but we considered that the utilization of four slots was enough for the purposes of this dataset. This setup yields two important advantages: first, there is no overlap among the FOV of the four units, avoiding in this way possible sensor interferences, and second, the combination of the output of the devices produces RGB-D observations with ~ 180∘ of horizontal FOV (see Figure 2).

RGB and depth images from the four RGB-D devices mounted on the robot in two locations: a kitchen and a bedroom.
Concerning the 2D laser scanner, it is mounted at the front part of the robot base (see Figure 1), at a height of ~0.31 m. In this position the sensor cannot perceive any part of the robot while it surveys a plane horizontal to the floor at its maximum FOV.
3.2. Sensor calibration
In order to provide accurate information within the Robot@Home dataset, the sensors mounted on the robot must be calibrated both intrinsically and extrinsically. The locations of the devices mounted on the robot, that is, their extrinsic parameters w.r.t. the robot frame, 4 have been computed in a three-steps process. First, the RGB-D devices were calibrated between them following the technique in Fernandez-Moral et al. (2014). Then, the relative pose between the RGB-D devices and the laser scanner is obtained by the procedure presented in Gómez-Ojeda et al. (2015). Finally, the position of the RGB-D rig in the robot frame is computed by minimizing the error of fitting planes to the walls and the floor of a room using RANSAC (Fischler and Bolles, 1981) while the robot is turning on the spot, and imposing vertical and horizontal conditions respectively. At this point every sensor is accurately related to the robot frame.
Regarding the sensors’ internal parameters, for the correction of the depth images from the RGB-D devices we have resorted to the CLAMS framework (Teichman et al., 2013), while for the RGB and the laser scanner data we have relied on the factory values given their good outcome.
3.3. Software for the collection of data
Data streams coming from the five devices, that is, four RGB-D cameras and the laser scanner, must be conveniently managed and stored. For that, in this work we have opted for the rawlog-grabber application from the Mobile Robot Programming Toolkit project (Blanco Claraco, 2015), which provides mechanisms to collect and save sensory data from different sources into a file. In a nutshell, this software launches a dedicated thread for each sensor that time-stamps and saves the collected data to a compressed binary file in the Rawlog common robotic dataset format, 5 which is automatically translated to human-readable information (plain text files and PNG images). Sensory observations have been saved at a frequency ranging from 1.25 Hz up to 10 Hz for the 2D laser scanner, and from 1 Hz up to 11 Hz for each RGB-D camera. These values are limited by the computational performance of the onboard computer, which has been compensated for by reducing the robot speed (maximum of 0.1 m/s and 10 deg/s for linear and angular speeds respectively), ensuring in this way a good coverage of the inspected areas.
3.4. Collection methodology
The data provided by Robot@Home have been collected within five dwelling apartments, named anto, alma, pare, rx2, and sarmis. For illustrative purposes, Figure 3 depicts their geometric maps, showing the annotations for the room categories in one of them. Raw data were collected in different sessions, each one containing a number of sequences of RGB-D observations and laser scans. These sequences were gathered by teleoperating the robot to fully inspect each individual room. Figure 3 shows an example of the path followed by the robot while collecting a sequence of the sarmis house.

(Left) Example of 2D geometric map of the sarmis house, annotated with the type of the inspected rooms (orange boxes). The black dots represent the path followed by the robot during the inspection of the house, starting at the green triangle (livingroom) and ending up at the red one (corridor). (Right), examples of geometric maps of the remaining domestic settings. For a better understanding of the descriptions resorting to color the reader is referred to the online version of this work.
A total of seven sessions were conducted, three in the sarmis house and one in each of the remaining settings. During the data collection, special attention was paid to conveniently steer the robot in order to provide different viewpoints of the objects in the scene, so they can appear partially or totally occluded. As an example, Figure 4 shows a pencil case that is fully visible in the first and third images, although showing a different pose, while it is partially occluded in the second one, and totally disappears in the fourth image.

(Top row) Different viewpoints from a sequence of cropped intensity images of the same set of objects, and (bottom row) their associated depth images. Notice that throughout the sequence some objects are totally or partially occluded by others. Numbers indicate the order of the viewpoint within the sequence.
Moreover, a number of particular characteristics have been intentionally included in each scenario to provide additional data for testing different object recognition algorithms and techniques, specifically the following:
Inclusion of distinctive objects. A number of patterns/objects have been placed in different rooms within these houses, specfically: teddies in alma, fruits in anto, numerical patterns in pare (see top row of Figure 5), and geometric patterns in rx2 (see bottom row in Figure 5).
Varying lighting conditions. Each of the three sessions in sarmis house was conducted at a different time of the day, which means that the objects were visualized under different lighting conditions.
Varying sets of objects. In those three sessions, the set of objects placed in each room from session to session differs, with objects dis/appearing as well as being moved (see Figure 6).

(Top row) Numerical patterns in pare house. (Bottom row) Geometric patterns in rx2 house.

Intensity images of a bedroom from the same RGB-D sensor illustrating the change of lighting conditions during the three conducted sessions at sarmis house. The overlapped numbers represent the identifier of the session. It can be also observed how the set of visible objects differs from session to session.
4. Dataset description
4.1. Raw data
The Robot@Home dataset comprises ~75 min of recorded data from a total of 83 sequences collected in the aforementioned sessions. These raw data include:
Table 2 shows a summary of the information gathered from each apartment, including the number of sequences, rooms inspected, number of 2D laser scans and RGB-D observations, as well as the time spent in their collection.
Number of sequences, rooms, and observations per house and time spent collecting them.
The surveyed scenarios include a total of 36 rooms (some of them visited several times), divided into eight categories, that contain ~1900 object instances belonging to 157 categories. An exhaustive list of the categories of the objects and rooms appearing in the dataset can be consulted at the dataset website.
4.2. Processed data
The raw data have been processed in order to enrich the dataset with the following information:
Processed data have been produced employing two software tools, namely the aforementioned Mobile Robot Programming Toolkit (MRPT), and the Object Labeling Toolkit (OLT) (Ruiz-Sarmiento et al., 2015c). OLT comprises a set of public tools 6 aimed at helping in the management and labeling of sequential RGB-D observations. The next sections describe in more detail the applications and methodologies followed to process the raw data.
4.2.1. 2D geometric maps
The ICP-slam application within MRPT has been used to register sequences of laser scans for building 2D geometric maps. Thereby, the Robot@Home dataset contains a total of 41 geometric maps, one per inspected room and a global map for each house (see Figures 3 and 8, fourth row). These maps are distributed along with the logs produced during the SLAM process, which include additional information such as the estimated path followed by the robot, snapshots of the scans’ registration over time, and so on.
4.2.2. 3D colored point maps
We have used the Mapping tool from OLT in order to produce aligned 3D representations of the recorded RGB-D data. This software registers sequences of RGB-D observations using the generalized iterative closest point technique (GICP) (Segal et al., 2009). This ICP variant requires an initial pose estimation to accurately align RGB-D observations, which in our case is obtained using visual odometry (Jaimez and González-Jiménez, 2015). Some examples of the provided reconstructions of rooms are shown in Figure 8 (fifth row).
4.2.3. Labeled 3D point maps
Each reconstructed room has been labeled with the Label scene tool from OLT. This tool allows us to easily set bounding boxes to the objects appearing in the point cloud reconstruction, and include annotations with the ground truth information about their category, for example, counter, book, couch, shelf, as well as an object id to identify the particular instance, that is, counter-1, book-3, and so on. Figure 8 (sixth row) illustrates some examples of annotations, while Figure 7 shows a snapshot of the labeling process.

Snapshot of a kitchen from the alma house during its labeling process through the Label scene OLT component.
4.2.4. Labeled RGB-D observations
Each RGB-D observation within the collected sequences has been also labeled with the category/instance of their contained objects through the Label rawlog application within OLT. This tool is fed with both the recorded sequence and the labeled, reconstructed map (obtained as described in the previous section) in order to automatically propagate the ground truth information to the RGB-D observations. The outcome of this process is a per-pixel labeling of the intensity and depth images within each observation, as well as a per-point labeling of its point cloud data (please refer to Ruiz-Sarmiento et al. (2015c) for further information). The last row of Figure 8 depicts depth images colored according to the propagated ground truth labels.

Excerpts of the information provided by Robot@Home. From top to bottom, examples of 2D laser scans from three different rooms, RGB and depth images gathered from them, their built 2D geometric maps and 3D reconstructions, the labels in such reconstructions as boxes where colors stand for different object categories, and, finally, the labeled depth information.
4.3. Usage
All the raw and processed data within Robot@Home have been conveniently structured into data types and sessions at its site, so the interested user can download chunks of information according to his/her needs (see Figure 9). The data are available in (human readable) plain text files 7 and PNG images. Some of their immediate applications are listed below.

Tree structure of the data provided in the dataset webpage. Notice that the different types of data are available to the user both, split in sessions, or all together. The topology of the houses and the 2D geometric maps have particular, more convenient download options.
Robot@Home also enables the benchmarking of categorization systems relying on different kinds of information, namely: (a) exclusively using laser scans, RGB, depth, or RGB-D observations, (b) employing a stream of data from a sequence, (c) resorting to partial registrations of such a sequence, or (d) exploiting the resultant whole registered scene.
From a semantic point of view, the compiled data are useful as input for modern categorization systems leveraging contextual information within domestic settings (Anand et al., 2013; Ruiz-Sarmiento et al., 2015a). This enables, for example, the exploitation of typical objects’ and rooms’ configurations such as beds are in bedrooms, microwaves are not in bathrooms, or cushions are on couches, in order to enhance the categorization performance.
An additional feature worth mentioning is that Robot@Home is ready to be used by the Benchmark rawlog application from OLT. This software compares two sequences of labeled RGB-D observations and computes the similarity of their annotations. In other words, it permits us to compare a sequence from the dataset including ground truth annotations, with the same sequence labeled by a categorization algorithm, retrieving information about the performance of such an algorithm. Thereby, a common benchmarking frame for the comparison of algorithms that exploit the Robot@Home dataset can be easily set.
In order to standardize the dataset usage for categorization/recognition purposes, we encourage the utilization of a leave-one-out cross-validation procedure (Arlot and Celisse, 2010), where the data from one apartment are employed for testing and those from the remaining ones for training. This process is repeated five times, changing the testing home, and the individual results are finally averaged.
5. Summary
In this work we have presented the Robot@Home dataset, a collection of data gathered by a mobile robot in domestic settings, publicly available at
The surveyed scenarios include characteristics that turn the dataset into a sandbox to test robotic categorization systems dealing with issues such as changing lighting conditions, cluttered room layouts, occlusions, or changing viewpoints. Additionally, a number of distinctive patterns and objects have been intentionally placed in these scenarios to enable their exploitation in robotic competitions. Although Robot@Home is especially suited as a benchmark tool for object and/or room categorization systems taking advantage of contextual relations among objects and rooms, its possible usages are diverse, for example, object/room instance recognition, object segmentation, data compression/transmission, and so on.
Footnotes
Acknowledgements
We are very grateful to our colleague E. Fernandez-Moral for his support for the extrinsic calibration of the RGB-D devices, and to Mariano Tarifa Jaimez for his valuable advice.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Spanish projects “IRO: Improvement of the sensorial and autonomous capability of robots through olfaction” (2012-TEP-530) and “PROMOVE: Advances in mobile robotics for promoting independent life of elders” (DPI2014-55826-R), and the European project “MoveCare: Multiple-actors virtual empathic caregiver for the elder” (Call: H2020-ICT-2016-1, contract number: 732158).
