No Arabic abstract
In this article we present a novel underwater dataset collected from several field trials within the EU FP7 project Cognitive autonomous diving buddy (CADDY), where an Autonomous Underwater Vehicle (AUV) was used to interact with divers and monitor their activities. To our knowledge, this is one of the first efforts to collect a large dataset in underwater environments targeting object classification, segmentation and human pose estimation tasks. The first part of the dataset contains stereo camera recordings (~10K) of divers performing hand gestures to communicate and interact with an AUV in different environmental conditions. These gestures samples serve to test the robustness of object detection and classification algorithms against underwater image distortions i.e., color attenuation and light backscatter. The second part includes stereo footage (~12.7K) of divers free-swimming in front of the AUV, along with synchronized IMUs measurements located throughout the divers suit (DiverNet) which serve as ground-truth for human pose and tracking methods. In both cases, these rectified images allow investigation of 3D representation and reasoning pipelines from low-texture targets commonly present in underwater scenarios. In this paper we describe our recording platform, sensor calibration procedure plus the data format and the utilities provided to use the dataset.
Underwater object detection for robot picking has attracted a lot of interest. However, it is still an unsolved problem due to several challenges. We take steps towards making it more realistic by addressing the following challenges. Firstly, the currently available datasets basically lack the test set annotations, causing researchers must compare their method with other SOTAs on a self-divided test set (from the training set). Training other methods lead to an increase in workload and different researchers divide different datasets, resulting there is no unified benchmark to compare the performance of different algorithms. Secondly, these datasets also have other shortcomings, e.g., too many similar images or incomplete labels. Towards these challenges we introduce a dataset, Detecting Underwater Objects (DUO), and a corresponding benchmark, based on the collection and re-annotation of all relevant datasets. DUO contains a collection of diverse underwater images with more rational annotations. The corresponding benchmark provides indicators of both efficiency and accuracy of SOTAs (under the MMDtection framework) for academic research and industrial applications, where JETSON AGX XAVIER is used to assess detector speed to simulate the robot-embedded environment.
Robots are becoming an important way to deliver health care, and personality is vital to understanding their effectiveness. Despite this, there is a lack of a systematic overarching understanding of personality in health care human robot interaction (H-HRI). To address this, the authors conducted a review that identified 18 studies on personality in H-HRI. This paper presents the results of that systematic literature review. Insights are derived from this review regarding the methodologies, outcomes, and samples utilized. The authors of this review discuss findings across this literature while identifying several gaps worthy of attention. Overall, this paper is an important starting point in understanding personality in H-HRI.
With the end goal of selecting and using diver detection models to support human-robot collaboration capabilities such as diver following, we thoroughly analyze a large set of deep neural networks for diver detection. We begin by producing a dataset of approximately 105,000 annotated images of divers sourced from videos -- one of the largest and most varied diver detection datasets ever created. Using this dataset, we train a variety of state-of-the-art deep neural networks for object detection, including SSD with Mobilenet, Faster R-CNN, and YOLO. Along with these single-frame detectors, we also train networks designed for detection of objects in a video stream, using temporal information as well as single-frame image information. We evaluate these networks on typical accuracy and efficiency metrics, as well as on the temporal stability of their detections. Finally, we analyze the failures of these detectors, pointing out the most common scenarios of failure. Based on our results, we recommend SSDs or Tiny-YOLOv4 for real-time applications on robots and recommend further investigation of video object detection methods.
In this paper, we propose StereoPIFu, which integrates the geometric constraints of stereo vision with implicit function representation of PIFu, to recover the 3D shape of the clothed human from a pair of low-cost rectified images. First, we introduce the effective voxel-aligned features from a stereo vision-based network to enable depth-aware reconstruction. Moreover, the novel relative z-offset is employed to associate predicted high-fidelity human depth and occupancy inference, which helps restore fine-level surface details. Second, a network structure that fully utilizes the geometry information from the stereo images is designed to improve the human body reconstruction quality. Consequently, our StereoPIFu can naturally infer the human bodys spatial location in camera space and maintain the correct relative position of different parts of the human body, which enables our method to capture human performance. Compared with previous works, our StereoPIFu significantly improves the robustness, completeness, and accuracy of the clothed human reconstruction, which is demonstrated by extensive experimental results.
This work describes a new human-in-the-loop (HitL) assistive grasping system for individuals with varying levels of physical capabilities. We investigated the feasibility of using four potential input devices with our assistive grasping system interface, using able-bodied individuals to define a set of quantitative metrics that could be used to assess an assistive grasping system. We then took these measurements and created a generalized benchmark for evaluating the effectiveness of any arbitrary input device into a HitL grasping system. The four input devices were a mouse, a speech recognition device, an assistive switch, and a novel sEMG device developed by our group that was connected either to the forearm or behind the ear of the subject. These preliminary results provide insight into how different interface devices perform for generalized assistive grasping tasks and also highlight the potential of sEMG based control for severely disabled individuals.