Planning accurate manipulation for deformable objects requires prediction of their state. The prediction is often complicated by a loss of stability that may result in collapse of the deformable object. In this work, stability of a fabric strip folding performed by a robot is studied. We show that there is a static instability in the folding process. This instability is detected in a physics-based simulation and the position of the instability is verified experimentally by real robotic manipulation. Three state-of-the-art methods for folding are assessed in the presence of static instability. It is shown that one of the existing folding paths is suitable for folding of materials with internal friction such as fabrics. Another folding path that utilizes dynamic motion exists for ideal elastic materials without internal friction. Our results show that instability needs to be considered in planning to obtain accurate manipulation of deformable objects.
Accurate manipulation of a deformable body such as a piece of fabric is difficult because of its many degrees of freedom and unobservable properties affecting its dynamics. To alleviate these challenges, we propose the application of feedback-based control to robotic fabric strip folding. The feedback is computed from the low dimensional state extracted from a camera image. We trained the controller using reinforcement learning in simulation which was calibrated to cover the real fabric strip behaviors. The proposed feedback-based folding was experimentally compared to two state-of-the-art folding methods and our method outperformed both of them in terms of accuracy.
Developing autonomous assistants to help with domestic tasks is a vital topic in robotics research. Among these tasks, garment folding is one of them that is still far from being achieved mainly due to the large number of possible configurations that a crumpled piece of clothing may exhibit. Research has been done on either estimating the pose of the garment as a whole or detecting the landmarks for grasping separately. However, such works constrain the capability of the robots to perceive the states of the garment by limiting the representations for one single task. In this paper, we propose a novel end-to-end deep learning model named GarmNet that is able to simultaneously localize the garment and detect landmarks for grasping. The localization of the garment represents the global information for recognising the category of the garment, whereas the detection of landmarks can facilitate subsequent grasping actions. We train and evaluate our proposed GarmNet model using the CloPeMa Garment dataset that contains 3,330 images of different garment types in different poses. The experiments show that the inclusion of landmark detection (GarmNet-B) can largely improve the garment localization, with an error rate of 24.7% lower. Solutions as ours are important for robotics applications, as these offer scalable to many classes, memory and processing efficient solutions.
Learning-based 3D object reconstruction enables single- or few-shot estimation of 3D object models. For robotics, this holds the potential to allow model-based methods to rapidly adapt to novel objects and scenes. Existing 3D reconstruction techniques optimize for visual reconstruction fidelity, typically measured by chamfer distance or voxel IOU. We find that when applied to realistic, cluttered robotics environments, these systems produce reconstructions with low physical realism, resulting in poor task performance when used for model-based control. We propose ARM, an amodal 3D reconstruction system that introduces (1) a stability prior over object shapes, (2) a connectivity prior, and (3) a multi-channel input representation that allows for reasoning over relationships between groups of objects. By using these priors over the physical properties of objects, our system improves reconstruction quality not just by standard visual metrics, but also performance of model-based control on a variety of robotics manipulation tasks in challenging, cluttered environments. Code is available at github.com/wagnew3/ARM.
In this paper, we study how to fold a specified origami crease pattern in order to minimize the impact of paper thickness. Specifically, origami designs are often expressed by a mountain-valley pattern (plane graph of creases with relative fold orientations), but in general this specification is consistent with exponentially many possible folded states. We analyze the complexity of finding the best consistent folded state according to two metrics: minimizing the total number of layers in the folded state (so that a flat folding is indeed close to flat), and minimizing the total amount of paper required to execute the folding (where thicker creases consume more paper). We prove both problems strongly NP-complete even for 1D folding. On the other hand, we prove the first problem fixed-parameter tractable in 1D with respect to the number of layers.
Robotic fabric manipulation has applications in home robotics, textiles, senior care and surgery. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We build upon the Visual Foresight framework to learn fabric dynamics that can be efficiently reused to accomplish different sequential fabric manipulation tasks with a single goal-conditioned policy. We extend our earlier work on VisuoSpatial Foresight (VSF), which learns visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. In this earlier work, we evaluated VSF on multi-step fabric smoothing and folding tasks against 5 baseline methods in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. A key finding was that depth sensing significantly improves performance: RGBD data yields an 80% improvement in fabric folding success rate in simulation over pure RGB data. In this work, we vary 4 components of VSF, including data generation, visual dynamics model, cost function, and optimization procedure. Results suggest that training visual dynamics models using longer, corner-based actions can improve the efficiency of fabric folding by 76% and enable a physical sequential fabric folding task that VSF could not previously perform with 90% reliability. Code, data, videos, and supplementary material are available at https://sites.google.com/view/fabric-vsf/.