No Arabic abstract
Why is it that we can recognize object identity and 3D shape from line drawings, even though they do not exist in the natural world? This paper hypothesizes that the human visual system perceives line drawings as if they were approximately realistic images. Moreover, the techniques of line drawing are chosen to accurately convey shape to a human observer. Several implications and variants of this hypothesis are explored.
We present a new method for vectorization of technical line drawings, such as floor plans, architectural drawings, and 2D CAD images. Our method includes (1) a deep learning-based cleaning stage to eliminate the background and imperfections in the image and fill in missing parts, (2) a transformer-based network to estimate vector primitives, and (3) optimization procedure to obtain the final primitive configurations. We train the networks on synthetic data, renderings of vector line drawings, and manually vectorized scans of line drawings. Our method quantitatively and qualitatively outperforms a number of existing techniques on a collection of representative technical drawings.
Spatial reasoning on multi-view line drawings by state-of-the-art supervised deep networks is recently shown with puzzling low performances on the SPARE3D dataset. To study the reason behind the low performance and to further our understandings of these tasks, we design controlled experiments on both input data and network designs. Guided by the hindsight from these experiment results, we propose a simple contrastive learning approach along with other network modifications to improve the baseline performance. Our approach uses a self-supervised binary classification network to compare the line drawing differences between various views of any two similar 3D objects. It enables deep networks to effectively learn detail-sensitive yet view-invariant line drawing representations of 3D objects. Experiments show that our method could significantly increase the baseline performance in SPARE3D, while some popular self-supervised learning methods cannot.
Spatial reasoning is an important component of human intelligence. We can imagine the shapes of 3D objects and reason about their spatial relations by merely looking at their three-view line drawings in 2D, with different levels of competence. Can deep networks be trained to perform spatial reasoning tasks? How can we measure their spatial intelligence? To answer these questions, we present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation, with increasing difficulty. We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task. They are used to provide supervision for training our baseline models using state-of-the-art architectures like ResNet. Our experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses. We hope SPARE3D can stimulate new problem formulations and network designs for spatial reasoning to empower intelligent robots to operate effectively in the 3D world via 2D sensors. The dataset and code are available at https://ai4ce.github.io/SPARE3D.
Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings. Recent work has primarily focused on explaining models for tasks like image classification or visual question answering. In this paper, we introduce Salient Attributes for Network Explanation (SANE) to explain image similarity models, where a models output is a score measuring the similarity of two inputs rather than a classification score. In this task, an explanation depends on both of the input images, so standard methods do not apply. Our SANE explanations pairs a saliency map identifying important image regions with an attribute that best explains the match. We find that our explanations provide additional information not typically captured by saliency maps alone, and can also improve performance on the classic task of attribute recognition. Our approachs ability to generalize is demonstrated on two datasets from diverse domains, Polyvore Outfits and Animals with Attributes 2. Code available at: https://github.com/VisionLearningGroup/SANE
We consider straight line drawings of a planar graph $G$ with possible edge crossings. The emph{untangling problem} is to eliminate all edge crossings by moving as few vertices as possible to new positions. Let $fix(G)$ denote the maximum number of vertices that can be left fixed in the worst case. In the emph{allocation problem}, we are given a planar graph $G$ on $n$ vertices together with an $n$-point set $X$ in the plane and have to draw $G$ without edge crossings so that as many vertices as possible are located in $X$. Let $fit(G)$ denote the maximum number of points fitting this purpose in the worst case. As $fix(G)le fit(G)$, we are interested in upper bounds for the latter and lower bounds for the former parameter. For each $epsilon>0$, we construct an infinite sequence of graphs with $fit(G)=O(n^{sigma+epsilon})$, where $sigma<0.99$ is a known graph-theoretic constant, namely the shortness exponent for the class of cubic polyhedral graphs. To the best of our knowledge, this is the first example of graphs with $fit(G)=o(n)$. On the other hand, we prove that $fix(G)gesqrt{n/30}$ for all $G$ with tree-width at most 2. This extends the lower bound obtained by Goaoc et al. [Discrete and Computational Geometry 42:542-569 (2009)] for outerplanar graphs. Our upper bound for $fit(G)$ is based on the fact that the constructed graphs can have only few collinear vertices in any crossing-free drawing. To prove the lower bound for $fix(G)$, we show that graphs of tree-width 2 admit drawings that have large sets of collinear vertices with some additional special properties.