ﻻ يوجد ملخص باللغة العربية
In this paper, we propose an effective method for fast and accurate scene parsing called Bidirectional Alignment Network (BiAlignNet). Previously, one representative work BiSeNet~cite{bisenet} uses two different paths (Context Path and Spatial Path) to achieve balanced learning of semantics and details, respectively. However, the relationship between the two paths is not well explored. We argue that both paths can benefit each other in a complementary way. Motivated by this, we propose a novel network by aligning two-path information into each other through a learned flow field. To avoid the noise and semantic gaps, we introduce a Gated Flow Alignment Module to align both features in a bidirectional way. Moreover, to make the Spatial Path learn more detailed information, we present an edge-guided hard pixel mining loss to supervise the aligned learning process. Our method achieves 80.1% and 78.5% mIoU in validation and test set of Cityscapes while running at 30 FPS with full resolution inputs. Code and models will be available at url{https://github.com/jojacola/BiAlignNet}.
In this paper, we propose a novel scene text detection method named TextMountain. The key idea of TextMountain is making full use of border-center information. Different from previous works that treat center-border as a binary classification problem,
Large geometry (e.g., orientation) variances are the key challenges in the scene text detection. In this work, we first conduct experiments to investigate the capacity of networks for learning geometry variances on detecting scene texts, and find tha
Video scene parsing is a long-standing challenging task in computer vision, aiming to assign pre-defined semantic labels to pixels of all frames in a given video. Compared with image semantic segmentation, this task pays more attention on studying ho
Multiple human parsing aims to segment various human parts and associate each part with the corresponding instance simultaneously. This is a very challenging task due to the diverse human appearance, semantic ambiguity of different body parts, and co
Scene parsing is challenging as it aims to assign one of the semantic categories to each pixel in scene images. Thus, pixel-level features are desired for scene parsing. However, classification networks are dominated by the discriminative portion, so