أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Wuyang Chen

Font Completion and Manipulation by Cycling Between Multi-Modality Representations

94 - Ye Yuan , Wuyang Chen , Zhaowen Wang 2021

Generating font glyphs of consistent style from one or a few reference glyphs, i.e., font completion, is an important task in topographical design. As the problem is more well-defined than general image style transfer tasks, thus it has received inte rest from both vision and machine learning communities. Existing approaches address this problem as a direct image-to-image translation task. In this work, we innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation, so that more intrinsic graphic properties of font styles can be captured. Specifically, we formulate a cross-modality cycled image-to-image model structure with a graph constructor between an image encoder and an image renderer. The novel graph constructor maps a glyphs latent code to its graph representation that matches expert knowledge, which is trained to help the translation task. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion. Furthermore, the graph representation output by our model also provides an intuitive interface for users to do local editing and manipulation. Our proposed cross-modality cycled representation learning has the potential to be applied to other domains with prior knowledge from different data modalities. Our code is available at https://github.com/VITA-Group/Font_Completion_Graph.

الرؤية الحاسوبية وتمييز الأنماط

Understanding and Accelerating Neural Architecture Search with Training-Free and Theory-Grounded Metrics

186 - Wuyang Chen , Xinyu Gong , Yunchao Wei 2021

This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS), with high performance, low cost, and in-depth interpretation. NAS has been explosively studied to automate the discovery of top-perform er neural networks, but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations. Recent NAS works start to explore indicators that can predict a networks performance without training. However, they either leveraged limited properties of deep networks, or the benefits of their training-free indicators are not applied to more extensive search methods. By rigorous correlation analysis, we present a unified framework to understand and accelerate NAS, by disentangling TEG characteristics of searched networks - Trainability, Expressivity, Generalization - all assessed in a training-free manner. The TEG indicators could be scaled up and integrated with various NAS search methods, including both supernet and single-path approaches. Extensive studies validate the effective and efficient guidance from our TEG-NAS framework, leading to both improved search accuracy and over 2.3x reduction in search time cost. Moreover, we visualize search trajectories on three landscapes of TEG characteristics, observing that while a good local minimum is easier to find on NAS-Bench-201 given its simple topology, balancing TEG characteristics is much harder on the DARTS search space due to its complex landscape geometry. Our code is available at https://github.com/VITA-Group/TEGNAS.

التعلم الآلي

DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference

141 - Chaojian Li , Wuyang Chen , Yuchen Gu 2021

Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms. Current segmentation models are trained and evaluated o n massive high-resolution scene images (data level) and suffer from the expensive computation arising from the required multi-scale aggregation(network level). In both folds, the computational and energy costs in training and inference are notable due to the often desired large input resolutions and heavy computational burden of segmentation models. To this end, we propose DANCE, general automated DAta-Network Co-optimization for Efficient segmentation model training and inference. Distinct from existing efficient segmentation approaches that focus merely on light-weight network design, DANCE distinguishes itself as an automated simultaneous data-network co-optimization via both input data manipulation and network architecture slimming. Specifically, DANCE integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images spatial complexity. Such a downsampling operation, in addition to slimming down the cost associated with the input size directly, also shrinks the dynamic range of input object and context scales, therefore motivating us to also adaptively slim the network to match the downsampled data. Extensive experiments and ablating studies (on four SOTA segmentation models with three popular segmentation datasets under two training settings) demonstrate that DANCE can achieve all-win towards efficient segmentation(reduced training cost, less expensive inference, and better mean Intersection-over-Union (mIoU)).

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Contrastive Syn-to-Real Generalization

71 - Wuyang Chen , Zhiding Yu , Shalini De Mello 2021

Training on synthetic data can be beneficial for label or data-scarce scenarios. However, synthetically trained models often suffer from poor generalization in real domains due to domain gaps. In this work, we make a key observation that the diversit y of the learned feature embeddings plays an important role in the generalization performance. To this end, we propose contrastive synthetic-to-real generalization (CSG), a novel framework that leverages the pre-trained ImageNet knowledge to prevent overfitting to the synthetic domain, while promoting the diversity of feature embeddings as an inductive bias to improve generalization. In addition, we enhance the proposed CSG framework with attentional pooling (A-pool) to let the model focus on semantically important regions and further improve its generalization. We demonstrate the effectiveness of CSG on various synthetic training tasks, exhibiting state-of-the-art performance on zero-shot domain generalization.

الرؤية الحاسوبية وتمييز الأنماط

Learning to Optimize: A Primer and A Benchmark

443 - Tianlong Chen , Xiaohan Chen , Wuyang Chen 2021

Learning to optimize (L2O) is an emerging approach that leverages machine learning to develop optimization methods, aiming at reducing the laborious iterations of hand engineering. It automates the design of an optimization method based on its perfor mance on a set of training problems. This data-driven procedure generates methods that can efficiently solve problems similar to those in the training. In sharp contrast, the typical and traditional designs of optimization methods are theory-driven, so they obtain performance guarantees over the classes of problems specified by the theory. The difference makes L2O suitable for repeatedly solving a certain type of optimization problems over a specific distribution of data, while it typically fails on out-of-distribution problems. The practicality of L2O depends on the type of target optimization, the chosen architecture of the method to learn, and the training procedure. This new paradigm has motivated a community of researchers to explore L2O and report their findings. This article is poised to be the first comprehensive survey and benchmark of L2O for continuous optimization. We set up taxonomies, categorize existing works and research directions, present insights, and identify open challenges. We also benchmarked many existing L2O approaches on a few but representative optimization problems. For reproducible research and fair benchmarking purposes, we released our software implementation and data in the package Open-L2O at https://github.com/VITA-Group/Open-L2O.

التحسين والتحكم التعلم الآلي التعلم الالي

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

54 - Wuyang Chen , Xinyu Gong , Zhangyang Wang 2021

Neural Architecture Search (NAS) has been explosively studied to automate the discovery of top-performer neural networks. Current works require heavy training of supernet or intensive architecture evaluations, thus suffering from heavy resource consu mption and often incurring search bias due to truncated training or approximations. Can we select the best neural architectures without involving any training and eliminate a drastic portion of the search cost? We provide an affirmative answer, by proposing a novel framework called training-free neural architecture search (TE-NAS). TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space. Both are motivated by recent theory advances in deep networks and can be computed without any training and any label. We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the networks test accuracy. Further on, we design a pruning-based NAS mechanism to achieve a more flexible and superior trade-off between the trainability and expressivity during the search. In NAS-Bench-201 and DARTS search spaces, TE-NAS completes high-quality search but only costs 0.5 and 4 GPU hours with one 1080Ti on CIFAR-10 and ImageNet, respectively. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in real NAS applications. Code is available at: https://github.com/VITA-Group/TENAS.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Sandwich Batch Normalization

175 - Xinyu Gong , Wuyang Chen , Tianlong Chen 2021

We present Sandwich Batch Normalization (SaBN), an embarrassingly easy improvement of Batch Normalization (BN) with only a few lines of code changes. SaBN is motivated by addressing the inherent feature distribution heterogeneity that one can be iden tified in many tasks, which can arise from data heterogeneity (multiple input domains) or model heterogeneity (dynamic architectures, model conditioning, etc.). Our SaBN factorizes the BN affine layer into one shared sandwich affine layer, cascaded by several parallel independent affine layers. Concrete analysis reveals that, during optimization, SaBN promotes balanced gradient norms while still preserving diverse gradient directions: a property that many application tasks seem to favor. We demonstrate the prevailing effectiveness of SaBN as a drop-in replacement in four tasks: $textbf{conditional image generation}$, $textbf{neural architecture search}$ (NAS), $textbf{adversarial training}$, and $textbf{arbitrary style transfer}$. Leveraging SaBN immediately achieves better Inception Score and FID on CIFAR-10 and ImageNet conditional image generation with three state-of-the-art GANs; boosts the performance of a state-of-the-art weight-sharing NAS algorithm significantly on NAS-Bench-201; substantially improves the robust and standard accuracies for adversarial defense; and produces superior arbitrary stylized results. We also provide visualizations and analysis to help understand why SaBN works. Codes are available at https://github.com/VITA-Group/Sandwich-Batch-Normalization.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

AutoPose: Searching Multi-Scale Branch Aggregation for Pose Estimation

90 - Xinyu Gong , Wuyang Chen , Yifan Jiang 2020

We present AutoPose, a novel neural architecture search(NAS) framework that is capable of automatically discovering multiple parallel branches of cross-scale connections towards accurate and high-resolution 2D human pose estimation. Recently, high-pe rformance hand-crafted convolutional networks for pose estimation show growing demands on multi-scale fusion and high-resolution representations. However, current NAS works exhibit limited flexibility on scale searching, they dominantly adopt simplified search spaces of single-branch architectures. Such simplification limits the fusion of information at different scales and fails to maintain high-resolution representations. The presentedAutoPose framework is able to search for multi-branch scales and network depth, in addition to the cell-level microstructure. Motivated by the search space, a novel bi-level optimization method is presented, where the network-level architecture is searched via reinforcement learning, and the cell-level search is conducted by the gradient-based method. Within 2.5 GPU days, AutoPose is able to find very competitive architectures on the MS COCO dataset, that are also transferable to the MPII dataset. Our code is available at https://github.com/VITA-Group/AutoPose.

الرؤية الحاسوبية وتمييز الأنماط

AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks

96 - Yonggan Fu , Wuyang Chen , Haotao Wang 2020

The compression of Generative Adversarial Networks (GANs) has lately drawn attention, due to the increasing demand for deploying GANs into mobile devices for numerous applications such as image translation, enhancement and editing. However, compared to the substantial efforts to compressing other deep models, the research on compressing GANs (usually the generators) remains at its infancy stage. Existing GAN compression algorithms are limited to handling specific GAN architectures and losses. Inspired by the recent success of AutoML in deep compression, we introduce AutoML to GAN compression and develop an AutoGAN-Distiller (AGD) framework. Starting with a specifically designed efficient search space, AGD performs an end-to-end discovery for new efficient generators, given the target computational resource constraints. The search is guided by the original GAN model via knowledge distillation, therefore fulfilling the compression. AGD is fully automatic, standalone (i.e., needing no trained discriminators), and generically applicable to various GAN models. We evaluate AGD in two representative GAN tasks: image translation and super resolution. Without bells and whistles, AGD yields remarkably lightweight yet more competitive compressed models, that largely outperform existing alternatives. Our codes and pretrained models are available at https://github.com/TAMU-VITA/AGD.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

FasterSeg: Searching for Faster Real-time Semantic Segmentation

96 - Wuyang Chen , Xinyu Gong , Xianming Liu 2019

We present FasterSeg, an automatically designed semantic segmentation network with not only state-of-the-art performance but also faster speed than current methods. Utilizing neural architecture search (NAS), FasterSeg is discovered from a novel and broader search space integrating multi-resolution branches, that has been recently found to be vital in manually designed segmentation models. To better calibrate the balance between the goals of high accuracy and low latency, we propose a decoupled and fine-grained latency regularization, that effectively overcomes our observed phenomenons that the searched networks are prone to collapsing to low-latency yet poor-accuracy models. Moreover, we seamlessly extend FasterSeg to a new collaborative search (co-searching) framework, simultaneously searching for a teacher and a student network in the same single run. The teacher-student distillation further boosts the student models accuracy. Experiments on popular segmentation benchmarks demonstrate the competency of FasterSeg. For example, FasterSeg can run over 30% faster than the closest manually designed competitor on Cityscapes, while maintaining comparable accuracy.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد