ﻻ يوجد ملخص باللغة العربية
An important step in the task of neural network design, such as hyper-parameter optimization (HPO) or neural architecture search (NAS), is the evaluation of a candidate models performance. Given fixed computational resources, one can either invest more time training each model to obtain more accurate estimates of final performance, or spend more time exploring a greater variety of models in the configuration space. In this work, we aim to optimize this exploration-exploitation trade-off in the context of HPO and NAS for image classification by accurately approximating a models maximal performance early in the training process. In contrast to recent accelerated NAS methods customized for certain search spaces, e.g., requiring the search space to be differentiable, our method is flexible and imposes almost no constraints on the search space. Our method uses the evolution history of features of a network during the early stages of training to build a proxy classifier that matches the peak performance of the network under consideration. We show that our method can be combined with multiple search algorithms to find better solutions to a wide range of tasks in HPO and NAS. Using a sampling-based search algorithm and parallel computing, our method can find an architecture which is better than DARTS and with an 80% reduction in wall-clock search time.
For artificial intelligence-based image analysis methods to reach clinical applicability, the development of high-performance algorithms is crucial. For example, existent segmentation algorithms based on natural images are neither efficient in their
Gradient estimation and vector space projection have been studied as two distinct topics. We aim to bridge the gap between the two by investigating how to efficiently estimate gradient based on a projected low-dimensional space. We first provide lowe
Many recently proposed methods for Neural Architecture Search (NAS) can be formulated as bilevel optimization. For efficient implementation, its solution requires approximations of second-order methods. In this paper, we demonstrate that gradient err
In this paper, we introduce the adaptive Wasserstein curvature denoising (AWCD), an original processing approach for point cloud data. By collecting curvatures information from Wasserstein distance, AWCD consider more precise structures of data and p
Neural network (NN) models are increasingly used in scientific simulations, AI, and other high performance computing (HPC) fields to extract knowledge from datasets. Each dataset requires tailored NN model architecture, but designing structures by ha