ﻻ يوجد ملخص باللغة العربية
As neural networks are increasingly employed in machine learning practice, how to efficiently share limited training resources among a diverse set of model training tasks becomes a crucial issue. To achieve better utilization of the shared resources, we explore the idea of jointly training multiple neural network models on a single GPU in this paper. We realize this idea by proposing a primitive, called pack. We further present a comprehensive empirical study of pack and end-to-end experiments that suggest significant improvements for hyperparameter tuning. The results suggest: (1) packing two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of the pack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network structure, and batch size; (3) there exists a trade-off between packing and unpacking when training multiple neural network models on limited resources; (4) a pack-aware Hyperband is up to 2.7x faster than the original Hyperband, with this improvement growing as memory size increases and subsequently the density of models packed.
We study a budgeted hyper-parameter tuning problem, where we optimize the tuning result under a hard resource constraint. We propose to solve it as a sequential decision making problem, such that we can use the partial training progress of configurat
Machine learning is a powerful method for modeling in different fields such as education. Its capability to accurately predict students success makes it an ideal tool for decision-making tasks related to higher education. The accuracy of machine lear
This paper proposes the first-ever algorithmic framework for tuning hyper-parameters of stochastic optimization algorithm based on reinforcement learning. Hyper-parameters impose significant influences on the performance of stochastic optimization al
Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into
Hyper-parameters of time series models play an important role in time series analysis. Slight differences in hyper-parameters might lead to very different forecast results for a given model, and therefore, selecting good hyper-parameter values is ind