Do you want to publish a course? Click here

Enhancing accuracy of deep learning algorithms by training with low-discrepancy sequences

71   0   0.0 ( 0 )
 Added by T. Konstantin Rusch
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

We propose a deep supervised learning algorithm based on low-discrepancy sequences as the training set. By a combination of theoretical arguments and extensive numerical experiments we demonstrate that the proposed algorithm significantly outperforms standard deep learning algorithms that are based on randomly chosen training data, for problems in moderately high dimensions. The proposed algorithm provides an efficient method for building inexpensive surrogates for many underlying maps in the context of scientific computing.

rate research

Read More

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference - not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, which limits statistical accuracy. To address this issue, we describe a simple low-precision stochastic gradient descent variant called HALP. HALP converges at the same theoretical rate as full-precision algorithms despite the noise introduced by using low precision throughout execution. The key idea is to use SVRG to reduce gradient variance, and to combine this with a novel technique called bit centering to reduce quantization error. We show that on the CPU, HALP can run up to $4 times$ faster than full-precision SVRG and can match its convergence trajectory. We implemented HALP in TensorQuant, and show that it exceeds the validation performance of plain low-precision SGD on two deep learning tasks.
Timely completion of design cycles for complex systems ranging from consumer electronics to hypersonic vehicles relies on rapid simulation-based prototyping. The latter typically involves high-dimensional spaces of possibly correlated control variables (CVs) and quantities of interest (QoIs) with non-Gaussian and possibly multimodal distributions. We develop a model-agnostic, moment-independent global sensitivity analysis (GSA) that relies on differential mutual information to rank the effects of CVs on QoIs. The data requirements of this information-theoretic approach to GSA are met by replacing computationally intensive components of the physics-based model with a deep neural network surrogate. Subsequently, the GSA is used to explain the network predictions, and the surrogate is deployed to close design loops. Viewed as an uncertainty quantification method for interrogating the surrogate, this framework is compatible with a wide variety of black-box models. We demonstrate that the surrogate-driven mutual information GSA provides useful and distinguishable rankings on two applications of interest in energy storage. Consequently, our information-theoretic GSA provides an outer loop for accelerated product design by identifying the most and least sensitive input directions and performing subsequent optimization over appropriately reduced parameter subspaces.
Learning Rate (LR) is an important hyper-parameter to tune for effective training of deep neural networks (DNNs). Even for the baseline of a constant learning rate, it is non-trivial to choose a good constant value for training a DNN. Dynamic learning rates involve multi-step tuning of LR values at various stages of the training process and offer high accuracy and fast convergence. However, they are much harder to tune. In this paper, we present a comprehensive study of 13 learning rate functions and their associated LR policies by examining their range parameters, step parameters, and value update parameters. We propose a set of metrics for evaluating and selecting LR policies, including the classification confidence, variance, cost, and robustness, and implement them in LRBench, an LR benchmarking system. LRBench can assist end-users and DNN developers to select good LR policies and avoid bad LR policies for training their DNNs. We tested LRBench on Caffe, an open source deep learning framework, to showcase the tuning optimization of LR policies. Evaluated through extensive experiments, we attempt to demystify the tuning of LR policies by identifying good LR policies with effective LR value ranges and step sizes for LR update schedules.
76 - Liang Chen , Lesley Tan 2021
In this paper, we investigate data-driven parameterized modeling of insertion loss for transmission lines with respect to design parameters. We first show that direct application of neural networks can lead to non-physics models with negative insertion loss. To mitigate this problem, we propose two deep learning solutions. One solution is to add a regulation term, which represents the passive condition, to the final loss function to enforce the negative quantity of insertion loss. In the second method, a third-order polynomial expression is defined first, which ensures positiveness, to approximate the insertion loss, then DeepONet neural network structure, which was proposed recently for function and system modeling, was employed to model the coefficients of polynomials. The resulting neural network is applied to predict the coefficients of the polynomial expression. The experimental results on an open-sourced SI/PI database of a PCB design show that both methods can ensure the positiveness for the insertion loss. Furthermore, both methods can achieve similar prediction results, while the polynomial-based DeepONet method is faster than DeepONet based method in training time.
We present a deep learning algorithm for the numerical solution of parametric families of high-dimensional linear Kolmogorov partial differential equations (PDEs). Our method is based on reformulating the numerical approximation of a whole family of Kolmogorov PDEs as a single statistical learning problem using the Feynman-Kac formula. Successful numerical experiments are presented, which empirically confirm the functionality and efficiency of our proposed algorithm in the case of heat equations and Black-Scholes option pricing models parametrized by affine-linear coefficient functions. We show that a single deep neural network trained on simulated data is capable of learning the solution functions of an entire family of PDEs on a full space-time region. Most notably, our numerical observations and theoretical results also demonstrate that the proposed method does not suffer from the curse of dimensionality, distinguishing it from almost all standard numerical methods for PDEs.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا