ترغب بنشر مسار تعليمي؟ اضغط هنا

Sharpening Occams Razor

161   0   0.0 ( 0 )
 نشر من قبل Paul Vitanyi
 تاريخ النشر 2002
والبحث باللغة English
 تأليف Ming Li -




اسأل ChatGPT حول البحث

We provide a new representation-independent formulation of Occams razor theorem, based on Kolmogorov complexity. This new formulation allows us to: (i) Obtain better sample complexity than both length-based and VC-base


قيم البحث

اقرأ أيضاً

We discuss the neutrino mass matrix based on the Occams Razor approach in the framework of the seesaw mechanism. We impose four zeros in the Dirac neutrino mass matrix, which give the minimum number of parameters needed for the observed neutrino mass es and lepton mixing angles, while the charged lepton mass matrix and the right-handed Majorana neutrino mass matrix are taken to be real diagonal ones. The low-energy neutrino mass matrix has only seven physical parameters. We show successful predictions for the mixing angle $theta_{13}$ and the CP violating phase $delta_{CP}$ with the normal mass hierarchy of neutrinos by using the experimental data on the neutrino mass squared differences, the mixing angles $theta_{12}$ and $theta_{23}$. The most favored region of $sintheta_{13}$ is around $0.13sim 0.15$, which is completely consistent with the observed value. The CP violating phase $delta_{CP}$ is favored to be close to $pm pi/2$. We also discuss the Majorana phases as well as the effective neutrino mass for the neutrinoless double-beta decay $m_{ee}$, which is around $7sim 8$ meV. It is extremely remarkable that we can perform a complete experiment to determine the low-energy neutrino mass matrix, since we have only seven physical parameters in the neutrino mass matrix. In particular, two CP violating phases in the neutrino mass matrix are directly given by two CP violating phases at high energy. Thus, assuming the leptogenesis we can determine the sign of the cosmic baryon in the universe from the low-energy experiments for the neutrino mass matrix.
In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give an algorithm to train a ReLU DNN with one hidden layer to *global optimality* with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) for approximating a ReLU deep net function by a shallower ReLU net. Our gap theorems hold for smoothly parametrized families of hard functions, contrary to countable, discrete families known in the literature. An example consequence of our gap theorems is the following: for every natural number $k$ there exists a function representable by a ReLU DNN with $k^2$ hidden layers and total size $k^3$, such that any ReLU DNN with at most $k$ hidden layers will require at least $frac{1}{2}k^{k+1}-1$ total nodes. Finally, for the family of $mathbb{R}^nto mathbb{R}$ DNNs with ReLU activations, we show a new lowerbound on the number of affine pieces, which is larger than previous constructions in certain regimes of the network architecture and most distinctively our lowerbound is demonstrated by an explicit construction of a *smoothly parameterized* family of functions attaining this scaling. Our construction utilizes the theory of zonotopes from polyhedral theory.
We demonstrate how graph neural networks can be used to solve combinatorial optimization problems. Our approach is broadly applicable to canonical NP-hard problems in the form of quadratic unconstrained binary optimization problems, such as maximum c ut, minimum vertex cover, maximum independent set, as well as Ising spin glasses and higher-order generalizations thereof in the form of polynomial unconstrained binary optimization problems. We apply a relaxation strategy to the problem Hamiltonian to generate a differentiable loss function with which we train the graph neural network and apply a simple projection to integer variables once the unsupervised training process has completed. We showcase our approach with numerical results for the canonical maximum cut and maximum independent set problems. We find that the graph neural network optimizer performs on par or outperforms existing solvers, with the ability to scale beyond the state of the art to problems with millions of variables.
We provide a construction for categorical representation learning and introduce the foundations of $textit{categorifier}$. The central theme in representation learning is the idea of $textbf{everything to vector}$. Every object in a dataset $mathcal{ S}$ can be represented as a vector in $mathbb{R}^n$ by an $textit{encoding map}$ $E: mathcal{O}bj(mathcal{S})tomathbb{R}^n$. More importantly, every morphism can be represented as a matrix $E: mathcal{H}om(mathcal{S})tomathbb{R}^{n}_{n}$. The encoding map $E$ is generally modeled by a $textit{deep neural network}$. The goal of representation learning is to design appropriate tasks on the dataset to train the encoding map (assuming that an encoding is optimal if it universally optimizes the performance on various tasks). However, the latter is still a $textit{set-theoretic}$ approach. The goal of the current article is to promote the representation learning to a new level via a $textit{category-theoretic}$ approach. As a proof of concept, we provide an example of a text translator equipped with our technology, showing that our categorical learning model outperforms the current deep learning models by 17 times. The content of the current article is part of the recent US patent proposal (patent application number: 63110906).
Flow-based generative models have become an important class of unsupervised learning approaches. In this work, we incorporate the key idea of renormalization group (RG) and sparse prior distribution to design a hierarchical flow-based generative mode l, called RG-Flow, which can separate information at different scales of images with disentangled representations at each scale. We demonstrate our method mainly on the CelebA dataset and show that the disentangled representations at different scales enable semantic manipulation and style mixing of the images. To visualize the latent representations, we introduce receptive fields for flow-based models and find that the receptive fields learned by RG-Flow are similar to those in convolutional neural networks. In addition, we replace the widely adopted Gaussian prior distribution by a sparse prior distribution to further enhance the disentanglement of representations. From a theoretical perspective, the proposed method has $O(log L)$ complexity for image inpainting compared to previous generative models with $O(L^2)$ complexity.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا