No Arabic abstract
Convex relaxations have emerged as a promising approach for verifying desirable properties of neural networks like robustness to adversarial perturbations. Widely used Linear Programming (LP) relaxations only work well when networks are trained to facilitate verification. This precludes applications that involve verification-agnostic networks, i.e., networks not specially trained for verification. On the other hand, semidefinite programming (SDP) relaxations have successfully be applied to verification-agnostic networks, but do not currently scale beyond small networks due to poor time and space asymptotics. In this work, we propose a first-order dual SDP algorithm that (1) requires memory only linear in the total number of network activations, (2) only requires a fixed number of forward/backward passes through the network per iteration. By exploiting iterative eigenvector methods, we express all solver operations in terms of forward and backward passes through the network, enabling efficient use of hardware like GPUs/TPUs. For two verification-agnostic networks on MNIST and CIFAR-10, we significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractable. In this work, we introduce a convex optimization strategy to train quantized NNs with polynomial activations. Our method leverages hidden convexity in two-layer neural networks from the recent literature, semidefinite lifting, and Grothendiecks identity. Surprisingly, we show that certain quantized NN problems can be solved to global optimality in polynomial-time in all relevant parameters via semidefinite relaxations. We present numerical examples to illustrate the effectiveness of our method.
The tendency of semidefinite programs to compose perfectly under product has been exploited many times in complexity theory: for example, by Lovasz to determine the Shannon capacity of the pentagon; to show a direct sum theorem for non-deterministic communication complexity and direct product theorems for discrepancy; and in interactive proof systems to show parallel repetition theorems for restricted classes of games. Despite all these examples of product theorems--some going back nearly thirty years--it was only recently that Mittal and Szegedy began to develop a general theory to explain when and why semidefinite programs behave perfectly under product. This theory captured many examples in the literature, but there were also some notable exceptions which it could not explain--namely, an early parallel repetition result of Feige and Lovasz, and a direct product theorem for the discrepancy method of communication complexity by Lee, Shraibman, and Spalek. We extend the theory of Mittal and Szegedy to explain these cases as well. Indeed, to the best of our knowledge, our theory captures all examples of semidefinite product theorems in the literature.
In this paper we examine a symmetric tensor decomposition problem, the Gramian decomposition, posed as a rank minimization problem. We study the relaxation of the problem and consider cases when the relaxed solution is a solution to the original problem. In some instances of tensor rank and order, we prove generically that the solution to the relaxation will be optimal in the original. In other cases, we present interesting examples and approaches that demonstrate the intricacy of this problem.
Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though extensive efficient accelerator designs, from traditional electronics to emerging photonics, have been successfully demonstrated, they are still bottlenecked by expensive memory accesses due to tremendous gaps between the bandwidth/power/latency of electrical memory and computing cores. Previous solutions fail to fully-leverage the ultra-fast computational speed of emerging DNN accelerators to break through the critical memory bound. In this work, we propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations, directly translating to performance improvement. We are the first to jointly explore the intrinsic correlations and bit-level redundancy within DNN kernels and propose a multi-level in situ generation mechanism with mixed-precision bases to achieve on-the-fly recovery of high-resolution parameters with minimum hardware overhead. Extensive experiments demonstrate that our proposed joint method can boost the memory efficiency by 10-20x with comparable accuracy over four state-of-the-art designs, when benchmarked on ResNet-18/DenseNet-121/MobileNetV2/V3 with various tasks.
We show a new way to round vector solutions of semidefinite programming (SDP) hierarchies into integral solutions, based on a connection between these hierarchies and the spectrum of the input graph. We demonstrate the utility of our method by providing a new SDP-hierarchy based algorithm for constraint satisfaction problems with 2-variable constraints (2-CSPs). More concretely, we show for every 2-CSP instance I a rounding algorithm for r rounds of the Lasserre SDP hierarchy for I that obtains an integral solution that is at most eps worse than the relaxations value (normalized to lie in [0,1]), as long as r > kcdotrank_{geq theta}(Ins)/poly(e) ;, where k is the alphabet size of I, $theta=poly(e/k)$, and $rank_{geq theta}(Ins)$ denotes the number of eigenvalues larger than $theta$ in the normalized adjacency matrix of the constraint graph of $Ins$. In the case that $Ins$ is a uniquegames instance, the threshold $theta$ is only a polynomial in $e$, and is independent of the alphabet size. Also in this case, we can give a non-trivial bound on the number of rounds for emph{every} instance. In particular our result yields an SDP-hierarchy based algorithm that matches the performance of the recent subexponential algorithm of Arora, Barak and Steurer (FOCS 2010) in the worst case, but runs faster on a natural family of instances, thus further restricting the set of possible hard instances for Khots Unique Games Conjecture. Our algorithm actually requires less than the $n^{O(r)}$ constraints specified by the $r^{th}$ level of the Lasserre hierarchy, and in some cases $r$ rounds of our program can be evaluated in time $2^{O(r)}poly(n)$.