No Arabic abstract
Generative adversarial networks (GANs) have promoted remarkable advances in single-image super-resolution (SR) by recovering photo-realistic images. However, high memory consumption of GAN-based SR (usually generators) causes performance degradation and more energy consumption, hindering the deployment of GAN-based SR into resource-constricted mobile devices. In this paper, we propose a novel compression framework textbf{M}ulti-scale textbf{F}eature textbf{A}ggregation Net based textbf{GAN} (MFAGAN) for reducing the memory access cost of the generator. First, to overcome the memory explosion of dense connections, we utilize a memory-efficient multi-scale feature aggregation net as the generator. Second, for faster and more stable training, our method introduces the PatchGAN discriminator. Third, to balance the student discriminator and the compressed generator, we distill both the generator and the discriminator. Finally, we perform a hardware-aware neural architecture search (NAS) to find a specialized SubGenerator for the target mobile phone. Benefiting from these improvements, the proposed MFAGAN achieves up to textbf{8.3}$times$ memory saving and textbf{42.9}$times$ computation reduction, with only minor visual quality degradation, compared with ESRGAN. Empirical studies also show $sim$textbf{70} milliseconds latency on Qualcomm Snapdragon 865 chipset.
With the development of deep learning, the single super-resolution image reconstruction network models are becoming more and more complex. Small changes in hyperparameters of the models have a greater impact on model performance. In the existing works, experts have gradually explored a set of optimal model parameters based on empirical values or performing brute-force search. In this paper, we introduce a new super-resolution image reconstruction generative adversarial network framework, and a Bayesian optimization method used to optimizing the hyperparameters of the generator and discriminator. The generator is made by self-calibrated convolution, and discriminator is made by convolution lays. We have defined the hyperparameters such as the number of network layers and the number of neurons. Our method adopts Bayesian optimization as a optimization policy of GAN in our model. Not only can find the optimal hyperparameter solution automatically, but also can construct a super-resolution image reconstruction network, reducing the manual workload. Experiments show that Bayesian optimization can search the optimal solution earlier than the other two optimization algorithms.
Image bitmaps have been widely used in in-memory applications, which consume lots of storage space and energy. Compared with legacy DRAM, non-volatile memories (NVMs) are suitable for bitmap storage due to the salient features in capacity and power savings. However, NVMs suffer from higher latency and energy consumption in writes compared with reads. Although compressing data in write accesses to NVMs on-the-fly reduces the bit-writes in NVMs, existing precise or approximate compression schemes show limited performance improvements for data of bitmaps, due to the irregular data patterns and variance in data. We observe that the data containing bitmaps show the pixel-level similarity due to the analogous contents in adjacent pixels. By exploiting the pixel-level similarity, we propose SimCom, an efficient similarity-aware compression scheme in hardware layer, to compress data for each write access on-the-fly. The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs. With the aid of domain knowledge of images, SimCom adaptively selects an appropriate compression mode to achieve an efficient trade-off between image quality and memory performance. We implement SimCom on GEM5 with NVMain and evaluate the performance with real-world workloads. Our results demonstrate that SimCom reduces 33.0%, 34.8% write latency and saves 28.3%, 29.0% energy than state-of-the-art FPC and BDI with minor quality loss of 3%.
GPUs offer orders-of-magnitude higher memory bandwidth than traditional CPU-only systems. However, GPU device memory tends to be relatively small and the memory capacity can not be increased by the user. This paper describes Buddy Compression, a scheme to increase both the effective GPU memory capacity and bandwidth while avoiding the downsides of conventional memory-expanding strategies. Buddy Compression compresses GPU memory, splitting each compressed memory entry between high-speed device memory and a slower-but-larger disaggregated memory pool (or system memory). Highly-compressible memory entries can thus be accessed completely from device memory, while incompressible entries source their data using both on and off-device accesses. Increasing the effective GPU memory capacity enables us to run larger-memory-footprint HPC workloads and larger batch-sizes or models for DL workloads than current memory capacities would allow. We show that our solution achieves an average compression ratio of 2.2x on HPC workloads and 1.5x on DL workloads, with a slowdown of just 1~2%.
This work considers a super-resolution framework for overcomplete tensor decomposition. Specifically, we view tensor decomposition as a super-resolution problem of recovering a sum of Dirac measures on the sphere and solve it by minimizing a continuous analog of the $ell_1$ norm on the space of measures. The optimal value of this optimization defines the tensor nuclear norm. Similar to the separation condition in the super-resolution problem, by explicitly constructing a dual certificate, we develop incoherence conditions of the tensor factors so that they form the unique optimal solution of the continuous analog of $ell_1$ norm minimization. Remarkably, the derived incoherence conditions are satisfied with high probability by random tensor factors uniformly distributed on the sphere, implying global identifiability of random tensor factors.
Generative adversarial networks (GANs) have gained increasing popularity in various computer vision applications, and recently start to be deployed to resource-constrained mobile devices. Similar to other deep models, state-of-the-art GANs suffer from high parameter complexities. That has recently motivated the exploration of compressing GANs (usually generators). Compared to the vast literature and prevailing success in compressing deep classifiers, the study of GAN compression remains in its infancy, so far leveraging individual compression techniques instead of more sophisticated combinations. We observe that due to the notorious instability of training GANs, heuristically stacking different compression techniques will result in unsatisfactory results. To this end, we propose the first unified optimization framework combining multiple compression means for GAN compression, dubbed GAN Slimming (GS). GS seamlessly integrates three mainstream compression techniques: model distillation, channel pruning and quantization, together with the GAN minimax objective, into one unified optimization form, that can be efficiently optimized from end to end. Without bells and whistles, GS largely outperforms existing options in compressing image-to-image translation GANs. Specifically, we apply GS to compress CartoonGAN, a state-of-the-art style transfer network, by up to 47 times, with minimal visual quality degradation. Codes and pre-trained models can be found at https://github.com/TAMU-VITA/GAN-Slimming.