No Arabic abstract
Variable and higher pulse repetition frequencies (PRFs) are increasingly being used to meet the stricter requirements and complexities of current airborne and spaceborne synthetic aperture radar (SAR) systems associated with higher resolution and wider area products. POLYPHASE, the proposed resampling scheme, downsamples and unifies variable PRFs within a single look complex (SLC) SAR acquisition and across a repeat pass sequence of acquisitions down to an effective lower PRF. A sparsity condition of the received SAR data ensures that the uniformly resampled data approximates the spectral properties of a decimated densely sampled version of the received SAR data. While experiments conducted with both synthetically generated and real airborne SAR data show that POLYPHASE retains comparable performance to the state-of-the-art BLUI scheme in image quality, a polyphase filter-based implementation of POLYPHASE offers significant computational savings for arbitrary (not necessarily periodic) input PRF variations, thus allowing fully on-board, in-place, and real-time implementation.
This paper presents an efficient gradient projection-based method for structural topological optimization problems characterized by a nonlinear objective function which is minimized over a feasible region defined by bilateral bounds and a single linear equality constraint. The specialty of the constraints type, as well as heuristic engineering experiences are exploited to improve the scaling scheme, projection, and searching step. In detail, gradient clipping and a modified projection of searching direction under certain condition are utilized to facilitate the efficiency of the proposed method. Besides, an analytical solution is proposed to approximate this projection with negligible computation and memory costs. Furthermore, the calculation of searching steps is largely simplified. Benchmark problems, including the MBB, the force inverter mechanism, and the 3D cantilever beam are used to validate the effectiveness of the method. The proposed method is implemented in MATLAB which is open-sourced for educational usage.
Motivation: The ability to generate massive amounts of sequencing data continues to overwhelm the processing capability of existing algorithms and compute infrastructures. In this work, we explore the use of hardware/software co-design and hardware acceleration to significantly reduce the execution time of short sequence alignment, a crucial step in analyzing sequenced genomes. We introduce Shouji, a highly-parallel and accurate pre-alignment filter that remarkably reduces the need for computationally-costly dynamic programming algorithms. The first key idea of our proposed pre-alignment filter is to provide high filtering accuracy by correctly detecting all common subsequences shared between two given sequences. The second key idea is to design a hardware accelerator that adopts modern FPGA (Field-Programmable Gate Array) architectures to further boost the performance of our algorithm. Results: Shouji significantly improves the accuracy of pre-alignment filtering by up to two orders of magnitude compared to the state-of-the-art pre-alignment filters, GateKeeper and SHD. Our FPGA-based accelerator is up to three orders of magnitude faster than the equivalent CPU implementation of Shouji. Using a single FPGA chip, we benchmark the benefits of integrating Shouji with five state-of-the-art sequence aligners, designed for different computing platforms. The addition of Shouji as a pre-alignment step reduces the execution time of the five state-of-the-art sequence aligners by up to 18.8x. Shouji can be adapted for any bioinformatics pipeline that performs sequence alignment for verification. Unlike most existing methods that aim to accelerate sequence alignment, Shouji does not sacrifice any of the aligner capabilities, as it does not modify or replace the alignment step. Availability: https://github.com/CMU-SAFARI/Shouji
The data-driven computing paradigm initially introduced by Kirchdoerfer and Ortiz (2016) enables finite element computations in solid mechanics to be performed directly from material data sets, without an explicit material model. From a computational effort point of view, the most challenging task is the projection of admissible states at material points onto their closest states in the material data set. In this study, we compare and develop several possible data structures for solving the nearest-neighbor problem. We show that approximate nearest-neighbor (ANN) algorithms can accelerate material data searches by several orders of magnitude relative to exact searching algorithms. The approximations are suggested by--and adapted to--the structure of the data-driven iterative solver and result in no significant loss of solution accuracy. We assess the performance of the ANN algorithm with respect to material data set size with the aid of a 3D elasticity test case. We show that computations on a single processor with up to one billion material data points are feasible within a few seconds execution time with a speedup of more than 106 with respect to exact k-d trees.
We introduce an FFT-based solver for the combinatorial continuous maximum flow discretization applied to computing the minimum cut through heterogeneous microstructures. Recently, computational methods were introduced for computing the effective crack energy of periodic and random media. These were based on the continuous minimum cut-maximum flow duality of G. Strang, and made use of discretizations based on trigonometric polynomials and finite elements. For maximum flow problems on graphs, node-based discretization methods avoid metrication artifacts associated to edge-based discretizations. We discretize the minimum cut problem on heterogeneous microstructures by the combinatorial continuous maximum flow discretization introduced by Couprie et al. Furthermore, we introduce an associated FFT-based ADMM solver and provide several adaptive strategies for choosing numerical parameters. We demonstrate the salient features of the proposed approach on problems of industrial scale.
To deploy a well-trained CNN model on low-end computation edge devices, it is usually supposed to compress or prune the model under certain computation budget (e.g., FLOPs). Current filter pruning methods mainly leverage feature maps to generate important scores for filters and prune those with smaller scores, which ignores the variance of input batches to the difference in sparse structure over filters. In this paper, we propose a data agnostic filter pruning method that uses an auxiliary network named Dagger module to induce pruning and takes pretrained weights as input to learn the importance of each filter. In addition, to help prune filters with certain FLOPs constraints, we leverage an explicit FLOPs-aware regularization to directly promote pruning filters toward target FLOPs. Extensive experimental results on CIFAR-10 and ImageNet datasets indicate our superiority to other state-of-the-art filter pruning methods. For example, our 50% FLOPs ResNet-50 can achieve 76.1% Top-1 accuracy on ImageNet dataset, surpassing many other filter pruning methods.