Do you want to publish a course? Click here

Multi-level analysis of compiler induced variability and performance tradeoffs

66   0   0.0 ( 0 )
 Added by Ian Briggs
 Publication date 2018
and research's language is English




Ask ChatGPT about the research

Successful HPC software applications are long-lived. When ported across machines and their compilers, these applications often produce different numerical results, many of which are unacceptable. Such variability is also a concern while optimizing the code more aggressively to gain performance. Efficient tools that help locate the program units (files and functions) within which most of the variability occurs are badly needed, both to plan for code ports and to root-cause errors due to variability when they happen in the field. In this work, we offer an enhanced version of the open-source testing framework FLiT to serve these roles. Key new features of FLiT include a suite of bisection algorithms that help locate the root causes of variability. Another added feature allows an analysis of the tradeoffs between performance and the degree of variability. Our new contributions also include a collection of case studies. Results on the MFEM finite-element library include variability/performance tradeoffs, and the identification of a (hitherto unknown) abnormal level of result-variability even under mild compiler optimizations. Results from studying the Laghos proxy application include identifying a significantly divergent floating-point result-variability and successful root-causing down to the problematic function over as little as 14 program executions. Finally, in an evaluation of 4,376 controlled injections of floating-point perturbations on the LULESH proxy application, we showed that the FLiT framework has 100 precision and recall in discovering the file and function locations of the injections all within an average of only 15 program executions.



rate research

Read More

In the software development process, model transformation is increasingly assimilated. However, systems being developed with model transformation sometimes grow in size and become complex. Meanwhile, the performance of model transformation tends to decrease. Hence, performance is an important quality of model transformation. According to current research model transformation performance focuses on optimising the engines internally. However, there exists no research activities to support transformation engineer to identify performance bottleneck in the transformation rules and hence, to predict the overall performance. In this paper we vision our aim at providing an approach of monitoring and profiling to identify the root cause of performance issues in the transformation rules and to predict the performance of model transformation. This will enable software engineers to systematically identify performance issues as well as predict the performance of model transformation.
Small cell networks with dynamic time-division duplex (D-TDD) have emerged as a potential solution to address the asymmetric traffic demands in 5G wireless networks. By allowing the dynamic adjustment of cell-specific UL/DL configuration, D-TDD flexibly allocates percentage of subframes to UL and DL transmissions to accommodate the traffic within each cell. However, the unaligned transmissions bring in extra interference which degrades the potential gain achieved by D-TDD. In this work, we propose an analytical framework to study the performance of multi-antenna small cell networks with clustered D-TDD, where cell clustering is employed to mitigate the interference from opposite transmission direction in neighboring cells. With tools from stochastic geometry, we derive explicit expressions and tractable tight upper bounds for success probability and network throughput. The proposed analytical framework allows to quantify the effect of key system parameters, such as UL/DL configuration, cluster size, antenna number, and SINR threshold. Our results show the superiority of the clustered D-TDD over the traditional D-TDD, and reveal the fact that there exists an optimal cluster size for DL performance, while UL performance always benefits from a larger cluster.
OpenCL for FPGA enables developers to design FPGAs using a programming model similar for processors. Recent works have shown that code optimization at the OpenCL level is important to achieve high computational efficiency. However, existing works either focus primarily on optimizing single kernels or solely depend on channels to design multi-kernel pipelines. In this paper, we propose a source-to-source compiler framework, MKPipe, for optimizing multi-kernel workloads in OpenCL for FPGA. Besides channels, we propose new schemes to enable multi-kernel pipelines. Our optimizing compiler employs a systematic approach to explore the tradeoffs of these optimizations methods. To enable more efficient overlapping between kernel execution, we also propose a novel workitem/workgroup-id remapping technique. Furthermore, we propose new algorithms for throughput balancing and resource balancing to tune the optimizations upon individual kernels in the multi-kernel workloads. Our results show that our compiler-optimized multi-kernels achieve up to 3.6x (1.4x on average) speedup over the baseline, in which the kernels have already been optimized individually.
In this paper, an analytical approach for the nonlinear distorted bit error rate performance of optical orthogonal frequency division multiplexing (O-OFDM) with single photon avalanche diode (SPAD) receivers is presented. Major distortion effects of passive quenching (PQ) and active quenching (AQ) SPAD receivers are analysed in this study. The performance analysis of DC-biased O-OFDM and asymmetrically clipped O-OFDM with PQ and AQ SPAD are derived. The comparison results show the maximum optical irradiance caused by the nonlinear distortion, which limits the transmission power and bit rate. The theoretical maximum bit rate of SPAD-based OFDM is found which is up to 1~Gbits/s. This approach supplies a closed-form analytical solution for designing an optimal SPAD-based system.
The performance analysis of a novel optical modulation scheme is presented in this paper. The basic concept is to transmit signs of modulated optical orthogonal frequency division multiplexing (O-OFDM) symbols and absolute values of the symbols separately by two information carrying units: 1) indices of two light emitting diodes (LED) transmitters that represent positive and negative signs separately; and 2) optical intensity symbols that carry the absolute values of signals. The new approach, referred as to non-DC-biased OFDM (NDC-OFDM), uses the optical spatial modulation (OSM) technique to eliminate the effect of the clipping distortion in DC-biased optical OFDM (DCO-OFDM). In addition, it can achieve similar advantages as the conventional unipolar modulation scheme, asymmetrically clipped optical OFDM (ACO-OFDM), without using additional subcarriers. In this paper, the analytical BER performance is compared with the Monte Carlo result in order to prove the reliability of the new method. Moreover, the practical BER performance of NDC-OFDM with DCO-OFDM and ACO-OFDM is compared for different constellation sizes to verify the improvement of NDC-OFDM on the spectral and power efficiencies.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا