ترغب بنشر مسار تعليمي؟ اضغط هنا

Acceleration of low-latency gravitational wave searches using Maxwell-microarchitecture GPUs

105   0   0.0 ( 0 )
 نشر من قبل Zhihui Du
 تاريخ النشر 2017
والبحث باللغة English




اسأل ChatGPT حول البحث

Low-latency detections of gravitational waves (GWs) are crucial to enable prompt follow-up observations to astrophysical transients by conventional telescopes. We have developed a low-latency pipeline using a technique called Summed Parallel Infinite Impulse Response (SPIIR) filtering, realized by a Graphic Processing Unit (GPU). In this paper, we exploit the new textit{Maxwell} memory access architecture in NVIDIA GPUs, namely the read-only data cache, warp-shuffle, and cross-warp atomic techniques. We report a 3-fold speed-up over our previous implementation of this filtering technique. To tackle SPIIR with relatively few filters, we develop a new GPU thread configuration with a nearly 10-fold speedup. In addition, we implement a multi-rate scheme of SPIIR filtering using Maxwell GPUs. We achieve more than 100-fold speed-up over a single core CPU for the multi-rate filtering scheme. This results in an overall of 21-fold CPU usage reduction for the entire SPIIR pipeline.



قيم البحث

اقرأ أيضاً

Searches for gravitational-wave counterparts have been going in earnest since GW170817 and the discovery of AT2017gfo. Since then, the lack of detection of other optical counterparts connected to binary neutron star or black hole - neutron star candi dates has highlighted the need for a better discrimination criterion to support this effort. At the moment, the low-latency gravitational-wave alerts contain preliminary information about the binary properties and, hence, on whether a detected binary might have an electromagnetic counterpart. The current alert method is a classifier that estimates the probability that there is a debris disc outside the black hole created during the merger as well as the probability of a signal being a binary neutron star, a black hole - neutron star, a binary black hole or of terrestrial origin. In this work, we expand upon this approach to predict both the ejecta properties and provide contours of potential lightcurves for these events in order to improve follow-up observation strategy. The various sources of uncertainty are discussed, and we conclude that our ignorance about the ejecta composition and the insufficient constraint of the binary parameters, by the low-latency pipelines, represent the main limitations. To validate the method, we test our approach on real events from the second and third Advanced LIGO-Virgo observing runs.
We investigate the use of particle swarm optimization (PSO) algorithm for detection of gravitational-wave signals from compact binary coalescences. We show that the PSO is fast and effective in searching for gravitational wave signals. The PSO-based aligned-spin coincident multi-detector search recovers appreciably more gravitational-wave signals, for a signal-to-noise ratio (SNR) of 10, the PSO based aligned-spin search recovers approximately 26 $%$ more events as compared to the template bank searches. The PSO-based aligned-spin coincident search uses 48k matched-filtering operations, and provides a better parameter estimation accuracy at the detection stage, as compared to the PyCBC template-bank search in LIGOs second observation run (O2) with 400k template points. We demonstrate an effective PSO-based precessing coincident search with 320k match-filtering operations per detector. We present results of an all-sky aligned-spin coherent search with 576k match-filtering operations per detector, for some examples of two-, three-, and four-detector networks constituting of the LIGO detectors in Hanford and Livingston, Virgo and KAGRA. Techniques for background estimation that are applicable to real data for PSO-based coincident and coherent searches are also presented.
NaNet is an FPGA-based PCIe X8 Gen2 NIC supporting 1/10 GbE links and the custom 34 Gbps APElink channel. The design has GPUDirect RDMA capabilities and features a network stack protocol offloading module, making it suitable for building low-latency, real-time GPU-based computing systems. We provide a detailed description of the NaNet hardware modular architecture. Benchmarks for latency and bandwidth for GbE and APElink channels are presented, followed by a performance analysis on the case study of the GPU-based low level trigger for the RICH detector in the NA62 CERN experiment, using either the NaNet GbE and APElink channels. Finally, we give an outline of project future activities.
With the detection of a binary neutron star system and its corresponding electromagnetic counterparts, a new window of transient astronomy has opened. Due to the size of the error regions, which can span hundreds to thousands of square degrees, there are significant benefits to optimizing tilings for these large sky areas. The rich science promised by gravitational-wave astronomy has led to the proposal for a variety of tiling and time allocation schemes, and for the first time, we make a systematic comparison of some of these methods. We find that differences of a factor of 2 or more in efficiency are possible, depending on the algorithm employed. For this reason, for future surveys searching for electromagnetic counterparts, care should be taken when selecting tiling, time allocation, and scheduling algorithms to maximize the probability of counterpart detection.
Compact binary systems emit gravitational radiation which is potentially detectable by current Earth bound detectors. Extracting these signals from the instruments background noise is a complex problem and the computational cost of most current searc hes depends on the complexity of the source model. Deep learning may be capable of finding signals where current algorithms hit computational limits. Here we restrict our analysis to signals from non-spinning binary black holes and systematically test different strategies by which training data is presented to the networks. To assess the impact of the training strategies, we re-analyze the first published networks and directly compare them to an equivalent matched-filter search. We find that the deep learning algorithms can generalize low signal-to-noise ratio (SNR) signals to high SNR ones but not vice versa. As such, it is not beneficial to provide high SNR signals during training, and fastest convergence is achieved when low SNR samples are provided early on. During testing we found that the networks are sometimes unable to recover any signals when a false alarm probability $<10^{-3}$ is required. We resolve this restriction by applying a modification we call unbounded Softmax replacement (USR) after training. With this alteration we find that the machine learning search retains $geq 97.5%$ of the sensitivity of the matched-filter search down to a false-alarm rate of 1 per month.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا