A data relocation approach for terrain surface analysis on multi-GPU systems: a case study on the total viewshed problem

65 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Andres Jesus Sanchez Fernandez

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف A. J. Sanchez-Fernandez - L. F. Romero - G. Bandera

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Digital Elevation Models (DEMs) are important datasets for modelling the line of sight, such as radio signals, sound waves and human vision. These are commonly analyzed using rotational sweep algorithms. However, such algorithms require large numbers of memory accesses to 2D arrays which, despite being regular, result in poor data locality in memory. Here, we propose a new methodology called skewed Digital Elevation Model (sDEM), which substantially improves the locality of memory accesses and increases the inherent parallelism involved in the computation of rotational sweep-based algorithms. In particular, sDEM applies a data restructuring technique before accessing the memory and performing the computation. To demonstrate the high efficiency of sDEM, we use the problem of total viewshed computation as a case study considering different implementations for single-core, multi-core, single-GPU and multi-GPU platforms. We conducted two experiments to compare sDEM with (i) the most commonly used geographic information systems (GIS) software and (ii) the state-of-the-art algorithm. In the first experiment, sDEM is on average 8.8x faster than current GIS software despite being able to consider only few points because of their limitations. In the second experiment, sDEM is 827.3x faster than the state-of-the-art algorithm in the best case.

قيم البحث

336 - A. H. Hassan , C. J. Fluke , 2011

Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, t he increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed todays single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a software as a service manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.

الأجهزة والأساليب للزيئات الفيزياء الفلكية النظم الموزعة والتوازية والحوسبة العنقودية

Target Location Problem for Multi-commodity Flow

74 - Xingwu Liu , Zhida Pan , Yuyi Wang 2020

Motivated by scheduling in Geo-distributed data analysis, we propose a target location problem for multi-commodity flow (LoMuF for short). Given commodities to be sent from their resources, LoMuF aims at locating their targets so that the multi-commo dity flow is optimized in some sense. LoMuF is a combination of two fundamental problems, namely, the facility location problem and the network flow problem. We study the hardness and algorithmic issues of the problem in various settings. The findings lie in three aspects. First, a series of NP-hardness and APX-hardness results are obtained, uncovering the inherent difficulty in solving this problem. Second, we propose an approximation algorithm for general undirected networks and an exact algorithm for undirected trees, which naturally induce efficient approximation algorithms on directed networks. Third, we observe separations between directed networks and undirected ones, indicating that imposing direction on edges makes the problem strictly harder. These results show the richness of the problem and pave the way to further studies.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Worst-Case Analysis for Randomly Collected Data

95 - Justin Y. Chen , Gregory Valiant , Paul Valiant 2019

We introduce a framework for statistical estimation that leverages knowledge of how samples are collected but makes no distributional assumptions on the data values. Specifically, we consider a population of elements $[n]={1,ldots,n}$ with correspond ing data values $x_1,ldots,x_n$. We observe the values for a sample set $A subset [n]$ and wish to estimate some statistic of the values for a target set $B subset [n]$ where $B$ could be the entire set. Crucially, we assume that the sets $A$ and $B$ are drawn according to some known distribution $P$ over pairs of subsets of $[n]$. A given estimation algorithm is evaluated based on its worst-case, expected error where the expectation is with respect to the distribution $P$ from which the sample $A$ and target sets $B$ are drawn, and the worst-case is with respect to the data values $x_1,ldots,x_n$. Within this framework, we give an efficient algorithm for estimating the target mean that returns a weighted combination of the sample values--where the weights are functions of the distribution $P$ and the sample and target sets $A$, $B$--and show that the worst-case expected error achieved by this algorithm is at most a multiplicative $pi/2$ factor worse than the optimal of such algorithms. The algorithm and proof leverage a surprising connection to the Grothendieck problem. This framework, which makes no distributional assumptions on the data values but rather relies on knowledge of the data collection process, is a significant departure from typical estimation and introduces a uniform algorithmic analysis for the many natural settings where membership in a sample may be correlated with data values, such as when sampling probabilities vary as in importance sampling, when individuals are recruited into a sample via a social network as in snowball sampling, or when samples have chronological structure as in selective prediction.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

Gauge Field Generation on Large-Scale GPU-Enabled Systems

501 - Frank Winter 2012

Over the past years GPUs have been successfully applied to the task of inverting the fermion matrix in lattice QCD calculations. Even strong scaling to capability-level supercomputers, corresponding to O(100) GPUs or more has been achieved. However s trong scaling a whole gauge field generation algorithm to this regim requires significantly more functionality than just having the matrix inverter utilizing the GPUs and has not yet been accomplished. This contribution extends QDP-JIT, the migration of SciDAC QDP++ to GPU-enabled parallel systems, to help to strong scale the whole Hybrid Monte-Carlo to this regime. Initial results are shown for gauge field generation with Chroma simulating pure Wilson fermions on OLCF TitanDev.

فيزياء الطاقة العالية - شعرية النظم الموزعة والتوازية والحوسبة العنقودية الفيزياء الحسابية

Multi-way sparsest cut problem on trees with a control on the number of parts and outliers

191 - Ramin Javadi , Saleh Ashkboos 2017

Given a graph, the sparsest cut problem asks for a subset of vertices whose edge expansion (the normalized cut given by the subset) is minimized. In this paper, we study a generalization of this problem seeking for $ k $ disjoint subsets of vertices (clusters) whose all edge expansions are small and furthermore, the number of vertices remained in the exterior of the subsets (outliers) is also small. We prove that although this problem is $ NP-$hard for trees, it can be solved in polynomial time for all weighted trees, provided that we restrict the search space to subsets which induce connected subgraphs. The proposed algorithm is based on dynamic programming and runs in the worst case in $ O(k^2 n^3) $, when $ n $ is the number of vertices and $ k $ is the number of clusters. It also runs in linear time when the number of clusters and the number of outliers is bounded by a constant.

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات