ترغب بنشر مسار تعليمي؟ اضغط هنا

Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics

61   0   0.0 ( 0 )
 نشر من قبل Wesley Darvin
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This work provides the first study to explore the interaction of update propagation with and without fine-grained synchronization (push vs. pull), emerging coherence protocols (GPU vs. DeNovo coherence), and software-centric consistency models (DRF0, DRF1, and DRFrlx) for graph workloads on emerging integrated GPU-CPU systems with native unified shared memory. We study 6 graph applications with 6 graph inputs for a total of 36 workloads running on 12 system (hardware+software) configurations reflecting the above design space of update propagation, coherence, and memory consistency. We make three key contributions. First, we show that there is no single best system configuration for all workloads, motivating systems with flexible coherence and consistency support. Second, we develop a model to accurately predict the best system configuration -- this model can be used by software designers to decide on push vs. pull and the consistency model and by flexible hardware to invoke the appropriate coherence and consistency configuration for the given workload. Third, we show that the design dimensions explored here are inter-dependent, reinforcing the need for software-hardware co-design in the above design dimensions. For example, software designers deciding on push vs. pull must consider the consistency model supported by hardware -- in some cases, push maybe better if hardware supports DRFrlx while pull may be better if hardware does not support DRFrlx.



قيم البحث

اقرأ أيضاً

We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state.We investigate the applicability of this push- pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 11 graph algorithms, 3 programming models, 2 graph abstractions, and various families of graphs. The conducted analysis illustrates surprising differences between push and pull variants of different algorithms in performance, speed of convergence, and code complexity; the insights are backed up by performance data from hardware counters.We use these findings to illustrate which variant is faster for each algorithm and to develop generic strategies that enable even higher speedups. Our insights can be used to accelerate graph processing engines or libraries on both massively-parallel shared-memory machines as well as distributed-memory systems.
The trade-off between pull-based and push-based graph processing engines is well-understood. On one hand, pull-based engines can achieve higher throughput because their workloads are read-dominant, rather than write-dominant, and can proceed without synchronization between threads. On the other hand, push-based engines are much better able to take advantage of the frontier optimization, which leverages the fact that often only a small subset of the graph needs to be accessed to complete an iteration of a graph processing application. Hybrid engines attempt to overcome this trade-off by dynamically switching between push and pull, but there are two key disadvantages with this approach. First, applications must be implemented twice (once for push and once for pull), and second, processing throughput is reduced for iterations that run with push. We propose a radically different solution: rebuild the frontier optimization entirely such that it is well-suited for a pull-based engine. In doing so, we remove the only advantage that a push-based engine had over a pull-based engine, making it possible to eliminate the push-based engine entirely. We introduce Wedge, a pull-only graph processing framework that transforms the traditional source-oriented vertex-based frontier into a pull-friendly format called the Wedge Frontier. The transformation itself is expensive even when parallelized, so we introduce two key optimizations to make it practical. First, we perform the transformation only when the resulting Wedge Frontier is sufficiently sparse. Second, we coarsen the granularity of the representation of elements in the Wedge Frontier. These optimizations respectively improve Wedges performance by up to 5x and 2x, enabling it to outperform Grazelle, Ligra, and GraphMat respectively by up to 2.8x, 4.9x, and 185.5x.
Ajax applications are designed to have high user interactivity and low user-perceived latency. Real-time dynamic web data such as news headlines, stock tickers, and auction updates need to be propagated to the users as soon as possible. However, Ajax still suffers from the limitations of the Webs request/response architecture which prevents servers from pushing real-time dynamic web data. Such applications usually use a pull style to obtain the latest updates, where the client actively requests the changes based on a predefined interval. It is possible to overcome this limitation by adopting a push style of interaction where the server broadcasts data when a change occurs on the server side. Both these options have their own trade-offs. This paper explores the fundamental limits of browser-based applications and analyzes push solutions for Ajax technology. It also shows the results of an empirical study comparing push and pull.
Quantum optimal control involves setting up an objective function that evaluates the quality of an operator representing the realized process w.r.t. the target process. Here we propose a stronger objective function which incorporates not only the tar get operator but also a set of its orthogonal operators. We find significantly superior convergence of optimization routines with the combined influences of all the operators. We refer to this method as the $textit{push-pull}$ optimization. In particular, we describe adopting the push-pull optimization to a gradient based approach and a variational-principle based approach. We carry out extensive numerical simulations of the push-pull optimization of quantum controls on a pair of Ising coupled qubits. Finally, we demonstrate its experimental application by preparing a long-lived singlet-order in a two-qubit system using NMR techniques.
A classical result of Schubert calculus is an inductive description of Schubert cycles using divided difference (or push-pull) operators in Chow rings. We define convex geometric analogs of push-pull operators and describe their applications to the t heory of Newton-Okounkov convex bodies. Convex geometric push-pull operators yield an inductive construction of Newton-Okounkov polytopes of Bott-Samelson varieties. In particular, we construct a Minkowski sum of Feigin-Fourier-Littelmann-Vinberg polytopes using convex geometric push-pull operators in type A.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا