مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Optimized Password Recovery for Encrypted RAR on GPUs

103 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiaojing An

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Xiaojing An - Haojun Zhao - Lulu Ding

النظم الموزعة والتوازية والحوسبة العنقودية التشفير والأمن

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

RAR uses classic symmetric encryption algorithm SHA-1 hashing and AES algorithm for encryption, and the only method of password recovery is brute force, which is very time-consuming. In this paper, we present an approach using GPUs to speed up the password recovery process. However, because the major calculation and time-consuming part, SHA-1 hashing, is hard to be parallelized, so this paper adopts coarse granularity parallel. That is, one GPU thread is responsible for the validation of one password. We mainly use three optimization methods to optimize this parallel version: asynchronous parallel between CPU and GPU, redundant calculations and conditional statements reduction, and the usage of registers optimization. Experiment result shows that the final version reaches 43~57 times speedup on an AMD FirePro W8000 GPU, compared to a well-optimized serial version on Intel Core i5 CPU.

قيم البحث

اقرأ أيضاً

Accelerating Concurrent Heap on GPUs

90 - Yanhao Chen 2019

Priority queue, often implemented as a heap, is an abstract data type that has been used in many well-known applications like Dijkstras shortest path algorithm, Prims minimum spanning tree, Huffman encoding, and the branch-and-bound algorithm. Howeve r, it is challenging to exploit the parallelism of the heap on GPUs since the control divergence and memory irregularity must be taken into account. In this paper, we present a parallel generalized heap model that works effectively on GPUs. We also prove the linearizability of our generalized heap model which enables us to reason about the expected results. We evaluate our concurrent heap thoroughly and show a maximum 19.49X speedup compared to the sequential CPU implementation and 2.11X speedup compared with the existing GPU implementation. We also apply our heap to single source shortest path with up to 1.23X speedup and 0/1 knapsack problem with up to 12.19X speedup.

النظم الموزعة والتوازية والحوسبة العنقودية

K-Clique Counting on GPUs

309 - Mohammad Almasri , Izzat El Hajj , Rakesh Nagi 2021

Counting k-cliques in a graph is an important problem in graph analysis with many applications. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. An important optimization is to eliminate search tre e branches that discover the same clique redundantly. Eliminating redundant clique discovery is typically done via graph orientation or pivoting. Parallel implementations for both of these approaches have demonstrated promising performance on CPUs. In this paper, we present our GPU implementations of k-clique counting for both the graph orientation and pivoting approaches. Our implementations explore both vertex-centric and edge-centric parallelization schemes, and replace recursive search tree traversal with iterative traversal based on an explicitly-managed shared stack. We also apply various optimizations to reduce memory consumption and improve the utilization of parallel execution resources. Our evaluation shows that our best GPU implementation outperforms the best state-of-the-art parallel CPU implementation by a geometric mean speedup of 12.39x, 6.21x, and 18.99x for k = 4, 7, and 10, respectively. We also evaluate the impact of the choice of parallelization scheme and the incremental speedup of each optimization. Our code will be open-sourced to enable further research on parallelizing k-clique counting on GPUs.

النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

ZMCintegral-v5.1: Support for Multi-function Integrations on GPUs

94 - Xiao-Yan Cao , Jun-Jie Zhang 2021

In this new version of ZMCintegral, we have added the functionality of multi-function integrations, i.e. the ability to integrate more than $10^{3}$ different functions on GPUs. The Python API remains the similar as the previou

النظم الموزعة والتوازية والحوسبة العنقودية فيزياء الطاقة العالية - الظواهر

Accelerating High-Order Stencils on GPUs

134 - Ryuichi Sai , John Mellor-Crummey , Xiaozhu Meng 2020

Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order sten cils on GPUs have been well-studied in the literature, not all of proposed enhancements work well for high-order stencils, such as those used for seismic modeling. Furthermore, coping with boundary conditions often requires different computational logic, which complicates efficient exploitation of the thread-level parallelism on GPUs. In this paper, we study high-order stencils and their unique characteristics on GPUs. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA and related boundary conditions. We evaluate their code shapes, memory hierarchy usage, data-fetching patterns, and other performance attributes. We conducted an empirical evaluation of these stencils using several mature and emerging tools and discuss our quantitative findings. Among our implementations, we achieve twice the performance of a proprietary code developed in C and mapped to GPUs using OpenACC. Additionally, several of our implementations have excellent performance portability.

النظم الموزعة والتوازية والحوسبة العنقودية

An Adaptive Load Balancer For Graph Analytical Applications on GPUs

168 - Vishwesh Jatala , Loc Hoang , Roshan Dathathri 2019

Load-balancing among the threads of a GPU for graph analytics workloads is difficult because of the irregular nature of graph applications and the high variability in vertex degrees, particularly in power-law graphs. We describe a novel load balancin g scheme to address this problem. Our scheme is implemented in the IrGL compiler to allow users to generate efficient load balanced code for a GPU from high-level sequential programs. We evaluated several graph analytics applications on up to 16 distributed GPUs using IrGL to compile the code and the Gluon substrate for inter-GPU communication. Our experiments show that this scheme can achieve an average speed-up of 2.2x on inputs that suffer from severe load imbalance problems when previous state-of-the-art load-balancing schemes are used.

النظم الموزعة والتوازية والحوسبة العنقودية

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الرشيد الدولية الخاصة للعلوم والتكنولوجيا

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Optimized Password Recovery for Encrypted RAR on GPUs

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً