ﻻ يوجد ملخص باللغة العربية
Deep learning (DL) workloads are moving towards accelerators for faster processing and lower cost. Modern DL accelerators are good at handling the large-scale multiply-accumulate operations that dominate DL workloads; however, it is challenging to make full use of the compute power of an accelerator since the data must be properly staged in a software-managed scratchpad memory. Failing to do so can result in significant performance loss. This paper proposes a systematic approach which leverages the polyhedral model to analyze all operators of a DL model together to minimize the number of memory accesses. Experiments show that our approach can substantially reduce the impact of memory accesses required by common neural-network models on a homegrown AWS machine-learning inference chip named Inferentia, which is available through Amazon EC2 Inf1 instances.
The explosion in workload complexity and the recent slow-down in Moores law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations, augmenting or
The performance gap between CPU and memory widens continuously. Choosing the best memory layout for each hardware architecture is increasingly important as more and more programs become memory bound. For portable codes that run across heterogeneous h
Contrastive learning has been applied successfully to learn vector representations of text. Previous research demonstrated that learning high-quality representations benefits from batch-wise contrastive loss with a large number of negatives. In pract
For deep neural network accelerators, memory movement is both energetically expensive and can bound computation. Therefore, optimal mapping of tensors to memory hierarchies is critical to performance. The growing complexity of neural networks calls f
Popular approaches for minimizing loss in data-driven learning often involve an abstraction or an explicit retention of the history of gradients for efficient parameter updates. The aggregated history of gradients nudges the parameter updates in the