ﻻ يوجد ملخص باللغة العربية
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented.
SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algor
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line pro
Persistent homology is a topological feature used in a variety of applications such as generating features for data analysis and penalizing optimization problems. We develop an approach to accelerate persistent homology computations performed on many
Tensors (also commonly seen as multi-linear operators or as multi-dimensional arrays) are ubiquitous in scientific computing and in data science, and so are the software efforts for tensor operations. Particularly in recent years, we have observed an
We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight v