No Arabic abstract
Energy and power consumption are major limitations to continued scaling of computing systems. Inexactness, where the quality of the solution can be traded for energy savings, has been proposed as an approach to overcoming those limitations. In the past, however, inexactness necessitated the need for highly customized or specialized hardware. The current evolution of commercial off-the-shelf(COTS) processors facilitates the use of lower-precision arithmetic in ways that reduce energy consumption. We study these new opportunities in this paper, using the example of an inexact Newton algorithm for solving nonlinear equations. Moreover, we have begun developing a set of techniques we call reinvestment that, paradoxically, use reduced precision to improve the quality of the computed result: They do so by reinvesting the energy saved by reduced precision.
The inability of Moores Law and other figure-of-merits (FOMs) to accurately explain the technology development of the semiconductor industry demands a holistic merit to guide the industry. Here we introduce a FOM termed CLEAR that accurately postdicts technology developments since the 1940s until today, and predicts photonics as a logical extension to keep-up the pace of information-handling machines. We show that CLEAR (Capability-to-Latency-Energy-Amount-Resistance) is multi-hierarchical applying to the device, interconnect, and system level. Being a holistic FOM, we show that empirical trends such as Moores Law and the Makimotos wave are special cases of the universal CLEAR merit. Looking ahead, photonic board- and chip-level technologies are able to continue the observed doubling rate of the CLEAR value every 12 months, while electronic technologies are unable to keep pace.
Scaling transistors dimensions has been the thrust for the semiconductor industry in the last 4 decades. However, scaling channel lengths beyond 10 nm has become exceptionally challenging due to the direct tunneling between source and drain which degrades gate control, switching functionality, and worsens power dissipation. Fortunately, the emergence of novel classes of materials with exotic properties in recent times has opened up new avenues in device design. Here, we show that by using channel materials with an anisotropic effective mass, the channel can be scaled down to 1nm and still provide an excellent switching performance in both MOSFETs and TFETs. In the case of TFETs, a novel design has been proposed to take advantage of anisotropic mass in both ON- and OFF-state of the TFETs. Full-band atomistic quantum transport simulations of phosphorene nanoribbon MOSFETs and TFETs based on the new design have been performed as a proof.
This work presents MLIR, a novel approach to building reusable and extensible compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of building domain specific compilers, and aid in connecting existing compilers together. MLIR facilitates the design and implementation of code generators, translators and optimizers at different levels of abstraction and also across application domains, hardware targets and execution environments. The contribution of this work includes (1) discussion of MLIR as a research artifact, built for extension and evolution, and identifying the challenges and opportunities posed by this novel design point in design, semantics, optimization specification, system, and engineering. (2) evaluation of MLIR as a generalized infrastructure that reduces the cost of building compilers-describing diverse use-cases to show research and educational opportunities for future programming languages, compilers, execution environments, and computer architecture. The paper also presents the rationale for MLIR, its original design principles, structures and semantics.
General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.
Electronic performance predictions of modern nanotransistors require nonequilibrium Greens functions including incoherent scattering on phonons as well as inclusion of random alloy disorder and surface roughness effects. The solution of all these effects is numerically extremely expensive and has to be done on the worlds largest supercomputers due to the large memory requirement and the high performance demands on the communication network between the compute nodes. In this work, it is shown that NEMO5 covers all required physical effects and their combination. Furthermore, it is also shown that NEMO5s implementation of the algorithm scales very well up to about 178176CPUs with a sustained performance of about 857 TFLOPS. Therefore, NEMO5 is ready to simulate future nanotransistors.