An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs


الملخص بالإنكليزية

Complex applications running on multicore processors show a rich performance phenomenology. The growing number of cores per ccNUMA domain complicates performance analysis of memory-bound code since system noise, load imbalance, or task-based programming models can lead to thread desynchronization. Hence, the simplifying assumption that all cores execute the same loop can not be upheld. Motivated by observations on plain and modifi

تحميل البحث