No Arabic abstract
To operate efficiently across a wide range of workloads with varying power requirements, a modern processor applies different current management mechanisms, which briefly throttle instruction execution while they adjust voltage and frequency to accommodate for power-hungry instructions (PHIs) in the instruction stream. Doing so 1) reduces the power consumption of non-PHI instructions in typical workloads and 2) optimizes system voltage regulators cost and area for the common use case while limiting current consumption when executing PHIs. However, these mechanisms may compromise a systems confidentiality guarantees. In particular, we observe that multilevel side-effects of throttling mechanisms, due to PHI-related current management mechanisms, can be detected by two different software contexts (i.e., sender and receiver) running on 1) the same hardware thread, 2) co-located Simultaneous Multi-Threading (SMT) threads, and 3) different physical cores. Based on these new observations on current management mechanisms, we develop a new set of covert channels, IChannels, and demonstrate them in real modern Intel processors (which span more than 70% of the entire client and server processor market). Our analysis shows that IChannels provides more than 24x the channel capacity of state-of-the-art power management covert channels. We propose practical and effective mitigations to each covert channel in IChannels by leveraging the insights we gain through a rigorous characterization of real systems.
CSI (Channel State Information) of WiFi systems contains the environment channel response between the transmitter and the receiver, so the people/objects and their movement in between can be sensed. To get CSI, the receiver performs channel estimation based on the pre-known training field of the transmitted WiFi signal. CSI related technology is useful in many cases, but it also brings concerns on privacy and security. In this paper, we open sourced a CSI fuzzer to enhance the privacy and security of WiFi CSI applications. It is built and embedded into the transmitter of openwifi, which is an open source full-stack WiFi chip design, to prevent unauthorized sensing without sacrificing the WiFi link performance. The CSI fuzzer imposes an artificial channel response to the signal before it is transmitted, so the CSI seen by the receiver will indicate the actual channel response combined with the artificial response. Only the authorized receiver, that knows the artificial response, can calculate the actual channel response and perform the CSI sensing. Another potential application of the CSI fuzzer is covert channels based on a set of pre-defined artificial response patterns. Our work resolves the pain point of implementing the anti-sensing idea based on the commercial off-the-shelf WiFi devices.
The Internet of Things (IoT) paradigm is expected to bring ubiquitous intelligence through new applications in order to enhance living and other environments. Several research and standardization studies are now focused on the Middleware level of the underlying communication system. For this level, several challenges need to be considered, among them the Quality of Service (QoS) issue. The Autonomic Computing paradigm is now recognized as a promising approach to help communication and other systems to self-adapt when the context is changing. With the aim to promote the vision of an autonomic Middleware-level QoS management for IoT-based systems, this paper proposes a set of QoS-oriented mechanisms that can be dynamically executed at the Middleware level to correct QoS degradation. The benefits of the proposed mechanisms are also illustrated for a concrete case of Enhanced Living Environment.
Scientific computing sometimes involves computation on sensitive data. Depending on the data and the execution environment, the HPC (high-performance computing) user or data provider may require confidentiality and/or integrity guarantees. To study the applicability of hardware-based trusted execution environments (TEEs) to enable secure scientific computing, we deeply analyze the performance impact of AMD SEV and Intel SGX for diverse HPC benchmarks including traditional scientific computing, machine learning, graph analytics, and emerging scientific computing workloads. We observe three main findings: 1) SEV requires careful memory placement on large scale NUMA machines (1$times$$-$3.4$times$ slowdown without and 1$times$$-$1.15$times$ slowdown with NUMA aware placement), 2) virtualization$-$a prerequisite for SEV$-$results in performance degradation for workloads with irregular memory accesses and large working sets (1$times$$-$4$times$ slowdown compared to native execution for graph applications) and 3) SGX is inappropriate for HPC given its limited secure memory size and inflexible programming model (1.2$times$$-$126$times$ slowdown over unsecure execution). Finally, we discuss forthcoming new TEE designs and their potential impact on scientific computing.
A model for a new electron vortex beam production method is proposed and experimentally demonstrated. The technique calls on the controlled manipulation of the degrees of freedom of the lens aberrations to achieve a helical phase front. These degrees of freedom are accessible by using the corrector lenses of a transmission electron microscope. The vortex beam is produced through a particular alignment of these lenses into a specifically designed astigmatic state and applying an annular aperture in the condensor plane. Experimental results are found to be in good agreement with simulations.
Energy proportionality is the key design goal followed by architects of modern multicore CPUs. One of its implications is that optimization of an application for performance will also optimize it for energy. In this work, we show that energy proportionality does not hold true for multicore CPUs. This finding creates the opportunity for bi-objective optimization of applications for performance and energy. We propose and study the first application-level method for bi-objective optimization of multithreaded data-parallel applications for performance and energy. The method uses two decision variables, the number of identical multithreaded kernels (threadgroups) executing the application and the number of threads in each threadgroup, with the workload always partitioned equally between the threadgroups. We experimentally demonstrate the efficiency of the method using four highly optimized multithreaded data-parallel applications, 2D fast Fourier transform based on FFTW and Intel MKL, and dense matrix-matrix multiplication using OpenBLAS and Intel MKL. Four modern multicore CPUs are used in the experiments. The experiments show that optimization for performance alone results in the increase in dynamic energy consumption by up to 89% and optimization for dynamic energy alone degrades the performance by up to 49%. By solving the bi-objective optimization problem, the method determines up to 11 globally Pareto-optimal solutions. Finally, we propose a qualitative dynamic energy model employing performance monitoring counters as parameters, which we use to explain the discovered energy nonproportionality and the Pareto-optimal solutions determined by our method. The model shows that the energy nonproportionality in our case is due to the activity of the data translation lookaside buffer (dTLB), which is disproportionately energy expensive.