ترغب بنشر مسار تعليمي؟ اضغط هنا

Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the limited c apacity of GPU memory. However, the performance of the GPU-based out-of-core stencil computation is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, but the study on the use of on-the-fly compression techniques is far from sufficient. In this study, we propose a method that accelerates the GPU-based out-of-core stencil computation with on-the-fly compression. We introduce a novel data compression approach that solves the data dependency between two contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining that overlaps CPU/GPU data transfer with GPU computation. Experimental results show that the proposed method achieved a speedup of 1.2x compared the method without compression. Moreover, although the precision loss involved by compression increased with the number of time steps, the precision loss was trivial up to 4,320 time steps, demonstrating the usefulness of the proposed method.
Service reliability is one of the key challenges that cloud providers have to deal with. In cloud systems, unplanned service failures may cause severe cascading impacts on their dependent services, deteriorating customer satisfaction. Predicting the cascading impacts accurately and efficiently is critical to the operation and maintenance of cloud systems. Existing approaches identify whether one service depends on another via distributed tracing but no prior work focused on discriminating to what extent the dependency between cloud services is. In this paper, we survey the outages and the procedure for failure diagnosis in two cloud providers to motivate the definition of the intensity of dependency. We define the intensity of dependency between two services as how much the status of the callee service influences the caller service. Then we propose AID, the first approach to predict the intensity of dependencies between cloud services. AID first generates a set of candidate dependency pairs from the spans. AID then represents the status of each cloud service with a multivariate time series aggregated from the spans. With the representation of services, AID calculates the similarities between the statuses of the caller and the callee of each candidate pair. Finally, AID aggregates the similarities to produce a unified value as the intensity of the dependency. We evaluate AID on the data collected from an open-source microservice benchmark and a cloud system in production. The experimental results show that AID can efficiently and accurately predict the intensity of dependencies. We further demonstrate the usefulness of our method in a large-scale commercial cloud system.
Optically trapped mixed-species single atom arrays with arbitrary geometries are an attractive and promising platform for various applications, because tunable quantum systems with multiple components provide extra degrees of freedom for experimental control. Here, we report the first demonstration of two-dimensional $6times4$ dual-species atom assembly with a filling fraction of 0.88 (0.89) for $^{85}$Rb ($^{87}$Rb) atoms. This mixed-species atomic synthetic is achieved via rearranging initially randomly distributed atoms using a sorting algorithm (heuristic heteronuclear algorithm) which is proposed for bottom-up atom assembly with both user-defined geometries and two-species atom number ratios. Our fully tunable hybrid-atom system of scalable advantages is a good starting point for high-fidelity quantum logic, many-body quantum simulation and forming defect-free single molecule arrays.
Demographic bias is a significant challenge in practical face recognition systems. Existing methods heavily rely on accurate demographic annotations. However, such annotations are usually unavailable in real scenarios. Moreover, these methods are typ ically designed for a specific demographic group and are not general enough. In this paper, we propose a false positive rate penalty loss, which mitigates face recognition bias by increasing the consistency of instance False Positive Rate (FPR). Specifically, we first define the instance FPR as the ratio between the number of the non-target similarities above a unified threshold and the total number of the non-target similarities. The unified threshold is estimated for a given total FPR. Then, an additional penalty term, which is in proportion to the ratio of instance FPR overall FPR, is introduced into the denominator of the softmax-based loss. The larger the instance FPR, the larger the penalty. By such unequal penalties, the instance FPRs are supposed to be consistent. Compared with the previous debiasing methods, our method requires no demographic annotations. Thus, it can mitigate the bias among demographic groups divided by various attributes, and these attributes are not needed to be previously predefined during training. Extensive experimental results on popular benchmarks demonstrate the superiority of our method over state-of-the-art competitors. Code and trained models are available at https://github.com/Tencent/TFace.
116 - Yanbang Chu , Le Liu , Cheng Shen 2021
Twisted double bilayer graphene (TDBG) is an electric-field-tunable moire system, exhibiting electron correlated states and related temperature linear (T-linear) resistivity. The displacement field provides a new knob to in-situ tune the relative str ength of electron interactions in TDBG, yielding not only a rich phase diagram but also the ability to investigate each phase individually. Here, we report a study of carrier density (n), displacement field (D) and twist angle dependence of T-linear resistivity in TDBG. For a large twist angle 1.5 degree where correlated insulating states are absent, we observe a T-linear resistivity (order of 10 Ohm per K) over a wide range of carrier density and its slope decreases with increasing of n before reaching the van Hove singularity, in agreement with acoustic phonon scattering model. The slope of T-linear resistivity is non-monotonically dependent on displacement field, with a single peak structure closely connected to single-particle van Hove Singularity (vHS) in TDBG. For an optimal twist angle of ~1.23 degree in the presence of correlated states, the slope of T-linear resistivity is found maximum at the boundary of the correlated halo regime (order of 100 Ohm per K), resulting a M shape displacement field dependence. The observation is beyond the phonon scattering model from single particle picture, and instead it suggests a strange metal behavior. We interpret the observation as a result of symmetry-breaking instability developed at quantum critical points where electron degeneracy changes. Our results demonstrate that TDBG is an ideal system to study the interplay between phonon and quantum criticality, and might help to map out the evolution of the order parameters for the ground states.
Spherical signals exist in many applications, e.g., planetary data, LiDAR scans and digitalization of 3D objects, calling for models that can process spherical data effectively. It does not perform well when simply projecting spherical data into the 2D plane and then using planar convolution neural networks (CNNs), because of the distortion from projection and ineffective translation equivariance. Actually, good principles of designing spherical CNNs are avoiding distortions and converting the shift equivariance property in planar CNNs to rotation equivariance in the spherical domain. In this work, we use partial differential operators (PDOs) to design a spherical equivariant CNN, PDO-e$text{S}^text{2}$CNN, which is exactly rotation equivariant in the continuous domain. We then discretize PDO-e$text{S}^text{2}$CNNs, and analyze the equivariance error resulted from discretization. This is the first time that the equivariance error is theoretically analyzed in the spherical domain. In experiments, PDO-e$text{S}^text{2}$CNNs show greater parameter efficiency and outperform other spherical CNNs significantly on several tasks.
61 - Cheng Shen , Wanli Xue 2021
While the Internet of Things (IoT) can benefit from machine learning by outsourcing model training on the cloud, user data exposure to an untrusted cloud service provider can pose threat to user privacy. Recently, federated learning is proposed as an approach for privacy-preserving machine learning (PPML) for the IoT, while its practicability remains unclear. This work presents the evaluation on the efficiency and privacy performance of a readily available federated learning framework based on PySyft, a Python library for distributed deep learning. It is observed that the training speed of the framework is significantly slower than of the centralized approach due to communication overhead. Meanwhile, the framework bears some vulnerability to potential man-in-the-middle attacks at the network level. The report serves as a starting point for PPML performance analysis and suggests the future direction for PPML framework development.
In this paper, we propose a direct Eulerian generalized Riemann problem (GRP) scheme for a blood flow model in arteries. It is an extension of the Eulerian GRP scheme, which is developed by Ben-Artzi, et. al. in J. Comput. Phys., 218(2006). By using the Riemann invariants, we diagonalize the blood flow system into a weakly coupled system, which is used to resolve rarefaction wave. We also use Rankine-Hugoniot condition to resolve the local GRP formulation. We pay special attention to the acoustic case as well as the sonic case. The extension to the two dimensional case is carefully obtained by using the dimensional splitting technique. We test that the derived GRP scheme is second order accuracy.
Sorting atoms stochastically loaded in optical tweezer arrays via an auxiliary mobile tweezer is an efficient approach to preparing intermediate-scale defect-free atom arrays in arbitrary geometries. However, high filling fraction of atom-by-atom ass emblers is impeded by redundant sorting moves with imperfect atom transport, especially for scaling the system size to larger atom numbers. Here, we propose a new sorting algorithm (heuristic cluster algorithm, HCA) which provides near-fewest moves in our tailored atom assembler scheme and experimentally demonstrate a $5times6$ defect-free atom array with 98.4(7)$%$ filling fraction for one rearrangement cycle. The feature of HCA that the number of moves $N_{m}approx N$ ($N$ is the number of defect sites to be filled) makes the filling fraction uniform as the size of atom assembler enlarged. Our method is essential to scale hundreds of assembled atoms for bottom-up quantum computation, quantum simulation and precision measurement.
Neural networks have achieved remarkable successes in machine learning tasks. This has recently been extended to graph learning using neural networks. However, there is limited theoretical work in understanding how and when they perform well, especia lly relative to established statistical learning techniques such as spectral embedding. In this short paper, we present a simple generative model where unsupervised graph convolutional network fails, while the adjacency spectral embedding succeeds. Specifically, unsupervised graph convolutional network is unable to look beyond the first eigenvector in certain approximately regular graphs, thus missing inference signals in non-leading eigenvectors. The phenomenon is demonstrated by visual illustrations and comprehensive simulations.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا