أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jianfeng Zhan

Pressure-induced Superconductivity at 32 K in MoB2

232 - Cuiying Pei , Jianfeng Zhang , Qi Wang 2021

Since the discovery of superconductivity in MgB2 (Tc ~ 39 K), the search for superconductivity in related materials with similar structures or ingredients has never stopped. Although about 100 binary borides have been explored, only few of them show superconductivity with relatively low Tc. In this work, we report the discovery of superconductivity up to 32 K in MoB2 under pressure which is the highest Tc in transition-metal borides. Although the Tc can be well explained by theoretical calculations in the framework of electron-phonon coupling, the d-electrons and phonon modes of transition metal Mo atoms play utterly important roles in the emergence of superconductivity in MoB2, distinctly different from the case of well-known MgB2. Our study sheds light on the exploration of high-Tc superconductors in transition metal borides.

المنصة الفائقة علم المواد

Shift-and-Balance Attention

103 - Chunjie Luo , Jianfeng Zhan , Tianshu Hao 2021

Attention is an effective mechanism to improve the deep model capability. Squeeze-and-Excite (SE) introduces a light-weight attention branch to enhance the networks representational power. The attention branch is gated using the Sigmoid function and multiplied by the feature maps trunk branch. It is too sensitive to coordinate and balance the trunk and attention branches contributions. To control the attention branchs influence, we propose a new attention method, called Shift-and-Balance (SB). Different from Squeeze-and-Excite, the attention branch is regulated by the learned control factor to control the balance, then added into the feature maps trunk branch. Experiments show that Shift-and-Balance attention significantly improves the accuracy compared to Squeeze-and-Excite when applied in more layers, increasing more size and capacity of a network. Moreover, Shift-and-Balance attention achieves better or close accuracy compared to the state-of-art Dynamic Convolution.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

Comparison and Benchmarking of AI Models and Frameworks on Mobile Devices

66 - Chunjie Luo , Xiwen He , Jianfeng Zhan 2020

Due to increasing amounts of data and compute resources, deep learning achieves many successes in various domains. The application of deep learning on the mobile and embedded devices is taken more and more attentions, benchmarking and ranking the AI abilities of mobile and embedded devices becomes an urgent problem to be solved. Considering the model diversity and framework diversity, we propose a benchmark suite, AIoTBench, which focuses on the evaluation of the inference abilities of mobile and embedded devices. AIoTBench covers three typical heavy-weight networks: ResNet50, InceptionV3, DenseNet121, as well as three light-weight networks: SqueezeNet, MobileNetV2, MnasNet. Each network is implemented by three frameworks which are designed for mobile and embedded devices: Tensorflow Lite, Caffe2, Pytorch Mobile. To compare and rank the AI capabilities of the devices, we propose two unified metrics as the AI scores: Valid Images Per Second (VIPS) and Valid FLOPs Per Second (VOPS). Currently, we have compared and ranked 5 mobile devices using our benchmark. This list will be extended and updated soon after.

التعلم الآلي الذكاء الاصطناعي الأداء

AIBench Training: Balanced Industry-Standard AI Training Benchmarking

358 - Fei Tang , Wanling Gao , Jianfeng Zhan 2020

Earlier-stage evaluations of a new AI architecture/system need affordable benchmarks. Only using a few AI component benchmarks like MLPerfalone in the other stages may lead to misleading conclusions. Moreover, the learning dynamics are not well under stood, and the benchmarks shelf-life is short. This paper proposes a balanced benchmarking methodology. We use real-world benchmarks to cover the factors space that impacts the learning dynamics to the most considerable extent. After performing an exhaustive survey on Internet service AI domains, we identify and implement nineteen representative AI tasks with state-of-the-art models. For repeatable performance ranking (RPR subset) and workload characterization (WC subset), we keep two subsets to a minimum for affordability. We contribute by far the most comprehensive AI training benchmark suite. The evaluations show: (1) AIBench Training (v1.1) outperforms MLPerfTraining (v0.7) in terms of diversity and representativeness of model complexity, computational cost, convergent rate, computation, and memory access patterns, and hotspot functions; (2) Against the AIBench full benchmarks, its RPR subset shortens the benchmarking cost by 64%, while maintaining the primary workload characteristics; (3) The performance ranking shows the single-purpose AI accelerator like TPU with the optimized TensorFlowframework performs better than that of GPUs while losing the latters general support for various AI models. The specification, source code, and performance numbers are available from the AIBench homepage https://www.benchcouncil.org/aibench-training/index.html.

الذكاء الاصطناعي التعلم الآلي

AIBench: An Agile Domain-specific Benchmarking Methodology and an AI Benchmark Suite

69 - Wanling Gao , Fei Tang , Jianfeng Zhan 2020

Domain-specific software and hardware co-design is encouraging as it is much easier to achieve efficiency for fewer tasks. Agile domain-specific benchmarking speeds up the process as it provides not only relevant design inputs but also relevant metri cs, and tools. Unfortunately, modern workloads like Big data, AI, and Internet services dwarf the traditional one in terms of code size, deployment scale, and execution path, and hence raise serious benchmarking challenges. This paper proposes an agile domain-specific benchmarking methodology. Together with seventeen industry partners, we identify ten important end-to-end application scenarios, among which sixteen representative AI tasks are distilled as the AI component benchmarks. We propose the permutations of essential AI and non-AI component benchmarks as end-to-end benchmarks. An end-to-end benchmark is a distillation of the essential attributes of an industry-scale application. We design and implement a highly extensible, configurable, and flexible benchmark framework, on the basis of which, we propose the guideline for building end-to-end benchmarks, and present the first end-to-end Internet service AI benchmark. The preliminary evaluation shows the value of our benchmark suite---AIBench against MLPerf and TailBench for hardware and software designers, micro-architectural researchers, and code developers. The specifications, source code, testbed, and results are publicly available from the web site url{http://www.benchcouncil.org/AIBench/index.html}.

الأداء الرؤية الحاسوبية وتمييز الأنماط

RSAVS superconductors: materials with a superconducting state that is robust against large volume shrinkage

40 - Cheng Huang , Jing Guo , Jianfeng Zhang 2019

TThe transition temperature (TC) between normal and superconducting states usually exhibits a dramatic increase or decrease with increasing applied pressure. Here we present, in contrast, a new kind of superconductor that exhibits the exotic feature that TC is robust against large volume shrinkages induced by applied pressure (here naming them as RSAVS superconductors). Extraordinarily, the TC in these materials stays almost constant over a large pressure range, e.g. over 136 GPa in the (TaNb)0.67(HfZrTi)0.33 high entropy alloy and 141 GPa in the NbTi commercial alloy. We show that the RSAVS behavior also exists in another high entropy alloy (ScZrNbTa)0.6(RhPd)0.4, and in superconducting elemental Ta and Nb, indicating that this behavior, which has never previously been identified or predicted by theory, occurs universally in some conventional superconductors. Our electronic structure calculations indicate that although the electronic density of state (DOS) at the Fermi level in the RSAVS state is dominated by the electrons from the degenerate dxy, dxz and dyz orbitals, these electrons decrease in influence with increasing pressure. In contrast, however, the contribution of the degenerate dx2-y2 and dz2 orbital electrons remains almost unchanged at the Fermi level, suggesting that these are the electrons that may play a crucial role in stabilizing the TC in the RSAVS state.

المنصة الفائقة

BenchCouncils View on Benchmarking AI and Other Emerging Workloads

102 - Jianfeng Zhan , Lei Wang , Wanling Gao 2019

This paper outlines BenchCouncils view on the challenges, rules, and vision of benchmarking modern workloads like Big Data, AI or machine learning, and Internet Services. We conclude the challenges of benchmarking modern workloads as FIDSS (Fragmente d, Isolated, Dynamic, Service-based, and Stochastic), and propose the PRDAERS benchmarking rules that the benchmarks should be specified in a paper-and-pencil manner, relevant, diverse, containing different levels of abstractions, specifying the evaluation metrics and methodology, repeatable, and scaleable. We believe proposing simple but elegant abstractions that help achieve both efficiency and general-purpose is the final target of benchmarking in future, which may be not pressing. In the light of this vision, we shortly discuss BenchCouncils related projects.

الأداء الذكاء الاصطناعي

Wellposedness of Second Order Master Equations for Mean Field Games with Nonsmooth Data

78 - Chenchen Mou , Jianfeng Zhang 2019

In this paper we study second order master equations arising from mean field games with common noise over arbitrary time duration. A classical solution typically requires the monotonicity condition (or small time duration) and sufficiently smooth dat a. While keeping the monotonicity condition, our goal is to relax the regularity of the data, which is an open problem in the literature. In particular, we do not require any differentiability in terms of the measures, which prevents us from obtaining classical solutions. We shall propose three weaker notions of solutions, named as {it good solutions}, {it weak solutions}, and {it viscosity solutions}, respectively, and establish the wellposedness of the master equation under all three notions. We emphasize that, due to the game nature, one cannot expect comparison principle even for classical solutions. The key for the global (in time) wellposedness is the uniform a priori estimate for the Lipschitz continuity of the solution in the measures. The monotonicity condition is crucial for this uniform estimate and thus is crucial for the existence of the global solution, but is not needed for the uniqueness. To facilitate our analysis, we construct a smooth mollifier for functions on Wasserstein space, which is new in the literature and is interesting in its own right. As an important application of our results, we prove the convergence of the Nash system, a high dimensional system of PDEs arising from the corresponding $N$-player game, under mild regularity requirements. We shall also prove a propagation of chaos property for the associated optimal trajectories.

تحليل PDES التحسين والتحكم الاحتمالات

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads

237 - Wanling Gao , Jianfeng Zhan , Lei Wang 2018

For the architecture community, reasonable simulation time is a strong requirement in addition to performance data accuracy. However, emerging big data and AI workloads are too huge at binary size level and prohibitively expensive to run on cycle-acc urate simulators. The concept of data motif, which is identified as a class of units of computation performed on initial or intermediate data, is the first step towards building proxy benchmark to mimic the real-world big data and AI workloads. However, there is no practical way to construct a proxy benchmark based on the data motifs to help simulation-based research. In this paper, we embark on a study to bridge the gap between data motif and a practical proxy benchmark. We propose a data motif-based proxy benchmark generating methodology by means of machine learning method, which combine data motifs with different weights to mimic the big data and AI workloads. Furthermore, we implement various data motifs using light-weight stacks and apply the methodology to five real-world workloads to construct a suite of proxy benchmarks, considering the data types, patterns, and distributions. The evaluation results show that our proxy benchmarks shorten the execution time by 100s times on real systems while maintaining the average system and micro-architecture performance data accuracy above 90%, even changing the input data sets or cluster configurations. Moreover, the generated proxy benchmarks reflect consistent performance trends across different architectures. To facilitate the community, we will release the proxy benchmarks on the project homepage http://prof.ict.ac.cn/BigDataBench.

النظم الموزعة والتوازية والحوسبة العنقودية الأداء

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads

118 - Wanling Gao , Jianfeng Zhan , Lei Wang 2018

The complexity and diversity of big data and AI workloads make understanding them difficult and challenging. This paper proposes a new approach to modelling and characterizing big data and AI workloads. We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on different initial or intermediate data inputs. Each class of unit of computation captures the common requirements while being reasonably divorced from individual implementations, and hence we call it a data motif. For the first time, among a wide variety of big data and AI workloads, we identify eight data motifs that take up most of the run time of those workloads, including Matrix, Sampling, Logic, Transform, Set, Graph, Sort and Statistic. We implement the eight data motifs on different software stacks as the micro benchmarks of an open-source big data and AI benchmark suite ---BigDataBench 4.0 (publicly available from http://prof.ict.ac.cn/BigDataBench), and perform comprehensive characterization of those data motifs from perspective of data sizes, types, sources, and patterns as a lens towards fully understanding big data and AI workloads. We believe the eight data motifs are promising abstractions and tools for not only big data and AI benchmarking, but also domain-specific hardware and software co-design.

النظم الموزعة والتوازية والحوسبة العنقودية الأداء

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد