No Arabic abstract
Todays cloud service architectures follow a one size fits all deployment strategy where the same service version instantiation is provided to the end users. However, consumers are broad and different applications have different accuracy and responsiveness requirements, which as we demonstrate renders the one size fits all approach inefficient in practice. We use a production-grade speech recognition engine, which serves several thousands of users, and an open source computer vision based system, to explain our point. To overcome the limitations of the one size fits all approach, we recommend Tolerance Tiers where each MLaaS tier exposes an accuracy/responsiveness characteristic, and consumers can programmatically select a tier. We evaluate our proposal on the CPU-based automatic speech recognition (ASR) engine and cutting-edge neural networks for image classification deployed on both CPUs and GPUs. The results show that our proposed approach provides an MLaaS cloud service architecture that can be tuned by the end API user or consumer to outperform the conventional one size fits all approach.
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (http://aix360.mybluemix.net/), an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessibl
We provide a general framework for characterizing the trade-off between accuracy and robustness in supervised learning. We propose a method and define quantities to characterize the trade-off between accuracy and robustness for a given architecture, and provide theoretical insight into the trade-off. Specifically we introduce a simple trade-off curve, define and study an influence function that captures the sensitivity, under adversarial attack, of the optima of a given loss function. We further show how adversarial training regularizes the parameters in an over-parameterized linear model, recovering the LASSO and ridge regression as special cases, which also allows us to theoretically analyze the behavior of the trade-off curve. In experiments, we demonstrate the corresponding trade-off curves of neural networks and how they vary with respect to factors such as number of layers, neurons, and across different network structures. Such information provides a useful guideline to architecture selection.
Context: Internal chemical mixing in intermediate- and high-mass stars represents an immense uncertainty in stellar evolution models.In addition to extending the main-sequence lifetime, chemical mixing also appreciably increases the mass of the stellar core. Several studies have made attempts to calibrate the efficiency of different convective boundary mixing mechanisms, with sometimes seemingly conflicting results. Aims: We aim to demonstrate that stellar models regularly under-predict the masses of convective stellar cores. Methods: We gather convective core mass and fractional core hydrogen content inferences from numerous independent binary and asteroseismic studies, and compare them to stellar evolution models computed with the MESA stellar evolution code. Results: We demonstrate that core mass inferences from the literature are ubiquitously more massive than predicted by stellar evolution models without or with little convective boundary mixing. Conclusions: Independent of the form of internal mixing, stellar models require an efficient mixing mechanism that produces more massive cores throughout the main sequence to reproduce high-precision observations. This has implications for the post-main sequence evolution of all stars which have a well developed convective core on the main sequence.
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41%$ in terms of mean $ell_2$ perturbation distance.
Exploratory testing (ET) is a powerful and efficient way of testing software by integrating design, execution, and analysis of tests during a testing session. ET is often contrasted with scripted testing, and seen as a choice between black and white. We pose that there are different levels of exploratory testing from fully exploratory to fully scripted and propose a scale for the degree of exploration for ET. The degree is defined through levels of ET, which correspond to the way test charters are formulated. We have evaluated the classification through focus groups at four companies and identified factors that influence the level of exploratory testing. The results show that the proposed ET levels have distinguishing characteristics and that the levels can be used as a guide to structure test charters. Our study also indicates that applying a combination of ET levels can be beneficial in achieving effective testing.