Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization

197 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Paulito Palmes

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Paulito P. Palmes - Akihiro Kishimoto - Radu Marinescu

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AutoMLPipeline (AMLP) toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.

قيم البحث

930 - Ilya Loshchilov 2013

This paper investigates the control of an ML component within the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) devoted to black-box optimization. The known CMA-ES weakness is its sample complexity, the number of evaluations of the objecti ve function needed to approximate the global optimum. This weakness is commonly addressed through surrogate optimization, learning an estimate of the objective function a.k.a. surrogate model, and replacing most evaluations of the true objective function with the (inexpensive) evaluation of the surrogate model. This paper presents a principled control of the learning schedule (when to relearn the surrogate model), based on the Kullback-Leibler divergence of the current search distribution and the training distribution of the former surrogate model. The experimental validation of the proposed approach shows significant performance gains on a comprehensive set of ill-conditioned benchmark problems, compared to the best state of the art including the quasi-Newton high-precision BFGS method.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

An ADMM Based Framework for AutoML Pipeline Configuration

65 - Sijia Liu , Parikshit Ram , Deepak Vijaykeerthy 2019

We study the AutoML problem of automatically configuring machine learning pipelines by jointly selecting algorithms and their appropriate hyper-parameters for all steps in supervised learning pipelines. This black-box (gradient-free) optimization wit h mixed integer & continuous variables is a challenging problem. We propose a novel AutoML scheme by leveraging the alternating direction method of multipliers (ADMM). The proposed framework is able to (i) decompose the optimization problem into easier sub-problems that have a reduced number of variables and circumvent the challenge of mixed variable categories, and (ii) incorporate black-box constraints along-side the black-box optimization objective. We empirically evaluate the flexibility (in utilizing existing AutoML techniques), effectiveness (against open source AutoML toolkits),and unique capability (of executing AutoML with practically motivated black-box constraints) of our proposed scheme on a collection of binary classification data sets from UCI ML& OpenML repositories. We observe that on an average our framework provides significant gains in comparison to other AutoML frameworks (Auto-sklearn & TPOT), highlighting the practical advantages of this framework.

التعلم الآلي التعلم الالي

Surrogate Modelling for Injection Molding Processes using Machine Learning

396 - Arsenii Uglov , Sergei Nikolaev , Sergei Belov 2021

Injection molding is one of the most popular manufacturing methods for the modeling of complex plastic objects. Faster numerical simulation of the technological process would allow for faster and cheaper design cycles of new products. In this work, w e propose a baseline for a data processing pipeline that includes the extraction of data from Moldflow simulation projects and the prediction of the fill time and deflection distributions over 3-dimensional surfaces using machine learning models. We propose algorithms for engineering of features, including information of injector gates parameters that will mostly affect the time for plastic to reach the particular point of the form for fill time prediction, and geometrical features for deflection prediction. We propose and evaluate baseline machine learning models for fill time and deflection distribution prediction and provide baseline values of MSE and RMSE metrics. Finally, we measure the execution time of our solution and show that it significantly exceeds the time of simulation with Moldflow software: approximately 17 times and 14 times faster for mean and median total times respectively, comparing the times of all analysis stages for deflection prediction. Our solution has been implemented in a prototype web application that was approved by the management board of Fiat Chrysler Automobiles and Illogic SRL. As one of the promising applications of this surrogate modelling approach, we envision the use of trained models as a fast objective function in the task of optimization of technological parameters of the injection molding process (meaning optimal placement of gates), which could significantly aid engineers in this task, or even automate it.

التعلم الآلي

Robusta: Robust AutoML for Feature Selection via Reinforcement Learning

115 - Xiaoyang Wang , Bo Li , Yibo Zhang 2021

Several AutoML approaches have been proposed to automate the machine learning (ML) process, such as searching for the ML model architectures and hyper-parameters. However, these AutoML pipelines only focus on improving the learning accuracy of benign samples while ignoring the ML model robustness under adversarial attacks. As ML systems are increasingly being used in a variety of mission-critical applications, improving the robustness of ML systems has become of utmost importance. In this paper, we propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)--to perform feature selection, aiming to select features that lead to both accurate and robust ML systems. We show that a variation of the 0-1 robust loss can be directly optimized via an RL-based combinatorial search in the feature selection scenario. In addition, we employ heuristics to accelerate the search procedure based on feature scoring metrics, which are mutual information scores, tree-based classifiers feature importance scores, F scores, and Integrated Gradient (IG) scores, as well as their combinations. We conduct extensive experiments and show that the proposed framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples compared with other feature selection methods.

التعلم الآلي الذكاء الاصطناعي

Improved Surrogate Modeling using Machine Learning for Industrial Civil Aircraft Aerodynamics

192 - Romain Dupuis , Jean-Christophe Jouhaud , Pierre Sagaut 2019

Predicting and simulating aerodynamic fields for civil aircraft over wide flight envelopes represent a real challenge mainly due to significant numerical costs and complex flows. Surrogate models and reduced-order models help to estimate aerodynamic fields from a few well-selected simulations. However, their accuracy dramatically decreases when different physical regimes are involved. Therefore, a method of local non-intrusive reduced-order models using machine learning, called Local Decomposition Method, has been developed to mitigate this issue. This paper introduces several enhancements to this method and presents a complex application to an industrial-like three-dimensional aircraft configuration over a full flight envelope. The enhancements of the method cover several aspects: choosing the best number of models, estimating apriori errors, improving the adaptive sampling for parallel issues, and better handling the borders between local models. The application is supported by an analysis of the model behavior, with a focus on the machine learning methods and the local properties. The model achieves strong levels of accuracy, in particular with two sub-models: one for the subsonic regime and one for the transonic regime. These results highlight that local models and machine learning represent very promising solutions to deal with surrogate models for aerodynamics.

ديناميات السوائل تحليل البيانات والإحصاءات والاحتمال