A data-based comparative review and AI-driven symbolic model for longitudinal dispersion coefficient in natural streams

79 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yifeng Zhao

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية فيزياء

والبحث باللغة English

تأليف Yifeng Zhao - Zicheng Liu - Pei Zhang

التعلم الآلي تحليل البيانات والإحصاءات والاحتمال

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

A better understanding of dispersion in natural streams requires knowledge of longitudinal dispersion coefficient(LDC). Various methods have been proposed for predictions of LDC. Those studies can be grouped into three types: analytical, statistical and ML-driven researches(Implicit and explicit). However, a comprehensive evaluation of them is still lacking. In this paper, we first present an in-depth analysis of those methods and find out their defects. This is carried out on an extensive database composed of 660 samples of hydraulic and channel properties worldwide. The reliability and representativeness of utilized data are enhanced through the deployment of the Subset Selection of Maximum Dissimilarity(SSMD) for testing set selection and the Inter Quartile Range(IQR) for removal of the outlier. The evaluation reveals the rank of those methods as: ML-driven method > the statistical method > the analytical method. Whereas implicit ML-driven methods are black-boxes in nature, explicit ML-driven methods have more potential in prediction of LDC. Besides, overfitting is a universal problem in existing models. Those models also suffer from a fixed parameter combination. To establish an interpretable model for LDC prediction with higher performance, we then design a novel symbolic regression method called evolutionary symbolic regression network(ESRN). It is a combination of genetic algorithms and neural networks. Strategies are introduced to avoid overfitting and explore more parameter combinations. Results show that the ESRN model has superiorities over other existing symbolic models in performance. The proposed model is suitable for practical engineering problems due to its advantage in low requirement of parameters (only w and U* are required). It can provide convincing solutions for situations where the field test cannot be carried out or limited field information can be obtained.

قيم البحث

108 - Yifeng Zhao , Pei Zhang , S.A. Galindo-Torres 2021

Longitudinal Dispersion(LD) is the dominant process of scalar transport in natural streams. An accurate prediction on LD coefficient(Dl) can produce a performance leap in related simulation. The emerging machine learning(ML) techniques provide a self -adaptive tool for this problem. However, most of the existing studies utilize an unproved quaternion feature set, obtained through simple theoretical deduction. Few studies have put attention on its reliability and rationality. Besides, due to the lack of comparative comparison, the proper choice of ML models in different scenarios still remains unknown. In this study, the Feature Gradient selector was first adopted to distill the local optimal feature sets directly from multivariable data. Then, a global optimal feature set (the channel width, the flow velocity, the channel slope and the cross sectional area) was proposed through numerical comparison of the distilled local optimums in performance with representative ML models. The channel slope is identified to be the key parameter for the prediction of LDC. Further, we designed a weighted evaluation metric which enables comprehensive model comparison. With the simple linear model as the baseline, a benchmark of single and ensemble learning models was provided. Advantages and disadvantages of the methods involved were also discussed. Results show that the support vector machine has significantly better performance than other models. Decision tree is not suitable for this problem due to poor generalization ability. Notably, simple models show superiority over complicated model on this low-dimensional problem, for their better balance between regression and generalization.

الجيوفيزياء التعلم الآلي

AI-based Modeling and Data-driven Evaluation for Smart Manufacturing Processes

57 - Mohammadhossein Ghahramani , Yan Qiao , MengChu Zhou 2020

Smart Manufacturing refers to optimization techniques that are implemented in production operations by utilizing advanced analytics approaches. With the widespread increase in deploying Industrial Internet of Things (IIoT) sensors in manufacturing pr ocesses, there is a progressive need for optimal and effective approaches to data management. Embracing Machine Learning and Artificial Intelligence to take advantage of manufacturing data can lead to efficient and intelligent automation. In this paper, we conduct a comprehensive analysis based on Evolutionary Computing and Deep Learning algorithms toward making semiconductor manufacturing smart. We propose a dynamic algorithm for gaining useful insights about semiconductor manufacturing processes and to address various challenges. We elaborate on the utilization of a Genetic Algorithm and Neural Network to propose an intelligent feature selection algorithm. Our objective is to provide an advanced solution for controlling manufacturing processes and to gain perspective on various dimensions that enable manufacturers to access effective predictive technologies.

التعلم الآلي التعلم الالي

Deep learning-based statistical noise reduction for multidimensional spectral data

297 - Younsik Kim , Dongjin Oh , Soonsang Huh 2021

In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious con straint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an example, we demonstrate a denoising method that utilizes deep learning as an intelligent way to overcome the constraint. With readily available ARPES data and random generation of training data set, we successfully trained the denoising neural network without overfitting. The denoising neural network can remove the noise in the data while preserving its intrinsic information. We show that the denoising neural network allows us to perform similar level of second-derivative and line shape analysis on data taken with two orders of magnitude less acquisition time. The importance of our method lies in its applicability to any multidimensional spectral data that are susceptible to statistical noise.

التعلم الآلي تحليل البيانات والإحصاءات والاحتمال

Data-Driven Wind Turbine Wake Modeling via Probabilistic Machine Learning

484 - S. Ashwin Renganathan , Romit Maulik , Stefano Letizia 2021

Wind farm design primarily depends on the variability of the wind turbine wake flows to the atmospheric wind conditions, and the interaction between wakes. Physics-based models that capture the wake flow-field with high-fidelity are computationally v ery expensive to perform layout optimization of wind farms, and, thus, data-driven reduced order models can represent an efficient alternative for simulating wind farms. In this work, we use real-world light detection and ranging (LiDAR) measurements of wind-turbine wakes to construct predictive surrogate models using machine learning. Specifically, we first demonstrate the use of deep autoencoders to find a low-dimensional emph{latent} space that gives a computationally tractable approximation of the wake LiDAR measurements. Then, we learn the mapping between the parameter space and the (latent space) wake flow-fields using a deep neural network. Additionally, we also demonstrate the use of a probabilistic machine learning technique, namely, Gaussian process modeling, to learn the parameter-space-latent-space mapping in addition to the epistemic and aleatoric uncertainty in the data. Finally, to cope with training large datasets, we demonstrate the use of variational Gaussian process models that provide a tractable alternative to the conventional Gaussian process models for large datasets. Furthermore, we introduce the use of active learning to adaptively build and improve a conventional Gaussian process model predictive capability. Overall, we find that our approach provides accurate approximations of the wind-turbine wake flow field that can be queried at an orders-of-magnitude cheaper cost than those generated with high-fidelity physics-based simulations.

التعلم الآلي تحليل البيانات والإحصاءات والاحتمال

A data-driven approach for identification of novel organic ferroelectrics

294 - Ayana Ghosh , Dennis P. Trujillo , Subhashis Hazarika 2021

Recent advances in the synthesis of polar molecular materials have produced practical alternatives to ferroelectric ceramics, opening up exciting new avenues for their incorporation into modern electronic devices. However, in order to realize the ful l potential of polar polymer and molecular crystals for modern technological applications, it is paramount to assemble and evaluate all the available data for such compounds, identifying descriptors that could be associated with an emergence of ferroelectricity. In this work, we utilized data-driven approaches to judiciously shortlist candidate materials from a wide chemical space that could possess ferroelectric functionalities. An importance-sampling based method was utilized to address the challenge of having a limited amount of available data on already known organic ferroelectrics. Sets of molecular- and crystal-level descriptors were combined with a Random Forest Regression algorithm in order to predict spontaneous polarization of the shortlisted compounds with an average error of ~20%.

علم المواد تحليل البيانات والإحصاءات والاحتمال