بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

An Efficient Search Strategy for Aggregation and Discretization of Attributes of Bayesian Networks Using Minimum Description Length

173 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jem Corcoran

تاريخ النشر 2014

مجال البحث الاحصاء الرياضي

والبحث باللغة English

تأليف Jem Corcoran - Daniel Tran - Nicholas Levine

التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Bayesian networks are convenient graphical expressions for high dimensional probability distributions representing complex relationships between a large number of random variables. They have been employed extensively in areas such as bioinformatics, artificial intelligence, diagnosis, and risk management. The recovery of the structure of a network from data is of prime importance for the purposes of modeling, analysis, and prediction. Most recovery algorithms in the literature assume either discrete of continuous but Gaussian data. For general continuous data, discretization is usually employed but often destroys the very structure one is out to recover. Friedman and Goldszmidt suggest an approach based on the minimum description length principle that chooses a discretization which preserves the information in the original data set, however it is one which is difficult, if not impossible, to implement for even moderately sized networks. In this paper we provide an extremely efficient search strategy which allows one to use the Friedman and Goldszmidt discretization in practice.

قيم البحث

اقرأ أيضاً

Minimum Description Length Revisited

67 - Peter Grunwald , Teemu Roos 2019

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was origi nally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective.

المنهجية نظرية المعلومات التعلم الآلي

Substructure Discovery Using Minimum Description Length and Background Knowledge

71 - D. J. Cook , L. B. Holder 1994

The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length prin ciple. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUEs ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.

الذكاء الاصطناعي

Categorical anomaly detection in heterogeneous data using minimum description length clustering

227 - James Cheney , Xavier Gombau , Ghita Berrada 2020

Fast and effective unsupervised anomaly detection algorithms have been proposed for categorical data based on the minimum description length (MDL) principle. However, they can be ineffective when detecting anomalies in heterogeneous datasets represen ting a mixture of different sources, such as security scenarios in which system and user processes have distinct behavior patterns. We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data by fitting a mixture model to the data, via a variant of k-means clustering. Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms, while mixtures of more sophisticated models yield further gains, on both synthetic datasets and realistic datasets from a security scenario.

قواعد البيانات الذكاء الاصطناعي

Thermodynamics of the Minimum Description Length on Community Detection

78 - Juan Ignacio Perotti , Claudio Juan Tessone , Aaron Clauset andn Guido Caldarelli 2018

Modern statistical modeling is an important complement to the more traditional approach of physics where Complex Systems are studied by means of extremely simple idealized models. The Minimum Description Length (MDL) is a principled approach to stati stical modeling combining Occams razor with Information Theory for the selection of models providing the most concise descriptions. In this work, we introduce the Boltzmannian MDL (BMDL), a formalization of the principle of MDL with a parametric complexity conveniently formulated as the free-energy of an artificial thermodynamic system. In this way, we leverage on the rich theoretical and technical background of statistical mechanics, to show the crucial importance that phase transitions and other thermodynamic concepts have on the problem of statistical modeling from an information theoretic point of view. For example, we provide information theoretic justifications of why a high-temperature series expansion can be used to compute systematic approximations of the BMDL when the formalism is used to model data, and why statistically significant model selections can be identified with ordered phases when the BMDL is used to model models. To test the introduced formalism, we compute approximations of BMDL for the problem of community detection in complex networks, where we obtain a principled MDL derivation of the Girvan-Newman (GN) modularity and the Zhang-Moore (ZM) community detection method. Here, by means of analytical estimations and numerical experiments on synthetic and empirical networks, we find that BMDL-based correction terms of the GN modularity improve the quality of the detected communities and we also find an information theoretic justification of why the ZM criterion for estimation of the number of network communities is better than alternative approaches such as the bare minimization of a free energy.

الفيزياء والمجتمع الأنظمة المضطربة والشبكات العصبية الشبكات الاجتماعية والمعلومات

Optimal Work Extraction and the Minimum Description Length Principle

119 - Leo Touzo , Matteo Marsili , Neri Merhav 2020

We discuss work extraction from classical information engines (e.g., Szilard) with $N$-particles, $q$ partitions, and initial arbitrary non-equilibrium states. In particular, we focus on their {em optimal} behaviour, which includes the measurement of a set of quantities $Phi$ with a feedback protocol that extracts the maximal average amount of work. We show that the optimal non-equilibrium state to which the engine should be driven before the measurement is given by the normalised maximum-likelihood probability distribution of a statistical model that admits $Phi$ as sufficient statistics. Furthermore, we show that the minimax universal code redundancy $mathcal{R}^*$ associated to this model, provides an upper bound to the work that the demon can extract on average from the cycle, in units of $k_{rm B}T$. We also find that, in the limit of $N$ large, the maximum average extracted work cannot exceed $H[Phi]/2$, i.e. one half times the Shannon entropy of the measurement. Our results establish a connection between optimal work extraction in stochastic thermodynamics and optimal universal data compression, providing design principles for optimal information engines. In particular, they suggest that: (i) optimal coding is thermodynamically efficient, and (ii) it is essential to drive the system into a critical state in order to achieve optimal performance.

الميكانيكا الإحصائية نظرية المعلومات نظرية المعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة الموصل

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Efficient Search Strategy for Aggregation and Discretization of Attributes of Bayesian Networks Using Minimum Description Length

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً