أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Paul M.B. Vitanyi

A Fast Quartet Tree Heuristic for Hierarchical Clustering

47 - Rudi L. Cilibrasi 2014

The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the $3{n choose 4}$ weighted quartet topologies on $n$ objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can b e the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly transforms a dendrogram, with all objects involved as leaves, achieving a monotonic approximation to the exact single globally optimal tree. The problem and the solution heuristic has been extensively used for general hierarchical clustering of nontree-like (non-phylogeny) data in various domains and across domains with heterogeneous data. We also present a greatly improved heuristic, reducing the running time by a factor of order a thousand to ten thousand. All this is implemented and available, as part of the CompLearn package. We compare performance and running time of the original and improv

التعلم الآلي الهندسة الحاسوبية، المالية،العلوم بنى وهياكل البيانات والخوارزميات

Algorithmic information theory

172 - Peter D. Grunwald 2008

We introduce algorithmic information theory, also known as the theory of Kolmogorov complexity. We explain the main concepts of this quantitative approach to defining `information. We discuss the extent to which Kolmogorovs and Shannons information t heory have a common purpose, and where they are fundamentally different. We indicate how recent developments within the theory allow one to formally distinguish between `structural (meaningful) and `random information as measured by the Kolmogorov structure function, which leads to a mathematical formalization of Occams razor in inductive inference. We end by discussing some of the philosophical implications of the theory.

نظرية المعلومات التعلم الآلي نظرية المعلومات

Normalized Information Distance

316 - Paul M.B. Vitanyi 2008

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the K olmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.

استرجاع المعلومات الذكاء الاصطناعي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد