An Information-theoretic Perspective of Hierarchical Clustering

149 0 0.0 ( 0 )

Download Cite

Added by Yicheng Pan

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yicheng Pan - Feng Zheng - Bingchen Fan

Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

A combinatorial cost function for hierarchical clustering was introduced by Dasgupta cite{dasgupta2016cost}. It has been generalized by Cohen-Addad et al. cite{cohen2019hierarchical} to a general form named admissible function. In this paper, we investigate hierarchical clustering from the emph{information-theoretic} perspective and formulate a new objective function. We also establish the relationship between these two perspectives. In algorithmic aspect, we get rid of the traditional top-down and bottom-up frameworks, and propose a new one to stratify the emph{sparsest} level of a cluster tree recursively in guide with our objective function. For practical use, our resulting cluster tree is not binary. Our algorithm called HCSE outputs a $k$-level cluster tree by a novel and interpretable mechanism to choose $k$ automatically without any hyper-parameter. Our experimental results on synthetic datasets show that HCSE has a great advantage in finding the intrinsic number of hierarchies, and the results on real datasets show that HCSE also achieves competitive costs over the popular algorithms LOUVAIN and HLP.

rate research

An Information-Theoretic Perspective on Overfitting and Underfitting

113 - Daniel Bashir , George D. Montanez , Sonia Sehra 2020

We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.

Machine Learning Artificial Intelligence Information Theory

Completion Time in Two-user Channels: An Information-Theoretic Perspective

125 - Yuanpeng Liu , Elza Erkip 2015

In a two-user channel, completion time refers to the number of channel uses spent by each user to transmit a bit pool with some given size. In this paper, the information-theoretic formulation of completion time is based on the concept of constrained rates, where users are allowed to employ different numbers of channel uses for transmission as opposed to the equal channel use of the standard information-theoretic formulation. Analogous to the capacity region, the completion time region characterizes all possible trade-offs among users completion times. For a multi-access channel, it is shown that the completion time region is achieved by operating the channel in two independent phases: a multi-access phase when both users are transmitting, and a point-to-point phase when one user has finished and the other is still transmitting. Using a similar two-phase approach, the completion time region (or inner and outer bounds) is established for a Gaussian broadcast channel and a Gaussian interference channel. It is observed that although consisting of two convex subregions, the completion time region may not be convex in general. Finally an optimization problem of minimizing the weighted sum completion time for a Gaussian multi-access channel and a Gaussian broadcast channel is solved, demonstrating the utility of the completion time approach.

Information Theory Information Theory

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

91 - Boxin Wang , Shuohang Wang , Yu Cheng 2020

Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks. Our code is available at https://github.com/AI-secure/InfoBERT.

Computation and Language Artificial Intelligence Machine Learning

An Information-theoretic Approach to Distribution Shifts

421 - Marco Federici , Ryota Tomioka , Patrick Forre 2021

Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from a novel information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization, and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.

Machine Learning Information Theory Information Theory

Learning Structured Latent Factors from Dependent Data:A Generative Model Framework from Information-Theoretic Perspective

92 - Ruixiang Zhang , Masanori Koyama , Katsuhiko Ishiguro 2020

Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extend the theory of multivariate information bottleneck to enforce it. Our model provides a principled approach to learn a set of semantically meaningful latent factors that reflect various types of desired structures like capturing correlation or encoding invariance, while also offering the flexibility to automatically estimate the dependency structure from data. We show that our framework unifies many existing generative models and can be applied to a variety of tasks including multi-modal data modeling, algorithmic fairness, and invariant risk minimization.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Information-theoretic Perspective of Hierarchical Clustering

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions