Do you want to publish a course? Click here

An Information-theoretic Perspective of Hierarchical Clustering

149   0   0.0 ( 0 )
 Added by Yicheng Pan
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

A combinatorial cost function for hierarchical clustering was introduced by Dasgupta cite{dasgupta2016cost}. It has been generalized by Cohen-Addad et al. cite{cohen2019hierarchical} to a general form named admissible function. In this paper, we investigate hierarchical clustering from the emph{information-theoretic} perspective and formulate a new objective function. We also establish the relationship between these two perspectives. In algorithmic aspect, we get rid of the traditional top-down and bottom-up frameworks, and propose a new one to stratify the emph{sparsest} level of a cluster tree recursively in guide with our objective function. For practical use, our resulting cluster tree is not binary. Our algorithm called HCSE outputs a $k$-level cluster tree by a novel and interpretable mechanism to choose $k$ automatically without any hyper-parameter. Our experimental results on synthetic datasets show that HCSE has a great advantage in finding the intrinsic number of hierarchies, and the results on real datasets show that HCSE also achieves competitive costs over the popular algorithms LOUVAIN and HLP.

rate research

Read More

We present an information-theoretic framework for understanding overfitting and underfitting in machine learning and prove the formal undecidability of determining whether an arbitrary classification algorithm will overfit a dataset. Measuring algorithm capacity via the information transferred from datasets to models, we consider mismatches between algorithm capacities and datasets to provide a signature for when a model can overfit or underfit a dataset. We present results upper-bounding algorithm capacity, establish its relationship to quantities in the algorithmic search framework for machine learning, and relate our work to recent information-theoretic approaches to generalization.
125 - Yuanpeng Liu , Elza Erkip 2015
In a two-user channel, completion time refers to the number of channel uses spent by each user to transmit a bit pool with some given size. In this paper, the information-theoretic formulation of completion time is based on the concept of constrained rates, where users are allowed to employ different numbers of channel uses for transmission as opposed to the equal channel use of the standard information-theoretic formulation. Analogous to the capacity region, the completion time region characterizes all possible trade-offs among users completion times. For a multi-access channel, it is shown that the completion time region is achieved by operating the channel in two independent phases: a multi-access phase when both users are transmitting, and a point-to-point phase when one user has finished and the other is still transmitting. Using a similar two-phase approach, the completion time region (or inner and outer bounds) is established for a Gaussian broadcast channel and a Gaussian interference channel. It is observed that although consisting of two convex subregions, the completion time region may not be convex in general. Finally an optimization problem of minimizing the weighted sum completion time for a Gaussian multi-access channel and a Gaussian broadcast channel is solved, demonstrating the utility of the completion time approach.
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks. Our code is available at https://github.com/AI-secure/InfoBERT.
Safely deploying machine learning models to the real world is often a challenging process. Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from a novel information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization, and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.
Learning controllable and generalizable representation of multivariate data with desired structural properties remains a fundamental problem in machine learning. In this paper, we present a novel framework for learning generative models with various underlying structures in the latent space. We represent the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extend the theory of multivariate information bottleneck to enforce it. Our model provides a principled approach to learn a set of semantically meaningful latent factors that reflect various types of desired structures like capturing correlation or encoding invariance, while also offering the flexibility to automatically estimate the dependency structure from data. We show that our framework unifies many existing generative models and can be applied to a variety of tasks including multi-modal data modeling, algorithmic fairness, and invariant risk minimization.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا