Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

227 0 0.0 ( 0 )

Download Cite

Added by Wilson Wong

Publication date 2008

fields Informatics Engineering

and research's language is English

Authors Wilson Wong - Wei Liu - Mohammed Bennamoun

Artificial Intelligence

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our evaluations revealed a precision and recall of 98.68% and 91.82% respectively with an accuracy at 95.42% in measuring the unithood of 1005 test cases.

rate research

Determining the Unithood of Word Sequences using a Probabilistic Approach

500 - Wilson Wong , Wei Liu , Mohammed Bennamoun 2008

Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1,825 test cases against an existing empirically-derived function revealed an improvement in terms of precision, recall and accuracy.

Artificial Intelligence

Inferring interaction partners from protein sequences using mutual information

84 - Anne-Florence Bitbol 2018

Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.

Biomolecules Biological Physics

Testing Against Independence and a Renyi Information Measure

59 - Amos Lapidoth , Christoph Pfister 2018

The achievable error-exponent pairs for the type I and type II errors are characterized in a hypothesis testing setup where the observation consists of independent and identically distributed samples from either a known joint probability distribution or an unknown product distribution. The empirical mutual information test, the Hoeffding test, and the generalized likelihood-ratio test are all shown to be asymptotically optimal. An expression based on a Renyi measure of dependence is shown to be the Fenchel biconjugate of the error-exponent function obtained by fixing one error exponent and optimizing the other. An example is provided where the error-exponent function is not convex and thus not equal to its Fenchel biconjugate.

Information Theory Information Theory

Semiparametric estimation of mutual information and related criteria : optimal test of independence

70 - Amor Keziou , Philippe Regnault 2015

We derive independence tests by means of dependence measures thresholding in a semiparametric context. Precisely, estimates of phi-mutual informations, associated to phi-divergences between a joint distribution and the product distribution of its margins, are derived through the dual representation of phi-divergences. The asymptotic properties of the proposed estimates are established, including consistency, asymptotic distributions and large deviations principle. The obtained tests of independence are compared via their relative asymptotic Bahadur efficiency and numerical simulations. It follows that the proposed semiparametric Kullback-Leibler Mutual information test is the optimal one. On the other hand, the proposed approach provides a new method for estimating the Kullback-Leibler mutual information in a semiparametric setting, as well as a model selection procedure in large class of dependency models including semiparametric copulas.

Statistics Theory Statistics Theory

Improved mutual information measure for classification and community detection

155 - M. E. J. Newman , George T. Cantwell , 2019

The information theoretic quantity known as mutual information finds wide use in classification and community detection analyses to compare two classifications of the same set of objects into groups. In the context of classification algorithms, for instance, it is often used to compare discovered classes to known ground truth and hence to quantify algorithm performance. Here we argue that the standard mutual information, as commonly defined, omits a crucial term which can become large under real-world conditions, producing results that can be substantially in error. We demonstrate how to correct this error and define a mutual information that works in all cases. We discuss practical implementation of the new measure and give some example applications.

Social and Information Networks Physics and Society Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions