ترغب بنشر مسار تعليمي؟ اضغط هنا

The Correspondence Analysis Platform for Uncovering Deep Structure in Data and Information

10   0   0.0 ( 0 )
 نشر من قبل Fionn Murtagh
 تاريخ النشر 2008
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Fionn Murtagh




اسأل ChatGPT حول البحث

We study two aspects of information semantics: (i) the collection of all relationships, (ii) tracking and spotting anomaly and change. The first is implemented by endowing all relevant information spaces with a Euclidean metric in a common projected space. The second is modelled by an induced ultrametric. A very general way to achieve a Euclidean embedding of different information spaces based on cross-tabulation counts (and from other input data formats) is provided by Correspondence Analysis. From there, the induced ultrametric that we are particularly interested in takes a sequential - e.g. temporal - ordering of the data into account. We employ such a perspective to look at narrative, the flow of thought and the flow of language (Chafe). In application to policy decision making, we show how we can focus analysis in a small number of dimensions.

قيم البحث

اقرأ أيضاً

9 - Fionn Murtagh 2008
Review of: Brigitte Le Roux and Henry Rouanet, Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis, Kluwer, Dordrecht, 2004, xi+475 pp.
Bayesian Networks (BNs) have become a powerful technology for reasoning under uncertainty, particularly in areas that require causal assumptions that enable us to simulate the effect of intervention. The graphical structure of these models can be det ermined by causal knowledge, learnt from data, or a combination of both. While it seems plausible that the best approach in constructing a causal graph involves combining knowledge with machine learning, this approach remains underused in practice. We implement and evaluate 10 knowledge approaches with application to different case studies and BN structure learning algorithms available in the open-source Bayesys structure learning system. The approaches enable us to specify pre-existing knowledge that can be obtained from heterogeneous sources, to constrain or guide structure learning. Each approach is assessed in terms of structure learning effectiveness and efficiency, including graphical accuracy, model fitting, complexity, and runtime; making this the first paper that provides a comparative evaluation of a wide range of knowledge approaches for BN structure learning. Because the value of knowledge depends on what data are available, we illustrate the results both with limited and big data. While the overall results show that knowledge becomes less important with big data due to higher learning accuracy rendering knowledge less important, some of the knowledge approaches are actually found to be more important with big data. Amongst the main conclusions is the observation that reduced search space obtained from knowledge does not always imply reduced computational complexity, perhaps because the relationships implied by the data and knowledge are in tension.
We introduce Platform for Situated Intelligence, an open-source framework created to support the rapid development and study of multimodal, integrative-AI systems. The framework provides infrastructure for sensing, fusing, and making inferences from temporal streams of data across different modalities, a set of tools that enable visualization and debugging, and an ecosystem of components that encapsulate a variety of perception and processing technologies. These assets jointly provide the means for rapidly constructing and refining multimodal, integrative-AI systems, while retaining the efficiency and performance characteristics required for deployment in open-world settings.
The relevance and importance of contextualizing data analytics is described. Qualitative characteristics might form the context of quantitative analysis. Topics that are at issue include: contrast, baselining, secondary data sources, supplementary da ta sources, dynamic and heterogeneous data. In geometric data analysis, especially with the Correspondence Analysis platform, various case studies are both experimented with, and are reviewed. In such aspects as paradigms followed, and technical implementation, implicitly and explicitly, an important point made is the major relevance of such work for both burgeoning analytical needs and for new analytical areas including Big Data analytics, and so on. For the general reader, it is aimed to display and describe, first of all, the analytical outcomes that are subject to analysis here, and then proceed to detail the more quantitative outcomes that fully support the analytics carried out.
Various kinds of data are routinely represented as discrete probability distributions. Examples include text documents summarized by histograms of word occurrences and images represented as histograms of oriented gradients. Viewing a discrete probabi lity distribution as a point in the standard simplex of the appropriate dimension, we can understand collections of such objects in geometric and topological terms. Importantly, instead of using the standard Euclidean distance, we look into dissimilarity measures with information-theoretic justification, and we develop the theory needed for applying topological data analysis in this setting. In doing so, we emphasize constructions that enable usage of existing computational topology software in this context.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا