ترغب بنشر مسار تعليمي؟ اضغط هنا

Data mining when each data point is a network

373   0   0.0 ( 0 )
 نشر من قبل Assimakis Kattis
 تاريخ النشر 2016
والبحث باللغة English




اسأل ChatGPT حول البحث

We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation. We further incorporate these approaches with equation free techniques, demonstrating how such data mining approaches can enhance scientific computation of network evolution dynamics.



قيم البحث

اقرأ أيضاً

The widespread adoption of online courses opens opportunities for the analysis of learner behaviour and for the optimisation of web-based material adapted to observed usage. Here we introduce a mathematical framework for the analysis of time series c ollected from online engagement of learners, which allows the identification of clusters of learners with similar online behaviour directly from the data, i.e., the groups of learners are not pre-determined subjectively but emerge algorithmically from the analysis and the data.The method uses a dynamic time warping kernel to create a pairwise similarity between time series of learner actions, and combines it with an unsupervised multiscale graph clustering algorithm to cluster groups of learners with similar patterns of behaviour. We showcase our approach on online engagement data of adult learners taking six web-based courses as part of a post-graduate degree at Imperial Business School. Our analysis identifies clusters of learners with statistically distinct patterns of engagement, ranging from distributed to massed learning, with different levels of adherence to pre-planned course structure and/or task completion, and also revealing outlier learners with highly sporadic behaviour. A posteriori comparison with performance showed that, although the majority of low-performing learners are part of in the massed learning cluster, the high performing learners are distributed across clusters with different traits of online engagement. We also show that our methodology is able to identify low performing learners more accurately than common classification methods based on raw statistics extracted from the data.
Numerous studies and anecdotes demonstrate the wisdom of the crowd, the surprising accuracy of a groups aggregated judgments. Less is known, however, about the generality of crowd wisdom. For example, are crowds wise even if their members have system atic judgmental biases, or can influence each other before members render their judgments? If so, are there situations in which we can expect a crowd to be less accurate than skilled individuals? We provide a precise but general definition of crowd wisdom: A crowd is wise if a linear aggregate, for example a mean, of its members judgments is closer to the target value than a randomly, but not necessarily uniformly, sampled member of the crowd. Building on this definition, we develop a theoretical framework for examining, a priori, when and to what degree a crowd will be wise. We systematically investigate the boundary conditions for crowd wisdom within this framework and determine conditions under which the accuracy advantage for crowds is maximized. Our results demonstrate that crowd wisdom is highly robust: Even if judgments are biased and correlated, one would need to nearly deterministically select only a highly skilled judge before an individuals judgment could be expected to be more accurate than a simple averaging of the crowd. Our results also provide an accuracy rationale behind the need for diversity of judgments among group members. Contrary to folk explanations of crowd wisdom which hold that judgments should ideally be independent so that errors cancel out, we find that crowd wisdom is maximized when judgments systematically differ as much as possible. We re-analyze data from two published studies that confirm our theoretical results.
122 - Ying-Cheng Lai 2020
In applications of nonlinear and complex dynamical systems, a common situation is that the system can be measured but its structure and the detailed rules of dynamical evolution are unknown. The inverse problem is to determine the system equations an d structure based solely on measured time series. Recently, methods based on sparse optimization have been developed. For example, the principle of exploiting sparse optimization such as compressive sensing to find the equations of nonlinear dynamical systems from data was articulated in 2011 by the Nonlinear Dynamics Group at Arizona State University. This article presents a brief review of the recent progress in this area. The basic idea is to expand the equations governing the dynamical evolution of the system into a power series or a Fourier series of a finite number of terms and then to determine the vector of the expansion coefficients based solely on data through sparse optimization. Examples discussed here include discovering the equations of stationary or nonstationary chaotic systems to enable prediction of dynamical events such as critical transition and system collapse, inferring the full topology of complex networks of dynamical oscillators and social networks hosting evolutionary game dynamics, and identifying partial differential equations for spatiotemporal dynamical systems. Situations where sparse optimization is effective and those in which the method fails are discussed. Comparisons with the traditional method of delay coordinate embedding in nonlinear time series analysis are given and the recent development of model-free, data driven prediction framework based on machine learning is briefly introduced.
Head motion is inevitable in the acquisition of diffusion-weighted images, especially for certain motion-prone subjects and for data gathering of advanced diffusion models with prolonged scan times. Deficient accuracy of motion correction cause deter ioration in the quality of diffusion model reconstruction, thus affecting the derived measures. This results in either loss of data, or introducing bias in outcomes from data of different motion levels, or both. Hence minimizing motion effects and reutilizing motion-contaminated data becomes vital to quantitative studies. We have previously developed a 3-dimensional hierarchical convolution neural network (3D H-CNN) for robust diffusion kurtosis mapping from under-sampled data. In this study, we propose to extend this method to motion-contaminated data for robust recovery of diffusion model-derived measures with a process of motion assessment and corrupted volume rejection. We validate the proposed pipeline in two in-vivo datasets. Results from the first dataset of individual subjects show that all the diffusion tensor and kurtosis tensor-derived measures from the new pipeline are minimally sensitive to motion effects, and are comparable to the motion-free reference with as few as eight volumes retained from the motion-contaminated data. Results from the second dataset of a group of children with attention deficit hyperactivity disorder demonstrate the ability of our approach in ameliorating spurious group differences due to head motion. This method shows great potential for exploiting some valuable but motion-corrupted DWI data which are likely to be discarded otherwise, and applying to data with different motion level thus improving their utilization and statistic power.
The Covid-19 pandemic has had a deep impact on the lives of the entire world population, inducing a participated societal debate. As in other contexts, the debate has been the subject of several d/misinformation campaigns; in a quite unprecedented fa shion, however, the presence of false information has seriously put at risk the public health. In this sense, detecting the presence of malicious narratives and identifying the kinds of users that are more prone to spread them represent the first step to limit the persistence of the former ones. In the present paper we analyse the semantic network observed on Twitter during the first Italian lockdown (induced by the hashtags contained in approximately 1.5 millions tweets published between the 23rd of March 2020 and the 23rd of April 2020) and study the extent to which various discursive communities are exposed to d/misinformation arguments. As observed in other studies, the recovered discursive communities largely overlap with traditional political parties, even if the debated topics concern different facets of the management of the pandemic. Although the themes directly related to d/misinformation are a minority of those discussed within our semantic networks, their popularity is unevenly distributed among the various discursive communities.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا