أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Jefrey Lijffijt

Adversarial Robustness of Probabilistic Network Embedding for Link Prediction

346 - Xi Chen , Bo Kang , Jefrey Lijffijt 2021

In todays networked society, many real-world problems can be formalized as predicting links in networks, such as Facebook friendship suggestions, e-commerce recommendations, and the prediction of scientific collaborations in citation networks. Increa singly often, link prediction problem is tackled by means of network embedding methods, owing to their state-of-the-art performance. However, these methods lack transparency when compared to simpler baselines, and as a result their robustness against adversarial attacks is a possible point of concern: could one or a few small adversarial modifications to the network have a large impact on the link prediction performance when using a network embedding model? Prior research has already investigated adversarial robustness for network embedding models, focused on classification at the node and graph level. Robustness with respect to the link prediction downstream task, on the other hand, has been explored much less. This paper contributes to filling this gap, by studying adversarial robustness of Conditional Network Embedding (CNE), a state-of-the-art probabilistic network embedding model, for link prediction. More specifically, given CNE and a network, we measure the sensitivity of the link predictions of the model to small adversarial perturbations of the network, namely changes of the link status of a node pair. Thus, our approach allows one to identify the links and non-links in the network that are most vulnerable to such perturbations, for further investigation by an analyst. We analyze the characteristics of the most and least sensitive perturbations, and empirically confirm that our approach not only succeeds in identifying the most vulnerable links and non-links, but also that it does so in a time-efficient manner thanks to an effective approximation.

الشبكات الاجتماعية والمعلومات التعلم الآلي

CSNE: Conditional Signed Network Embedding

137 - Alexandru Mara , Yoosof Mashayekhi , Jefrey Lijffijt 2020

Signed networks are mathematical structures that encode positive and negative relations between entities such as friend/foe or trust/distrust. Recently, several papers studied the construction of useful low-dimensional representations (embeddings) of these networks for the prediction of missing relations or signs. Existing embedding methods for sign prediction generally enforce different notions of status or balance theories in their optimization function. These theories, however, are often inaccurate or incomplete, which negatively impacts method performance. In this context, we introduce conditional signed network embedding (CSNE). Our probabilistic approach models structural information about the signs in the network separately from fine-grained detail. Structural information is represented in the form of a prior, while the embedding itself is used for capturing fine-grained information. These components are then integrated in a rigorous manner. CSNEs accuracy depends on the existence of sufficiently powerful structural priors for modelling signed networks, currently unavailable in the literature. Thus, as a second main contribution, which we find to be highly valuable in its own right, we also introduce a novel approach to construct priors based on the Maximum Entropy (MaxEnt) principle. These priors can model the emph{polarity} of nodes (degree to which their links are positive) as well as signed emph{triangle counts} (a measure of the degree structural balance holds to in a network). Experiments on a variety of real-world networks confirm that CSNE outperforms the state-of-the-art on the task of sign prediction. Moreover, the MaxEnt priors on their own, while less accurate than full CSNE, achieve accuracies competitive with the state-of-the-art at very limited computational cost, thus providing an excellent runtime-accuracy trade-off in resource-constrained situations.

الشبكات الاجتماعية والمعلومات التعلم الآلي التعلم الالي

Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?

416 - Alexandru Mara , Jefrey Lijffijt , Tijl De Bie 2020

Network embedding methods map a networks nodes to vectors in an embedding space, in such a way that these representations are useful for estimating some notion of similarity or proximity between pairs of nodes in the network. The quality of these nod e representations is then showcased through results of downstream prediction tasks. Commonly used benchmark tasks such as link prediction, however, present complex evaluation pipelines and an abundance of design choices. This, together with a lack of standardized evaluation setups can obscure the real progress in the field. In this paper, we aim to shed light on the state-of-the-art of network embedding methods for link prediction and show, using a consistent evaluation pipeline, that only thin progress has been made over the last years. The newly conducted benchmark that we present here, including 17 embedding methods, also shows that many approaches are outperformed even by simple heuristics. Finally, we argue that standardized evaluation tools can repair this situation and boost future progress in this field.

الشبكات الاجتماعية والمعلومات الذكاء الاصطناعي التعلم الآلي

FONDUE: A Framework for Node Disambiguation Using Network Embeddings

100 - Ahmad Mel , Bo Kang , Jefrey Lijffijt 2020

Real-world data often presents itself in the form of a network. Examples include social networks, citation networks, biological networks, and knowledge graphs. In their simplest form, networks represent real-life entities (e.g. people, papers, protei ns, concepts) as nodes, and describe them in terms of their relations with other entities by means of edges between these nodes. This can be valuable for a range of purposes from the study of information diffusion to bibliographic analysis, bioinformatics research, and question-answering. The quality of networks is often problematic though, affecting downstream tasks. This paper focuses on the common problem where a node in the network in fact corresponds to multiple real-life entities. In particular, we introduce FONDUE, an algorithm based on network embedding for node disambiguation. Given a network, FONDUE identifies nodes that correspond to multiple entities, for subsequent splitting. Extensive experiments on twelve benchmark datasets demonstrate that FONDUE is substantially and uniformly more accurate for ambiguous node identification compared to the existing state-of-the-art, at a comparable computational cost, while less optimal for determining the best way to split ambiguous nodes.

التعلم الآلي الحساب واللغة التعلم الالي

Block-Approximated Exponential Random Graphs

100 - Florian Adriaens , Alexandru Mara , Jefrey Lijffijt 2020

An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. By utilizing fast matrix block-approximation techniques, we propose an approximative framework to such non-trivial ERGs that r esult in dyadic independence (i.e., edge independent) distributions, while being able to meaningfully model both local information of the graph (e.g., degrees) as well as global information (e.g., clustering coefficient, assortativity, etc.) if desired. This allows one to efficiently generate random networks with similar properties as an observed network, and the models can be used for several downstream tasks such as link prediction. Our methods are scalable to sparse graphs consisting of millions of nodes. Empirical evaluation demonstrates competitiveness in terms of both speed and accuracy with state-of-the-art methods -- which are typically based on embedding the graph into some low-dimensional space -- for link prediction, showcasing the potential of a more direct and interpretable probabalistic model for this task.

الشبكات الاجتماعية والمعلومات التعلم الآلي التعلم الالي

ALPINE: Active Link Prediction using Network Embedding

71 - Xi Chen , Bo Kang , Jefrey Lijffijt 2020

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, consumer-product recommendations, and the identification of hidden interactions between actors in a crim e network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, the link status of a node pair can be queried, which can be used as additional information by the link prediction algorithm. Unfortunately, such queries can be expensive or time-consuming, mandating the careful consideration of which node pairs to query. In this paper we estimate the improvement in link prediction accuracy after querying any particular node pair, to use in an active learning setup. Specifically, we propose ALPINE (Active Link Prediction usIng Network Embedding), the first method to achieve this for link prediction based on network embedding. To this end, we generalized the notion of V-optimality from experimental design to this setting, as well as more basic active learning heuristics originally developed in standard classification settings. Empirical results on real data show that ALPINE is scalable, and boosts link prediction accuracy with far fewer queries.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Explainable Subgraphs with Surprising Densities: A Subgroup Discovery Approach

78 - Junning Deng , Bo Kang , Jefrey Lijffijt 2020

The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between two people depends on their attributes, such as their age, address, and hobbies. The con nectivity of a graph can thus possibly be understood in terms of patterns of the form the subgroup of individuals with properties X are often (or rarely) friends with individuals in another subgroup with properties Y. Such rules present potentially actionable and generalizable insights into the graph. We present a method that finds pairs of node subgroups between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. This interestingness is quantified subjectively, to contrast with prior information an analyst may have about the graph. This view immediately enables iterative mining of such patterns. Our work generalizes prior work on dense subgraph mining (i.e. subgraphs induced by a single subgroup). Moreover, not only is the proposed method more general, we also demonstrate considerable practical advantages for the single subgroup special case.

الشبكات الاجتماعية والمعلومات التعلم الآلي التعلم الالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد