ﻻ يوجد ملخص باللغة العربية
Supervised, semi-supervised, and unsupervised learning estimate a function given input/output samples. Generalization of the learned function to unseen data can be improved by incorporating side information into learning. Side information are data that are neither from the input space nor from the output space of the function, but include useful information for learning it. In this paper we show that learning with side information subsumes a variety of related approaches, e.g. multi-task learning, multi-view learning and learning using privileged information. Our main contributions are (i) a new perspective that connects these previously isolated approaches, (ii) insights about how these methods incorporate different types of prior knowledge, and hence implement different patterns, (iii) facilitating the application of these methods in novel tasks, as well as (iv) a systematic experimental evaluation of these patterns in two supervised learning tasks.
We give an online algorithm and prove novel mistake and regret bounds for online binary matrix completion with side information. The mistake bounds we prove are of the form $tilde{O}(D/gamma^2)$. The term $1/gamma^2$ is analogous to the usual margin
The use of drug combinations often leads to polypharmacy side effects (POSE). A recent method formulates POSE prediction as a link prediction problem on a graph of drugs and proteins, and solves it with Graph Convolutional Networks (GCNs). However, d
This paper proposes a new meta-learning method -- named HARMLESS (HAwkes Relational Meta LEarning method for Short Sequences) for learning heterogeneous point process models from short event sequence data along with a relational network. Specifically
The explosion in workload complexity and the recent slow-down in Moores law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations, augmenting or
Pairwise clustering, in general, partitions a set of items via a known similarity function. In our treatment, clustering is modeled as a transductive prediction problem. Thus rather than beginning with a known similarity function, the function instea