ﻻ يوجد ملخص باللغة العربية
Background: The inception of next generations sequencing technologies have exponentially increased the volume of biological sequence data. Protein sequences, being quoted as the `language of life, has been analyzed for a multitude of applications and inferences. Motivation: Owing to the rapid development of deep learning, in recent years there have been a number of breakthroughs in the domain of Natural Language Processing. Since these methods are capable of performing different tasks when trained with a sufficient amount of data, off-the-shelf models are used to perform various biological applications. In this study, we investigated the applicability of the popular Skip-gram model for protein sequence analysis and made an attempt to incorporate some biological insights into it. Results: We propose a novel $k$-mer embedding scheme, Align-gram, which is capable of mapping the similar $k$-mers close to each other in a vector space. Furthermore, we experiment with other sequence-based protein representations and observe that the embeddings derived from Align-gram aids modeling and training deep learning models better. Our experiments with a simple baseline LSTM model and a much complex CNN model of DeepGoPlus shows the potential of Align-gram in performing different types of deep learning applications for protein sequence analysis.
Android malware has been on the rise in recent years due to the increasing popularity of Android and the proliferation of third party application markets. Emerging Android malware families are increasingly adopting sophisticated detection avoidance t
Network representation learning, as an approach to learn low dimensional representations of vertices, has attracted considerable research attention recently. It has been proven extremely useful in many machine learning tasks over large graph. Most ex
The numerical integration of an analytical function $f(x)$ using a finite set of equidistant points can be performed by quadrature formulas like the Newton-Cotes. Unlike Gaussian quadrature formulas however, higher-order Newton-Cotes formulas are not
We simulate first- and second-order context overlap and show that Skip-Gram with Negative Sampling is similar to Singular Value Decomposition in capturing second-order co-occurrence information, while Pointwise Mutual Information is agnostic to it. W
We show that the skip-gram formulation of word2vec trained with negative sampling is equivalent to a weighted logistic PCA. This connection allows us to better understand the objective, compare it to other word embedding methods, and extend it to higher dimensional models.