بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

FPScreen: A Rapid Similarity Search Tool for Massive Molecular Library Based on Molecular Fingerprint Comparison

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tianmou Liu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lijun Wang - Jianbing Gong - Yingxia Zhang

علوم الكمبيوتر استرجاع المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We designed a fast similarity search engine for large molecular libraries: FPScreen. We downloaded 100 million molecules structure files in PubChem with SDF extension, then applied a computational chemistry tool RDKit to convert each structure file into one line of text in MACCS format and stored them in a text file as our molecule library. The similarity search engine compares the similarity while traversing the 166-bit strings in the library file line by line. FPScreen can complete similarity search through 100 million entries in our molecule library within one hour. That is very fast as a biology computation tool. Additionally, we divided our library into several strides for parallel processing. FPScreen was developed in WEB mode.

قيم البحث

224 - Hongwu Peng , Shiyang Chen , Zhepeng Wang 2021

Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient accelerat ion of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measuring the pairwise similarity between different molecular fingerprints. In this paper, we propose and optimize an FPGA-based accelerator design on exhaustive and approximate search algorithms. On exhaustive search using BitBound & folding, we analyze the similarity cutoff and folding level relationship with search speedup and accuracy, and propose a scalable on-the-fly query engine on FPGAs to reduce the resource utilization and pipeline interval. We achieve a 450 million compounds-per-second processing throughput for a single query engine. On approximate search using hierarchical navigable small world (HNSW), a popular algorithm with high recall and query speed. We propose an FPGA-based graph traversal engine to utilize a high throughput register array based priority queue and fine-grained distance calculation engine to increase the processing capability. Experimental results show that the proposed FPGA-based HNSW implementation has a 103385 query per second (QPS) on the Chembl database with 0.92 recall and achieves a 35x speedup than the existing CPU implementation on average. To the best of our knowledge, our FPGA-based implementation is the first attempt to accelerate molecular similarity search algorithms on FPGA and has the highest performance among existing approaches.

هندسة العتاد

A Novel Graph-based Approach for Determining Molecular Similarity

93 - Maritza Hernandez , Arman Zaribafiyan , Maliheh Aramon 2016

In this paper, we tackle the problem of measuring similarity among graphs that represent real objects with noisy data. To account for noise, we relax the definition of similarity using the maximum weighted co-$k$-plex relaxation method, which allows dissimilarities among graphs up to a predetermined level. We then formulate the problem as a novel quadratic unconstrained binary optimization problem that can be solved by a quantum annealer. The context of our study is molecular similarity where the presence of noise might be due to regular errors in measuring molecular features. We develop a similarity measure and use it to predict the mutagenicity of a molecule. Our results indicate that the relaxed similarity measure, designed to accommodate the regular errors, yields a higher prediction accuracy than the measure that ignores the noise.

بنى وهياكل البيانات والخوارزميات الأساليب الكمية فيزياء الكم

118 - Zijing Liu , Mauricio Barahona 2021

We propose a similarity measure for sparsely sampled time course data in the form of a log-likelihood ratio of Gaussian processes (GP). The proposed GP similarity is similar to a Bayes factor and provides enhanced robustness to noise in sparse time s eries, such as those found in various biological settings, e.g., gene transcriptomics. We show that the GP measure is equivalent to the Euclidean distance when the noise variance in the GP is negligible compared to the noise variance of the signal. Our numerical experiments on both synthetic and real data show improved performance of the GP similarity when used in conjunction with two distance-based clustering methods.

التعلم الآلي استرجاع المعلومات التعلم الالي

Aspect-based Document Similarity for Research Papers

79 - Malte Ostendorff , Terry Ruas , Till Blume 2020

Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recomm ender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity for research papers. Paper citations indicate the aspect-based similarity, i.e., the section title in which a citation occurs acts as a label for the pair of citing and cited paper. We apply a series of Transformer models such as RoBERTa, ELECTRA, XLNet, and BERT variations and compare them to an LSTM baseline. We perform our experiments on two newly constructed datasets of 172,073 research paper pairs from the ACL Anthology and CORD-19 corpus. Our results show SciBERT as the best performing system. A qualitative examination validates our quantitative results. Our findings motivate future research of aspect-based document similarity and the development of a recommender system based on the evaluated techniques. We make our datasets, code, and trained models publicly available.

الحساب واللغة استرجاع المعلومات

Improved Measurements of Molecular Cloud Distances Based on Global Search

179 - Qing-Zeng Yan , Ji Yang , Yang Su 2021

The principle of the background-eliminated extinction-parallax (BEEP) method is examining the extinction difference between on- and off-cloud regions to reveal the extinction jump caused by molecular clouds, thereby revealing the distance in complex dust environments. The BEEP method requires high-quality images of molecular clouds and high-precision stellar parallaxes and extinction data, which can be provided by the Milky Way Imaging Scroll Painting (MWISP) CO survey and the Gaia DR2 catalog, as well as supplementary AV extinction data. In this work, the BEEP method is further improved (BEEP-II) to measure molecular cloud distances in a global search manner. Applying the BEEP-II method to three regions mapped by the MWISP CO survey, we collectively measured 238 distances for 234 molecular clouds. Compared with previous BEEP results, the BEEP-II method measures distances efficiently, particularly for those molecular clouds with large angular size or in complicated environments, making it suitable for distance measurements of molecular clouds in large samples.

الفيزياء الفلكية من المجرات الفيزياء الفلكية الشمسية والنجوم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة أسيوط

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

FPScreen: A Rapid Similarity Search Tool for Massive Molecular Library Based on Molecular Fingerprint Comparison

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً