أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Marcin Cieslik

Abstractions, Algorithms and Data Structures for Structural Bioinformatics in PyCogent

168 - Marcin Cieslik , Zygmunt Derewenda , Cameron Mura 2014

To facilitate flexible and efficient structural bioinformatics analyses, new functionality for three-dimensional structure processing and analysis has been introduced into PyCogent -- a popular feature-rich framework for sequence-based bioinformatics , but one which has lacked equally powerful tools for handling stuctural/coordinate-based data. Extensible Python modules have been developed, which provide object-oriented abstractions (based on a hierarchical representation of macromolecules), efficient data structures (e.g. kD-trees), fast implementations of common algorithms (e.g. surface-area calculations), read/write support for Protein Data Bank-related file formats and wrappers for external command-line applications (e.g. Stride). Integration of this code into PyCogent is symbiotic, allowing sequence-based work to benefit from structure-derived data and, reciprocally, enabling structural studies to leverage PyCogents versatile tools for phylogenetic and evolutionary analyses.

الجزيئات الحيوية بنى وهياكل البيانات والخوارزميات هندسة البرمجيات

PaPy: Parallel and Distributed Data-processing Pipelines in Python

70 - Marcin Cieslik , Cameron Mura 2014

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written Python fu nctions (nodes) connected by pipes (edges) into a directed acyclic graph. These functions are arbitrarily definable, and can make use of any Python modules or external binaries. Given a user-defined topology and collection of input data, functions are composed into nested higher-order maps, which are transparently and robustly evaluated in parallel on a single computer or on remote hosts. Local and remote computational resources can be flexibly pooled and assigned to functional nodes, thereby allowing facile load-balancing and pipeline optimization to maximize computational throughput. Input items are processed by nodes in parallel, and traverse the graph in batches of adjustable size -- a trade-off between lazy-evaluation, parallelism, and memory consumption. The processing of a single item can be parallelized in a scatter/gather scheme. The simplicity and flexibility of distributed workflows using PaPy bridges the gap between desktop -> grid, enabling this new computing paradigm to be leveraged in the processing of large scientific datasets.

لغات البرمجة الأساليب الكمية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد