Do you want to publish a course? Click here

Motivation: Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the potential of the mapper to select less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds. Results: We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-bp read in $O(x times L)$ operations on average and in $O(x times L^{2})$ operations in the worst case. We compared OSS against four state-of-the-art seed selection schemes and observed that OSS provides a 3-fold reduction of average seed frequency over the best previous seed selection optimizations.
Zonal configuration of energy market is often a consequence of political borders. However there are a few methods developed to help with zonal delimitation in respect to some measures. This paper presents the approach aiming at reduction of the loop flow effect - an element of unscheduled flows which introduces a loss of market efficiency. In order to undertake zonal partitioning, a detailed decomposition of power flow is performed. Next, we identify the zone which is a source of the problem and enhance delimitation by dividing it into two zones. The procedure is illustrated by a study of simple case.
With the advent of high-throughput wet lab technologies the amount of protein interaction data available publicly has increased substantially, in turn spurring a plethora of computational methods for in silico knowledge discovery from this data. In this paper, we focus on parameterized methods for modeling and solving complex computational problems encountered in such knowledge discovery from protein data. Specifically, we concentrate on three relevant problems today in proteomics, namely detection of lethal proteins, functional modules and alignments from protein interaction networks. We propose novel graph theoretic models for these problems and devise practical parameterized algorithms. At a broader level, we demonstrate how these methods can be viable alternatives for the several heurestic, randomized, approximation and sub-optimal methods by arriving at parameterized yet optimal solutions for these problems. We substantiate these theoretical results by experimenting on real protein interaction data of S. cerevisiae (budding yeast) and verifying the results using gene ontology.
Latent periodic elements in genomes play important roles in genomic functions. Many complex periodic elements in genomes are difficult to be detected by commonly used digital signal processing (DSP). We present a novel method to compute the periodic power spectrum of a DNA sequence based on the nucleotide distributions on periodic positions of the sequence. The method directly calculates full periodic spectrum of a DNA sequence rather than frequency spectrum by Fourier transform. The magnitude of the periodic power spectrum reflects the strength of the periodicity signals, thus, the algorithm can capture all the latent periodicities in DNA sequences. We apply this method on detection of latent periodicities in different genome elements, including exons and microsatellite DNA sequences. The results show that the method minimizes the impact of spectral leakage, captures a much broader latent periodicities in genomes, and outperforms the conventional Fourier transform.
Spectral deferred corrections (SDC) is an iterative approach for constructing higher- order accurate numerical approximations of ordinary differential equations. SDC starts with an initial approximation of the solution defined at a set of Gaussian or spectral collocation nodes over a time interval and uses an iterative application of lower-order time discretizations applied to a correction equation to improve the solution at these nodes. Each deferred correction sweep increases the formal order of accuracy of the method up to the limit inherent in the accuracy defined by the collocation points. In this paper, we demonstrate that SDC is well suited to recovering from soft (transient) hardware faults in the data. A strategy where extra correction iterations are used to recover from soft errors and provide algorithmic resilience is proposed. Specifically, in this approach the iteration is continued until the residual (a measure of the error in the approximation) is small relative to the residual on the first correction iteration and changes slowly between successive iterations. We demonstrate the effectiveness of this strategy for both canonical test problems and a comprehen- sive situation involving a mature scientific application code that solves the reacting Navier-Stokes equations for combustion research.
HIV-1 can disseminate between susceptible cells by two mechanisms: cell-free infection following fluid-phase diffusion of virions and by highly-efficient direct cell-to-cell transmission at immune cell contacts. The contribution of this hybrid spreading mechanism, which is also a characteristic of some important computer worm outbreaks, to HIV-1 progression in vivo remains unknown. Here we present a new mathematical model that explicitly incorporates the ability of HIV-1 to use hybrid spreading mechanisms and evaluate the consequences for HIV-1 pathogenenesis. The model captures the major phases of the HIV-1 infection course of a cohort of treatment naive patients and also accurately predicts the results of the Short Pulse Anti-Retroviral Therapy at Seroconversion (SPARTAC) trial. Using this model we find that hybrid spreading is critical to seed and establish infection, and that cell-to-cell spread and increased CD4+ T cell activation are important for HIV-1 progression. Notably, the model predicts that cell-to-cell spread becomes increasingly effective as infection progresses and thus may present a considerable treatment barrier. Deriving predictions of various treatments influence on HIV-1 progression highlights the importance of earlier intervention and suggests that treatments effectively targeting cell-to-cell HIV-1 spread can delay progression to AIDS. This study suggests that hybrid spreading is a fundamental feature of HIV infection, and provides the mathematical framework incorporating this feature with which to evaluate future therapeutic strategies.
We present what we believe to be the first formal verification of a biologically realistic (nonlinear ODE) model of a neural circuit in a multicellular organism: Tap Withdrawal (TW) in emph{C. Elegans}, the common roundworm. TW is a reflexive behavior exhibited by emph{C. Elegans} in response to vibrating the surface on which it is moving; the neural circuit underlying this response is the subject of this investigation. Specifically, we perform reachability analysis on the TW circuit model of Wicks et al. (1996), which enables us to estimate key circuit parameters. Underlying our approach is the use of Fan and Mitras recently developed technique for automatically computing local discrepancy (convergence and divergence rates) of general nonlinear systems. We show that the results we obtain are in agreement with the experimental results of Wicks et al. (1995). As opposed to the fixed parameters found in most biological models, which can only produce the predominant behavior, our techniques characterize ranges of parameters that produce (and do not produce) all three observed behaviors: reversal of movement, acceleration, and lack of response.
212 - S. Nayak , S. Chakraverty 2015
This paper deals with uncertain parabolic fluid flow problem where the uncertainty occurs due to the initial conditions and parameters involved in the system. Uncertain values are considered as fuzzy and these are handled through a recently developed method. Here the concepts of fuzzy numbers are combined with Finite Difference Method (FDM) and then Fuzzy Finite Difference Method (FFDM) has been proposed. The proposed FFDM has been used to solve the fluid flow problem bounded by two parallel plates. Finally sensitivity of the fuzzy parameters has also been analysed.
Side effects of prescribed medications are a common occurrence. Electronic healthcare databases present the opportunity to identify new side effects efficiently but currently the methods are limited due to confounding (i.e. when an association between two variables is identified due to them both being associated to a third variable). In this paper we propose a proof of concept method that learns common associations and uses this knowledge to automatically refine side effect signals (i.e. exposure-outcome associations) by removing instances of the exposure-outcome associations that are caused by confounding. This leaves the signal instances that are most likely to correspond to true side effect occurrences. We then calculate a novel measure termed the confounding-adjusted risk value, a more accurate absolute risk value of a patient experiencing the outcome within 60 days of the exposure. Tentative results suggest that the method works. For the four signals (i.e. exposure-outcome associations) investigated we are able to correctly filter the majority of exposure-outcome instances that were unlikely to correspond to true side effects. The method is likely to improve when tuning the association rule mining parameters for specific health outcomes. This paper shows that it may be possible to filter signals at a patient level based on association rules learned from considering patients medical histories. However, additional work is required to develop a way to automate the tuning of the methods parameters.
468 - Ben Teng , Can Yang , Jiming Liu 2015
Motivation: Genome-wide association studies (GWASs), which assay more than a million single nucleotide polymorphisms (SNPs) in thousands of individuals, have been widely used to identify genetic risk variants for complex diseases. However, most of the variants that have been identified contribute relatively small increments of risk and only explain a small portion of the genetic variation in complex diseases. This is the so-called missing heritability problem. Evidence has indicated that many complex diseases are genetically related, meaning these diseases share common genetic risk variants. Therefore, exploring the genetic correlations across multiple related studies could be a promising strategy for removing spurious associations and identifying underlying genetic risk variants, and thereby uncovering the mystery of missing heritability in complex diseases. Results: We present a general and robust method to identify genetic patterns from multiple large-scale genomic datasets. We treat the summary statistics as a matrix and demonstrate that genetic patterns will form a low-rank matrix plus a sparse component. Hence, we formulate the problem as a matrix recovering problem, where we aim to discover risk variants shared by multiple diseases/traits and those for each individual disease/trait. We propose a convex formulation for matrix recovery and an efficient algorithm to solve the problem. We demonstrate the advantages of our method using both synthesized datasets and real datasets. The experimental results show that our method can successfully reconstruct both the shared and the individual genetic patterns from summary statistics and achieve better performance compared with alternative methods under a wide range of scenarios.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا