Discretization of Time Series Data

115 0 0.0 ( 0 )

Download Cite

Added by Elena Dimitrova

Publication date 2005

fields Biology

and research's language is English

Authors Elena S. Dimitrova - John J. McGee - Reinhard C. Laubenbacher

Other Quantitative Biology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Data discretization, also known as binning, is a frequently used technique in computer science, statistics, and their applications to biological data analysis. We present a new method for the discretization of real-valued data into a finite number of discrete values. Novel aspects of the method are the incorporation of an information-theoretic criterion and a criterion to determine the optimal number of values. While the method can be used for data clustering, the motivation for its development is the need for a discretization algorithm for several multivariate time series of heterogeneous data, such as transcript, protein, and metabolite concentration measurements. As several modeling methods for biochemical networks employ discrete variable states, the method needs to preserve correlations between variables as well as the dynamic features of the time series. A C++ implementation of the algorithm is available from the authors at http://polymath.vbi.vt.edu/discretization .

rate research

Evaluating data augmentation for financial time series classification

256 - Elizabeth Fons , Paula Dawson , Xiao-jun Zeng 2020

Data augmentation methods in combination with deep neural networks have been used extensively in computer vision on classification tasks, achieving great success; however, their use in time series classification is still at an early stage. This is even more so in the field of financial prediction, where data tends to be small, noisy and non-stationary. In this paper we evaluate several augmentation methods applied to stocks datasets using two state-of-the-art deep learning models. The results show that several augmentation methods significantly improve financial performance when used in combination with a trading strategy. For a relatively small dataset ($approx30K$ samples), augmentation methods achieve up to $400%$ improvement in risk adjusted return performance; for a larger stock dataset ($approx300K$ samples), results show up to $40%$ improvement.

Statistical Finance Machine Learning

Anomaly on Superspace of Time Series Data

86 - Salvatore Capozziello Dipartimento di Fisica 2017

We apply the G-Theory and anomaly of ghost and anti-ghost fields in the theory of supersymmetry to study a superspace over time series data for the detection of hidden general supply and demand equilibrium in the financial market. We provide a proof of the existence of the general equilibrium point over 14-extradimensions of the new G-theory compared to M-theory of 11 dimensions model of Edward Witten. We found that the process of coupling between nonequilibrium and equilibrium spinor fields of expectation ghost fields in the superspace of time series data induces an infinitely long exact sequence of cohomology from a short exact sequence of moduli state space model. If we assume that the financial market is separated into $2$ topological spaces of supply and demand as the D-brane and anti-D-brane model, then we can use a cohomology group to compute the stability of the market as a stable point of the general equilibrium of the interaction between D-branes of the market. We obtain the result that the general equilibrium will exist if and only if the 14-th-Batalin-Vilkovisky cohomology group with the negative dimensions underlying major 14 hidden factors influencing the market is zero.

General Physics

Analysis of Compression Techniques for DNA Sequence Data

101 - Shakeela Bibi , Javed Iqbal , Adnan Iftekhar 2020

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are the biomolecules which are present in all cells of human beings. Due to the self-replicating property of DNA, it is a key constitute of genetic material that exist in all breathingcreatures. This biomolecule (DNA) comprehends the genetic material obligatory for the operational and expansion of all personified lives. To save DNA data of single person we require 10CD-ROMs.Moreover, this size is increasing constantly, and more and more sequences are adding in the public databases. This abundant increase in the sequence data arise challenges in the precise information extraction from this data. Since many data analyzing and visualization tools do not support processing of this huge amount of data. To reduce the size of DNA and protein sequence, many scientists introduced various types of sequence compression algorithms such as compress or gzip, Context Tree Weighting (CTW), Lampel Ziv Welch (LZW), arithmetic coding, run-length encoding and substitution method etc. These techniques have sufficiently contributed to minimizing the volume of the biological datasets. On the other hand, traditional compression techniques are also not much suitable for the compression of these types of sequential data. In this paper, we have explored diverse types of techniques for compression of large amounts of DNA Sequence Data. In this paper, the analysis of techniques reveals that efficient techniques not only reduce the size of the sequence but also avoid any information loss. The review of existing studies also shows that compression of a DNA sequence is significant for understanding the critical characteristics of DNA data in addition to improving storage efficiency and data transmission. In addition, the compression of the protein sequence is a challenge for the research community. The major parameters for evaluation of these compression algorithms include compression ratio, running time complexity etc.

Other Quantitative Biology

SeDMiD for Confusion Detection: Uncovering Mind State from Time Series Brain Wave Data

92 - Jingkang Yang , Haohan Wang , Jun Zhu 2016

Understanding how brain functions has been an intriguing topic for years. With the recent progress on collecting massive data and developing advanced technology, people have become interested in addressing the challenge of decoding brain wave data into meaningful mind states, with many machine learning models and algorithms being revisited and developed, especially the ones that handle time series data because of the nature of brain waves. However, many of these time series models, like HMM with hidden state in discrete space or State Space Model with hidden state in continuous space, only work with one source of data and cannot handle different sources of information simultaneously. In this paper, we propose an extension of State Space Model to work with different sources of information together with its learning and inference algorithms. We apply this model to decode the mind state of students during lectures based on their brain waves and reach a significant better results compared to traditional methods.

Neurons and Cognition Artificial Intelligence Machine Learning

Modelling Neuronal Behaviour with Time Series Regression: Recurrent Neural Networks on C. Elegans Data

117 - Gonc{c}alo Mestre 2021

Given the inner complexity of the human nervous system, insight into the dynamics of brain activity can be gained from understanding smaller and simpler organisms, such as the nematode C. Elegans. The behavioural and structural biology of these organisms is well-known, making them prime candidates for benchmarking modelling and simulation techniques. In these complex neuronal collections, classical, white-box modelling techniques based on intrinsic structural or behavioural information are either unable to capture the profound nonlinearities of the neuronal response to different stimuli or generate extremely complex models, which are computationally intractable. In this paper we show how the nervous system of C. Elegans can be modelled and simulated with data-driven models using different neural network architectures. Specifically, we target the use of state of the art recurrent neural networks architectures such as LSTMs and GRUs and compare these architectures in terms of their properties and their accuracy as well as the complexity of the resulting models. We show that GRU models with a hidden layer size of 4 units are able to accurately reproduce with high accuracy the systems response to very different stimuli.

Neurons and Cognition Machine Learning Quantitative Methods