ﻻ يوجد ملخص باللغة العربية
Delivering effective data analytics is of crucial importance to the interpretation of the multitude of biological datasets currently generated by an ever increasing number of high throughput techniques. Logic programming has much to offer in this area. Here, we detail advances that highlight two of the strengths of logical formalisms in developing data analytic solutions in biological settings: access to large relational databases and building analytical pipelines collecting graph information from multiple sources. We present significant advances on the bio_db package which serves biological databases as Prolog facts that can be served either by in-memory loading or via database backends. These advances include modularising the underlying architecture and the incorporation of datasets from a second organism (mouse). In addition, we introduce a number of data analytics tools that operate on these datasets and are bundled in the analysis package: bio_analytics. Emphasis in both packages is on ease of installation and use. We highlight the general architecture of our components based approach. An experimental graphical user interface via SWISH for local installation is also available. Finally, we advocate that biological data analytics is a fertile area which can drive further innovation in applied logic programming.
In the past, the semantic issues raised by the non-monotonic nature of aggregates often prevented their use in the recursive statements of logic programs and deductive databases. However, the recently introduced notion of Pre-mappability (PreM) has s
We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. In some cases, the datasets are collected from multiple locations, such as sensors (e.g., mobile phones and street cameras) spread throughout a
Big data benchmarking is particularly important and provides applicable yardsticks for evaluating booming big data systems. However, wide coverage and great complexity of big data computing impose big challenges on big data benchmarking. How can we c
Tensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have recei
Next Generation Sequencing (NGS) technology has resulted in massive amounts of proteomics and genomics data. This data is of no use if it is not properly analyzed. ETL (Extraction, Transformation, Loading) is an important step in designing data analy