No Arabic abstract
The volume of data that will be produced by the next generation of astrophysical instruments represents a significant opportunity for making unplanned and unexpected discoveries. Conversely, finding unexpected objects or phenomena within such large volumes of data presents a challenge that may best be solved using computational and statistical approaches. We present the application of a coarse-grained complexity measure for identifying interesting observations in large astronomical data sets. This measure, which has been termed apparent complexity, has been shown to model human intuition and perceptions of complexity. Apparent complexity is computationally efficient to derive and can be used to segment and identify interesting observations in very large data sets based on their morphological complexity. We show, using data from the Australia Telescope Large Area Survey, that apparent complexity can be combined with clustering methods to provide an automated process for distinguishing between images of galaxies which have been classified as having simple and complex morphologies. The approach generalizes well when applied to new data after being calibrated on a smaller data set, where it performs better than tested classification methods using pixel data. This generalizability positions apparent complexity as a suitable machine-learning feature for identifying complex observations with unanticipated features.
To date, the only way to argue polynomial lower bounds for dynamic algorithms is via fine-grained complexity arguments. These arguments rely on strong assumptions about specific problems such as the Strong Exponential Time Hypothesis (SETH) and the Online Matrix-Vector Multiplication Conjecture (OMv). While they have led to many exciting discoveries, dynamic algorithms still miss out some benefits and lessons from the traditional ``coarse-grained approach that relates together classes of problems such as P and NP. In this paper we initiate the study of coarse-grained complexity theory for dynamic algorithms. Below are among questions that this theory can answer. What if dynamic Orthogonal Vector (OV) is easy in the cell-probe model? A research program for proving polynomial unconditional lower bounds for dynamic OV in the cell-probe model is motivated by the fact that many conditional lower bounds can be shown via reductions from the dynamic OV problem. Since the cell-probe model is more powerful than word RAM and has historically allowed smaller upper bounds, it might turn out that dynamic OV is easy in the cell-probe model, making this research direction infeasible. Our theory implies that if this is the case, there will be very interesting algorithmic consequences: If dynamic OV can be maintained in polylogarithmic worst-case update time in the cell-probe model, then so are several important dynamic problems such as $k$-edge connectivity, $(1+epsilon)$-approximate mincut, $(1+epsilon)$-approximate matching, planar nearest neighbors, Chans subset union and 3-vs-4 diameter. The same conclusion can be made when we replace dynamic OV by, e.g., subgraph connectivity, single source reachability, Chans subset union, and 3-vs-4 diameter. Lower bounds for $k$-edge connectivity via dynamic OV? (see the full abstract in the pdf file).
We developed several pieces of software to enable the tracking of provenance information for the large-scale complex astronomical observatory CTA, the Cherenkov Telescope Array. Such major facilities produce data that will be publicly released to a large community of scientists. There are thus strong requirements to ensure data quality, reliability and trustworthiness. Among those requirements, traceability and reproducibility of the data products have to be included in the development of large projects. Those requirements can be answered by structuring and storing the provenance information for each data product. We followed the Provenance data model, currently discussed at the IVOA, and implemented solutions to collect provenance information during the CTA data processing and the execution of jobs on a work cluster.
City traffic is a dynamic system of enormous complexity. Modeling and predicting city traffic flow remains to be a challenge task and the main difficulties are how to specify the supply and demands and how to parameterize the model. In this paper we attempt to solve these problems with the help of large amount of floating car data. We propose a coarse-grained cellular automata model that simulates vehicles moving on uniform grids whose size are much larger compared with the microscopic cellular automata model. The car-car interaction in the microscopic model is replaced by the coupling between vehicles and coarse-grained state variables in our model. To parameterize the model, flux-occupancy relations are fitted from the historical data at every grids, which serve as the coarse-grained fundamental diagrams coupling the occupancy and speed. To evaluate the model, we feed it with the historical travel demands and trajectories obtained from the floating car data and use the model to predict road speed one hour into the future. Numerical results show that our model can capture the traffic flow pattern of the entire city and make reasonable predictions. The current work can be considered a prototype for a model-based forecasting system for city traffic.
Brute-force simulations for dynamics on very large networks are quite expensive. While phenomenological treatments may capture some macroscopic properties, they often ignore important microscopic details. Fortunately, one may be only interested in the property of local part and not in the whole network. Here, we propose a hybrid multiscale coarse-grained(HMCG) method which combines a fine Monte Carlo(MC) simulation on the part of nodes of interest with a more coarse Langevin dynamics on the rest part. We demonstrate the validity of our method by analyzing the equilibrium Ising model and the nonequilibrium susceptible-infected-susceptible model. It is found that HMCG not only works very well in reproducing the phase transitions and critical phenomena of the microscopic models, but also accelerates the evaluation of dynamics with significant computational savings compared to microscopic MC simulations directly for the whole networks. The proposed method is general and can be applied to a wide variety of networked systems just adopting appropriate microscopic simulation methods and coarse graining approaches.
During the last decade coarse-grained nucleotide models have emerged that allow us to DNA and RNA on unprecedented time and length scales. Among them is oxDNA, a coarse-grained, sequence-specific model that captures the hybridisation transition of DNA and many structural properties of single- and double-stranded DNA. oxDNA was previously only available as standalone software, but has now been implemented into the popular LAMMPS molecular dynamics code. This article describes the new implementation and analyses its parallel performance. Practical applications are presented that focus on single-stranded DNA, an area of research which has been so far under-investigated. The LAMMPS implementation of oxDNA lowers the entry barrier for using the oxDNA model significantly, facilitates future code development and interfacing with existing LAMMPS functionality as well as other coarse-grained and atomistic DNA models.