No Arabic abstract
Particle picking is currently a critical step in the cryo-EM single particle reconstruction pipeline. Despite extensive work on this problem, for many data sets it is still challenging, especially for low SNR micrographs. We present the KLT (Karhunen Loeve Transform) picker, which is fully automatic and requires as an input only the approximated particle size. In particular, it does not require any manual picking. Our method is designed especially to handle low SNR micrographs. It is based on learning a set of optimal templates through the use of multi-variate statistical analysis via the Karhunen Loeve Transform. We evaluate the KLT picker on publicly available data sets and present high-quality results with minimal manual effort.
Muons are the most abundant charged particles arriving at sea level originating from the decay of secondary charged pions and kaons. These secondary particles are created when high-energy cosmic rays hit the atmosphere interacting with air nuclei initiating cascades of secondary particles which led to the formation of extensive air showers (EAS). They carry essential information about the extra-terrestrial events and are characterized by large flux and varying angular distribution. To account for open questions and the origin of cosmic rays, one needs to study various components of cosmic rays with energy and arriving direction. Because of the close relation between muon and neutrino production, it is the most important particle to keep track of. We propose a novel tracking algorithm based on the Geometric Deep Learning approach using graphical structure to incorporate domain knowledge to track cosmic ray muons in our 3-D scintillator detector. The detector is modeled using the GEANT4 simulation package and EAS is simulated using CORSIKA (COsmic Ray SImulations for KAscade) with a focus on muons originating from EAS. We shed some light on the performance, robustness towards noise and double hits, limitations, and application of the proposed algorithm in tracking applications with the possibility to generalize to other detectors for astrophysical and collider experiments.
In this paper, we consider a surrogate modeling approach using a data-driven nonparametric likelihood function constructed on a manifold on which the data lie (or to which they are close). The proposed method represents the likelihood function using a spectral expansion formulation known as the kernel embedding of the conditional distribution. To respect the geometry of the data, we employ this spectral expansion using a set of data-driven basis functions obtained from the diffusion maps algorithm. The theoretical error estimate suggests that the error bound of the approximate data-driven likelihood function is independent of the variance of the basis functions, which allows us to determine the amount of training data for accurate likelihood function estimations. Supporting numerical results to demonstrate the robustness of the data-driven likelihood functions for parameter estimation are given on instructive examples involving stochastic and deterministic differential equations. When the dimension of the data manifold is strictly less than the dimension of the ambient space, we found that the proposed approach (which does not require the knowledge of the data manifold) is superior compared to likelihood functions constructed using standard parametric basis functions defined on the ambient coordinates. In an example where the data manifold is not smooth and unknown, the proposed method is more robust compared to an existing polynomial chaos surrogate model which assumes a parametric likelihood, the non-intrusive spectral projection.
The in situ measurement of the particle size distribution (PSD) of a suspension of particles presents huge challenges. Various effects from the process could introduce noise to the data from which the PSD is estimated. This in turn could lead to the occurrence of artificial peaks in the estimated PSD. Limitations in the models used in the PSD estimation could also lead to the occurrence of these artificial peaks. This could pose a significant challenge to in situ monitoring of particulate processes, as there will be no independent estimate of the PSD to allow a discrimination of the artificial peaks to be carried out. Here, we present an algorithm which is capable of discriminating between artificial and true peaks in PSD estimates based on fusion of multiple data streams. In this case, chord length distribution and laser diffraction data have been used. The data fusion is done by means of multi-objective optimisation using the weighted sum approach. The algorithm is applied to two different particle suspensions. The estimated PSDs from the algorithm are compared with offline estimates of PSD from the Malvern Mastersizer and Morphologi G3. The results show that the algorithm is capable of eliminating an artificial peak in a PSD estimate when this artificial peak is sufficiently displaced from the true peak. However, when the artificial peak is too close to the true peak, it is only suppressed but not completely eliminated.
Modern scientific computational methods are undergoing a transformative change; big data and statistical learning methods now have the potential to outperform the classical first-principles modeling paradigm. This book bridges this transition, connecting the theory of probability, stochastic processes, functional analysis, numerical analysis, and differential geometry. It describes two classes of computational methods to leverage data for modeling dynamical systems. The first is concerned with data fitting algorithms to estimate parameters in parametric models that are postulated on the basis of physical or dynamical laws. The second class is on operator estimation, which uses the data to nonparametrically approximate the operator generated by the transition function of the underlying dynamical systems. This self-contained book is suitable for graduate studies in applied mathematics, statistics, and engineering. Carefully chosen elementary examples with supplementary MATLAB codes and appendices covering the relevant prerequisite materials are provided, making it suitable for self-study.
A data-driven convergence criterion for the DAgostini (Richardson-Lucy) iterative unfolding is presented. It relies on the unregularized spectrum (infinite number of iterations), and allows a safe estimation of the bias and undercoverage induced by truncating the algorithm. In addition, situations where the response matrix is not perfectly known are also discussed, and show that in most cases the unregularized spectrum is not an unbiased estimator of the true distribution. Whenever a bias is introduced, either by truncation of by poor knowledge of the response, a way to retrieve appropriate coverage properties is proposed.