No Arabic abstract
Current Flash X-ray single-particle diffraction Imaging (FXI) experiments, which operate on modern X-ray Free Electron Lasers (XFELs), can record millions of interpretable diffraction patterns from individual biomolecules per day. Due to the stochastic nature of the XFELs, those patterns will to a varying degree include scatterings from contaminated samples. Also, the heterogeneity of the sample biomolecules is unavoidable and complicates data processing. Reducing the data volumes and selecting high-quality single-molecule patterns are therefore critical steps in the experimental set-up. In this paper, we present two supervised template-based learning methods for classifying FXI patterns. Our Eigen-Image and Log-Likelihood classifier can find the best-matched template for a single-molecule pattern within a few milliseconds. It is also straightforward to parallelize them so as to fully match the XFEL repetition rate, thereby enabling processing at site.
Single particle diffraction imaging experiments at free-electron lasers (FEL) have a great potential for structure determination of reproducible biological specimens that can not be crystallized. One of the challenges in processing the data from such an experiment is to determine correct orientation of each diffraction pattern from samples randomly injected in the FEL beam. We propose an algorithm (see also O. Yefanov et al., Photon Science - HASYLAB Annual Report 2010) that can solve this problem and can be applied to samples from tens of nanometers to microns in size, measured with sub-nanometer resolution in the presence of noise. This is achieved by the simultaneous analysis of a large number of diffraction patterns corresponding to different orientations of the particles. The algorithms efficiency is demonstrated for two biological samples, an artificial protein structure without any symmetry and a virus with icosahedral symmetry. Both structures are few tens of nanometers in size and consist of more than 100 000 non-hydrogen atoms. More than 10 000 diffraction patterns with Poisson noise were simulated and analyzed for each structure. Our simulations indicate the possibility to achieve resolution of about 3.3 {AA} at 3 {AA} wavelength and incoming flux of 10^{12} photons per pulse focused to 100times 100 nm^2.
Modern Flash X-ray diffraction Imaging (FXI) acquires diffraction signals from single biomolecules at a high repetition rate from X-ray Free Electron Lasers (XFELs), easily obtaining millions of 2D diffraction patterns from a single experiment. Due to the stochastic nature of FXI experiments and the massive volumes of data, retrieving 3D electron densities from raw 2D diffraction patterns is a challenging and time-consuming task. We propose a semi-automatic data analysis pipeline for FXI experiments, which includes four steps: hit finding and preliminary filtering, pattern classification, 3D Fourier reconstruction, and post analysis. We also include a recently developed bootstrap methodology in the post-analysis step for uncertainty analysis and quality control. To achieve the best possible resolution, we further suggest using background subtraction, signal windowing, and convex optimization techniques when retrieving the Fourier phases in the post-analysis step. As an application example, we quantified the 3D electron structure of the PR772 virus using the proposed data-analysis pipeline. The retrieved structure was above the detector-edge resolution and clearly showed the pseudo-icosahedral capsid of the PR772.
The training of deep learning models generally requires a large amount of annotated data for effective convergence and generalisation. However, obtaining high-quality annotations is a laboursome and expensive process due to the need of expert radiologists for the labelling task. The study of semi-supervised learning in medical image analysis is then of crucial importance given that it is much less expensive to obtain unlabelled images than to acquire images labelled by expert radiologists.Essentially, semi-supervised methods leverage large sets of unlabelled data to enable better training convergence and generalisation than if we use only the small set of labelled images.In this paper, we propose the Self-supervised Mean Teacher for Semi-supervised (S$^2$MTS$^2$) learning that combines self-supervised mean-teacher pre-training with semi-supervised fine-tuning. The main innovation of S$^2$MTS$^2$ is the self-supervised mean-teacher pre-training based on the joint contrastive learning, which uses an infinite number of pairs of positive query and key features to improve the mean-teacher representation. The model is then fine-tuned using the exponential moving average teacher framework trained with semi-supervised learning.We validate S$^2$MTS$^2$ on the thorax disease multi-label classification problem from the dataset Chest X-ray14, where we show that it outperforms the previous SOTA semi-supervised learning methods by a large margin.
Single particle imaging (SPI) is a promising method for native structure determination which has undergone a fast progress with the development of X-ray Free-Electron Lasers. Large amounts of data are collected during SPI experiments, driving the need for automated data analysis. The necessary data analysis pipeline has a number of steps including binary object classification (single versus multiple hits). Classification and object detection are areas where deep neural networks currently outperform other approaches. In this work, we use the fast object detector networks YOLOv2 and YOLOv3. By exploiting transfer learning, a moderate amount of data is sufficient for training of the neural network. We demonstrate here that a convolutional neural network (CNN) can be successfully used to classify data from SPI experiments. We compare the results of classification for the two different networks, with different depth and architecture, by applying them to the same SPI data with different data representation. The best results are obtained for YOLOv2 color images linear scale classification, which shows an accuracy of about 97% with the precision and recall of about 52% and 61%, respectively, which is in comparison to manual data classification.
An outstanding question in X-ray single particle imaging experiments has been the feasibility of imaging sub 10-nm-sized biomolecules under realistic experimental conditions where very few photons are expected to be measured in a single snapshot and instrument background may be significant relative to particle scattering. While analyses of simulated data have shown that the determination of an average image should be feasible using Bayesian methods such as the EMC algorithm, this has yet to be demonstrated using experimental data containing realistic non-isotropic instrument background, sample variability and other experimental factors. In this work, we show that the orientation and phase retrieval steps work at photon counts diluted to the signal levels one expects from smaller molecules or with weaker pulses, using data from experimental measurements of 60-nm PR772 viruses. Even when the signal is reduced to a fraction as little as 1/256, the virus electron density determined using ab initio phasing is of almost the same quality as the high-signal data. However, we are still limited by the total number of patterns collected, which may soon be mitigated by the advent of high repetition-rate sources like the European XFEL and LCLS-II.