No Arabic abstract
In this paper we present an interference detection toolbox consisting of a high dynamic range Digital Fast-Fourier-Transform spectrometer (DFFT, based on FPGA-technology) and data analysis software for automated radio frequency interference (RFI) detection. The DFFT spectrometer allows high speed data storage of spectra on time scales of less than a second. The high dynamic range of the device assures constant calibration even during extremely powerful RFI events. The software uses an algorithm which performs a two-dimensional baseline fit in the time-frequency domain, searching automatically for RFI signals superposed on the spectral data. We demonstrate, that the software operates successfully on computer-generated RFI data as well as on real DFFT data recorded at the Effelsberg 100-m telescope. At 21-cm wavelength RFI signals can be identified down to the 4-sigma level. A statistical analysis of all RFI events detected in our observational data revealed that: (1) mean signal strength is comparable to the astronomical line emission of the Milky Way, (2) interferences are polarised, (3) electronic devices in the neighbourhood of the telescope contribute significantly to the RFI radiation. We also show that the radiometer equation is no longer fulfilled in presence of RFI signals.
Security operation centers (SOCs) typically use a variety of tools to collect large volumes of host logs for detection and forensic of intrusions. Our experience, supported by recent user studies on SOC operators, indicates that operators spend ample time (e.g., hundreds of man-hours) on investigations into logs seeking adversarial actions. Similarly, reconfiguration of tools to adapt detectors for future similar attacks is commonplace upon gaining novel insights (e.g., through internal investigation or shared indicators). This paper presents an automated malware pattern-extraction and early detection tool, testing three machine learning approaches: TF-IDF (term frequency-inverse document frequency), Fishers LDA (linear discriminant analysis) and ET (extra trees/extremely randomized trees) that can (1) analyze freshly discovered malware samples in sandboxes and generate dynamic analysis reports (host logs); (2) automatically extract the sequence of events induced by malware given a large volume of ambient (un-attacked) host logs, and the relatively few logs from hosts that are infected with potentially polymorphic malware; (3) rank the most discriminating features (unique patterns) of malware and from the learned behavior detect malicious activity; and (4) allows operators to visualize the discriminating features and their correlations to facilitate malware forensic efforts. To validate the accuracy and efficiency of our tool, we design three experiments and test seven ransomware attacks (i.e., WannaCry, DBGer, Cerber, Defray, GandCrab, Locky, and nRansom). The experimental results show that TF-IDF is the best of the three methods to identify discriminating features, and ET is the most time-efficient and robust approach.
We present a novel methodology for automated feature subset selection from a pool of physiological signals using Quantum Annealing (QA). As a case study, we will investigate the effectiveness of QA-based feature selection techniques in selecting the optimal feature subset for stress detection. Features are extracted from four signal sources: foot EDA, hand EDA, ECG, and respiration. The proposed method embeds the feature variables extracted from the physiological signals in a binary quadratic model. The bias of the feature variable is calculated using the Pearson correlation coefficient between the feature variable and the target variable. The weight of the edge connecting the two feature variables is calculated using the Pearson correlation coefficient between two feature variables in the binary quadratic model. Subsequently, D-Waves clique sampler is used to sample cliques from the binary quadratic model. The underlying solution is then re-sampled to obtain multiple good solutions and the clique with the lowest energy is returned as the optimal solution. The proposed method is compared with commonly used feature selection techniques for stress detection. Results indicate that QA-based feature subset selection performed equally as that of classical techniques. However, under data uncertainty conditions such as limited training data, the performance of quantum annealing for selecting optimum features remained unaffected, whereas a significant decrease in performance is observed with classical feature selection techniques. Preliminary results show the promise of quantum annealing in optimizing the training phase of a machine learning classifier, especially under data uncertainty conditions.
In this work we explore the possibility of applying machine learning methods designed for one-dimensional problems to the task of galaxy image classification. The algorithms used for image classification typically rely on multiple costly steps, such as the Point Spread Function (PSF) deconvolution and the training and application of complex Convolutional Neural Networks (CNN) of thousands or even millions of parameters. In our approach, we extract features from the galaxy images by analysing the elliptical isophotes in their light distribution and collect the information in a sequence. The sequences obtained with this method present definite features allowing a direct distinction between galaxy types, as opposed to smooth Sersic profiles. Then, we train and classify the sequences with machine learning algorithms, designed through the platform Modulos AutoML, and study how they optimize the classification task. As a demonstration of this method, we use the second public release of the Dark Energy Survey (DES DR2). We show that by applying it to this sample we are able to successfully distinguish between early-type and late-type galaxies, for images with signal-to-noise ratio greater then 300. This yields an accuracy of $86%$ for the early-type galaxies and $93%$ for the late-type galaxies, which is on par with most contemporary automated image classification approaches. Our novel method allows for galaxy images to be accurately classified and is faster than other approaches. Data dimensionality reduction also implies a significant lowering in computational cost. In the perspective of future data sets obtained with e.g. Euclid and the Vera Rubin Observatory (VRO), this work represents a path towards using a well-tested and widely used platform from industry in efficiently tackling galaxy classification problems at the peta-byte scale.
Epilepsy is a neurological disorder classified as the second most serious neurological disease known to humanity, after stroke. Localization of the epileptogenic zone is an important step for epileptic patient treatment, which starts with epileptic spike detection. The common practice for spike detection of brain signals is via visual scanning of the recordings, which is a subjective and a very time-consuming task. Motivated by that, this paper focuses on using machine learning for automatic detection of epileptic spikes in magnetoencephalography (MEG) signals. First, we used the Position Weight Matrix (PWM) method combined with a uniform quantizer to generate useful features. Second, the extracted features are classified using a Support Vector Machine (SVM) for the purpose of epileptic spikes detection. The proposed technique shows great potential in improving the spike detection accuracy and reducing the feature vector size. Specifically, the proposed technique achieved average accuracy up to 98% in using 5-folds cross-validation applied to a balanced dataset of 3104 samples. These samples are extracted from 16 subjects where eight are healthy and eight are epileptic subjects using a sliding frame of size of 100 samples-points with a step-size of 2 sample-points
We improve our filament automated detection method which was proposed in our previous works. It is then applied to process the full disk H$alpha$ data mainly obtained by Big Bear Solar Observatory (BBSO) from 1988 to 2013, spanning nearly 3 solar cycles. The butterfly diagrams of the filaments, showing the information of the filament area, spine length, tilt angle, and the barb number, are obtained. The variations of these features with the calendar year and the latitude band are analyzed. The drift velocities of the filaments in different latitude bands are calculated and studied. We also investigate the north-south (N-S) asymmetries of the filament numbers in total and in each subclass classified according to the filament area, spine length, and tilt angle. The latitudinal distribution of the filament number is found to be bimodal. About 80% of all the filaments have tilt angles within [0{deg}, 60{deg}]. For the filaments within latitudes lower (higher) than 50{deg} the northeast (northwest) direction is dominant in the northern hemisphere and the southeast (southwest) direction is dominant in the southern hemisphere. The latitudinal migrations of the filaments experience three stages with declining drift velocities in each of solar cycles 22 and 23, and it seems that the drift velocity is faster in shorter solar cycles. Most filaments in latitudes lower (higher) than 50{deg} migrate toward the equator (polar region). The N-S asymmetry indices indicate that the southern hemisphere is the dominant hemisphere in solar cycle 22 and the northern hemisphere is the dominant one in solar cycle 23.