Tracking and Characterizing Botnets Using Automatically Generated Domains

91 0 0.0 ( 0 )

Download Cite

Added by Lorenzo Cavallaro

Publication date 2013

fields Informatics Engineering

and research's language is English

Authors Stefano Schiavoni Politecnico di Milano

Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Modern botnets rely on domain-generation algorithms (DGAs) to build resilient command-and-control infrastructures. Recent works focus on recognizing automatically generated domains (AGDs) from DNS traffic, which potentially allows to identify previously unknown AGDs to hinder or disrupt botnets communication capabilities. The state-of-the-art approaches require to deploy low-level DNS sensors to access data whose collection poses practical and privacy issues, making their adoption problematic. We propose a mechanism that overcomes the above limitations by analyzing DNS traffic data through a combination of linguistic and IP-based features of suspicious domains. In this way, we are able to identify AGD names, characterize their DGAs and isolate logical groups of domains that represent the respective botnets. Moreover, our system enriches these groups with new, previously unknown AGD names, and produce novel knowledge about the evolving behavior of each tracked botnet. We used our system in real-world settings, to help researchers that requested intelligence on suspicious domains and were able to label them as belonging to the correct botnet automatically. Additionally, we ran an evaluation on 1,153,516 domains, including AGDs from both modern (e.g., Bamital) and traditional (e.g., Conficker, Torpig) botnets. Our approach correctly isolated families of AGDs that belonged to distinct DGAs, and set automatically generated from non-automatically generated domains apart in 94.8 percent of the cases.

rate research

Identifying Automatically Generated Headlines using Transformers

167 - Antonis Maronikolakis , Hinrich Schutze , Mark Stevenson 2020

False information spread via the internet and social media influences public opinion and user activity, while generative models enable fake content to be generated faster and more cheaply than had previously been possible. In the not so distant future, identifying fake content generated by deep learning models will play a key role in protecting users from misinformation. To this end, a dataset containing human and computer-generated headlines was created and a user study indicated that humans were only able to identify the fake headlines in 47.8% of the cases. However, the most accurate automatic approach, transformers, achieved an overall accuracy of 85.7%, indicating that content generated from language models can be filtered out accurately.

Computation and Language Computers and Society Machine Learning

Evaluating Automatically Generated Phoneme Captions for Images

106 - Justin van der Hout , Zoltan DHaese , Mark Hasegawa-Johnson 2020

Image2Speech is the relatively new task of generating a spoken description of an image. This paper presents an investigation into the evaluation of this task. For this, first an Image2Speech system was implemented which generates image captions consisting of phoneme sequences. This system outperformed the original Image2Speech system on the Flickr8k corpus. Subsequently, these phoneme captions were converted into sentences of words. The captions were rated by human evaluators for their goodness of describing the image. Finally, several objective metric scores of the results were correlated with these human ratings. Although BLEU4 does not perfectly correlate with human ratings, it obtained the highest correlation among the investigated metrics, and is the best currently existing metric for the Image2Speech task. Current metrics are limited by the fact that they assume their input to be words. A more appropriate metric for the Image2Speech task should assume its input to be parts of words, i.e. phonemes, instead.

Computation and Language Computer Vision and Pattern Recognition

Automatically tracking neurons in a moving and deforming brain

201 - Jeffrey P. Nguyen , Ashley N. Linder , George S. Plummer 2016

Advances in optical neuroimaging techniques now allow neural activity to be recorded with cellular resolution in awake and behaving animals. Brain motion in these recordings pose a unique challenge. The location of individual neurons must be tracked in 3D over time to accurately extract single neuron activity traces. Recordings from small invertebrates like C. elegans are especially challenging because they undergo very large brain motion and deformation during animal movement. Here we present an automated computer vision pipeline to reliably track populations of neurons with single neuron resolution in the brain of a freely moving C. elegans undergoing large motion and deformation. 3D volumetric fluorescent images of the animals brain are straightened, aligned and registered, and the locations of neurons in the images are found via segmentation. Each neuron is then assigned an identity using a new time-independent machine-learning approach we call Neuron Registration Vector Encoding. In this approach, non-rigid point-set registration is used to match each segmented neuron in each volume with a set of reference volumes taken from throughout the recording. The way each neuron matches with the references defines a feature vector which is clustered to assign an identity to each neuron in each volume. Finally, thin-plate spline interpolation is used to correct errors in segmentation and check consistency of assigned identities. The Neuron Registration Vector Encoding approach proposed here is uniquely well suited for tracking neurons in brains undergoing large deformations. When applied to whole-brain calcium imaging recordings in freely moving C. elegans, this analysis pipeline located 150 neurons for the duration of an 8 minute recording and consistently found more neurons more quickly than manual or semi-automated approaches.

Neurons and Cognition Computer Vision and Pattern Recognition Biological Physics

Attack as Defense: Characterizing Adversarial Examples using Robustness

80 - Zhe Zhao , Guangke Chen , Jingyi Wang 2021

As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an examples robustness. A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches. We also evaluate our defence against potential adaptive attacks and show that A2D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.

Cryptography and Security Artificial Intelligence Software Engineering

Automatically Generate Steganographic Text Based on Markov Model and Huffman Coding

138 - Zhongliang Yang , Shuyu Jin , Yongfeng Huang 2018

Steganography, as one of the three basic information security systems, has long played an important role in safeguarding the privacy and confidentiality of data in cyberspace. The text is the most widely used information carrier in peoples daily life, using text as a carrier for information hiding has broad research prospects. However, due to the high coding degree and less information redundancy in the text, it has been an extremely challenging problem to hide information in it for a long time. In this paper, we propose a steganography method which can automatically generate steganographic text based on the Markov chain model and Huffman coding. It can automatically generate fluent text carrier in terms of secret information which need to be embedded. The proposed model can learn from a large number of samples written by people and obtain a good estimate of the statistical language model. We evaluated the proposed model from several perspectives. Experimental results show that the performance of the proposed model is superior to all the previous related methods in terms of information imperceptibility and information hidden capacity.

Cryptography and Security