Identifying Hidden Buyers in Darknet Markets via Dirichlet Hawkes Process

66 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xintao Wu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Panpan Zheng - Shuhan Yuan - Xintao Wu

التعلم الآلي أجهزة الكمبيوتر والمجتمع الشبكات الاجتماعية والمعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The darknet markets are notorious black markets in cyberspace, which involve selling or brokering drugs, weapons, stolen credit cards, and other illicit goods. To combat illicit transactions in the cyberspace, it is important to analyze the behaviors of participants in darknet markets. Currently, many studies focus on studying the behavior of vendors. However, there is no much work on analyzing buyers. The key challenge is that the buyers are anonymized in darknet markets. For most of the darknet markets, We only observe the first and last digits of a buyers ID, such as ``a**b. To tackle this challenge, we propose a hidden buyer identification model, called UNMIX, which can group the transactions from one hidden buyer into one cluster given a transaction sequence from an anonymized ID. UNMIX is able to model the temporal dynamics information as well as the product, comment, and vendor information associated with each transaction. As a result, the transactions with similar patterns in terms of time and content group together as the subsequence from one hidden buyer. Experiments on the data collected from three real-world darknet markets demonstrate the effectiveness of our approach measured by various clustering metrics. Case studies on real transaction sequences explicitly show that our approach can group transactions with similar patterns into the same clusters.

قيم البحث

97 - Gael Poux-Medard , Julien Velcin , Sabine Loudcher 2021

The textual content of a document and its publication date are intertwined. For example, the publication of a news article on a topic is influenced by previous publications on similar issues, according to underlying temporal dynamics. However, it can be challenging to retrieve meaningful information when textual information conveys little information or when temporal dynamics are hard to unveil. Furthermore, the textual content of a document is not always linked to its temporal dynamics. We develop a flexible method to create clusters of textual documents according to both their content and publication time, the Powered Dirichlet-Hawkes process (PDHP). We show PDHP yields significantly better results than state-of-the-art models when temporal information or textual content is weakly informative. The PDHP also alleviates the hypothesis that textual content and temporal dynamics are always perfectly correlated. PDHP allows retrieving textual clusters, temporal clusters, or a mixture of both with high accuracy when they are not. We demonstrate that PDHP generalizes previous work --such as the Dirichlet-Hawkes process (DHP) and Uniform process (UP). Finally, we illustrate the changes induced by PDHP over DHP and UP in a real-world application using Reddit data.

التعلم الآلي الرياضيات المتقطعة استرجاع المعلومات

Dynamic Embedding on Textual Networks via a Gaussian Process

152 - Pengyu Cheng , Yitong Li , Xinyuan Zhang 2019

Textual network embedding aims to learn low-dimensional representations of text-annotated nodes in a graph. Prior work in this area has typically focused on fixed graph structures; however, real-world networks are often dynamic. We address this chall enge with a novel end-to-end node-embedding model, called Dynamic Embedding for Textual Networks with a Gaussian Process (DetGP). After training, DetGP can be applied efficiently to dynamic graphs without re-training or backpropagation. The learned representation of each node is a combination of textual and structural embeddings. Because the structure is allowed to be dynamic, our method uses the Gaussian process to take advantage of its non-parametric properties. To use both local and global graph structures, diffusion is used to model multiple hops between neighbors. The relative importance of global versus local structure for the embeddings is learned automatically. With the non-parametric nature of the Gaussian process, updating the embeddings for a changed graph structure requires only a forward pass through the learned model. Considering link prediction and node classification, experiments demonstrate the empirical effectiveness of our method compared to baseline approaches. We further show that DetGP can be straightforwardly and efficiently applied to dynamic textual networks.

التعلم الآلي الحساب واللغة الشبكات الاجتماعية والمعلومات

Chi-squared Amplification: Identifying Hidden Hubs

99 - Ravi Kannan , Santosh Vempala 2016

We consider the following general hidden hubs model: an $n times n$ random matrix $A$ with a subset $S$ of $k$ special rows (hubs): entries in rows outside $S$ are generated from the probability distribution $p_0 sim N(0,sigma_0^2)$; for each row in $S$, some $k$ of its entries are generated from $p_1 sim N(0,sigma_1^2)$, $sigma_1>sigma_0$, and the rest of the entries from $p_0$. The problem is to identify the high-degree hubs efficiently. This model includes and significantly generalizes the planted Gaussian Submatrix Model, where the special entries are all in a $k times k$ submatrix. There are two well-known barriers: if $kgeq csqrt{nln n}$, just the row sums are sufficient to find $S$ in the general model. For the submatrix problem, this can be improved by a $sqrt{ln n}$ factor to $k ge csqrt{n}$ by spectral methods or combinatorial methods. In the variant with $p_0=pm 1$ (with probability $1/2$ each) and $p_1equiv 1$, neither barrier has been broken. We give a polynomial-time algorithm to identify all the hidden hubs with high probability for $k ge n^{0.5-delta}$ for some $delta >0$, when $sigma_1^2>2sigma_0^2$. The algorithm extends to the setting where planted entries might have different variances each at least as large as $sigma_1^2$. We also show a nearly matching lower bound: for $sigma_1^2 le 2sigma_0^2$, there is no polynomial-time Statistical Query algorithm for distinguishing between a matrix whose entries are all from $N(0,sigma_0^2)$ and a matrix with $k=n^{0.5-delta}$ hidden hubs for any $delta >0$. The lower bound as well as the algorithm are related to whether the chi-squared distance of the two distributions diverges. At the critical value $sigma_1^2=2sigma_0^2$, we show that the general hidden hubs problem can be solved for $kgeq csqrt n(ln n)^{1/4}$, improving on the naive row sum-based method.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

A Fractional Hawkes process

91 - J. Chen , A.G. Hawkes , E. Scalas 2020

We modify ETAS models by replacing the Pareto-like kernel proposed by Ogata with a Mittag-Leffler type kernel. Provided that the kernel decays as a power law with exponent $beta + 1 in (1,2]$, this replacement has the advantage that the Laplace trans form of the Mittag-Leffler function is known explicitly, leading to simpler calculation of relevant quantities.

الاحتمالات نظرية الإحصاء نظرية الإحصاء

Age Dependent Hawkes Process

68 - Mads Bonde Raad , Susanne Ditlevsen , Eva Locherbach 2018

In the last decade, Hawkes processes have received a lot of attention as good models for functional connectivity in neural spiking networks. In this paper we consider a variant of this process, the Age Dependent Hawkes process, which incorporates ind ividual post-jump behaviour into the framework of the usual Hawkes model. This allows to model recovery properties such as refractory periods, where the effects of the network are momentarily being suppressed or altered. We show how classical stability results for Hawkes processes can be improved by introducing age into the system. In particular, we neither need to a priori bound the intensities nor to impose any conditions on the Lipschitz constants. When the interactions between neurons are of mean field type, we study large network limits and establish the propagation of chaos property of the system.

الاحتمالات