أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Haoyu wang

Learning Constraints and Descriptive Segmentation for Subevent Detection

439 - Haoyu Wang , Hongming Zhang , Muhao Chen 2021

Event mentions in text correspond to real-world events of varying degrees of granularity. The task of subevent detection aims to resolve this granularity issue, recognizing the membership of multi-granular events in event complexes. Since knowing the span of descriptive contexts of event complexes helps infer the membership of events, we propose the task of event-based text segmentation (EventSeg) as an auxiliary task to improve the learning for subevent detection. To bridge the two tasks together, we propose an approach to learning and enforcing constraints that capture dependencies between subevent detection and EventSeg prediction, as well as guiding the model to make globally consistent inference. Specifically, we adopt Rectifier Networks for constraint learning and then convert the learned constraints to a regularization term in the loss function of the neural model. Experimental results show that the proposed method outperforms baseline methods by 2.3% and 2.5% on benchmark datasets for subevent detection, HiEve and IC, respectively, while achieving a decent performance on EventSeg prediction.

الحساب واللغة

Demystifying Scam Tokens on Uniswap Decentralized Exchange

221 - Pengcheng Xia , Haoyu wang , Bingyu Gao 2021

The prosperity of the cryptocurrency ecosystem drives the needs for digital asset trading platforms. Beyond centralized exchanges (CEXs), decentralized exchanges (DEXs) are introduced to allow users to trade cryptocurrency without transferring the cu stody of their digital assets to the middlemen, thus eliminating the security and privacy issues of CEX. Uniswap, as the most prominent cryptocurrency DEX, is continuing to attract scammers, with fraudulent cryptocurrencies flooding in the ecosystem. In this paper, we take the first step to detect and characterize scam tokens on Uniswap. We first collect all the transactions related to Uniswap exchanges and investigate the landscape of cryptocurrency trading on Uniswap from different perspectives. Then, we propose an accurate approach for flagging scam tokens on Uniswap based on a guilt-by-association heuristic and a machine-learning powered technique. We have identified over 10K scam tokens listed on Uniswap, which suggests that roughly 50% of the tokens listed on Uniswap are scam tokens. All the scam tokens and liquidity pools are created specialized for the rug pull scams, and some scam tokens have embedded tricks and backdoors in the smart contracts. We further observe that thousands of collusion addresses help carry out the scams in league with the scam token/pool creators. The scammers have gained a profit of at least $16 million from 40,165 potential victims. Our observations in this paper suggest the urgency to identify and stop scams in the decentralized finance ecosystem.

التشفير والأمن

On long time asymptotic behavior of the defocusing schrodinger equation with finite density initial data

378 - Zhaoyu Wang , Engui Fan 2021

We consider the Cauchy problem for the defocusing Schr$ddot{text{o}}$dinger (NLS) equation with finite density initial data begin{align} &iq_t+q_{xx}-2(|q|^2-1)q=0, onumber &q(x,0)=q_0(x), quad lim_{x to pm infty}q_0(x)=pm 1. onumber end{align} Rece ntly, for the space-time region $|x/(2t)|<1$ without stationary phase points on the jump contour, Cuccagna and Jenkins presented the asymptotic stability of the $N$-soliton solutions for the NLS equation by using the $bar{partial}$ generalization of the nonlinear steepest descent method. Their asymptotic result is the form begin{align} q(x,t)= T(infty)^{-2} q^{sol,N}(x,t) + mathcal{O}(t^{-1 }). end{align} However, for the space-time region $ |x/(2t)|>1$, there will be two stationary points appearing on the jump contour, the corresponding long-time asymptotics is still unknown. In this paper, for the region $|x/(2t)|>1, x/t=mathcal{O}(1)$, we found a different asymptotic expansion $$ q(x,t)= e^{-ialpha(infty)} left( q_{sol}(x,t;sigma_d^{(out)}) +t^{-1/2} h(x,t) right)+mathcal{O}left(t^{-3/4}right),$$ whose leading term is $N$-soliton solutions; the second $t^{-1/2}$ order term is soliton-soliton and soliton-radiation interactions; and the third term $mathcal{O}(t^{-3/4})$ is a residual error from a $overlinepartial$-equation. Additionally, the asymptotic stability property for the N-soliton solutions of the defocusing NLS equation sufficiently is obtained.

تحليل PDES بالضبط النظم القابلة للاندماج والقابلة للتكام

Synthetic Active Distribution System Generation via Unbalanced Graph Generative Adversarial Network

67 - Rong Yan , Yuxuan Yuan , Zhaoyu Wang 2021

Real active distribution networks with associated smart meter (SM) data are critical for power researchers. However, it is practically difficult for researchers to obtain such comprehensive datasets from utilities due to privacy concerns. To bridge t his gap, an implicit generative model with Wasserstein GAN objectives, namely unbalanced graph generative adversarial network (UG-GAN), is designed to generate synthetic three-phase unbalanced active distribution system connectivity. The basic idea is to learn the distribution of random walks both over a real-world system and across each phase of line segments, capturing the underlying local properties of an individual real-world distribution network and generating specific synthetic networks accordingly. Then, to create a comprehensive synthetic test case, a network correction and extension process is proposed to obtain time-series nodal demands and standard distribution grid components with realistic parameters, including distributed energy resources (DERs) and capacity banks. A Midwest distribution system with 1-year SM data has been utilized to validate the performance of our method. Case studies with several power applications demonstrate that synthetic active networks generated by the proposed framework can mimic almost all features of real-world networks while avoiding the disclosure of confidential information.

أنظمة وتحكم التعلم الآلي أنظمة وتحكم

The clustering of galaxies in the DESI imaging legacy surveys DR8: I. the luminosity and color dependent intrinsic clustering

427 - Zhaoyu Wang , Haojie Xu , Xiaohu Yang 2021

In a recent study, we developed a method to model the impact of photometric redshift uncertainty on the two-point correlation function (2PCF). In this method, we can obtain both the intrinsic clustering strength and the photometric redshift errors si multaneously by fitting the projected 2PCF with two integration depths along the line-of-sight. Here we apply this method to the DESI Legacy Imaging Surveys Data Release 8 (LS DR8), the largest galaxy sample currently available. We separate galaxies into 20 samples in 8 redshift bins from $z=0.1$ to $z=1.0$, and a few $rm z$-band absolute magnitude bins, with $M_{rm z} le -20$. These galaxies are further separated into red and blue sub-samples according to their $M^{0.5}_{rm r}-M^{0.5}_{rm z}$ colors. We measure the projected 2PCFs for all these galaxy (sub-)samples, and fit them using our photometric redshift 2PCF model. We find that the photometric redshift errors are smaller in red sub-samples than the overall population. On the other hand, there might be some systematic photometric redshift errors in the blue sub-samples, so that some of the sub-samples show significantly enhanced 2PCF at large scales. Therefore, focusing only on the red and all (sub-)samples, we find that the biases of galaxies in these (sub-)samples show clear color, redshift and luminosity dependencies, in that red brighter galaxies at higher redshift are more biased than their bluer and low redshift counterparts. Apart from the best fit set of parameters, $sigma_{z}$ and $b$, from this state-of-the-art photometric redshift survey, we obtain high precision intrinsic clustering measurements for these 40 red and all galaxy (sub-)samples. These measurements on large and small scales hold important information regarding the cosmology and galaxy formation, which will be used in our subsequent probes in this series.

الفيزياء الفلكية من المجرات علم الكونيات والفيزياء الفلكية Nongalactic

Multimodal Emergent Fake News Detection via Meta Neural Process Networks

107 - Yaqing Wang , Fenglong Ma , Haoyu Wang 2021

Fake news travels at unprecedented speeds, reaches global audiences and puts users and communities at great risk via social media platforms. Deep learning based models show good performance when trained on large amounts of labeled data on events of i nterest, whereas the performance of models tends to degrade on other events due to domain shift. Therefore, significant challenges are posed for existing detection approaches to detect fake news on emergent events, where large-scale labeled datasets are difficult to obtain. Moreover, adding the knowledge from newly emergent events requires to build a new model from scratch or continue to fine-tune the model, which can be challenging, expensive, and unrealistic for real-world settings. In order to address those challenges, we propose an end-to-end fake news detection framework named MetaFEND, which is able to learn quickly to detect fake news on emergent events with a few verified posts. Specifically, the proposed model integrates meta-learning and neural process methods together to enjoy the benefits of these approaches. In particular, a label embedding module and a hard attention mechanism are proposed to enhance the effectiveness by handling categorical information and trimming irrelevant posts. Extensive experiments are conducted on multimedia datasets collected from Twitter and Weibo. The experimental results show our proposed MetaFEND model can detect fake news on never-seen events effectively and outperform the state-of-the-art methods.

استرجاع المعلومات الحساب واللغة

Ethereum Name Service: the Good, the Bad, and the Ugly

390 - Pengcheng Xia , Haoyu Wang , Zhou Yu 2021

DNS has always been criticized for its inherent design flaws, making the system vulnerable to kinds of attacks. Besides, DNS domain names are not fully controlled by the users, which can be easily taken down by the authorities and registrars. Since b lockchain has its unique properties like immutability and decentralization, it seems to be promising to build a decentralized name service on blockchain. Ethereum Name Service (ENS), as a novel name service built atop Etheruem, has received great attention from the community. Yet, no existing work has systematically studied this emerging system, especially the security issues and misbehaviors in ENS. To fill the void, we present the first large-scale study of ENS by collecting and analyzing millions of event logs related to ENS. We characterize the ENS system from a number of perspectives. Our findings suggest that ENS is showing gradually popularity during its four years evolution, mainly due to its distributed and open nature that ENS domain names can be set to any kinds of records, even censored and malicious contents. We have identified several security issues and misbehaviors including traditional DNS security issues and new issues introduced by ENS smart contracts. Attackers are abusing the system with thousands of squatting ENS names, a number of scam blockchain addresses and malicious websites, etc. Our exploration suggests that our community should invest more effort into the detection and mitigation of issues in Blockchain-based Name Services towards building an open and trustworthy name service.

التشفير والأمن

A Systematical Study on Application Performance Management Libraries for Apps

64 - Yutian Tang , Haoyu Wang , Xian Zhan 2021

Being able to automatically detect the performance issues in apps can significantly improve apps quality as well as having a positive influence on user satisfaction. Application Performance Management (APM) libraries are used to locate the apps perfo rmance bottleneck, monitor their behaviors at runtime, and identify potential security risks. Although app developers have been exploiting application performance management (APM) tools to capture these potential performance issues, most of them do not fully understand the internals of these APM tools and the effect on their apps. To fill this gap, in this paper, we conduct the first systematic study on APMs for apps by scrutinizing 25 widely-used APMs for Android apps and develop a framework named APMHunter for exploring the usage of APMs in Android apps. Using APMHunter, we conduct a large-scale empirical study on 500,000 Android apps to explore the usage patterns of APMs and discover the potential misuses of APMs. We obtain two major findings: 1) some APMs still employ deprecated permissions and approaches, which makes APMs fail to perform as expected; 2) inappropriate use of APMs can cause privacy leaks. Thus, our study suggests that both APM vendors and developers should design and use APMs scrupulously.

هندسة البرمجيات

DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection

350 - Yuanchun Li , Jiayi Hua , Haoyu Wang 2021

Deep learning models are increasingly used in mobile applications as critical components. Unlike the program bytecode whose vulnerabilities and threats have been widely-discussed, whether and how the deep learning models deployed in the applications can be compromised are not well-understood since neural networks are usually viewed as a black box. In this paper, we introduce a highly practical backdoor attack achieved with a set of reverse-engineering techniques over compiled deep learning models. The core of the attack is a neural conditional branch constructed with a trigger detector and several operators and injected into the victim model as a malicious payload. The attack is effective as the conditional logic can be flexibly customized by the attacker, and scalable as it does not require any prior knowledge from the original model. We evaluated the attack effectiveness using 5 state-of-the-art deep learning models and real-world samples collected from 30 users. The results demonstrated that the injected backdoor can be triggered with a success rate of 93.5%, while only brought less than 2ms latency overhead and no more than 1.4% accuracy decrease. We further conducted an empirical study on real-world mobile deep learning apps collected from Google Play. We found 54 apps that were vulnerable to our attack, including popular and security-critical ones. The results call for the awareness of deep learning application developers and auditors to enhance the protection of deployed models.

التشفير والأمن الذكاء الاصطناعي هندسة البرمجيات

Unsupervised Label-aware Event Trigger and Argument Classification

88 - Hongming Zhang , Haoyu Wang , Dan Roth 2020

Identifying events and mapping them to pre-defined event types has long been an important natural language processing problem. Most previous work has been heavily relying on labor-intensive and domain-specific annotations while ignoring the semantic meaning contained in the labels of the event types. As a result, the learned models cannot effectively generalize to new domains, where new event types could be introduced. In this paper, we propose an unsupervised event extraction pipeline, which first identifies events with available tools (e.g., SRL) and then automatically maps them to pre-defined event types with our proposed unsupervised classification model. Rather than relying on annotated data, our model matches the semantics of identified events with those of event type labels. Specifically, we leverage pre-trained language models to contextually represent pre-defined types for both event triggers and arguments. After we map identified events to the target types via representation similarity, we use the event ontology (e.g., argument type Victim can only appear as the argument of event type Attack) as global constraints to regularize the prediction. The proposed approach is shown to be very effective when tested on the ACE-2005 dataset, which has 33 trigger and 22 argument types. Without using any annotation, we successfully map 83% of the triggers and 54% of the arguments to the correct types, almost doubling the performance of previous zero-shot approaches.

الحساب واللغة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد