أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Mao Ye

FOX: Hardware-Assisted File Auditing for Direct Access NVM-Hosted Filesystems

145 - Mao Ye 2021

With emerging non-volatile memories entering the mainstream market, several operating systems start to incorporate new changes and optimizations. One major OS support is the direct-access for files, which enables efficient access for files hosted in byte-addressable NVM systems. With DAX-enabled filesystems, files can be accessed directly similar to memory with typical load/store operations. Despite its efficiency, the frequently used system call of direct access is troublesome for system auditing. File system auditing is mandatory and widely used because auditing logs can help detect anomalies, suspicious file accesses, or be used as an evidence in digital forensics. However, the frequent and long-time usage of direct access call blinds the operating system or file system from tracking process operations to shared files after the initial page faults. This might results in imprecise casualty analysis and leads to false conclusion for attack detection. To remedy the tension between enabling fine-grained file system auditing and leveraging the performance of NVM-hosted file systems, we propose a novel hardware-assisted auditing scheme, FOX. FOX enables file system auditing through lightweight hardware-software changes which can monitor every read or write event for mapped files on NVM. Additionally, we propose the optimized schemes, that enable auditing flexibility for selected files/memory range. By prototyping FOX on a full system simulator, Gem5, we observe a relatively small reduced throughput and an acceptable extra writes compared to our baseline. Compared to other instrumentation-based software schemes, our scheme is low-overhead and secure.

هندسة العتاد

VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments

263 - Lizhen Nie , Mao Ye , Qiang Liu 2021

Motivated by the rising abundance of observational data with continuous treatments, we investigate the problem of estimating the average dose-response curve (ADRF). Available parametric methods are limited in their model space, and previous attempts in leveraging neural network to enhance model expressiveness relied on partitioning continuous treatment into blocks and using separate heads for each block; this however produces in practice discontinuous ADRFs. Therefore, the question of how to adapt the structure and training of neural network to estimate ADRFs remains open. This paper makes two important contributions. First, we propose a novel varying coefficient neural network (VCNet) that improves model expressiveness while preserving continuity of the estimated ADRF. Second, to improve finite sample performance, we generalize targeted regularization to obtain a doubly robust estimator of the whole ADRF curve.

التعلم الآلي التعلم الالي

QoS-aware Link Scheduling Strategy for Data Transmission in SDVN

129 - Yong Zhang , Mao Ye , Lin Guan 2021

The vehicular ad-hoc network (VANET) based on dedicated short-range communication (DSRC) is a distributed communication system, in which all the nodes share the wireless channel with carrier sense multiple access/collision avoid (CSMA/CA) protocol. H owever, the backoff mechanism of CSMA/CA in the channel contention might cause uncertain transmission delay and impede a certain quality of service (QoS) of applications. Moreover, there still exists a possibility of parlous data-packets collision, especially for broadcast or non-acknowledgement (NACK) transmissions. The original contributions of this paper are summarized as follows: (1) Model the packets collision probability of broadcast or NACK transmission in VANET with the combination theory and investigate the potential influence of miss my packets (MMP) problem. (2) Based on the software define vehicular network (SDVN) framework and QoS requirement, a novel link-level scheduling strategy, which determines the start-sending time for each connection, is proposed to maximize packets delivery ratio (PDR). Alternatively, maximizing PDR has been converted to the overlap minimization among transmission durations. (3) Meanwhile, an innovative transmission scheduling greedy search (TSGS) algorithm is originally proposed to mitigate computational complexity. Extensive simulations have been done in a unified platform Veins combining SUMO and OMNET++. And numerous results show that the proposed algorithm can effectively improve the PDR by at least 15%, enhance the collision-avoidance performance by almost 40%, and reduce the MMP ratio by about 3% compared with the random transmitting, meanwhile meet the QoS requirement.

بنية الشبكات والإنترنت

Overlap-Minimization Scheduling Strategy for Data Transmission in VANET

421 - Yong Zhang , Mao Ye , Lin Guan 2021

The vehicular ad-hoc network (VANET) based on dedicated short-range communication (DSRC) is a distributed communication system, in which all the nodes share the wireless channel with carrier sense multiple access/collision avoid (CSMA/CA) protocol. H owever, the competition and backoff mechanisms of CSMA/CA often bring additional delays and data packet collisions, which may hardly meet the QoS requirements in terms of delay and packets delivery ratio (PDR). Moreover, because of the distribution nature of security information in broadcast mode, the sender cannot know whether the receivers have received the information successfully. Similarly, this problem also exists in no-acknowledge (non-ACK) transmissions of VANET. Therefore, the probability of packet collisions should be considered in broadcast or non-ACK working modes. This paper presents a connection-level scheduling algorithm overlaid on CSMA/CA to schedule the start sending time of each transmission. By converting the object of reducing collision probability to minimizing the overlap of transmission durations of connections, the probability of backoff-activation can be greatly decreased. Then the delay and the probability of packet collisions can also be decreased. Numerical simulations have been conducted in our unified platform containing SUMO, Veins and Omnet++. The result shows that the proposed algorithm can effectively improve the PDR and reduce the packets collision in VANET.

بنية الشبكات والإنترنت

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

223 - Mao Ye , Lemeng Wu , Qiang Liu 2020

Despite the great success of deep learning, recent works show that large deep neural networks are often highly redundant and can be significantly reduced in size. However, the theoretical question of how much we can prune a neural network given a spe cified tolerance of accuracy drop is still open. This paper provides one answer to this question by proposing a greedy optimization based pruning method. The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate w.r.t. the size of the pruned network, under weak assumptions that apply for most practical settings. Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.

التعلم الآلي التحسين والتحكم التعلم الالي

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

54 - Mao Ye , Dhruv Choudhary , Jiecao Yu 2020

Large scale deep learning provides a tremendous opportunity to improve the quality of content recommendation systems by employing both wider and deeper models, but this comes at great infrastructural cost and carbon footprint in modern data centers. Pruning is an effective technique that reduces both memory and compute demand for model inference. However, pruning for online recommendation systems is challenging due to the continuous data distribution shift (a.k.a non-stationary data). Although incremental training on the full model is able to adapt to the non-stationary data, directly applying it on the pruned model leads to accuracy loss. This is because the sparsity pattern after pruning requires adjustment to learn new patterns. To the best of our knowledge, this is the first work to provide in-depth analysis and discussion of applying pruning to online recommendation systems with non-stationary data distribution. Overall, this work makes the following contributions: 1) We present an adaptive dense to sparse paradigm equipped with a novel pruning algorithm for pruning a large scale recommendation system with non-stationary data distribution; 2) We design the pruning algorithm to automatically learn the sparsity across layers to avoid repeating hand-tuning, which is critical for pruning the heterogeneous architectures of recommendation systems trained with non-stationary data.

التعلم الآلي

TDMP-Reliable Target Driven and Mobility Prediction based Routing Protocol in Complex VANET

186 - Mao Ye , Lin Guan , Mohammed Quddus 2020

Vehicle-to-everything (V2X) communication in the vehicular ad hoc network (VANET), an infrastructure-free mechanism, has emerged as a crucial component in the advanced Intelligent Transport System (ITS) for special information transmission and inter- vehicular communications. One of the main research challenges in VANET is the design and implementation of network routing protocols which manage to trigger V2X communication with the reliable end-to-end connectivity and efficient packet transmission. The organically changing nature of road transport vehicles poses a significant threat to VANET with respect to the accuracy and reliability of packet delivery. Therefore, a position-based routing protocol tends to be the predominant method in VANET as they overcome rapid changes in vehicle movements effectively. However, existing routing protocols have some limitations such as (i) inaccurate in high dynamic network topology, (ii) defective link-state estimation (iii) poor movement prediction in heterogeneous road layouts. In this paper, a target-driven and mobility prediction (TDMP) based routing protocol is therefore developed for high-speed mobility and dynamic topology of vehicles, fluctuant traffic flow and diverse road layouts in VANET. The primary idea in TDMP is that the destination target of a driver is included in the mobility prediction to assist the implementation of the routing protocol. Compared to existing geographic routing protocols which mainly greedily forward the packet to the next-hop based on its current position and partial road layout, TDMP is developed to enhance the packet transmission with the consideration of the estimation of inter-vehicles link status, and the prediction of vehicle positions dynamically in fluctuant mobility and global road layout.

بنية الشبكات والإنترنت

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

79 - Denny Zhou , Mao Ye , Chen Chen 2020

For deploying a deep learning model into production, it needs to be both accurate and compact to meet the latency and memory constraints. This usually results in a network that is deep (to ensure performance) and yet thin (to improve computational ef ficiency). In this paper, we propose an efficient method to train a deep thin network with a theoretic guarantee. Our method is motivated by model compression. It consists of three stages. First, we sufficiently widen the deep thin network and train it until convergence. Then, we use this well-trained deep wide network to warm up (or initialize) the original deep thin network. This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer. Finally, we further fine tune this already well-initialized deep thin network. The theoretical guarantee is established by using the neural mean field analysis. It demonstrates the advantage of our layerwise imitation approach over backpropagation. We also conduct large-scale empirical experiments to validate the proposed method. By training with our method, ResNet50 can outperform ResNet101, and BERT Base can be comparable with BERT Large, when ResNet101 and BERT Large are trained under the standard training procedures as in the literature.

التعلم الآلي التعلم الالي

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

72 - Mao Ye , Chengyue Gong , Qiang Liu 2020

State-of-the-art NLP models can often be fooled by human-unaware transformations such as synonymous word substitution. For security reasons, it is of critical importance to develop models with certified robustness that can provably guarantee that the prediction is can not be altered by any possible synonymous word substitution. In this work, we propose a certified robust method based on a new randomized smoothing technique, which constructs a stochastic ensemble by applying random word substitutions on the input sentences, and leverage the statistical properties of the ensemble to provably certify the robustness. Our method is simple and structure-free in that it only requires the black-box queries of the model outputs, and hence can be applied to any pre-trained models (such as BERT) and any types of models (world-level or subword-level). Our method significantly outperforms recent state-of-the-art methods for certified robustness on both IMDB and Amazon text classification tasks. To the best of our knowledge, we are the first work to achieve certified robustness on large systems such as BERT with practically meaningful certified accuracy.

التعلم الآلي الحساب واللغة التشفير والأمن

Unsupervised Feature Selection via Multi-step Markov Transition Probability

69 - Yan Min , Mao Ye , Liang Tian 2020

Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propose a novel and very simple approach for unsupervised feature selection, named MMFS (Multi-step Markov transition probability for Feature Selection). The idea is using multi-step Markov transition probability to describe the relation between any data pair. Two ways from the positive and negative viewpoints are employed respectively to keep the data structure after feature selection. From the positive viewpoint, the maximum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. Then, the features which can keep the compact data structure are selected. From the viewpoint of negative, the minimum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. On the contrary, the features that least maintain the loose data structure are selected. And the two ways can also be combined. Thus three algorithms are proposed. Our main contributions are a novel feature section approach which uses multi-step transition probability to characterize the data structure, and three algorithms proposed from the positive and negative aspects for keeping data structure. The performance of our approach is compared with the state-of-the-art methods on eight real-world data sets, and the experimental results show that the proposed MMFS is effective in unsupervised feature selection.

التعلم الآلي التعلم الالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد