أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Hang Zhou

Deep 3D Mesh Watermarking with Self-Adaptive Robustness

511 - Feng Wang , Hang Zhou , Han Fang 2021

Robust 3D mesh watermarking is a traditional research topic in computer graphics, which provides an efficient solution to the copyright protection for 3D meshes. Traditionally, researchers need manually design watermarking algorithms to achieve suffi cient robustness for the actual application scenarios. In this paper, we propose the first deep learning-based 3D mesh watermarking framework, which can solve this problem once for all. In detail, we propose an end-to-end network, consisting of a watermark embedding sub-network, a watermark extracting sub-network and attack layers. We adopt the topology-agnostic graph convolutional network (GCN) as the basic convolution operation for 3D meshes, so our network is not limited by registered meshes (which share a fixed topology). For the specific application scenario, we can integrate the corresponding attack layers to guarantee adaptive robustness against possible attacks. To ensure the visual quality of watermarked 3D meshes, we design a curvature-based loss function to constrain the local geometry smoothness of watermarked meshes. Experimental results show that the proposed method can achieve more universal robustness and faster watermark embedding than baseline methods while guaranteeing comparable visual quality.

الرسم الحاسوبي

Probabilistic Analysis of Euclidean Capacitated Vehicle Routing

138 - Claire Mathieu , Hang Zhou 2021

We give a probabilistic analysis of the unit-demand Euclidean capacitated vehicle routing problem in the random setting, where the input distribution consists of $n$ unit-demand customers modeled as independent, identically distributed uniform random points in the two-dimensional plane. The objective is to visit every customer using a set of routes of minimum total length, such that each route visits at most $k$ customers, where $k$ is the capacity of a vehicle. All of the following results are in the random setting and hold asymptotically almost surely. The best known polynomial-time approximation for this problem is the iterated tour partitioning (ITP) algorithm, introduced in 1985 by Haimovich and Rinnooy Kan. They showed that the ITP algorithm is near-optimal when $k$ is either $o(sqrt{n})$ or $omega(sqrt{n})$, and they asked whether the ITP algorithm was also effective in the intermediate range. In this work, we show that when $k=sqrt{n}$, the ITP algorithm is at best a $(1+c_0)$-approximation for some positive constant $c_0$. On the other hand, the approximation ratio of the ITP algorithm was known to be at most $0.995+alpha$ due to Bompadre, Dror, and Orlin, where $alpha$ is the approximation ratio of an algorithm for the traveling salesman problem. In this work, we improve the upper bound on the approximation ratio of the ITP algorithm to $0.915+alpha$. Our analysis is based on a new lower bound on the optimal cost for the metric capacitated vehicle routing problem, which may be of independent interest.

بنى وهياكل البيانات والخوارزميات

Causal Attention for Unbiased Visual Recognition

297 - Tan Wang , Chang Zhou , Qianru Sun 2021

Attention module does not always help deep models learn causal features that are robust in any confounding context, e.g., a foreground object feature is invariant to different backgrounds. This is because the confounders trick the attention to captur e spurious correlations that benefit the prediction when the training and testing data are IID (identical & independent distribution); while harm the prediction when the data are OOD (out-of-distribution). The sole fundamental solution to learn causal attention is by causal intervention, which requires additional annotations of the confounders, e.g., a dog model is learned within grass+dog and road+dog respectively, so the grass and road contexts will no longer confound the dog recognition. However, such annotation is not only prohibitively expensive, but also inherently problematic, as the confounders are elusive in nature. In this paper, we propose a causal attention module (CaaM) that self-annotates the confounders in unsupervised fashion. In particular, multiple CaaMs can be stacked and integrated in conventional attention CNN and self-attention Vision Transformer. In OOD settings, deep models with CaaM outperform those without it significantly; even in IID settings, the attention localization is also improved by CaaM, showing a great potential in applications that require robust visual saliency. Codes are available at url{https://github.com/Wangt-CN/CaaM}.

الرؤية الحاسوبية وتمييز الأنماط

MS-KD: Multi-Organ Segmentation with Multiple Binary-Labeled Datasets

414 - Shixiang Feng , Yuhang Zhou , Xiaoman Zhang 2021

Annotating multiple organs in 3D medical images is time-consuming and costly. Meanwhile, there exist many single-organ datasets with one specific organ annotated. This paper investigates how to learn a multi-organ segmentation model leveraging a set of binary-labeled datasets. A novel Multi-teacher Single-student Knowledge Distillation (MS-KD) framework is proposed, where the teacher models are pre-trained single-organ segmentation networks, and the student model is a multi-organ segmentation network. Considering that each teacher focuses on different organs, a region-based supervision method, consisting of logits-wise supervision and feature-wise supervision, is proposed. Each teacher supervises the student in two regions, the organ region where the teacher is considered as an expert and the background region where all teachers agree. Extensive experiments on three public single-organ datasets and a multi-organ dataset have demonstrated the effectiveness of the proposed MS-KD framework.

الرؤية الحاسوبية وتمييز الأنماط

On the Robustness of Domain Adaption to Adversarial Attacks

135 - Liyuan Zhang , Yuhang Zhou , Lei Zhang 2021

State-of-the-art deep neural networks (DNNs) have been proved to have excellent performance on unsupervised domain adaption (UDA). However, recent work shows that DNNs perform poorly when being attacked by adversarial samples, where these attacks are implemented by simply adding small disturbances to the original images. Although plenty of work has focused on this, as far as we know, there is no systematic research on the robustness of unsupervised domain adaption model. Hence, we discuss the robustness of unsupervised domain adaption against adversarial attacking for the first time. We benchmark various settings of adversarial attack and defense in domain adaption, and propose a cross domain attack method based on pseudo label. Most importantly, we analyze the impact of different datasets, models, attack methods and defense methods. Directly, our work proves the limited robustness of unsupervised domain adaptation model, and we hope our work may facilitate the community to pay more attention to improve the robustness of the model against attacking.

الرؤية الحاسوبية وتمييز الأنماط

Learning to Rehearse in Long Sequence Memorization

64 - Zhu Zhang , Chang Zhou , Jianxin Ma 2021

Existing reasoning tasks often have an important assumption that the input contents can be always accessed while reasoning, requiring unlimited storage resources and suffering from severe time delay on long sequences. To achieve efficient reasoning o n long sequences with limited storage resources, memory augmented neural networks introduce a human-like write-read memory to compress and memorize the long input sequence in one pass, trying to answer subsequent queries only based on the memory. But they have two serious drawbacks: 1) they continually update the memory from current information and inevitably forget the early contents; 2) they do not distinguish what information is important and treat all contents equally. In this paper, we propose the Rehearsal Memory (RM) to enhance long-sequence memorization by self-supervised rehearsal with a history sampler. To alleviate the gradual forgetting of early information, we design self-supervised rehearsal training with recollection and familiarity tasks. Further, we design a history sampler to select informative fragments for rehearsal training, making the memory focus on the crucial information. We evaluate the performance of our rehearsal memory by the synthetic bAbI task and several downstream tasks, including text/video question answering and recommendation on long sequences.

التعلم الآلي

Controllable Gradient Item Retrieval

130 - Haonan Wang , Chang Zhou , Carl Yang 2021

In this paper, we identify and study an important problem of gradient item retrieval. We define the problem as retrieving a sequence of items with a gradual change on a certain attribute, given a reference item and a modification text. For example, a fter a customer saw a white dress, she/he wants to buy a similar one but more floral on it. The extent of more floral is subjective, thus prompting one floral dress is hard to satisfy the customers needs. A better way is to present a sequence of products with increasingly floral attributes based on the white dress, and allow the customer to select the most satisfactory one from the sequence. Existing item retrieval methods mainly focus on whether the target items appear at the top of the retrieved sequence, but ignore the demand for retrieving a sequence of products with gradual change on a certain attribute. To deal with this problem, we propose a weakly-supervised method that can learn a disentangled item representation from user-item interaction data and ground the semantic meaning of attributes to dimensions of the item representation. Our method takes a reference item and a modification as a query. During inference, we start from the reference item and walk along the direction of the modification in the item representation space to retrieve a sequence of items in a gradient manner. We demonstrate our proposed method can achieve disentanglement through weak supervision. Besides, we empirically show that an item sequence retrieved by our method is gradually changed on an indicated attribute and, in the item retrieval task, our method outperforms existing approaches on three different datasets.

استرجاع المعلومات

Intrinsic Wasserstein Correlation Analysis

159 - Hang Zhou , Zhenhua Lin , Fang Yao 2021

We develop a framework of canonical correlation analysis for distribution-valued functional data within the geometry of Wasserstein spaces. Specifically, we formulate an intrinsic concept of correlation between random distributions, propose estimatio n methods based on functional principal component analysis (FPCA) and Tikhonov regularization, respectively, for the correlation and its corresponding weight functions, and establish the minimax convergence rates of the estimators. The key idea is to extend the framework of tensor Hilbert spaces to distribution-valued functional data to overcome the challenging issue raised by nonlinearity of Wasserstein spaces. The finite-sample performance of the proposed estimators is illustrated via simulation studies, and the practical merit is demonstrated via a study on the association of distributions of brain activities between two brain regions.

المنهجية

M6-UFC: Unifying Multi-Modal Controls for Conditional Image Synthesis

73 - Zhu Zhang , Jianxin Ma , Chang Zhou 2021

Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations. In this paper, instead of investigating th ese control signals separately, we propose a new two-stage architecture, UFC-BERT, to unify any number of multi-modal controls. In UFC-BERT, both the diverse control signals and the synthesized image are uniformly represented as a sequence of discrete tokens to be processed by Transformer. Different from existing two-stage autoregressive approaches such as DALL-E and VQGAN, UFC-BERT adopts non-autoregressive generation (NAR) at the second stage to enhance the holistic consistency of the synthesized image, to support preserving specified image blocks, and to improve the synthesis speed. Further, we design a progressive algorithm that iteratively improves the non-autoregressively generated image, with the help of two estimators developed for evaluating the compliance with the controls and evaluating the fidelity of the synthesized image, respectively. Extensive experiments on a newly collected large-scale clothing dataset M2C-Fashion and a facial dataset Multi-Modal CelebA-HQ verify that UFC-BERT can synthesize high-fidelity images that comply with flexible multi-modal controls.

الرؤية الحاسوبية وتمييز الأنماط

Femtosecond dynamics of a polariton bosonic cascade at room temperature

468 - Fei Chen , Hang Zhou , Hui Li 2021

Whispering gallery modes in a microwire are characterized by a nearly equidistant energy spectrum. In the strong exciton-photon coupling regime, this system represents a bosonic cascade: a ladder of discrete energy levels that sustains stimulated tra nsitions between neighboring steps. In this work, by using femtosecond angle-resolved spectroscopic imaging technique, the ultrafast dynamics of polaritons in a bosonic cascade based on a one-dimensional ZnO whispering gallery microcavity is explicitly visualized. Clear ladder-form build-up process from higher to lower energy branches of the polariton condensates are observed, which are well reproduced by modeling using rate equations. Moreover, the polariton parametric scattering dynamics are distinguished on a timescale of hundreds of femtoseconds. Our understanding of the femtosecond condensation and scattering dynamics paves the way towards ultrafast coherent control of polaritons at room temperature, which will make it promising for high-speed all-optical integrated applications.

بصريات الفيزياء ميسكالي وننكالي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد