أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Ziyi Liu

Multi-Constraint Shortest Path using Forest Hop Labeling

144 - Ziyi Liu , Lei Li , Mengxuan Zhang 2021

The textit{Multi-Constraint Shortest Path (MCSP)} problem aims to find the shortest path between two nodes in a network subject to a given constraint set. It is typically processed as a textit{skyline path} problem. However, the number of intermediat e skyline paths becomes larger as the network size increases and the constraint number grows, which brings about the dramatical growth of computational cost and further makes the existing index-based methods hardly capable of obtaining the complete exact results. In this paper, we propose a novel high-dimensional skyline path concatenation method to avoid the expensive skyline path search, which then supports the efficient construction of hop labeling index for textit{MCSP} queries. Specifically, a set of insightful observations and techniques are proposed to improve the efficiency of concatenating two skyline path set, a textit{n-Cube} technique is designed to prune the concatenation space among multiple hops, and a textit{constraint pruning} method is used to avoid the unnecessary computation. Furthermore, to scale up to larger networks, we propose a novel textit{forest hop labeling} which enables the parallel label construction from different network partitions. Our approach is the first method that can achieve both accuracy and efficiency for textit{MCSP} query answering. Extensive experiments on real-life road networks demonstrate the superiority of our method over the state-of-the-art solutions.

بنى وهياكل البيانات والخوارزميات

X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering

114 - Jingjing Jiang , Ziyi Liu , Yifan Liu 2021

Encouraging progress has been made towards Visual Question Answering (VQA) in recent years, but it is still challenging to enable VQA models to adaptively generalize to out-of-distribution (OOD) samples. Intuitively, recompositions of existing visual concepts (i.e., attributes and objects) can generate unseen compositions in the training set, which will promote VQA models to generalize to OOD samples. In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly. X-GGM leverages graph generative modeling to iteratively generate a relation matrix and node representations for the predefined graph that utilizes attribute-object pairs as nodes. Furthermore, to alleviate the unstable training issue in graph generative modeling, we propose a gradient distribution consistency loss to constrain the data distribution with adversarial perturbations and the generated distribution. The baseline VQA model (LXMERT) trained with the X-GGM scheme achieves state-of-the-art OOD performance on two standard VQA OOD benchmarks, i.e., VQA-CP v2 and GQA-OOD. Extensive ablation studies demonstrate the effectiveness of X-GGM components.

الرؤية الحاسوبية وتمييز الأنماط الوسائط المتعددة

LightFuse: Lightweight CNN based Dual-exposure Fusion

99 - Ziyi Liu , Jie Yang , Svetlana Yanushkevich 2021

Deep convolutional neural networks (DCNN) aided high dynamic range (HDR) imaging recently received a lot of attention. The quality of DCNN generated HDR images have overperformed the traditional counterparts. However, DCNN is prone to be computationa lly intensive and power-hungry. To address the challenge, we propose LightFuse, a light-weight CNN-based algorithm for extreme dual-exposure image fusion, which can be implemented on various embedded computing platforms with limited power and hardware resources. Two sub-networks are utilized: a GlobalNet (G) and a DetailNet (D). The goal of G is to learn the global illumination information on the spatial dimension, whereas D aims to enhance local details on the channel dimension. Both G and D are based solely on depthwise convolution (D Conv) and pointwise convolution (P Conv) to reduce required parameters and computations. Experimental results display that the proposed technique could generate HDR images with plausible details in extremely exposed regions. Our PSNR score exceeds the other state-of-the-art approaches by 1.2 to 1.6 times and achieves 1.4 to 20 times FLOP and parameter reduction compared with others.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context

111 - Ziyi Liu , Le Wang , Wei Tang 2021

Weakly-supervised Temporal Action Localization (WS-TAL) methods learn to localize temporal starts and ends of action instances in a video under only video-level supervision. Existing WS-TAL methods rely on deep features learned for action recognition . However, due to the mismatch between classification and localization, these features cannot distinguish the frequently co-occurring contextual background, i.e., the context, and the actual action instances. We term this challenge action-context confusion, and it will adversely affect the action localization accuracy. To address this challenge, we introduce a framework that learns two feature subspaces respectively for actions and their context. By explicitly accounting for action visual elements, the action instances can be localized more precisely without the distraction from the context. To facilitate the learning of these two feature subspaces with only video-level categorical labels, we leverage the predictions from both spatial and temporal streams for snippets grouping. In addition, an unsupervised learning task is introduced to make the proposed module focus on mining temporal information. The proposed approach outperforms state-of-the-art WS-TAL methods on three benchmarks, i.e., THUMOS14, ActivityNet v1.2 and v1.3 datasets.

الرؤية الحاسوبية وتمييز الأنماط

ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization

94 - Ziyi Liu , Le Wang , Qilin Zhang 2021

The object of Weakly-supervised Temporal Action Localization (WS-TAL) is to localize all action instances in an untrimmed video with only video-level supervision. Due to the lack of frame-level annotations during training, current WS-TAL methods rely on attention mechanisms to localize the foreground snippets or frames that contribute to the video-level classification task. This strategy frequently confuse context with the actual action, in the localization result. Separating action and context is a core problem for precise WS-TAL, but it is very challenging and has been largely ignored in the literature. In this paper, we introduce an Action-Context Separation Network (ACSNet) that explicitly takes into account context for accurate action localization. It consists of two branches (i.e., the Foreground-Background branch and the Action-Context branch). The Foreground- Background branch first distinguishes foreground from background within the entire video while the Action-Context branch further separates the foreground as action and context. We associate video snippets with two latent components (i.e., a positive component and a negative component), and their different combinations can effectively characterize foreground, action and context. Furthermore, we introduce extended labels with auxiliary context categories to facilitate the learning of action-context separation. Experiments on THUMOS14 and ActivityNet v1.2/v1.3 datasets demonstrate the ACSNet outperforms existing state-of-the-art WS-TAL methods by a large margin.

الرؤية الحاسوبية وتمييز الأنماط

Mobile-end Tone Mapping based on Integral Image and Integral Histogram

135 - Jie Yang , Mengchen Lin , Ziyi Liu 2021

Wide dynamic range (WDR) image tone mapping is in high demand in many applications like film production, security monitoring, and photography. It is especially crucial for mobile devices because most of the images taken today are from mobile phones, hence such technology is highly demanded in the consumer market of mobile devices and is essential for a good customer experience. However, high-quality and high-performance WDR image tone mapping implementations are rarely found in the mobile-end. In this paper, we introduce a high performance, mobile-end WDR image tone mapping implementation. It leverages the tone mapping results of multiple receptive fields and calculates a suitable value for each pixel. The utilization of integral image and integral histogram significantly reduce the required computation. Moreover, GPU parallel computation is used to increase the processing speed. The experimental results indicate that our implementation can process a high-resolution WDR image within a second on mobile devices and produce appealing image quality.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

Tone Mapping Based on Multi-scale Histogram Synthesis

72 - Jie Yang , Ziyi Liu , Ulian Shahnovich 2021

In this paper, we present a novel tone mapping algorithm that can be used for displaying wide dynamic range (WDR) images on low dynamic range (LDR) devices. The proposed algorithm is mainly motivated by the logarithmic response and local adaptation f eatures of the human visual system (HVS). HVS perceives luminance differently when under different adaptation levels, and therefore our algorithm uses functions built upon different scales to tone map pixels to different values. Functions of large scales are used to maintain image brightness consistency and functions of small scales are used to preserve local detail and contrast. An efficient method using local variance has been proposed to fuse the values of different scales and to remove artifacts. The algorithm utilizes integral images and integral histograms to reduce computation complexity and processing time. Experimental results show that the proposed algorithm can generate high brightness, good contrast, and appealing images that surpass the performance of many state-of-the-art tone mapping algorithms. This project is available at https://github.com/jieyang1987/ToneMapping-Based-on-Multi-scale-Histogram-Synthesis.

الرؤية الحاسوبية وتمييز الأنماط معالجة الصور والفيديو

Deep Reformulated Laplacian Tone Mapping

146 - Jie Yang , Ziyi Liu , Mengchen Lin 2021

Wide dynamic range (WDR) images contain more scene details and contrast when compared to common images. However, it requires tone mapping to process the pixel values in order to display properly. The details of WDR images can diminish during the tone mapping process. In this work, we address the problem by combining a novel reformulated Laplacian pyramid and deep learning. The reformulated Laplacian pyramid always decompose a WDR image into two frequency bands where the low-frequency band is global feature-oriented, and the high-frequency band is local feature-oriented. The reformulation preserves the local features in its original resolution and condenses the global features into a low-resolution image. The generated frequency bands are reconstructed and fine-tuned to output the final tone mapped image that can display on the screen with minimum detail and contrast loss. The experimental results demonstrate that the proposed method outperforms state-of-the-art WDR image tone mapping methods. The code is made publicly available at https://github.com/linmc86/Deep-Reformulated-Laplacian-Tone-Mapping.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

WDR FACE: The First Database for Studying Face Detection in Wide Dynamic Range

98 - Ziyi Liu , Jie Yang , Mengchen Lin 2021

Currently, face detection approaches focus on facial information by varying specific parameters including pose, occlusion, lighting, background, race, and gender. These studies only utilized the information obtained from low dynamic range images, how ever, face detection in wide dynamic range (WDR) scenes has received little attention. To our knowledge, there is no publicly available WDR database for face detection research. To facilitate and support future face detection research in the WDR field, we propose the first WDR database for face detection, called WDR FACE, which contains a total of 398 16-bit megapixel grayscale wide dynamic range images collected from 29 subjects. These WDR images (WDRIs) were taken in eight specific WDR scenes. The dynamic range of 90% images surpasses 60,000:1, and that of 70% images exceeds 65,000:1. Furthermore, we show the effect of different face detection procedures on the WDRIs in our database. This is done with 25 different tone mapping operators and five different face detectors. We provide preliminary experimental results of face detection on this unique WDR database.

الرؤية الحاسوبية وتمييز الأنماط

A review for Tone-mapping Operators on Wide Dynamic Range Image

219 - Ziyi Liu 2021

The dynamic range of our normal life can exceeds 120 dB, however, the smart-phone cameras and the conventional digital cameras can only capture a dynamic range of 90 dB, which sometimes leads to loss of details for the recorded image. Now, some profe ssional hardware applications and image fusion algorithms have been devised to take wide dynamic range (WDR), but unfortunately existing devices cannot display WDR image. Tone mapping (TM) thus becomes an essential step for exhibiting WDR image on our ordinary screens, which convert the WDR image into low dynamic range (LDR) image. More and more researchers are focusing on this topic, and give their efforts to design an excellent tone mapping operator (TMO), showing detailed images as the same as the perception that human eyes could receive. Therefore, it is important for us to know the history, development, and trend of TM before proposing a practicable TMO. In this paper, we present a comprehensive study of the most well-known TMOs, which divides TMOs into traditional and machine learning-based category.

معالجة الصور والفيديو الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد