No Arabic abstract
Video privacy leakage is becoming an increasingly severe public problem, especially in cloud-based video surveillance systems. It leads to the new need for secure cloud-based video applications, where the video is encrypted for privacy protection. Despite some methods that have been proposed for encrypted video moving object detection and tracking, none has robust performance against complex and dynamic scenes. In this paper, we propose an efficient and robust privacy-preserving motion detection and multiple object tracking scheme for encrypted surveillance video bitstreams. By analyzing the properties of the video codec and format-compliant encryption schemes, we propose a new compressed-domain feature to capture motion information in complex surveillance scenarios. Based on this feature, we design an adaptive clustering algorithm for moving object segmentation with an accuracy of 4x4 pixels. We then propose a multiple object tracking scheme that uses Kalman filter estimation and adaptive measurement refinement. The proposed scheme does not require video decryption or full decompression and has a very low computation load. The experimental results demonstrate that our scheme achieves the best detection and tracking performance compared with existing works in the encrypted and compressed domain. Our scheme can be effectively used in complex surveillance scenarios with different challenges, such as camera movement/jitter, dynamic background, and shadows.
Privacy considerations and bias in datasets are quickly becoming high-priority issues that the computer vision community needs to face. So far, little attention has been given to practical solutions that do not involve collection of new datasets. In this work, we show that for object detection on COCO, both anonymizing the dataset by blurring faces, as well as swapping faces in a balanced manner along the gender and skin tone dimension, can retain object detection performances while preserving privacy and partially balancing bias.
In this paper, we propose a learned video codec with a residual prediction network (RP-Net) and a feature-aided loop filter (LF-Net). For the RP-Net, we exploit the residual of previous multiple frames to further eliminate the redundancy of the current frame residual. For the LF-Net, the features from residual decoding network and the motion compensation network are used to aid the reconstruction quality. To reduce the complexity, a light ResNet structure is used as the backbone for both RP-Net and LF-Net. Experimental results illustrate that we can save about 10% BD-rate compared with previous learned video compression frameworks. Moreover, we can achieve faster coding speed due to the ResNet backbone. This project is available at https://github.com/chaoliu18/RPLVC.
Video compression is a basic requirement for consumer and professional video applications alike. Video coding standards such as H.264/AVC and H.265/HEVC are widely deployed in the market to enable efficient use of bandwidth and storage for many video applications. To reduce the coding artifacts and improve the compression efficiency, neural network based loop filtering of the reconstructed video has been developed in the literature. However, loop filtering is a challenging task due to the variation in video content and sampling densities. In this paper, we propose a on-line scaling based multi-density attention network for loop filtering in video compression. The core of our approach lies in several aspects: (a) parallel multi-resolution convolution streams for extracting multi-density features, (b) single attention branch to learn the sample correlations and generate mask maps, (c) a channel-mutual attention procedure to fuse the data from multiple branches, (d) on-line scaling technique to further optimize the output results of network according to the actual signal. The proposed multi-density attention network learns rich features from multiple sampling densities and performs robustly on video content of different resolutions. Moreover, the online scaling process enhances the signal adaptability of the off-line pre-trained model. Experimental results show that 10.18% bit-rate reduction at the same video quality can be achieved over the latest Versatile Video Coding (VVC) standard. The objective performance of the proposed algorithm outperforms the state-of-the-art methods and the subjective quality improvement is obvious in terms of detail preservation and artifact alleviation.
Traffic inspection is a fundamental building block of many security solutions today. For example, to prevent the leakage or exfiltration of confidential insider information, as well as to block malicious traffic from entering the network, most enterprises today operate intrusion detection and prevention systems that inspect traffic. However, the state-of-the-art inspection systems do not reflect well the interests of the different involved autonomous roles. For example, employees in an enterprise, or a company outsourcing its network management to a specialized third party, may require that their traffic remains confidential, even from the system administrator. Moreover, the rules used by the intrusion detection system, or more generally the configuration of an online or offline anomaly detection engine, may be provided by a third party, e.g., a security research firm, and can hence constitute a critical business asset which should be kept confidential. Today, it is often believed that accounting for these additional requirements is impossible, as they contradict efficiency and effectiveness. We in this paper explore a novel approach, called Privacy Preserving Inspection (PRI), which provides a solution to this problem, by preserving privacy of traffic inspection and confidentiality of inspection rules and configurations, and e.g., also supports the flexible installation of additional Data Leak Prevention (DLP) rules specific to the company.
Traditional video compression technologies have been developed over decades in pursuit of higher coding efficiency. Efficient temporal information representation plays a key role in video coding. Thus, in this paper, we propose to exploit the temporal correlation using both first-order optical flow and second-order flow prediction. We suggest an one-stage learning approach to encapsulate flow as quantized features from consecutive frames which is then entropy coded with adaptive contexts conditioned on joint spatial-temporal priors to exploit second-order correlations. Joint priors are embedded in autoregressive spatial neighbors, co-located hyper elements and temporal neighbors using ConvLSTM recurrently. We evaluate our approach for the low-delay scenario with High-Efficiency Video Coding (H.265/HEVC), H.264/AVC and another learned video compression method, following the common test settings. Our work offers the state-of-the-art performance, with consistent gains across all popular test sequences.