No Arabic abstract
In single-pixel imaging (SPI), the target object is illuminated with varying patterns sequentially and an intensity sequence is recorded by a single-pixel detector without spatial resolution. A high quality object image can only be computationally reconstructed after a large number of illuminations, with disadvantages of long imaging time and high cost. Conventionally, object classification is performed after a reconstructed object image with good fidelity is available. In this paper, we propose to classify the target object with a small number of illuminations in a fast manner for Fourier SPI. A naive Bayes classifier is employed to classify the target objects based on the single-pixel intensity sequence without any image reconstruction and each sequence element is regarded as an object feature in the classifier. Simulation results demonstrate our proposed scheme can classify the number digit object images with high accuracy (e.g. 80% accuracy using only 13 illuminations, at a sampling ratio of 0.3%).
Video object segmentation, aiming to segment the foreground objects given the annotation of the first frame, has been attracting increasing attentions. Many state-of-the-art approaches have achieved great performance by relying on online model updating or mask-propagation techniques. However, most online models require high computational cost due to model fine-tuning during inference. Most mask-propagation based models are faster but with relatively low performance due to failure to adapt to object appearance variation. In this paper, we are aiming to design a new model to make a good balance between speed and performance. We propose a model, called NPMCA-net, which directly localizes foreground objects based on mask-propagation and non-local technique by matching pixels in reference and target frames. Since we bring in information of both first and previous frames, our network is robust to large object appearance variation, and can better adapt to occlusions. Extensive experiments show that our approach can achieve a new state-of-the-art performance with a fast speed at the same time (86.5% IoU on DAVIS-2016 and 72.2% IoU on DAVIS-2017, with speed of 0.11s per frame) under the same level comparison. Source code is available at https://github.com/siyueyu/NPMCA-net.
As an alternative to conventional multi-pixel cameras, single-pixel cameras enable images to be recorded using a single detector that measures the correlations between the scene and a set of patterns. However, to fully sample a scene in this way requires at least the same number of correlation measurements as there are pixels in the reconstructed image. Therefore single-pixel imaging systems typically exhibit low frame-rates. To mitigate this, a range of compressive sensing techniques have been developed which rely on a priori knowledge of the scene to reconstruct images from an under-sampled set of measurements. In this work we take a different approach and adopt a strategy inspired by the foveated vision systems found in the animal kingdom - a framework that exploits the spatio-temporal redundancy present in many dynamic scenes. In our single-pixel imaging system a high-resolution foveal region follows motion within the scene, but unlike a simple zoom, every frame delivers new spatial information from across the entire field-of-view. Using this approach we demonstrate a four-fold reduction in the time taken to record the detail of rapidly evolving features, whilst simultaneously accumulating detail of more slowly evolving regions over several consecutive frames. This tiered super-sampling technique enables the reconstruction of video streams in which both the resolution and the effective exposure-time spatially vary and adapt dynamically in response to the evolution of the scene. The methods described here can complement existing compressive sensing approaches and may be applied to enhance a variety of computational imagers that rely on sequential correlation measurements.
Voxel-based 3D object classification has been frequently studied in recent years. The previous methods often directly convert the classic 2D convolution into a 3D form applied to an object with binary voxel representation. In this paper, we investigate the reason why binary voxel representation is not very suitable for 3D convolution and how to simultaneously improve the performance both in accuracy and speed. We show that by giving each voxel a signed distance value, the accuracy will gain about 30% promotion compared with binary voxel representation using a two-layer fully connected network. We then propose a fast fully connected and convolution hybrid cascade network for voxel-based 3D object classification. This threestage cascade network can divide 3D models into three categories: easy, moderate and hard. Consequently, the mean inference time (0.3ms) can speedup about 5x and 2x compared with the state-of-the-art point cloud and voxel based methods respectively, while achieving the highest accuracy in the latter category of methods (92%). Experiments with ModelNet andMNIST verify the performance of the proposed hybrid cascade network.
Two novel visual cryptography (VC) schemes are proposed by combining VC with single-pixel imaging (SPI) for the first time. It is pointed out that the overlapping of visual key images in VC is similar to the superposition of pixel intensities by a single-pixel detector in SPI. In the first scheme, QR-code VC is designed by using opaque sheets instead of transparent sheets. The secret image can be recovered when identical illumination patterns are projected onto multiple visual key images and a single detector is used to record the total light intensities. In the second scheme, the secret image is shared by multiple illumination pattern sequences and it can be recovered when the visual key patterns are projected onto identical items. The application of VC can be extended to more diversified scenarios by our proposed schemes.
Single-pixel imaging is a novel imaging scheme that has gained popularity due to its huge computational gain and potential for a low-cost alternative to imaging beyond the visible spectrum. The traditional reconstruction methods struggle to produce a clear recovery when one limits the number of illumination patterns from a spatial light modulator. As a remedy, several deep-learning-based solutions have been proposed which lack good generalization ability due to the architectural setup and loss functions. In this paper, we propose a generative adversarial network-based reconstruction framework for single-pixel imaging, referred to as SPI-GAN. Our method can reconstruct images with 17.92 dB PSNR and 0.487 SSIM, even if the sampling ratio drops to 5%. This facilitates much faster reconstruction making our method suitable for single-pixel video. Furthermore, our ResNet-like architecture for the generator leads to useful representation learning that allows us to reconstruct completely unseen objects. The experimental results demonstrate that SPI-GAN achieves significant performance gain, e.g. near 3dB PSNR gain, over the current state-of-the-art method.