ترغب بنشر مسار تعليمي؟ اضغط هنا

Quality Assessment of In-the-Wild Videos

88   0   0.0 ( 0 )
 نشر من قبل Dingquan Li
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.



قيم البحث

اقرأ أيضاً

Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging prob lem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed unified framework explicitly includes three stages: relative quality assessor, nonlinear mapping, and dataset-specific perceptual scale alignment, to jointly predict relative quality, perceptual quality, and subjective quality. Experiments are conducted on four publicly available datasets for VQA in the wild, i.e., LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The experimental results verify the effectiveness of the mixed datasets training strategy and prove the superior performance of the unified model in comparison with the state-of-the-art models. For reproducible research, we make the PyTorch implementation of our method available at https://github.com/lidq92/MDTVSFA.
Nowadays, most existing blind image quality assessment (BIQA) models 1) are developed for synthetically-distorted images and often generalize poorly to authentic ones; 2) heavily rely on human ratings, which are prohibitively labor-expensive to colle ct. Here, we propose an $opinion$-$free$ BIQA method that learns from synthetically-distorted images and multiple agents to assess the perceptual quality of authentically-distorted ones captured in the wild without relying on human labels. Specifically, we first assemble a large number of image pairs from synthetically-distorted images and use a set of full-reference image quality assessment (FR-IQA) models to assign pseudo-binary labels of each pair indicating which image has higher quality as the supervisory signal. We then train a convolutional neural network (CNN)-based BIQA model to rank the perceptual quality, optimized for consistency with the binary labels. Since there exists domain shift between the synthetically- and authentically-distorted images, an unsupervised domain adaptation (UDA) module is introduced to alleviate this issue. Extensive experiments demonstrate the effectiveness of our proposed $opinion$-$free$ BIQA model, yielding state-of-the-art performance in terms of correlation with human opinion scores, as well as gMAD competition. Codes will be made publicly available upon acceptance.
Perceptual quality assessment of the videos acquired in the wilds is of vital importance for quality assurance of video services. The inaccessibility of reference videos with pristine quality and the complexity of authentic distortions pose great cha llenges for this kind of blind video quality assessment (BVQA) task. Although model-based transfer learning is an effective and efficient paradigm for the BVQA task, it remains to be a challenge to explore what and how to bridge the domain shifts for better video representation. In this work, we propose to transfer knowledge from image quality assessment (IQA) databases with authentic distortions and large-scale action recognition with rich motion patterns. We rely on both groups of data to learn the feature extractor. We train the proposed model on the target VQA databases using a mixed list-wise ranking loss function. Extensive experiments on six databases demonstrate that our method performs very competitively under both individual database and mixed database training settings. We also verify the rationality of each component of the proposed method and explore a simple manner for further improvement.
209 - Ren Yang , Mai Xu , Tie Liu 2017
The latest High Efficiency Video Coding (HEVC) standard has been increasingly applied to generate video streams over the Internet. However, HEVC compressed videos may incur severe quality degradation, particularly at low bit-rates. Thus, it is necess ary to enhance the visual quality of HEVC videos at the decoder side. To this end, this paper proposes a Quality Enhancement Convolutional Neural Network (QE-CNN) method that does not require any modification of the encoder to achieve quality enhancement for HEVC. In particular, our QE-CNN method learns QE-CNN-I and QE-CNN-P models to reduce the distortion of HEVC I and P frames, respectively. The proposed method differs from the existing CNN-based quality enhancement approaches, which only handle intra-coding distortion and are thus not suitable for P frames. Our experimental results validate that our QE-CNN method is effective in enhancing quality for both I and P frames of HEVC videos. To apply our QE-CNN method in time-constrained scenarios, we further propose a Time-constrained Quality Enhancement Optimization (TQEO) scheme. Our TQEO scheme controls the computational time of QE-CNN to meet a target, meanwhile maximizing the quality enhancement. Next, the experimental results demonstrate the effectiveness of our TQEO scheme from the aspects of time control accuracy and quality enhancement under different time constraints. Finally, we design a prototype to implement our TQEO scheme in a real-time scenario.
Existing blind image quality assessment (BIQA) methods are mostly designed in a disposable way and cannot evolve with unseen distortions adaptively, which greatly limits the deployment and application of BIQA models in real-world scenarios. To addres s this problem, we propose a novel Lifelong blind Image Quality Assessment (LIQA) approach, targeting to achieve the lifelong learning of BIQA. Without accessing to previous training data, our proposed LIQA can not only learn new distortions, but also mitigate the catastrophic forgetting of seen distortions. Specifically, we adopt the Split-and-Merge distillation strategy to train a single-head network that makes task-agnostic predictions. In the split stage, we first employ a distortion-specific generator to obtain the pseudo features of each seen distortion. Then, we use an auxiliary multi-head regression network to generate the predicted quality of each seen distortion. In the merge stage, we replay the pseudo features paired with pseudo labels to distill the knowledge of multiple heads, which can build the final regressed single head. Experimental results demonstrate that the proposed LIQA method can handle the continuous shifts of different distortion types and even datasets. More importantly, our LIQA model can achieve stable performance even if the task sequence is long.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا