As the immersive multimedia techniques like Free-viewpoint TV (FTV) develop at an astonishing rate, users demand for high-quality immersive contents increases dramatically. Unlike traditional uniform artifacts, the distortions within immersive contents could be non-uniform structure-related and thus are challenging for commonly used quality metrics. Recent studies have demonstrated that the representation of visual features can be extracted from multiple levels of the hierarchy. Inspired by the hierarchical representation mechanism in the human visual system (HVS), in this paper, we explore to adopt structural representations to quantitatively measure the impact of such structure-related distortion on perceived quality in FTV scenario. More specifically, a bio-inspired full reference image quality metric is proposed based on 1) low-level contour descriptor; 2) mid-level contour category descriptor; and 3) task-oriented non-natural structure descriptor. The experimental results show that the proposed model outperforms significantly the state-of-the-art metrics.