ترغب بنشر مسار تعليمي؟ اضغط هنا

Scene parsing from images is a fundamental yet challenging problem in visual content understanding. In this dense prediction task, the parsing model assigns every pixel to a categorical label, which requires the contextual information of adjacent ima ge patches. So the challenge for this learning task is to simultaneously describe the geometric and semantic properties of objects or a scene. In this paper, we explore the effective use of multi-layer feature outputs of the deep parsing networks for spatial-semantic consistency by designing a novel feature aggregation module to generate the appropriate global representation prior, to improve the discriminative power of features. The proposed module can auto-select the intermediate visual features to correlate the spatial and semantic information. At the same time, the multiple skip connections form a strong supervision, making the deep parsing network easy to train. Extensive experiments on four public scene parsing datasets prove that the deep parsing network equipped with the proposed feature aggregation module can achieve very promising results.
Recent research on deep neural networks (DNNs) has primarily focused on improving the model accuracy. Given a proper deep learning framework, it is generally possible to increase the depth or layer width to achieve a higher level of accuracy. However , the huge number of model parameters imposes more computational and memory usage overhead and leads to the parameter redundancy. In this paper, we address the parameter redundancy problem in DNNs by replacing conventional full projections with bilinear projections. For a fully-connected layer with $D$ input nodes and $D$ output nodes, applying bilinear projection can reduce the model space complexity from $mathcal{O}(D^2)$ to $mathcal{O}(2D)$, achieving a deep model with a sub-linear layer size. However, structured projection has a lower freedom of degree compared to the full projection, causing the under-fitting problem. So we simply scale up the mapping size by increasing the number of output channels, which can keep and even boosts the model accuracy. This makes it very parameter-efficient and handy to deploy such deep models on mobile systems with memory limitations. Experiments on four benchmark datasets show that applying the proposed bilinear projection to deep neural networks can achieve even higher accuracies than conventional full DNNs, while significantly reduces the model size.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا