ﻻ يوجد ملخص باللغة العربية
In this paper we show that by carefully making good choices for various detailed but important factors in a visual recognition framework using deep learning features, one can achieve a simple, efficient, yet highly accurate image classification system. We first list 5 important factors, based on both existing researches and ideas proposed in this paper. These important detailed factors include: 1) $ell_2$ matrix normalization is more effective than unnormalized or $ell_2$ vector normalization, 2) the proposed natural deep spatial pyramid is very effective, and 3) a very small $K$ in Fisher Vectors surprisingly achieves higher accuracy than normally used large $K$ values. Along with other choices (convolutional activations and multiple scales), the proposed DSP framework is not only intuitive and efficient, but also achieves excellent classification accuracy on many benchmark datasets. For example, DSPs accuracy on SUN397 is 59.78%, significantly higher than previous state-of-the-art (53.86%).
By design, average precision (AP) for object detection aims to treat all classes independently: AP is computed independently per category and averaged. On the one hand, this is desirable as it treats all classes, rare to frequent, equally. On the oth
We investigate opinion dynamics in multi-agent networks when a bias toward one of two possible opinions exists; for example, reflecting a status quo vs a superior alternative. Starting with all agents sharing an initial opinion representing the statu
Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high
Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed probl
Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which o