Do you want to publish a course? Click here

Towards A Fairer Landmark Recognition Dataset

90   0   0.0 ( 0 )
 Added by Zu Kim
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead to biased data. To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population. These relevances are estimated by combining anonymized Google Maps user contribution statistics with the contributors demographic information. We present a stratification approach and analysis which leads to a much fairer coverage of the world, compared to existing datasets. The resulting datasets are used to evaluate computer vision models as part of the the Google Landmark Recognition and RetrievalChallenges 2021.



rate research

Read More

The current state of the research in landmark recognition highlights the good accuracy which can be achieved by embedding techniques, such as Fisher vector and VLAD. All these techniques do not exploit spatial information, i.e. consider all the features and the corresponding descriptors without embedding their location in the image. This paper presents a new variant of the well-known VLAD (Vector of Locally Aggregated Descriptors) embedding technique which accounts, at a certain degree, for the location of features. The driving motivation comes from the observation that, usually, the most interesting part of an image (e.g., the landmark to be recognized) is almost at the center of the image, while the features at the borders are irrelevant features which do no depend on the landmark. The proposed variant, called locVLAD (location-aware VLAD), computes the mean of the two global descriptors: the VLAD executed on the entire original image, and the one computed on a cropped image which removes a certain percentage of the image borders. This simple variant shows an accuracy greater than the existing state-of-the-art approach. Experiments are conducted on two public datasets (ZuBuD and Holidays) which are used both for training and testing. Morever a more balanced version of ZuBuD is proposed.
We study a class of mathematical and statistical algorithms with the aim of establishing a computer-based framework for fast and reliable automatic abnormality detection on landmark represented image templates. Under this framework, we apply a landmark-based algorithm for finding a group average as an estimator that is said to best represent the common features of the group in study. This algorithm extracts information of momentum at each landmark through the process of template matching. If ever converges, the proposed algorithm produces a local coordinate system for each member of the observing group, in terms of the residual momentum. We use a Bayesian approach on the collected residual momentum representations for making inference. For illustration, we apply this framework to a small database of brain images for detecting structure abnormality. The brain structure changes identified by our framework are highly consistent with studies in the literature.
The growth of high-performance mobile devices has resulted in more research into on-device image recognition. The research problems are the latency and accuracy of automatic recognition, which remains obstacles to its real-world usage. Although the recently developed deep neural networks can achieve accuracy comparable to that of a human user, some of them still lack the necessary latency. This paper describes the development of the architecture of a new convolutional neural network model, NU-LiteNet. For this, SqueezeNet was developed to reduce the model size to a degree suitable for smartphones. The model size of NU-LiteNet is therefore 2.6 times smaller than that of SqueezeNet. The recognition accuracy of NU-LiteNet also compared favorably with other recently developed deep neural networks, when experiments were conducted on two standard landmark databases.
In this paper, we aim to improve the dataset foundation for pedestrian attribute recognition in real surveillance scenarios. Recognition of human attributes, such as gender, and clothes types, has great prospects in real applications. However, the development of suitable benchmark datasets for attribute recognition remains lagged behind. Existing human attribute datasets are collected from various sources or an integration of pedestrian re-identification datasets. Such heterogeneous collection poses a big challenge on developing high quality fine-grained attribute recognition algorithms. Furthermore, human attribute recognition are generally severely affected by environmental or contextual factors, such as viewpoints, occlusions and body parts, while existing attribute datasets barely care about them. To tackle these problems, we build a Richly Annotated Pedestrian (RAP) dataset from real multi-camera surveillance scenarios with long term collection, where data samples are annotated with not only fine-grained human attributes but also environmental and contextual factors. RAP has in total 41,585 pedestrian samples, each of which is annotated with 72 attributes as well as viewpoints, occlusions, body parts information. To our knowledge, the RAP dataset is the largest pedestrian attribute dataset, which is expected to greatly promote the study of large-scale attribute recognition systems. Furthermore, we empirically analyze the effects of different environmental and contextual factors on pedestrian attribute recognition. Experimental results demonstrate that viewpoints, occlusions and body parts information could assist attribute recognition a lot in real applications.
With the rapid development of electronic commerce, the way of shopping has experienced a revolutionary evolution. To fully meet customers massive and diverse online shopping needs with quick response, the retailing AI system needs to automatically recognize products from images and videos at the stock-keeping unit (SKU) level with high accuracy. However, product recognition is still a challenging task, since many of SKU-level products are fine-grained and visually similar by a rough glimpse. Although there are already some products benchmarks available, these datasets are either too small (limited number of products) or noisy-labeled (lack of human labeling). In this paper, we construct a human-labeled product image dataset named Products-10K, which contains 10,000 fine-grained SKU-level products frequently bought by online customers in JD.com. Based on our new database, we also introduced several useful tips and tricks for fine-grained product recognition. The products-10K dataset is available via https://products-10k.github.io/.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا