What is the right way to represent document images?

273 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gabriela Csurka

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Gabriela Csurka - Diane Larlus - Albert Gordo

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this article we study the problem of document image representation based on visual features. We propose a comprehensive experimental study that compares three types of visual document image representations: (1) traditional so-called shallow features, such as the RunLength and the Fisher-Vector descriptors, (2) deep features based on Convolutional Neural Networks, and (3) features extracted from hybrid architectures that take inspiration from the two previous ones. We evaluate these features in several tasks (i.e. classification, clustering, and retrieval) and in different setups (e.g. domain transfer) using several public and in-house datasets. Our results show that deep features generally outperform other types of features when there is no domain shift and the new task is closely related to the one used to train the model. However, when a large domain or task shift is present, the Fisher-Vector shallow features generalize better and often obtain the best results.

قيم البحث

70 - JPAC Collaboration: M. Mikhasenko , A. Pilloni , J. Nys 2017

Hadron decay chains constitute one of the main sources of information on the QCD spectrum. We discuss the differences between several partial wave analysis formalisms used in the literature to build the amplitudes. We match the helicity amplitudes to the covariant tensor basis. Hereby, we pay attention to the analytical properties of the amplitudes and separate singularities of kinematical and dynamical nature. We study the analytical properties of the spin-orbit (LS) formalism, and some of the covariant tensor approaches. In particular, we explicitly build the amplitudes for the B -> psi pi K and B -> Dbar pi pi decays, and show that the energy dependence of the covariant approach is model dependent. We also show that the usual recursive construction of covariant tensors explicitly violates crossing symmetry, which would lead to different resonance parameters extracted from scattering and decay processes.

فيزياء الطاقة العالية - الظواهر فيزياء الطاقة العالية - التجربة نظرية نووية

What is the best way to measure baryonic acoustic oscillations?

438 - Ariel G. Sanchez MPE 2008

Oscillations in the baryon-photon fluid prior to recombination imprint different signatures on the power spectrum and correlation function of matter fluctuations. The measurement of these features using galaxy surveys has been proposed as means to de termine the equation of state of the dark energy. The accuracy required to achieve competitive constraints demands an extremely good understanding of systematic effects which change the baryonic acoustic oscillation (BAO) imprint. We use 50 very large volume N-body simulations to investigate the BAO signature in the two-point correlation function. The location of the BAO bump does not correspond to the sound horizon scale at the level of accuracy required by future measurements, even before any dynamical or statistical effects are considered. Careful modelling of the correlation function is therefore required to extract the cosmological information encoded on large scales. We find that the correlation function is less affected by scale dependent effects than the power spectrum. We show that a model for the correlation function proposed by Crocce & Scoccimarro (2008), based on renormalised perturbation theory, gives an essentially unbiased measurement of the dark energy equation of state. This means that information from the large scale shape of the correlation function, in addition to the form of the BAO peak, can be used to provide robust constraints on cosmological parameters. The correlation function therefore provides a better constraint on the distance scale (~50% smaller errors with no systematic bias) than the more conservative approach required when using the power spectrum (i.e. which requires amplitude and long wavelength shape information to be discarded).

What is the right formalism to search for resonances? II. The pentaquark chain

91 - JPAC Collaboration: A. Pilloni , J. Nys , M. Mikhasenko 2018

We discuss the differences between several partial-wave analysis formalisms used in the construction of three-body decay amplitudes involving fermions. Specifically, we consider the decay Lambda_b -> psi p K- , where the hidden charm pentaquark signa l has been reported. We analyze the analytical properties of the amplitudes and separate kinematical and dynamical singularities. The result is an amplitude with the minimal energy dependence compatible with the S-matrix principles.

فيزياء الطاقة العالية - الظواهر فيزياء الطاقة العالية - التجربة نظرية نووية

What Is Around The Camera?

67 - Stamatios Georgoulis , Konstantinos Rematas , Tobias Ritschel 2016

How much does a single image reveal about the environment it was taken in? In this paper, we investigate how much of that information can be retrieved from a foreground object, combined with the background (i.e. the visible part of the environment). Assuming it is not perfectly diffuse, the foreground object acts as a complexly shaped and far-from-perfect mirror. An additional challenge is that its appearance confounds the light coming from the environment with the unknown materials it is made of. We propose a learning-based approach to predict the environment from multiple reflectance maps that are computed from approximate surface normals. The proposed method allows us to jointly model the statistics of environments and material properties. We train our system from synthesized training data, but demonstrate its applicability to real-world data. Interestingly, our analysis shows that the information obtained from objects made out of multiple materials often is complementary and leads to better performance.

الرؤية الحاسوبية وتمييز الأنماط

DiverseNet: When One Right Answer is not Enough

142 - Michael Firman , Neill D. F. Campbell , Lourdes Agapito 2020

Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a single test-time prediction for each query, failing to find other modes in the output space. Existing methods that allow for sampling often sacrifice speed or accuracy. We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. For a single input, we learn to predict a range of possible answers. We compare favorably to methods that seek diversity through an ensemble of networks. Such stochastic multiple choice learning faces mode collapse, where one or more ensemble members fail to receive any training signal. Our best performing solution can be deployed for various tasks, and just involves small modifications to the existing single-mode architecture, loss function, and training regime. We demonstrate that our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة اليرموك الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

What is the right way to represent document images?

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً