ترغب بنشر مسار تعليمي؟ اضغط هنا

What is the right way to represent document images?

273   0   0.0 ( 0 )
 نشر من قبل Gabriela Csurka
 تاريخ النشر 2016
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this article we study the problem of document image representation based on visual features. We propose a comprehensive experimental study that compares three types of visual document image representations: (1) traditional so-called shallow features, such as the RunLength and the Fisher-Vector descriptors, (2) deep features based on Convolutional Neural Networks, and (3) features extracted from hybrid architectures that take inspiration from the two previous ones. We evaluate these features in several tasks (i.e. classification, clustering, and retrieval) and in different setups (e.g. domain transfer) using several public and in-house datasets. Our results show that deep features generally outperform other types of features when there is no domain shift and the new task is closely related to the one used to train the model. However, when a large domain or task shift is present, the Fisher-Vector shallow features generalize better and often obtain the best results.



قيم البحث

اقرأ أيضاً

Hadron decay chains constitute one of the main sources of information on the QCD spectrum. We discuss the differences between several partial wave analysis formalisms used in the literature to build the amplitudes. We match the helicity amplitudes to the covariant tensor basis. Hereby, we pay attention to the analytical properties of the amplitudes and separate singularities of kinematical and dynamical nature. We study the analytical properties of the spin-orbit (LS) formalism, and some of the covariant tensor approaches. In particular, we explicitly build the amplitudes for the B -> psi pi K and B -> Dbar pi pi decays, and show that the energy dependence of the covariant approach is model dependent. We also show that the usual recursive construction of covariant tensors explicitly violates crossing symmetry, which would lead to different resonance parameters extracted from scattering and decay processes.
Oscillations in the baryon-photon fluid prior to recombination imprint different signatures on the power spectrum and correlation function of matter fluctuations. The measurement of these features using galaxy surveys has been proposed as means to de termine the equation of state of the dark energy. The accuracy required to achieve competitive constraints demands an extremely good understanding of systematic effects which change the baryonic acoustic oscillation (BAO) imprint. We use 50 very large volume N-body simulations to investigate the BAO signature in the two-point correlation function. The location of the BAO bump does not correspond to the sound horizon scale at the level of accuracy required by future measurements, even before any dynamical or statistical effects are considered. Careful modelling of the correlation function is therefore required to extract the cosmological information encoded on large scales. We find that the correlation function is less affected by scale dependent effects than the power spectrum. We show that a model for the correlation function proposed by Crocce & Scoccimarro (2008), based on renormalised perturbation theory, gives an essentially unbiased measurement of the dark energy equation of state. This means that information from the large scale shape of the correlation function, in addition to the form of the BAO peak, can be used to provide robust constraints on cosmological parameters. The correlation function therefore provides a better constraint on the distance scale (~50% smaller errors with no systematic bias) than the more conservative approach required when using the power spectrum (i.e. which requires amplitude and long wavelength shape information to be discarded).
We discuss the differences between several partial-wave analysis formalisms used in the construction of three-body decay amplitudes involving fermions. Specifically, we consider the decay Lambda_b -> psi p K- , where the hidden charm pentaquark signa l has been reported. We analyze the analytical properties of the amplitudes and separate kinematical and dynamical singularities. The result is an amplitude with the minimal energy dependence compatible with the S-matrix principles.
How much does a single image reveal about the environment it was taken in? In this paper, we investigate how much of that information can be retrieved from a foreground object, combined with the background (i.e. the visible part of the environment). Assuming it is not perfectly diffuse, the foreground object acts as a complexly shaped and far-from-perfect mirror. An additional challenge is that its appearance confounds the light coming from the environment with the unknown materials it is made of. We propose a learning-based approach to predict the environment from multiple reflectance maps that are computed from approximate surface normals. The proposed method allows us to jointly model the statistics of environments and material properties. We train our system from synthesized training data, but demonstrate its applicability to real-world data. Interestingly, our analysis shows that the information obtained from objects made out of multiple materials often is complementary and leads to better performance.
Many structured prediction tasks in machine vision have a collection of acceptable answers, instead of one definitive ground truth answer. Segmentation of images, for example, is subject to human labeling bias. Similarly, there are multiple possible pixel values that could plausibly complete occluded image regions. State-of-the art supervised learning methods are typically optimized to make a single test-time prediction for each query, failing to find other modes in the output space. Existing methods that allow for sampling often sacrifice speed or accuracy. We introduce a simple method for training a neural network, which enables diverse structured predictions to be made for each test-time query. For a single input, we learn to predict a range of possible answers. We compare favorably to methods that seek diversity through an ensemble of networks. Such stochastic multiple choice learning faces mode collapse, where one or more ensemble members fail to receive any training signal. Our best performing solution can be deployed for various tasks, and just involves small modifications to the existing single-mode architecture, loss function, and training regime. We demonstrate that our method results in quantitative improvements across three challenging tasks: 2D image completion, 3D volume estimation, and flow prediction.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا