No Arabic abstract
An implicit but pervasive hypothesis of modern computer vision research is that convolutional neural network (CNN) architectures that perform better on ImageNet will also perform better on other vision datasets. We challenge this hypothesis through an extensive empirical study for which we train 500 sampled CNN architectures on ImageNet as well as 8 other image classification datasets from a wide array of application domains. The relationship between architecture and performance varies wildly, depending on the datasets. For some of them, the performance correlation with ImageNet is even negative. Clearly, it is not enough to optimize architectures solely for ImageNet when aiming for progress that is relevant for all applications. Therefore, we identify two dataset-specific performance indicators: the cumulative width across layers as well as the total depth of the network. Lastly, we show that the range of dataset variability covered by ImageNet can be significantly extended by adding ImageNet subsets restricted to few classes.
We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models inability to generalize to slightly harder images than those found in the original test sets.
In deep learning era, pretrained models play an important role in medical image analysis, in which ImageNet pretraining has been widely adopted as the best way. However, it is undeniable that there exists an obvious domain gap between natural images and medical images. To bridge this gap, we propose a new pretraining method which learns from 700k radiographs given no manual annotations. We call our method as Comparing to Learn (C2L) because it learns robust features by comparing different image representations. To verify the effectiveness of C2L, we conduct comprehensive ablation studies and evaluate it on different tasks and datasets. The experimental results on radiographs show that C2L can outperform ImageNet pretraining and previous state-of-the-art approaches significantly. Code and models are available.
Being complex-valued and low in signal-to-noise ratios, magnitude-based diffusion MRI is confounded by the noise-floor that falsely elevates signal magnitude and incurs bias to the commonly used diffusion indices, such as fractional anisotropy (FA). To avoid noise-floor, most existing phase correction methods explore improving filters to estimate the noise-free background phase. In this work, after diving into the phase correction procedures, we argue that even a perfect filter is insufficient for phase correction because the correction procedures are incapable of distinguishing sign-symbols of noise, resulting in artifacts (textit{i.e.}, arbitrary signal loss). With this insight, we generalize the definition of noise-floor to a complex polar coordinate system and propose a calibration procedure that could conveniently distinguish noise sign symbols. The calibration procedure is conceptually simple and easy to implement without relying on any external technique while keeping distinctly effective.
There are two canonical approaches to treating the Standard Model as an Effective Field Theory (EFT): Standard Model EFT (SMEFT), expressed in the electroweak symmetric phase utilizing the Higgs doublet, and Higgs EFT (HEFT), expressed in the broken phase utilizing the physical Higgs boson and an independent set of Goldstone bosons. HEFT encompasses SMEFT, so understanding whether SMEFT is sufficient motivates identifying UV theories that require HEFT as their low energy limit. This distinction is complicated by field redefinitions that obscure the naive differences between the two EFTs. By reformulating the question in a geometric language, we derive concrete criteria that can be used to distinguish SMEFT from HEFT independent of the chosen field basis. We highlight two cases where perturbative new physics must be matched onto HEFT: (i) the new particles derive all of their mass from electroweak symmetry breaking, and (ii) there are additional sources of electroweak symmetry breaking. Additionally, HEFT has a broader practical application: it can provide a more convergent parametrization when new physics lies near the weak scale. The ubiquity of models requiring HEFT suggests that SMEFT is not enough.
Yes, and no. We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore develop a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels. Furthermore, we find the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Nevertheless, we find our annotation procedure to have largely remedied the errors in the original labels, reinforcing ImageNet as a powerful benchmark for future research in visual recognition.