ترغب بنشر مسار تعليمي؟ اضغط هنا

An important scenario for image quality assessment (IQA) is to evaluate image restoration (IR) algorithms. The state-of-the-art approaches adopt a full-reference paradigm that compares restored images with their corresponding pristine-quality images. However, pristine-quality images are usually unavailable in blind image restoration tasks and real-world scenarios. In this paper, we propose a practical solution named degraded-reference IQA (DR-IQA), which exploits the inputs of IR models, degraded images, as references. Specifically, we extract reference information from degraded images by distilling knowledge from pristine-quality images. The distillation is achieved through learning a reference space, where various degraded images are encouraged to share the same feature statistics with pristine-quality images. And the reference space is optimized to capture deep image priors that are useful for quality assessment. Note that pristine-quality images are only used during training. Our work provides a powerful and differentiable metric for blind IRs, especially for GAN-based methods. Extensive experiments show that our results can even be close to the performance of full-reference settings.
Multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://github.com/hou-yz/MVDeTr.
Sampling-based motion planning algorithms such as RRT* are well-known for their ability to quickly find an initial solution and then converge to the optimal solution asymptotically. However, the convergence rate can be slow for highdimensional planni ng problems, particularly for dynamical systems where the sampling space is not just the configuration space but the full state space. In this paper, we introduce the idea of using a partial-final-state-free (PFF) optimal controller in kinodynamic RRT* [1] to reduce the dimensionality of the sampling space. Instead of sampling the full state space, the proposed accelerated kinodynamic RRT*, called Kino-RRT*, only samples part of the state space, while the rest of the states are selected by the PFF optimal controller. We also propose a delayed and intermittent update of the optimal arrival time of all the edges in the RRT* tree to decrease the computation complexity of the algorithm. We tested the proposed algorithm using 4-D and 10-D state-space linear systems and showed that Kino-RRT* converges much faster than the kinodynamic RRT* algorithm.
Understanding classifier decision under novel environments is central to the community, and a common practice is evaluating it on labeled test sets. However, in real-world testing, image annotations are difficult and expensive to obtain, especially w hen the test environment is changing. A natural question then arises: given a trained classifier, can we evaluate its accuracy on varying unlabeled test sets? In this work, we train semantic classification and rotation prediction in a multi-task way. On a series of datasets, we report an interesting finding, i.e., the semantic classification accuracy exhibits a strong linear relationship with the accuracy of the rotation prediction task (Pearsons Correlation r > 0.88). This finding allows us to utilize linear regression to estimate classifier performance from the accuracy of rotation prediction which can be obtained on the test set through the freely generated rotation labels.
A new belief space planning algorithm, called covariance steering Belief RoadMap (CS-BRM), is introduced, which is a multi-query algorithm for motion planning of dynamical systems under simultaneous motion and observation uncertainties. CS-BRM extend s the probabilistic roadmap (PRM) approach to belief spaces and is based on the recently developed theory of covariance steering (CS) that enables guaranteed satisfaction of terminal belief constraints in finite-time. The nodes in the CS-BRM are sampled in belief space and represent distributions of the system states. A covariance steering controller steers the system from one BRM node to another, thus acting as an edge controller of the corresponding belief graph that ensures belief constraint satisfaction. After the edge controller is computed, a specific edge cost is assigned to that edge. The CS-BRM algorithm allows the sampling of non-stationary belief nodes, and thus is able to explore the velocity space and find efficient motion plans. The performance of CS-BRM is evaluated and compared to a previous belief space planning method, demonstrating the benefits of the proposed approach.
The striking resemblance of high multiplicity proton-proton (pp) collisions at the LHC to heavy ion collisions challenges our conventional wisdom on the formation of the Quark-Gluon Plasma (QGP). A consistent explanation of the collectivity phenomena in pp will help us to understand the mechanism that leads to the QGP-like signals in small systems. In this study, we introduce a transport model approach connecting the initial conditions provided by PYTHIA8 with subsequent AMPT rescatterings to study the collective behavior in high energy pp collisions. The multiplicity dependence of light hadron productions from this model is in reasonable agreement with the pp $sqrt{s}=13$ TeV experimental data. It is found in the comparisons that both the partonic and hadronic final state interactions are important for the generation of the radial flow feature of the pp transverse momentum spectra. The study also shows that the long range two particle azimuthal correlation in high multiplicity pp events is sensitive to the proton sub-nucleon spatial fluctuations.
Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs). Given a video, we aim to localize video segments containing an AVE and identify its category. In order to learn discriminative features for a classifi er, it is pivotal to identify the helpful (or positive) audio-visual segment pairs while filtering out the irrelevant ones, regardless whether they are synchronized or not. To this end, we propose a new positive sample propagation (PSP) module to discover and exploit the closely related audio-visual pairs by evaluating the relationship within every possible pair. It can be done by constructing an all-pair similarity map between each audio and visual segment, and only aggregating the features from the pairs with high similarity scores. To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed. We also propose a new weighting branch to better exploit the temporal correlations in weakly supervised setting. We perform extensive experiments on the public AVE dataset and achieve new state-of-the-art accuracy in both fully and weakly supervised settings, thus verifying the effectiveness of our method.
We extensively study the system size dependence of nuclear collisions with a multi-phase transport model. Previously certain key parameters for the initial condition needed significantly different values for $pp$ and central $AA$ collisions for the m odel to reasonably describe the yields and transverse momentum spectra of the bulk matter in those collision systems. Here we scale two key parameters, the Lund string fragmentation parameter $b_L$ and the minijet transverse momentum cutoff $p_0$, with local nuclear thickness functions from the two colliding nuclei. This allows the model to use the parameter values for $pp$ collisions with the local nuclear scaling to describe the system size and centrality dependences of nuclear collisions self consistently. In addition to providing good descriptions of $pp$ collisions from 23.6 GeV to 13 TeV and reasonable descriptions of the centrality dependence of charged particle yields for Au+Au collisions from $7.7A$ GeV to $200A$ GeV and Pb+Pb collisions at LHC energies, the improved model can now well describe the centrality dependence of the mean transverse momentum of charged particles below $p_{rm T} lesssim 2$ GeV. It works similarly well for smaller systems including $p$Pb, Cu+Cu and Xe+Xe collisions.
Quantum memory is the core device for the construction of large-scale quantum networks. For scalable and convenient practical applications, integrated optical memories, especially on-chip optical memories, are crucial requirements because they can be easily integrated with other on-chip devices. Here, we report the coherent optical memory based on a type-IV waveguide fabricated on the surface of a rare-earth ion-doped crystal (i.e. $mathrm{Eu^{3+}}$:$mathrm{Y_2SiO_5}$). The properties of the optical transition ($mathrm{{^7}F{_0}rightarrow{^5}D{_0}}$) of the $mathrm{Eu^{3+}}$ ions inside the surface waveguide are well preserved compared to those of the bulk crystal. Spin-wave atomic frequency comb storage is demonstrated inside the type-IV waveguide. The reliability of this device is confirmed by the high interference visibility of ${97pm 1%}$ between the retrieval pulse and the reference pulse. The developed on-chip optical memory paves the way towards integrated quantum nodes.
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem, where existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering. A potential drawback of using pseudo labels is that errors may accumulate and it is challenging to estimate the number of pseudo IDs. We introduce a different unsupervised method that allows us to learn pedestrian embeddings from raw videos, without resorting to pseudo labels. The goal is to construct a self-supervised pretext task that matches the person re-ID objective. Inspired by the emph{data association} concept in multi-object tracking, we propose the textbf{Cyc}le textbf{As}sociation (textbf{CycAs}) task: after performing data association between a pair of video frames forward and then backward, a pedestrian instance is supposed to be associated to itself. To fulfill this goal, the model must learn a meaningful representation that can well describe correspondences between instances in frame pairs. We adapt the discrete association process to a differentiable form, such that end-to-end training becomes feasible. Experiments are conducted in two aspects: We first compare our method with existing unsupervised re-ID methods on seven benchmarks and demonstrate CycAs superiority. Then, to further validate the practical value of CycAs in real-world applications, we perform training on self-collected videos and report promising performance on standard test sets.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا