ترغب بنشر مسار تعليمي؟ اضغط هنا

159 - Jiajian Zhao , Yifan Zhao , Jia Li 2021
The crucial problem in vehicle re-identification is to find the same vehicle identity when reviewing this object from cross-view cameras, which sets a higher demand for learning viewpoint-invariant representations. In this paper, we propose to solve this problem from two aspects: constructing robust feature representations and proposing camera-sensitive evaluations. We first propose a novel Heterogeneous Relational Complement Network (HRCN) by incorporating region-specific features and cross-level features as complements for the original high-level output. Considering the distributional differences and semantic misalignment, we propose graph-based relation modules to embed these heterogeneous features into one unified high-dimensional space. On the other hand, considering the deficiencies of cross-camera evaluations in existing measures (i.e., CMC and AP), we then propose a Cross-camera Generalization Measure (CGM) to improve the evaluations by introducing position-sensitivity and cross-camera generalization penalties. We further construct a new benchmark of existing models with our proposed CGM and experimental results reveal that our proposed HRCN model achieves new state-of-the-art in VeRi-776, VehicleID, and VERI-Wild.
We propose a new computationally-efficient first-order algorithm for Model-Agnostic Meta-Learning (MAML). The key enabling technique is to interpret MAML as a bilevel optimization (BLO) problem and leverage the sign-based SGD(signSGD) as a lower-leve l optimizer of BLO. We show that MAML, through the lens of signSGD-oriented BLO, naturally yields an alternating optimization scheme that just requires first-order gradients of a learned meta-model. We term the resulting MAML algorithm Sign-MAML. Compared to the conventional first-order MAML (FO-MAML) algorithm, Sign-MAML is theoretically-grounded as it does not impose any assumption on the absence of second-order derivatives during meta training. In practice, we show that Sign-MAML outperforms FO-MAML in various few-shot image classification tasks, and compared to MAML, it achieves a much more graceful tradeoff between classification accuracy and computation efficiency.
The cosmological evolution can modify the dark matter (DM) properties in the early Universe to be vastly different from the properties today. Therefore, the relation between the relic abundance and the DM constraints today needs to be revisited. We p ropose novel textit{transient} annihilations of DM which helps to alleviate the pressure from DM null detection results. As a concrete example, we consider the vector portal DM and focus on the mass evolution of the dark photon. When the Universe cools down, the gauge boson mass can increase monotonically and go across several important thresholds; opening new transient annihilation channels in the early Universe. Those channels are either forbidden or weakened at the late Universe which helps to evade the indirect searches. In particular, the transient resonant channel can survive direct detection (DD) without tuning the DM to be half of the dark photon mass and can be soon tested by future DD or collider experiments. A feature of the scenario is the existence of a light dark scalar.
95 - Dong Zhao , Jia Li , Hongyu Li 2021
Recently, the Vision Transformer (ViT) has shown impressive performance on high-level and low-level vision tasks. In this paper, we propose a new ViT architecture, named Hybrid Local-Global Vision Transformer (HyLoG-ViT), for single image dehazing. T he HyLoG-ViT block consists of two paths, the local ViT path and the global ViT path, which are used to capture local and global dependencies. The hybrid features are fused via convolution layers. As a result, the HyLoG-ViT reduces the computational complexity and introduces locality in the networks. Then, the HyLoG-ViT blocks are incorporated within our dehazing networks, which jointly learn the intrinsic image decomposition and image dehazing. Specifically, the network consists of one shared encoder and three decoders for reflectance prediction, shading prediction, and haze-free image generation. The tasks of reflectance and shading prediction can produce meaningful intermediate features that can serve as complementary features for haze-free image generation. To effectively aggregate the complementary features, we propose a complementary features selection module (CFSM) to select the useful ones for image dehazing. Extensive experiments on homogeneous, non-homogeneous, and nighttime dehazing tasks reveal that our proposed Transformer-based dehazing network can achieve comparable or even better performance than CNNs-based dehazing models.
215 - Jia Liu , Zhiping Chen , Huifu Xu 2021
In this paper, we consider a multistage expected utility maximization problem where the decision makers utility function at each stage depends on historical data and the information on the true utility function is incomplete. To mitigate the risk ari sing from ambiguity of the true utility, we propose a maximin robust model where the optimal policy is based on the worst sequence of utility functions from an ambiguity set constructed with partially available information about the decision makers preferences. We then show that the multistage maximin problem is time consistent when the utility functions are state-dependent and demonstrate with a counter example that the time consistency may not be retained when the utility functions are state-independent. With the time consistency, we show the maximin problem can be solved by a recursive formula whereby a one-stage maximin problem is solved at each stage beginning from the last stage. Moreover, we propose two approaches to construct the ambiguity set: a pairwise comparison approach and a $zeta$-ball approach where a ball of utility functions centered at a nominal utility function under $zeta$-metric is considered. To overcome the difficulty arising from solving the infinite dimensional optimization problem in computation of the worst-case expected utility value, we propose piecewise linear approximation of the utility functions and derive error bound for the approximation under moderate conditions. Finally we develop a scenario tree-based computational scheme for solving the multistage preference robust optimization model and report some preliminary numerical results.
The use of autonomous vehicles for chemical source localisation is a key enabling tool for disaster response teams to safely and efficiently deal with chemical emergencies. Whilst much work has been performed on source localisation using autonomous s ystems, most previous works have assumed an open environment or employed simplistic obstacle avoidance, separate to the estimation procedure. In this paper, we explore the coupling of the path planning task for both source term estimation and obstacle avoidance in a holistic framework. The proposed system intelligently produces potential gas sampling locations based on the current estimation of the wind field and the local map. Then a tree search is performed to generate paths toward the estimated source location that traverse around any obstacles and still allow for exploration of potentially superior sampling locations. The proposed informed tree planning algorithm is then tested against the Entrotaxis technique in a series of high fidelity simulations. The proposed system is found to reduce source position error far more efficiently than Entrotaxis in a feature rich environment, whilst also exhibiting vastly more consistent and robust results.
Person Re-Identification (Re-Id) in occlusion scenarios is a challenging problem because a pedestrian can be partially occluded. The use of local information for feature extraction and matching is still necessary. Therefore, we propose a Pose-guided inter-and intra-part relational transformer (Pirt) for occluded person Re-Id, which builds part-aware long-term correlations by introducing transformers. In our framework, we firstly develop a pose-guided feature extraction module with regional grouping and mask construction for robust feature representations. The positions of a pedestrian in the image under surveillance scenarios are relatively fixed, hence we propose an intra-part and inter-part relational transformer. The intra-part module creates local relations with mask-guided features, while the inter-part relationship builds correlations with transformers, to develop cross relationships between part nodes. With the collaborative learning inter- and intra-part relationships, experiments reveal that our proposed Pirt model achieves a new state of the art on the public occluded dataset, and further extensions on standard non-occluded person Re-Id datasets also reveal our comparable performances.
171 - Yifan Zhao , Jiawei Zhao , Jia Li 2021
Conventional RGB-D salient object detection methods aim to leverage depth as complementary information to find the salient regions in both modalities. However, the salient object detection results heavily rely on the quality of captured depth data wh ich sometimes are unavailable. In this work, we make the first attempt to solve the RGB-D salient object detection problem with a novel depth-awareness framework. This framework only relies on RGB data in the testing phase, utilizing captured depth data as supervision for representation learning. To construct our framework as well as achieving accurate salient detection results, we propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task: 1) a depth awareness module to excavate depth information and to mine ambiguous regions via adaptive depth-error weights, 2) a spatial-aware cross-modal interaction and a channel-aware cross-level interaction, exploiting the low-level boundary cues and amplifying high-level salient channels, and 3) a gated multi-scale predictor module to perceive the object saliency in different contextual scales. Besides its high performance, our proposed UTA network is depth-free for inference and runs in real-time with 43 FPS. Experimental evidence demonstrates that our proposed network not only surpasses the state-of-the-art methods on five public RGB-D SOD benchmarks by a large margin, but also verifies its extensibility on five public RGB SOD benchmarks.
3D human shape and pose estimation is the essential task for human motion analysis, which is widely used in many 3D applications. However, existing methods cannot simultaneously capture the relations at multiple levels, including spatial-temporal lev el and human joint level. Therefore they fail to make accurate predictions in some hard scenarios when there is cluttered background, occlusion, or extreme pose. To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework. STE consists of a series of cascaded blocks based on Multi-Head Self-Attention, and each block uses two parallel branches to learn spatial and temporal attention respectively. Meanwhile, KTD aims at modeling the joint level attention. It regards pose estimation as a top-down hierarchical process similar to SMPL kinematic tree. With the training set of 3DPW, MAED outperforms previous state-of-the-art methods by 6.2, 7.2, and 2.4 mm of PA-MPJPE on the three widely used benchmarks 3DPW, MPI-INF-3DHP, and Human3.6M respectively. Our code is available at https://github.com/ziniuwan/maed.
We present a learning-based approach for generating binaural audio from mono audio using multi-task learning. Our formulation leverages additional information from two related tasks: the binaural audio generation task and the flipped audio classifica tion task. Our learning model extracts spatialization features from the visual and audio input, predicts the left and right audio channels, and judges whether the left and right channels are flipped. First, we extract visual features using ResNet from the video frames. Next, we perform binaural audio generation and flipped audio classification using separate subnetworks based on visual features. Our learning method optimizes the overall loss based on the weighted sum of the losses of the two tasks. We train and evaluate our model on the FAIR-Play dataset and the YouTube-ASMR dataset. We perform quantitative and qualitative evaluations to demonstrate the benefits of our approach over prior techniques.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا