ترغب بنشر مسار تعليمي؟ اضغط هنا

In recent years, single image dehazing models (SIDM) based on atmospheric scattering model (ASM) have achieved remarkable results. However, it is noted that ASM-based SIDM degrades its performance in dehazing real world hazy images due to the limited modelling ability of ASM where the atmospheric light factor (ALF) and the angular scattering coefficient (ASC) are assumed as constants for one image. Obviously, the hazy images taken in real world cannot always satisfy this assumption. Such generating modelling mismatch between the real-world images and ASM sets up the upper bound of trained ASM-based SIDM for dehazing. Bearing this in mind, in this study, a new fully non-homogeneous atmospheric scattering model (FNH-ASM) is proposed for well modeling the hazy images under complex conditions where ALF and ASC are pixel dependent. However, FNH-ASM brings difficulty in practical application. In FNH-ASM based SIDM, the estimation bias of parameters at different positions lead to different distortion of dehazing result. Hence, in order to reduce the influence of parameter estimation bias on dehazing results, two new cost sensitive loss functions, beta-Loss and D-Loss, are innovatively developed for limiting the parameter bias of sensitive positions that have a greater impact on the dehazing result. In the end, based on FNH-ASM, an end-to-end CNN-based dehazing network, FNHD-Net, is developed, which applies beta-Loss and D-Loss. Experimental results demonstrate the effectiveness and superiority of our proposed FNHD-Net for dehazing on both synthetic and real-world images. And the performance improvement of our method increases more obviously in dense and heterogeneous haze scenes.
Most existing human pose estimation (HPE) methods exploit multi-scale information by fusing feature maps of four different spatial sizes, ie $1/4$, $1/8$, $1/16$, and $1/32$ of the input image. There are two drawbacks of this strategy: 1) feature map s of different spatial sizes may be not well aligned spatially, which potentially hurts the accuracy of keypoint location; 2) these scales are fixed and inflexible, which may restrict the generalization ability over various human sizes. Towards these issues, we propose an adaptive dilated convolution (ADC). It can generate and fuse multi-scale features of the same spatial sizes by setting different dilation rates for different channels. More importantly, these dilation rates are generated by a regression module. It enables ADC to adaptively adjust the fused scales and thus ADC may generalize better to various human sizes. ADC can be end-to-end trained and easily plugged into existing methods. Extensive experiments show that ADC can bring consistent improvements to various HPE methods. The source codes will be released for further research.
132 - Dong An , Yuankai Qi , Yan Huang 2021
Vision and Language Navigation (VLN) requires an agent to navigate to a target location by following natural language instructions. Most of existing works represent a navigation candidate by the feature of the corresponding single view where the cand idate lies in. However, an instruction may mention landmarks out of the single view as references, which might lead to failures of textual-visual matching of existing methods. In this work, we propose a multi-module Neighbor-View Enhanced Model (NvEM) to adaptively incorporate visual contexts from neighbor views for better textual-visual matching. Specifically, our NvEM utilizes a subject module and a reference module to collect contexts from neighbor views. The subject module fuses neighbor views at a global level, and the reference module fuses neighbor objects at a local level. Subjects and references are adaptively determined via attention mechanisms. Our model also includes an action module to utilize the strong orientation guidance (e.g., turn left) in instructions. Each module predicts navigation action separately and their weighted sum is used for predicting the final action. Extensive experimental results demonstrate the effectiveness of the proposed method on the R2R and R4R benchmarks against several state-of-the-art navigators, and NvEM even beats some pre-training ones. Our code is available at https://github.com/MarSaKi/NvEM.
99 - Bo-Yan Huang 2021
After observing the Higgs boson by the ATLAS and CMS experiments at the LHC, accurate measurements of its properties, which allow us to study the electroweak symmetry breaking mechanism, become a high priority for particle physics. The most promising of extracting the Higgs self-coupling at hadron colliders is by examining the double Higgs production, especially in the $b bar{b} gamma gamma$ channel. In this work, we presented full loop calculation for both SM and New Physics effects of the Higgs pair production to next-to-leading-order (NLO), including loop-induced processes $ggto HH$, $ggto HHg$, and $qg to qHH$. We also included the calculation of the corrections from diagrams with only one QCD coupling in $qg to qHH$, which was neglected in the previous studies. With the latest observed limit on the HH production cross-section, we studied the constraints on the effective Higgs couplings for the LHC at center-of-mass energies of 14 TeV and a provisional 100 TeV proton collider within the Future-Circular-Collider (FCC) project. To obtain results better than using total cross-section alone, we focused on the $b bar{b} gamma gamma$ channel and divided the differential cross-section into low and high bins based on the total invariant mass and $p_{T}$ spectra. The new physics effects are further constrained by including extra kinematic information. However, some degeneracy persists, as shown in previous studies, especially in determining the Higgs trilinear coupling. Our analysis shows that the degeneracy is reduced by including the full NLO corrections.
In this work, we address the task of referring image segmentation (RIS), which aims at predicting a segmentation mask for the object described by a natural language expression. Most existing methods focus on establishing unidirectional or directional relationships between visual and linguistic features to associate two modalities together, while the multi-scale context is ignored or insufficiently modeled. Multi-scale context is crucial to localize and segment those objects that have large scale variations during the multi-modal fusion process. To solve this problem, we propose a simple yet effective Cascaded Multi-modal Fusion (CMF) module, which stacks multiple atrous convolutional layers in parallel and further introduces a cascaded branch to fuse visual and linguistic features. The cascaded branch can progressively integrate multi-scale contextual information and facilitate the alignment of two modalities during the multi-modal fusion process. Experimental results on four benchmark datasets demonstrate that our method outperforms most state-of-the-art methods. Code is available at https://github.com/jianhua2022/CMF-Refseg.
Qualitative relationships illustrate how changing one property (e.g., moving velocity) affects another (e.g., kinetic energy) and constitutes a considerable portion of textual knowledge. Current approaches use either semantic parsers to transform nat ural language inputs into logical expressions or a black-box model to solve them in one step. The former has a limited application range, while the latter lacks interpretability. In this work, we categorize qualitative reasoning tasks into two types: prediction and comparison. In particular, we adopt neural network modules trained in an end-to-end manner to simulate the two reasoning processes. Experiments on two qualitative reasoning question answering datasets, QuaRTz and QuaRel, show our methods effectiveness and generalization capability, and the intermediate outputs provided by the modules make the reasoning process interpretable.
118 - Yu Bai , Yang Gao , Heyan Huang 2021
Parallel cross-lingual summarization data is scarce, requiring models to better use the limited available cross-lingual resources. Existing methods to do so often adopt sequence-to-sequence networks with multi-task frameworks. Such approaches apply m ultiple decoders, each of which is utilized for a specific task. However, these independent decoders share no parameters, hence fail to capture the relationships between the discrete phrases of summaries in different languages, breaking the connections in order to transfer the knowledge of the high-resource languages to low-resource languages. To bridge these connections, we propose a novel Multi-Task framework for Cross-Lingual Abstractive Summarization (MCLAS) in a low-resource setting. Employing one unified decoder to generate the sequential concatenation of monolingual and cross-lingual summaries, MCLAS makes the monolingual summarization task a prerequisite of the cross-lingual summarization (CLS) task. In this way, the shared decoder learns interactions involving alignments and summary patterns across languages, which encourages attaining knowledge transfer. Experiments on two CLS datasets demonstrate that our model significantly outperforms three baseline models in both low-resource and full-dataset scenarios. Moreover, in-depth analysis on the generated summaries and attention heads verifies that interactions are learned well using MCLAS, which benefits the CLS task under limited parallel resources.
We present a conceptual design study of external calibrators in the 21 cm experiment towards detecting the globally averaged radiation of the epoch of reionization (EoR). Employment of external calibrator instead of internal calibrator commonly used in current EoR experiments allows to remove instrumental effects such as beam pattern, receiver gain and instability of the system if the conventional three-position switch measurements are implemented in a short time interval. Furthermore, in the new design the antenna system is placed in an underground anechoic chamber with an open/closing ceiling to maximally reduce the environmental effect such as RFI and ground radiation/reflection. It appears that three of the four external calibrators proposed in this paper, including two indoor artificial transmitters and one outdoor celestial radiation (the Galactic polarization), fail to meet our purpose. Diurnal motion of the Galactic diffuse emission turns to be the most possible source as an external calibrator, for which we have discussed the observational strategy and the algorithm of extracting the EoR signal.
Dialogue state tracking (DST) plays a key role in task-oriented dialogue systems to monitor the users goal. In general, there are two strategies to track a dialogue state: predicting it from scratch and updating it from previous state. The scratch-ba sed strategy obtains each slot value by inquiring all the dialogue history, and the previous-based strategy relies on the current turn dialogue to update the previous dialogue state. However, it is hard for the scratch-based strategy to correctly track short-dependency dialogue state because of noise; meanwhile, the previous-based strategy is not very useful for long-dependency dialogue state tracking. Obviously, it plays different roles for the context information of different granularity to track different kinds of dialogue states. Thus, in this paper, we will study and discuss how the context information of different granularity affects dialogue state tracking. First, we explore how greatly different granularities affect dialogue state tracking. Then, we further discuss how to combine multiple granularities for dialogue state tracking. Finally, we apply the findings about context granularity to few-shot learning scenario. Besides, we have publicly released all codes.
Most existing studies on the double/debiased machine learning method concentrate on the causal parameter estimation recovering from the first-order orthogonal score function. In this paper, we will construct the $k^{mathrm{th}}$-order orthogonal scor e function for estimating the average treatment effect (ATE) and present an algorithm that enables us to obtain the debiased estimator recovered from the score function. Such a higher-order orthogonal estimator is more robust to the misspecification of the propensity score than the first-order one does. Besides, it has the merit of being applicable with many machine learning methodologies such as Lasso, Random Forests, Neural Nets, etc. We also undergo comprehensive experiments to test the power of the estimator we construct from the score function using both the simulated datasets and the real datasets.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا