ترغب بنشر مسار تعليمي؟ اضغط هنا

We study joint learning of Convolutional Neural Network (CNN) and Transformer for vision-language pre-training (VLPT) which aims to learn cross-modal alignments from millions of image-text pairs. State-of-the-art approaches extract salient image regi ons and align regions with words step-by-step. As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages. In this paper, we propose SOHO to See Out of tHe bOx that takes a whole image as input, and learns vision-language representation in an end-to-end manner. SOHO does not require bounding box annotations which enables inference 10 times faster than region-based approaches. In particular, SOHO learns to extract comprehensive yet compact image features through a visual dictionary (VD) that facilitates cross-modal understanding. VD is designed to represent consistent visual abstractions of similar semantics. It is updated on-the-fly and utilized in our proposed pre-training task Masked Visual Modeling (MVM). We conduct experiments on four well-established vision-language tasks by following standard VLPT settings. In particular, SOHO achieves absolute gains of 2.0% R@1 score on MSCOCO text retrieval 5k test split, 1.5% accuracy on NLVR$^2$ test-P split, 6.7% accuracy on SNLI-VE test split, respectively.
We study the structure of stationary channel flows predicted by the regularized 13-moment equations. Compared with the previous work [P. Taheri et al., Phys. Fluids, 21 (2009), 017102], we focus on gases whose molecules satisfy the general inverse po wer law. The analytical solutions are obtained for the semi-linear equations, and the structures of Couette, Fourier, and Poiseuille flows are solved by coupling the general solutions with newly derived boundary conditions. The results show excellent agreement with the reference solution in the slip-flow regime. Our results also show that the R13 equations derived from inverse power law models can have better accuracy than the R13 equations of Maxwell molecules with altered viscosity.
We propose Pixel-BERT to align image pixels with text by deep multi-modal transformers that jointly learn visual and language embedding in a unified end-to-end framework. We aim to build a more accurate and thorough connection between image pixels an d language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks. Our Pixel-BERT which aligns semantic connection in pixel and text level solves the limitation of task-specific visual representation for vision and language tasks. It also relieves the cost of bounding box annotations and overcomes the unbalance between semantic labels in visual task and language semantic. To provide a better representation for down-stream tasks, we pre-train a universal end-to-end model with image and sentence pairs from Visual Genome dataset and MS-COCO dataset. We propose to use a random pixel sampling mechanism to enhance the robustness of visual representation and to apply the Masked Language Model and Image-Text Matching as pre-training tasks. Extensive experiments on downstream tasks with our pre-trained model show that our approach makes the most state-of-the-arts in downstream tasks, including Visual Question Answering (VQA), image-text retrieval, Natural Language for Visual Reasoning for Real (NLVR). Particularly, we boost the performance of a single model in VQA task by 2.17 points compared with SOTA under fair comparison.
We propose to boost VQA by leveraging more powerful feature extractors by improving the representation ability of both visual and text features and the ensemble of models. For visual feature, some detection techniques are used to improve the detector . For text feature, we adopt BERT as the language model and find that it can significantly improve VQA performance. Our solution won the second place in the VQA Challenge 2019.
117 - Zhicheng Hu , Zhenning Cai 2019
We introduce a numerical solver for the spatially inhomogeneous Boltzmann equation using the Burnett spectral method. The modelling and discretization of the collision operator are based on the previous work [Z. Cai, Y. Fan, and Y. Wang, Burnett spec tral method for the spatially homogeneous Boltzmann equation, arXiv:1810.07804], which is the hybridization of the BGK operator for higher moments and the quadratic collision operator for lower moments. To ensure the preservation of the equilibrium state, we introduce an additional term to the discrete collision operator, which equals zero when the number of degrees of freedom tends to infinity. Compared with the previous work [Z. Hu, Z. Cai, and Y. Wang,Numerical simulation of microflows using Hermite spectral methods, arXiv:1807.06236], the computational cost is reduced by one order. Numerical experiments such as shock structure calculation and Fourier flows are carried out to show the efficiency and accuracy of our numerical method.
79 - Zhicheng Hu , Guanghui Hu 2018
In [Z. Hu, R. Li, and Z. Qiao. Acceleration for microflow simulations of high-order moment models by using lower-order model correction. J. Comput. Phys., 327:225-244, 2016], it has been successfully demonstrated that using lower-order moment model c orrection is a promising idea to accelerate the steady-state computation of high-order moment models of the Boltzmann equation. To develop the existing solver, the following aspects are studied in this paper. First, the finite volume method with linear reconstruction is employed for high-resolution spatial discretization so that the degrees of freedom in spatial space could be reduced remarkably without loss of accuracy. Second, by introducing an appropriate parameter $tau$ in the correction step, it is found that the performance of the solver can be improved significantly, i.e., more levels would be involved in the solver, which further accelerates the convergence of the method. Third, Heuns method is employed as the smoother in each level to enhance the robustness of the solver. Numerical experiments in microflows are carried out to demonstrate the efficiency and to investigate the behavior of the new solver. In addition, several order reduction strategies for the choice of the order sequence of the solver are tested, and the strategy $m_{l-1} = lceil m_{l} / 2 rceil$ is found to be most efficient.
We propose a Hermite spectral method for the spatially inhomogeneous Boltzmann equation. For the inverse-power-law model, we generalize an approximate quadratic collision operator defined in the normalized and dimensionless setting to an operator for arbitrary distribution functions. An efficient algorithm with a fast transform is introduced to discretize this new collision operator. The method is tested for one-dimensional benchmark microflow problems.
We study the acceleration of steady-state computation for microflow, which is modeled by the high-order moment models derived recently from the steady-state Boltzmann equation with BGK-type collision term. By using the lower-order model correction, a novel nonlinear multi-level moment solver is developed. Numerical examples verify that the resulting solver improves the convergence significantly thus is able to accelerate the steady-state computation greatly. The behavior of the solver is also numerically investigated. It is shown that the convergence rate increases, indicating the solver would be more efficient, as the total levels increases. Three order reduction strategies of the solver are considered. Numerical results show that the most efficient order reduction strategy would be $m_{l-1} = lceil m_{l} / 2 rceil$.
431 - Zhicheng Hu , Ruo Li 2014
We develop a nonlinear multigrid method to solve the steady state of microflow, which is modeled by the high order moment system derived recently for the steady-state Boltzmann equation with ES-BGK collision term. The solver adopts a symmetric Gauss- Seidel iterative scheme nested by a local Newton iteration on grid cell level as its smoother. Numerical examples show that the solver is insensitive to the parameters in the implementation thus is quite robust. It is demonstrated that expected efficiency improvement is achieved by the proposed method in comparison with the direct time-stepping scheme.
This paper studies the numerical solution of traveling singular sources problems. In such problems, a big challenge is the sources move with different speeds, which are described by some ordinary differential equations. A predictor-corrector algorith m is presented to simulate the position of singular sources. Then a moving mesh method in conjunction with domain decomposition is derived for the underlying PDE. According to the positions of the sources, the whole domain is splitted into several subdomains, where moving mesh equations are solved respectively. On the resulting mesh, the computation of jump $[dot{u}]$ is avoided and the discretization of the underlying PDE is reduced into only two cases. In addition, the new method has a desired second-order of the spatial convergence. Numerical examples are presented to illustrate the convergence rates and the efficiency of the method. Blow-up phenomenon is also investigated for various motions of the sources.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا