ترغب بنشر مسار تعليمي؟ اضغط هنا

93 - Runsheng Xu , Hao Xiang , Xin Xia 2021
Employing Vehicle-to-Vehicle communication to enhance perception performance in self-driving technology has attracted considerable attention recently; however, the absence of a suitable open dataset for benchmarking algorithms has made it difficult t o develop and assess cooperative perception technologies. To this end, we present the first large-scale open simulated dataset for Vehicle-to-Vehicle perception. It contains over 70 interesting scenes, 111,464 frames, and 232,913 annotated 3D vehicle bounding boxes, collected from 8 towns in CARLA and a digital town of Culver City, Los Angeles. We then construct a comprehensive benchmark with a total of 16 implemented models to evaluate several information fusion strategies~(i.e. early, late, and intermediate fusion) with state-of-the-art LiDAR detection algorithms. Moreover, we propose a new Attentive Intermediate Fusion pipeline to aggregate information from multiple connected vehicles. Our experiments show that the proposed pipeline can be easily integrated with existing 3D LiDAR detectors and achieve outstanding performance even with large compression rates. To encourage more researchers to investigate Vehicle-to-Vehicle perception, we will release the dataset, benchmark methods, and all related codes in https://mobility-lab.seas.ucla.edu/opv2v/.
Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feat ure representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.
383 - Yizun Lin , Yuesheng Xu 2021
We estimate convergence rates for fixed-point iterations of a class of nonlinear operators which are partially motivated from solving convex optimization problems. We introduce the notion of the generalized averaged nonexpansive (GAN) operator with a positive exponent, and provide a convergence rate analysis of the fixed-point iteration of the GAN operator. The proposed generalized averaged nonexpansiveness is weaker than the averaged nonexpansiveness while stronger than nonexpansiveness. We show that the fixed-point iteration of a GAN operator with a positive exponent converges to its fixed-point and estimate the local convergence rate (the convergence rate in terms of the distance between consecutive iterates) according to the range of the exponent. We prove that the fixed-point iteration of a GAN operator with a positive exponent strictly smaller than 1 can achieve an exponential global convergence rate (the convergence rate in terms of the distance between an iterate and the solution). Furthermore, we establish the global convergence rate of the fixed-point iteration of a GAN operator, depending on both the exponent of generalized averaged nonexpansiveness and the exponent of the H$ddot{text{o}}$lder regularity, if the GAN operator is also H$ddot{text{o}}$lder regular. We then apply the established theory to three types of convex optimization problems that appear often in data science to design fixed-point iterative algorithms for solving these optimization problems and to analyze their convergence properties.
We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replaci ng applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.
Fusing intra-operative 2D transrectal ultrasound (TRUS) image with pre-operative 3D magnetic resonance (MR) volume to guide prostate biopsy can significantly increase the yield. However, such a multimodal 2D/3D registration problem is a very challeng ing task. In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research gaps by aligning a 2D TRUS frame with a 3D TRUS volume without requiring hardware tracking. The proposed FVR-Net utilizes a dual-branch feature extraction module to extract the information from TRUS frame and volume to estimate transformation parameters. We also introduce a differentiable 2D slice sampling module which allows gradients backpropagating from an unsupervised image similarity loss for content correspondence learning. Our model shows superior efficiency for real-time interventional guidance with highly competitive registration accuracy.
Video question answering is a challenging task, which requires agents to be able to understand rich video contents and perform spatial-temporal reasoning. However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer with relational reasoning; (2) During reasoning, appearance and motion features have complicated interdependence which are correlated and complementary to each other. Based on these observations, we propose a Dual-Visual Graph Reasoning Unit (DualVGR) which reasons over videos in an end-to-end fashion. The first contribution of our DualVGR is the design of an explainable Query Punishment Module, which can filter out irrelevant visual features through multiple cycles of reasoning. The second contribution is the proposed Video-based Multi-view Graph Attention Network, which captures the relations between appearance and motion features. Our DualVGR network achieves state-of-the-art performance on the benchmark MSVD-QA and SVQA datasets, and demonstrates competitive results on benchmark MSRVTT-QA datasets. Our code is available at https://github.com/MMIR/DualVGR-VideoQA.
341 - Yi Fan , Changsu Cao , Xusheng Xu 2021
Quantum computation represents a revolutionary means for solving problems in quantum chemistry. However, due to the limited coherence time and relatively low gate fidelity in the current noisy intermediate-scale quantum (NISQ) devices, realization of quantum algorithms for large chemical systems remains a major challenge. In this work, we demonstrate how the circuit depth of the unitary coupled cluster ansatz in the algorithm of variational quantum eigensolver can be significantly reduced by an energy-sorting strategy. Specifically, subsets of excitation operators are pre-screened from the unitary coupled-cluster singles and doubles (UCCSD) operator pool according to its contribution to the total energy. The procedure is then iteratively repeated until the convergence of the final energy to within the chemical accuracy. For demonstration, this method has been sucessfully applied to systems of molecules and periodic hydrogen chain. Particularly, a reduction up to 14 times in the number of operators is observed while retaining the accuracy of the origin UCCSD operator pools. This method can be widely extended to other variational ansatz other than UCC.
Recently, the transductive graph-based methods have achieved great success in the few-shot classification task. However, most existing methods ignore exploring the class-level knowledge that can be easily learned by humans from just a handful of samp les. In this paper, we propose an Explicit Class Knowledge Propagation Network (ECKPN), which is composed of the comparison, squeeze and calibration modules, to address this problem. Specifically, we first employ the comparison module to explore the pairwise sample relations to learn rich sample representations in the instance-level graph. Then, we squeeze the instance-level graph to generate the class-level graph, which can help obtain the class-level visual knowledge and facilitate modeling the relations of different classes. Next, the calibration module is adopted to characterize the relations of the classes explicitly to obtain the more discriminative class-level knowledge representations. Finally, we combine the class-level knowledge with the instance-level sample representations to guide the inference of the query samples. We conduct extensive experiments on four few-shot classification benchmarks, and the experimental results show that the proposed ECKPN significantly outperforms the state-of-the-art methods.
347 - She-Sheng Xue 2021
We study the homologous collapse of stellar nuclear core, the virial theorem for hadron collisional relaxations, and photon productions from hadron collisions. We thus show the gravo-thermal dynamical process that transforms gravitational energy to p hoton energy. The process is energetically and entropically favourable. The total baryon number conservation, Euler equation for energy-momentum conservation and Poissons equation for gravitational potential are adopted to describe homologous core collapses. The virial theorem determines the hadron collision energy gain from gravitational potential. The hadronic photon production rate determines the photon energy density. The time scales of macroscopic and microscopic processes are studied to verify approximations. As a result, we show the formation of opaque photon-pair spheres, whose total energy, size, temperature and number density, accounting for the main energetic features of Gamma-Ray Burst progenitors. We obtain the intrinsic correlations of these quantities. They depend only on the averaged thermal index of the stellar core. We discuss the possibility to confront them with observational data.
Hyperbolic phonon polaritons (HPhPs) sustained in van der Waals (vdW) materials exhibit extraordinary capabilities of confining long-wave electromagnetic fields to the deep subwavelength scale. In stark contrast to the uniaxial vdW hyperbolic materia ls such as hexagonal boron nitride (h-BN), the recently emerging biaxial hyperbolic materials such as {alpha}-MoO3 and {alpha}-V2O5 further bring new degree of freedoms in controlling light at the flatland, due to their distinctive in-plane hyperbolic dispersion. However, the controlling and focusing of such in-plane HPhPs are to date remain elusive. Here, we propose a versatile technique for launching, controlling and focusing of in-plane HPhPs in {alpha}-MoO3 with geometrically designed plasmonic antennas. By utilizing high resolution near-field optical imaging technique, we directly excited and mapped the HPhPs wavefronts in real space. We find that subwavelength manipulating and focusing behavior are strongly dependent on the curvature of antenna extremity. This strategy operates effectively in a broadband spectral region. These findings can not only provide fundamental insights into manipulation of light by biaxial hyperbolic crystals at nanoscale, but also open up new opportunities for planar nanophotonic applications.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا