ترغب بنشر مسار تعليمي؟ اضغط هنا

Recent studies indicate that hierarchical Vision Transformer with a macro architecture of interleaved non-overlapped window-based self-attention & shifted-window operation is able to achieve state-of-the-art performance in various visual recognition tasks, and challenges the ubiquitous convolutional neural networks (CNNs) using densely slid kernels. Most follow-up works attempt to replace the shifted-window operation with other kinds of cross-window communication paradigms, while treating self-attention as the de-facto standard for window-based information aggregation. In this manuscript, we question whether self-attention is the only choice for hierarchical Vision Transformer to attain strong performance, and the effects of different kinds of cross-window communication. To this end, we replace self-attention layers with embarrassingly simple linear mapping layers, and the resulting proof-of-concept architecture termed as LinMapper can achieve very strong performance in ImageNet-1k image recognition. Moreover, we find that LinMapper is able to better leverage the pre-trained representations from image recognition and demonstrates excellent transfer learning properties on downstream dense prediction tasks such as object detection and instance segmentation. We also experiment with other alternatives to self-attention for content aggregation inside each non-overlapped window under different cross-window communication approaches, which all give similar competitive results. Our study reveals that the textbf{macro architecture} of Swin model families, other than specific aggregation layers or specific means of cross-window communication, may be more responsible for its strong performance and is the real challenger to the ubiquitous CNNs dense sliding window paradigm. Code and models will be publicly available to facilitate future research.
Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation. However, how to establish a query based video instance segmentation (VIS) framework with elegant architecture and strong performance remains to be settled. In this paper, we present textbf{QueryTrack} (i.e., tracking instances as queries), a unified query based VIS framework fully leveraging the intrinsic one-to-one correspondence between instances and queries in QueryInst. The proposed method obtains 52.7 / 52.3 AP on YouTube-VIS-2019 / 2021 datasets, which wins the 2-nd place in the YouTube-VIS Challenge at CVPR 2021 textbf{with a single online end-to-end model, single scale testing & modest amount of training data}. We also provide QueryTrack-ResNet-50 baseline results on YouTube-VIS-2021 val set as references for the VIS community.
Time-delay signature (TDS) suppression of semiconductor lasers with external optical feedback is necessary to ensure the security of chaos-based secure communications. Here we numerically and experimentally demonstrate a technique to effectively supp ress the TDS of chaotic lasers using quantum noise. The TDS and dynamical complexity are quantified using the autocorrelation function and normalized permutation entropy at the feedback delay time, respectively. Quantum noise from quadrature fluctuations of vacuum state is prepared through balanced homodyne measurement. The effects of strength and bandwidth of quantum noise on chaotic TDS suppression and complexity enhancement are investigated numerically and experimentally. Compared to the original dynamics, the TDS of this quantum-noise improved chaos is suppressed up to 94% and the bandwidth suppression ratio of quantum noise to chaotic laser is 1:25. The experiment agrees well with the theory. The improved chaotic laser is potentially beneficial to chaos-based random number generation and secure communication.
Can Transformer perform $2mathrm{D}$ object-level recognition from a pure sequence-to-sequence perspective with minimal knowledge about the $2mathrm{D}$ spatial structure? To answer this question, we present You Only Look at One Sequence (YOLOS), a s eries of object detection models based on the naive Vision Transformer with the fewest possible modifications as well as inductive biases. We find that YOLOS pre-trained on the mid-sized ImageNet-$1k$ dataset only can already achieve competitive object detection performance on COCO, textit{e.g.}, YOLOS-Base directly adopted from BERT-Base can achieve $42.0$ box AP. We also discuss the impacts as well as limitations of current pre-train schemes and model scaling strategies for Transformer in vision through object detection. Code and model weights are available at url{https://github.com/hustvl/YOLOS}.
Distributed energy resource (DER) frequency regulations are promising technologies for future grid operation. Unlike conventional generators, DERs might require open communication networks to exchange signals with control centers, possibly through DE R aggregators; therefore, the impacts of the communication variations on the system stability need to be investigated. This paper develops a cyber-physical dynamic simulation model based on the Hierarchical Engine for Large-Scale Co-Simulation (HELICS) to evaluate the impact of the communication variations, such as delays in DER frequency regulations. The feasible delay range can be obtained under different parameter settings. The results show that the risk of instability generally increases with the communication delay.
Recently, query based object detection frameworks achieve comparable performance with previous state-of-the-art object detectors. However, how to fully leverage such frameworks to perform instance segmentation remains an open problem. In this paper, we present QueryInst (Instances as Queries), a query based instance segmentation method driven by parallel supervision on dynamic mask heads. The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage. This approach eliminates the explicit multi-stage mask head connection and the proposal distribution inconsistency issues inherent in non-query based multi-stage instance segmentation methods. We conduct extensive experiments on three challenging benchmarks, i.e., COCO, CityScapes, and YouTube-VIS to evaluate the effectiveness of QueryInst in instance segmentation and video instance segmentation (VIS) task. Specifically, using ResNet-101-FPN backbone, QueryInst obtains 48.1 box AP and 42.8 mask AP on COCO test-dev, which is 2 points higher than HTC in terms of both box AP and mask AP, while runs 2.4 times faster. For video instance segmentation, QueryInst achieves the best performance among all online VIS approaches and strikes a decent speed-accuracy trade-off. Code is available at url{https://github.com/hustvl/QueryInst}.
Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For temporal information modeling in VIS, we prese nt a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames. Different from previous schemes, crossover learning does not require any additional network parameters for feature enhancement. By integrating with the instance segmentation loss, crossover learning enables efficient cross-frame instance-to-pixel relation learning and brings cost-free improvement during inference. Besides, a global balanced instance embedding branch is proposed for more accurate and more stable online instance association. We conduct extensive experiments on three challenging VIS benchmarks, ie, YouTube-VIS-2019, OVIS, and YouTube-VIS-2021 to evaluate our methods. To our knowledge, CrossVIS achieves state-of-the-art performance among all online VIS methods and shows a decent trade-off between latency and accuracy. Code will be available to facilitate future research.
Multi-axis additive manufacturing enables high flexibility of material deposition along dynamically varied directions. The Cartesian motion platforms of these machines include three parallel axes and two rotational axes. Singularity on rotational axe s is a critical issue to be tackled in motion planning for ensuring high quality of manufacturing results. The highly nonlinear mapping in the singular region can convert a smooth toolpath with uniformly sampled waypoints defined in the model coordinate system into a highly discontinuous motion in the machine coordinate system, which leads to over-extrusion / under-extrusion of materials in filament-based additive manufacturing. The problem is challenging as both the maximal and the minimal speeds at the tip of a printer head must be controlled in motion. Moreover, collision may occur when sampling-based collision avoidance is employed. In this paper, we present a motion planning method to support the manufacturing realization of designed toolpaths for multi-axis additive manufacturing. Problems of singularity and collision are considered in an integrated manner to improve the motion therefore the quality of fabrication.
183 - Wenbo Wang , Xin Fang , Hantao Cui 2021
The rapid deployment of distributed energy resources (DERs) in distribution networks has brought challenges to balance the system and stabilize frequency. DERs have the ability to provide frequency regulation; however, existing dynamic frequency simu lation tools-which were developed mainly for the transmission system-lack the capability to simulate distribution network dynamics with high penetrations of DERs. Although electromagnetic transient (EMT) simulation tools can simulate distribution network dynamics, the computation efficiency limits their use for large-scale transmission-and-distribution (T&D) simulations. This paper presents an efficient T&D dynamic frequency co-simulation framework for DER frequency response based on the HELICS platform and existing off-the-shelf simulators. The challenge of synchronizing frequency between the transmission network and DERs hosted in the distribution network is approached by detailed modeling of DERs in frequency dynamic models while DER phasor models are also preserved in the distribution networks. Thereby, local voltage constraints can be respected when dispatching the DER power for frequency response. The DER frequency responses (primary and secondary)-are simulated in case studies to validate the proposed framework. Lastly, fault-induced delayed voltage recovery (FIDVR) event of a large system is presented to demonstrate the efficiency and effectiveness of the overall framework.
This letter investigates parallelism approaches for equation and Jacobian evaluations in large-scale power flow calculation. Two levels of parallelism are proposed and analyzed: inter-model parallelism, which evaluates models in parallel, and intra-m odel parallelism, which evaluates calculations within each model in parallel. Parallelism techniques such as multi-threading and single instruction multiple data (SIMD) vectorization are discussed, implemented, and benchmarked as six calculation workflows. Case studies on the 70,000-bus synthetic grid show that equation evaluations can be accelerated by ten times, and the overall Newton power flow advances the state of the art by 20%.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا