We discuss a new class of non-renormalization theorems in N=4 and N=2 Super-Yang-Mills theory, obtained by using a superspace which makes a lower dimensional subgroup of the full supersymmetry manifest. Certain Wilson loops (and Wilson lines) belong to the chiral ring of the lower dimensional supersymmetry algebra, and their expectation values can be computed exactly.
We investigate the steady state of a system of photons in a pumped dye-filled microcavity. By varying pump and thermalization the system can be tuned between Bose-Einstein condensation, multimode condensation, and lasing. We present a rich non-equilibrium phase diagram which exhibits transitions between these phases, including decondensation of individual modes under conditions that would typically favor condensation.
Highlight detection has the potential to significantly ease video browsing, but existing methods often suffer from expensive supervision requirements, where human viewers must manually identify highlights in training videos. We propose a scalable unsupervised solution that exploits video duration as an implicit supervision signal. Our key insight is that video segments from shorter user-generated videos are more likely to be highlights than those from longer videos, since users tend to be more selective about the content when capturing shorter videos. Leveraging this insight, we introduce a novel ranking framework that prefers segments from shorter videos, while properly accounting for the inherent noise in the (unlabeled) training data. We use it to train a highlight detector with 10M hashtagged Instagram videos. In experiments on two challenging public video highlight detection benchmarks, our method substantially improves the state-of-the-art for unsupervised highlight detection.
Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that convolutions, fully-connected (FC) layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks. Code is available at: https://github.com/MonashAI/LIT
For general off-shell N=2 supergravity-matter systems in three spacetime dimensions, a formalism is developed to reduce the corresponding actions from superspace to components. The component actions are explicitly computed in the cases of Type I and Type II minimal supergravity formulations. We describe the models for topologically massive supergravity which correspond to all the known off-shell formulations for three-dimensional N=2 supergravity. We also present a universal setting to construct supersymmetric backgrounds associated with these off-shell supergravities.
Obtaining viewer responses from videos can be useful for creators and streaming platforms to analyze the video performance and improve the future user experience. In this report, we present our method for 2021 Evoked Expression from Videos Challenge. In particular, our model utilizes both audio and image modalities as inputs to predict emotion changes of viewers. To model long-range emotion changes, we use a GRU-based model to predict one sparse signal with 1Hz. We observe that the emotion changes are smooth. Therefore, the final dense prediction is obtained via linear interpolating the signal, which is robust to the prediction fluctuation. Albeit simple, the proposed method has achieved pearsons correlation score of 0.04430 on the final private test set.