ترغب بنشر مسار تعليمي؟ اضغط هنا

Temporal grounding aims to predict a time interval of a video clip corresponding to a natural language query input. In this work, we present EVOQUER, a temporal grounding framework incorporating an existing text-to-video grounding model and a video-a ssisted query generation network. Given a query and an untrimmed video, the temporal grounding model predicts the target interval, and the predicted video clip is fed into a video translation task by generating a simplified version of the input query. EVOQUER forms closed-loop learning by incorporating loss functions from both temporal grounding and query generation serving as feedback. Our experiments on two widely used datasets, Charades-STA and ActivityNet, show that EVOQUER achieves promising improvements by 1.05 and 1.31 at [email protected]. We also discuss how the query generation task could facilitate error analysis by explaining temporal grounding model behavior.
112 - Lu Liu , Lili Wei , Wuqi Zhang 2021
Smart contracts are programs running on blockchain to execute transactions. When input constraints or security properties are violated at runtime, the transaction being executed by a smart contract needs to be reverted to avoid undesirable consequenc es. On Ethereum, the most popular blockchain that supports smart contracts, developers can choose among three transaction-reverting statements (i.e., require, if...revert, and if...throw) to handle anomalous transactions. While these transaction-reverting statements are vital for preventing smart contracts from exhibiting abnormal behaviors or suffering malicious attacks, there is limited understanding of how they are used in practice. In this work, we perform the first empirical study to characterize transaction-reverting statements in Ethereum smart contracts. We measured the prevalence of these statements in 3,866 verified smart contracts from popular dapps and built a taxonomy of their purposes via manually analyzing 557 transaction-reverting statements. We also compared template contracts and their corresponding custom contracts to understand how developers customize the use of transaction-reverting statements. Finally, we analyzed the security impact of transaction-reverting statements by removing them from smart contracts and comparing the mutated contracts against the original ones. Our study led to important findings, which can shed light on further research in the broad area of smart contract quality assurance and provide practical guidance to smart contract developers on the appropriate use of transaction-reverting statements.
556 - Ke Yang , Guangyu Wang , Lu Liu 2021
Two-dimensional (2D) ferromagnets have recently drawn extensive attention, and here we study the electronic structure and magnetic properties of the bulk and monolayer of CrSBr, using first-principles calculations and Monte Carlo simulations. Our res ults show that bulk CrSBr is a magnetic semiconductor and has the easy magnetization b-axis, hard c-axis, and intermediate a-axis. Thus, the experimental triaxial magnetic anisotropy (MA) is well reproduced here, and it is identified to be the joint effects of spin-orbit coupling (SOC) and magnetic dipole-dipole interaction. We find that bulk CrSBr has a strong ferromagnetic (FM) intralayer coupling but a marginal interlayer one. We also study CrSBr monolayer in detail and find that the intralayer FM exchange persists and the shape anisotropy has a more pronounced contribution to the MA. Using the parameters of the FM exchange and the triaxial MA, our Monte Carlo simulations show that CrSBr monolayer has Curie temperature Tc = 175 K. Moreover, we find that a uniaxial tensile (compressive) strain along the a (b) axis would further increase the Tc.
Multi-head attention plays a crucial role in the recent success of Transformer models, which leads to consistent performance improvements over conventional attention in various applications. The popular belief is that this effectiveness stems from th e ability of jointly attending multiple positions. In this paper, we first demonstrate that jointly attending multiple positions is not a unique feature of multi-head attention, as multi-layer single-head attention also attends multiple positions and is more effective. Then, we suggest the main advantage of the multi-head attention is the training stability, since it has less number of layers than the single-head attention, when attending the same number of positions. For example, 24-layer 16-head Transformer (BERT-large) and 384-layer single-head Transformer has the same total attention head number and roughly the same model size, while the multi-head one is significantly shallower. Meanwhile, we show that, with recent advances in deep learning, we can successfully stabilize the training of the 384-layer Transformer. As the training difficulty is no longer a bottleneck, substantially deeper single-head Transformer achieves consistent performance improvements without tuning hyper-parameters.
83 - Jialu Liu , Tianqi Liu , Cong Yu 2021
Effectively modeling text-rich fresh content such as news articles at document-level is a challenging problem. To ensure a content-based model generalize well to a broad range of applications, it is critical to have a training dataset that is large b eyond the scale of human labels while achieving desired quality. In this work, we address those two challenges by proposing a novel approach to mine semantically-relevant fresh documents, and their topic labels, with little human supervision. Meanwhile, we design a multitask model called NewsEmbed that alternatively trains a contrastive learning with a multi-label classification to derive a universal document encoder. We show that the proposed approach can provide billions of high quality organic training examples and can be naturally extended to multilingual setting where texts in different languages are encoded in the same semantic space. We experimentally demonstrate NewsEmbeds competitive performance across multiple natural language understanding tasks, both supervised and unsupervised.
Pre-trained text encoders such as BERT and its variants have recently achieved state-of-the-art performances on many NLP tasks. While being effective, these pre-training methods typically demand massive computation resources. To accelerate pre-traini ng, ELECTRA trains a discriminator that predicts whether each input token is replaced by a generator. However, this new task, as a binary classification, is less semantically informative. In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets. We further develop two techniques to effectively combine all pre-training tasks: (1) using attention-based networks for task-specific heads, and (2) sharing bottom layers of the generator and the discriminator. Extensive experiments on GLUE and SQuAD datasets demonstrate both the effectiveness and the efficiency of our proposed method.
111 - Yue Tan , Guodong Long , Lu Liu 2021
The heterogeneity across devices usually hinders the optimization convergence and generalization performance of federated learning (FL) when the aggregation of devices knowledge occurs in the gradient space. For example, devices may differ in terms o f data distribution, network latency, input/output space, and/or model architecture, which can easily lead to the misalignment of their local gradients. To improve the tolerance to heterogeneity, we propose a novel federated prototype learning (FedProto) framework in which the devices and server communicate the class prototypes instead of the gradients. FedProto aggregates the local prototypes collected from different devices, and then sends the global prototypes back to all devices to regularize the training of local models. The training on each device aims to minimize the classification error on the local data while keeping the resulting local prototypes sufficiently close to the corresponding global ones. Through experiments, we propose a benchmark setting tailored for heterogeneous FL, with FedProto outperforming several recent FL approaches on multiple datasets.
97 - Hang Du , Hailin Shi , Yinglu Liu 2021
Near-infrared to visible (NIR-VIS) face recognition is the most common case in heterogeneous face recognition, which aims to match a pair of face images captured from two different modalities. Existing deep learning based methods have made remarkable progress in NIR-VIS face recognition, while it encounters certain newly-emerged difficulties during the pandemic of COVID-19, since people are supposed to wear facial masks to cut off the spread of the virus. We define this task as NIR-VIS masked face recognition, and find it problematic with the masked face in the NIR probe image. First, the lack of masked face data is a challenging issue for the network training. Second, most of the facial parts (cheeks, mouth, nose etc.) are fully occluded by the mask, which leads to a large amount of loss of information. Third, the domain gap still exists in the remaining facial parts. In such scenario, the existing methods suffer from significant performance degradation caused by the above issues. In this paper, we aim to address the challenge of NIR-VIS masked face recognition from the perspectives of training data and training method. Specifically, we propose a novel heterogeneous training method to maximize the mutual information shared by the face representation of two domains with the help of semi-siamese networks. In addition, a 3D face reconstruction based approach is employed to synthesize masked face from the existing NIR image. Resorting to these practices, our solution provides the domain-invariant face representation which is also robust to the mask occlusion. Extensive experiments on three NIR-VIS face datasets demonstrate the effectiveness and cross-dataset-generalization capacity of our method.
55 - Longqi Wang , Jing Jin , Lu Liu 2021
In pulsar astronomy, detecting effective pulsar signals among numerous pulsar candidates is an important research topic. Starting from space X-ray pulsar signals, the two-dimensional autocorrelation profile map (2D-APM) feature modelling method, whic h utilizes epoch folding of the autocorrelation function of X-ray signals and expands the time-domain information of the periodic axis, is proposed. A uniform setting criterion regarding the time resolution of the periodic axis addresses pulsar signals without any prior information. Compared with the traditional profile, the model has a strong anti-noise ability, a greater abundance of information and consistent characteristics. The new feature is simulated with double Gaussian components, and the characteristic distribution of the model is revealed to be closely related to the distance between the double peaks of the profile. Next, a deep convolutional neural network (DCNN) is built, named Inception-ResNet. According to the order of the peak separation and number of arriving photons, 30 data sets based on the Poisson process are simulated to construct the training set, and the observation data of PSRs B0531+21, B0540-69 and B1509-58 from the Rossi X-ray Timing Explorer (RXTE) are selected to generate the test set. The number of training sets and the number of test sets are 30,000 and 5,400, respectively. After achieving convergence stability, more than 99$%$ of the pulsar signals are recognized, and more than 99$%$ of the interference is successfully rejected, which verifies the high degree of agreement between the network and the feature model and the high potential of the proposed method in searching for pulsars.
Hot streaks dominate the main impact of creative careers. Despite their ubiquitous nature across a wide range of creative domains, it remains unclear if there is any regularity underlying the beginning of hot streaks. Here, we develop computational m ethods using deep learning and network science and apply them to novel, large-scale datasets tracing the career outputs of artists, film directors, and scientists, allowing us to build high-dimensional representations of the artworks, films, and scientific publications they produce. By examining individuals career trajectories within the underlying creative space, we find that across all three domains, individuals tend to explore diverse styles or topics before their hot streak, but become notably more focused in what they work on after the hot streak begins. Crucially, we find that hot streaks are associated with neither exploration nor exploitation behavior in isolation, but a particular sequence of exploration followed by exploitation, where the transition from exploration to exploitation closely traces the onset of a hot streak. Overall, these results unveil among the first identifiable regularity underlying the onset of hot streaks, which appears universal across diverse creative domains, suggesting that a sequential view of creative strategies that balances experimentation and implementation may be particularly powerful for producing long-lasting contributions, which may have broad implications for identifying and nurturing creative talents.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا