ترغب بنشر مسار تعليمي؟ اضغط هنا

Collecting annotated data for semantic segmentation is time-consuming and hard to scale up. In this paper, we for the first time propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets. The highlight is that the annotations from different domains can be efficiently reused and consistently boost performance for each specific domain. This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets regardless of their taxonomy labels, and followed by fine-tuning the pretrained model over specific dataset as usual. In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing and propose a pixel-to-class sparse coding strategy that explicitly models the pixel-class similarity over the manifold embedding space. In this way, we are able to increase intra-class compactness and inter-class separability, as well as considering inter-class similarity across different datasets for better transferability. Experiments conducted on several benchmarks demonstrate its superior performance. Notably, MDP consistently outperforms the pretrained models over ImageNet by a considerable margin, while only using less than 10% samples for pretraining.
Fingerspelling, in which words are signed letter by letter, is an important component of American Sign Language. Most previous work on automatic fingerspelling recognition has assumed that the boundaries of fingerspelling regions in signing videos ar e known beforehand. In this paper, we consider the task of fingerspelling detection in raw, untrimmed sign language videos. This is an important step towards building real-world fingerspelling recognition systems. We propose a benchmark and a suite of evaluation metrics, some of which reflect the effect of detection on the downstream fingerspelling recognition task. In addition, we propose a new model that learns to detect fingerspelling via multi-task training, incorporating pose estimation and fingerspelling recognition (transcription) along with detection, and compare this model to several alternatives. The model outperforms all alternative approaches across all metrics, establishing a state of the art on the benchmark.
142 - Bowen Shi , Xin Dai , Yuan-Ming Lu 2020
We study the entanglement behavior of a random unitary circuit punctuated by projective measurements at the measurement-driven phase transition in one spatial dimension. We numerically study the logarithmic entanglement negativity of two disjoint int ervals and find that it scales as a power of the cross-ratio. We investigate two systems: (1) Clifford circuits with projective measurements, and (2) Haar random local unitary circuit with projective measurements. Remarkably, we identify a power-law behavior of entanglement negativity at the critical point. Previous results of entanglement entropy and mutual information point to an emergent conformal invariance of the measurement-driven transition. Our result suggests that the critical behavior of the measurement-driven transition is distinct from the ground state behavior of any emph{unitary} conformal field theory.
84 - Bowen Shi , Isaac H. Kim 2020
We study the ground-state entanglement of gapped domain walls between topologically ordered systems in two spatial dimensions. We derive a universal correction to the ground-state entanglement entropy, which is equal to the logarithm of the total qua ntum dimension of a set of superselection sectors localized on the domain wall. This expression is derived from the recently proposed entanglement bootstrap method.
149 - Bowen Shi , Isaac H. Kim 2020
We develop a theory of gapped domain wall between topologically ordered systems in two spatial dimensions. We find a new type of superselection sector -- referred to as the parton sector -- that subdivides the known superselection sectors localized o n gapped domain walls. Moreover, we introduce and study the properties of composite superselection sectors that are made out of the parton sectors. We explain a systematic method to define these sectors, their fusion spaces, and their fusion rules, by deriving nontrivial identities relating their quantum dimensions and fusion multiplicities. We propose a set of axioms regarding the ground state entanglement entropy of systems that can host gapped domain walls, generalizing the bulk axioms proposed in [B. Shi, K. Kato, and I. H. Kim, Ann. Phys. 418, 168164 (2020)]. Similar to our analysis in the bulk, we derive our main results by examining the self-consistency relations of an object called information convex set. As an application, we define an analog of topological entanglement entropy for gapped domain walls and derive its exact expression.
This paper proposes a network architecture mainly designed for audio tagging, which can also be used for weakly supervised acoustic event detection (AED). The proposed network consists of a modified DenseNet as the feature extractor, and a global ave rage pooling (GAP) layer to predict frame-level labels at inference time. This architecture is inspired by the work proposed by Zhou et al., a well-known framework using GAP to localize visual objects given image-level labels. While most of the previous works on weakly supervised AED used recurrent layers with attention-based mechanism to localize acoustic events, the proposed network directly localizes events using the feature map extracted by DenseNet without any recurrent layers. In the audio tagging task of DCASE 2017, our method significantly outperforms the state-of-the-art method in F1 score by 5.3% on the dev set, and 6.0% on the eval set in terms of absolute values. For weakly supervised AED task in DCASE 2018, our model outperforms the state-of-the-art method in event-based F1 by 8.1% on the dev set, and 0.5% on the eval set in terms of absolute values, by using data augmentation and tri-training to leverage unlabeled data.
Segmental models are sequence prediction models in which scores of hypotheses are based on entire variable-length segments of frames. We consider segmental models for whole-word (acoustic-to-word) speech recognition, with the feature vectors defined using vector embeddings of segments. Such models are computationally challenging as the number of paths is proportional to the vocabulary size, which can be orders of magnitude larger than when using subword units like phones. We describe an efficient approach for end-to-end whole-word segmental models, with forward-backward and Viterbi decoding performed on a GPU and a simple segment scoring function that reduces space complexity. In addition, we investigate the use of pre-training via jointly trained acoustic word embeddings (AWEs) and acoustically grounded word embeddings (AGWEs) of written word labels. We find that word error rate can be reduced by a large margin by pre-training the acoustic segment representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs. Our final models improve over prior A2W models.
Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution. While extensive research has focused on functional architectures for representing words a nd sentences, there is less work on representing arbitrary spans of text within sentences. In this paper, we conduct a comprehensive empirical evaluation of six span representation methods using eight pretrained language representation models across six tasks, including two tasks that we introduce. We find that, although some simple span representations are fairly reliable across tasks, in general the optimal span representation varies by task, and can also vary within different facets of individual tasks. We also find that the choice of span representation has a bigger impact with a fixed pretrained encoder than with a fine-tuned encoder.
61 - Bowen Shi , Ke Xu , Jichang Zhao 2020
The boom in social media with regard to producing and consuming information simultaneously implies the crucial role of online user influence in determining content popularity. In particular, understanding behavior variations between the influential e lites and the mass grassroots is an important issue in communication. However, how their behavior varies across user categories and content domains, and how these differences influence content popularity are rarely addressed. From a novel view of seven content-domains, a detailed picture of behavior variations among five user groups, from both views of elites and mass, is drawn in Weibo, one of the most popular Twitter-like services in China. Interestingly, elites post more diverse contents with video links while the mass possess retweeters of higher loyalty. According to these variations, user-oriented actions of enhancing content popularity are discussed and testified. The most surprising finding is that the diversity of contents do not always bring more retweets, and the mass and elites should promote content popularity by increasing their retweeter counts and loyalty, respectively. Our results for the first time demonstrate the possibility of highly individualized strategies of popularity promotions in social media, instead of a universal principle.
As a mixed result of intensive dependency on third-party libraries, flexible mechanism to declare dependencies, and increased number of modules in a project, multip
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا