ترغب بنشر مسار تعليمي؟ اضغط هنا

We study the vision transformer structure in the mobile level in this paper, and find a dramatic performance drop. We analyze the reason behind this phenomenon, and propose a novel irregular patch embedding module and adaptive patch fusion module to improve the performance. We conjecture that the vision transformer blocks (which consist of multi-head attention and feed-forward network) are more suitable to handle high-level information than low-level features. The irregular patch embedding module extracts patches that contain rich high-level information with different receptive fields. The transformer blocks can obtain the most useful information from these irregular patches. Then the processed patches pass the adaptive patch merging module to get the final features for the classifier. With our proposed improvements, the traditional uniform vision transformer structure can achieve state-of-the-art results in mobile level. We improve the DeiT baseline by more than 9% under the mobile-level settings and surpass other transformer architectures like Swin and CoaT by a large margin.
Attempting to fully exploit the rich information of topological structure and node features for attributed graph, we introduce self-supervised learning mechanism to graph representation learning and propose a novel Self-supervised Consensus Represent ation Learning (SCRL) framework. In contrast to most existing works that only explore one graph, our proposed SCRL method treats graph from two perspectives: topology graph and feature graph. We argue that their embeddings should share some common information, which could serve as a supervisory signal. Specifically, we construct the feature graph of node features via k-nearest neighbor algorithm. Then graph convolutional network (GCN) encoders extract features from two graphs respectively. Self-supervised loss is designed to maximize the agreement of the embeddings of the same node in the topology graph and the feature graph. Extensive experiments on real citation networks and social networks demonstrate the superiority of our proposed SCRL over the state-of-the-art methods on semi-supervised node classification task. Meanwhile, compared with its main competitors, SCRL is rather efficient.
In this paper, we propose Parametric Contrastive Learning (PaCo) to tackle long-tailed recognition. Based on theoretical analysis, we observe supervised contrastive loss tends to bias on high-frequency classes and thus increases the difficulty of imb alanced learning. We introduce a set of parametric class-wise learnable centers to rebalance from an optimization perspective. Further, we analyze our PaCo loss under a balanced setting. Our analysis demonstrates that PaCo can adaptively enhance the intensity of pushing samples of the same class close as more samples are pulled together with their corresponding centers and benefit hard example learning. Experiments on long-tailed CIFAR, ImageNet, Places, and iNaturalist 2018 manifest the new state-of-the-art for long-tailed recognition. On full ImageNet, models trained with PaCo loss surpass supervised contrastive learning across various ResNet backbones, e.g., our ResNet-200 achieves 81.8% top-1 accuracy. Our code is available at https://github.com/dvlab-research/Parametric-Contrastive-Learning.
84 - Weishu Liu 2021
By using publications from Web of Science Core Collection (WoSCC), Fosso Wamba and his colleagues published an interesting and comprehensive paper in Technological Forecasting and Social Change to explore the structure and dynamics of artificial inte lligence (AI) scholarship. Data demonstrated in Fosso Wambas study implied that the year 1991 seemed to be a watershed of AI research. This research note tried to uncover the 1991 phenomenon from the perspective of database limitation by probing the limitations of search in abstract/author keywords/keywords plus fields of WoSCC empirically. The low availability rates of abstract/author keywords/keywords plus information in WoSCC found in this study can explain the watershed phenomenon of AI scholarship in 1991 to a large extent. Some other caveats for the use of WoSCC in old literature retrieval and historical bibliometric analysis were also mentioned in the discussion section. This research note complements Fosso Wamba and his colleagues study and also helps avoid improper interpretation in the use of WoSCC in old literature retrieval and historical bibliometric analysis.
With the development of deep encoder-decoder architectures and large-scale annotated medical datasets, great progress has been achieved in the development of automatic medical image segmentation. Due to the stacking of convolution layers and the cons ecutive sampling operations, existing standard models inevitably encounter the information recession problem of feature representations, which fails to fully model the global contextual feature dependencies. To overcome the above challenges, this paper proposes a novel Transformer based medical image semantic segmentation framework called TransAttUnet, in which the multi-level guided attention and multi-scale skip connection are jointly designed to effectively enhance the functionality and flexibility of traditional U-shaped architecture. Inspired by Transformer, a novel self-aware attention (SAA) module with both Transformer Self Attention (TSA) and Global Spatial Attention (GSA) is incorporated into TransAttUnet to effectively learn the non-local interactions between encoder features. In particular, we also establish additional multi-scale skip connections between decoder blocks to aggregate the different semantic-scale upsampling features. In this way, the representation ability of multi-scale context information is strengthened to generate discriminative features. Benefitting from these complementary components, the proposed TransAttUnet can effectively alleviate the loss of fine details caused by the information recession problem, improving the diagnostic sensitivity and segmentation quality of medical image analysis. Extensive experiments on multiple medical image segmentation datasets of different imaging demonstrate that our method consistently outperforms the state-of-the-art baselines.
Monge map refers to the optimal transport map between two probability distributions and provides a principled approach to transform one distribution to another. In spite of the rapid developments of the numerical methods for optimal transport problem s, computing the Monge maps remains challenging, especially for high dimensional problems. In this paper, we present a scalable algorithm for computing the Monge map between two probability distributions. Our algorithm is based on a weak form of the optimal transport problem, thus it only requires samples from the marginals instead of their analytic expressions, and can accommodate optimal transport between two distributions with different dimensions. Our algorithm is suitable for general cost functions, compared with other existing methods for estimating Monge maps using samples, which are usually for quadratic costs. The performance of our algorithms is demonstrated through a series of experiments with both synthetic and realistic data.
Knowledge distillation transfers knowledge from the teacher network to the student one, with the goal of greatly improving the performance of the student network. Previous methods mostly focus on proposing feature transformation and loss functions be tween the same levels features to improve the effectiveness. We differently study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our new review mechanism is effective and structurally simple. Our finally designed nested and compact framework requires negligible computation overhead, and outperforms other methods on a variety of tasks. We apply our method to classification, object detection, and instance segmentation tasks. All of them witness significant student network performance improvement. Code is available at https://github.com/Jia-Research-Lab/ReviewKD
Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and then merge the incomplete results by tracking or ma tching. These methods may cause error accumulation in the merging step. Contrarily, we propose a new paradigm -- Propose-Reduce, to generate complete sequences for input videos by a single step. We further build a sequence propagation head on the existing image-level instance segmentation network for long-term propagation. To ensure robustness and high recall of our proposed framework, multiple sequences are proposed where redundant sequences of the same instance are reduced. We achieve state-of-the-art performance on two representative benchmark datasets -- we obtain 47.6% in terms of AP on YouTube-VIS validation set and 70.4% for J&F on DAVIS-UVOS validation set.
Miners in a blockchain system are suffering from ever-increasing storage costs, which in general have not been properly compensated by the users transaction fees. This reduces the incentives for the miners participation and may jeopardize the blockch ain security. We propose to mitigate this blockchain insufficient fee issue through a Fee and Waiting Tax (FWT) mechanism, which explicitly considers the two types of negative externalities in the system. Specifically, we model the interactions between the protocol designer, users, and miners as a three-stage Stackelberg game. By characterizing the equilibrium of the game, we find that miners neglecting the negative externality in transaction selection cause they are willing to accept insufficient-fee transactions. This leads to the insufficient storage fee issue in the existing protocol. Moreover, our proposed optimal FWT mechanism can motivate users to pay sufficient transaction fees to cover the storage costs and achieve the unconstrained social optimum. Numerical results show that the optimal FWT mechanism guarantees sufficient transaction fees and achieves an average social welfare improvement of 33.73% or more over the existing protocol. Furthermore, the optimal FWT mechanism achieves the maximum fairness index and performs well even under heterogeneous-storage-cost miners.
Brighter type Ia supernovae (SNe Ia) prefer less massive hosts with higher star formation. This bias is over-corrected for SNe Ia standardized using the standard Tripp relation, resulting in a step-like dependence of standardized distance on host pro perties. Using the PISCO supernova host sample and SDSS, GALEX, and 2MASS photometry, we compare host galaxy stellar mass and star formation rate (SFR) estimates from different observation and fitting techniques and their impact on the mass step and sSFR step biases. The step size for FAST++ mass estimates was $-0.04pm0.02$ mag for FAST++ and STARLIGHT, increasing by 0.02 mag for ZPEG. UV information had no effect on measured mass step size or location. Our small sample sizes resulted in all mass step size uncertainties being within 2$sigma$ significance of a zero step due. Regardless, mass step sizes were all consistently within 1$sigma$ of each other. Specific SFR (sSFR) step sizes are $0.05pm0.03$ mag (H$alpha$) and $0.06pm0.03$ mag (UV) for a reduced 51 host sample with SDSS and GALEX coverage, with 50% increase in step size uncertainties. Step location was determined by mass sample used to normalize sSFR. The step size reduces by 0.04 mag with an unconstrained location using all available 73 hosts with H$alpha$ measurements. Despite reduced sample sizes, we find no evidence that observation or fitting technique choice drives mass step measurement, but cannot conclude the same for the sSFR step. Further work will focus on differing star formation epochs and dust attenuation corrections effects on the sSFR bias.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا