Branched Multi-Task Networks: Deciding What Layers To Share

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Simon Vandenhende

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Simon Vandenhende - Stamatios Georgoulis - Bert De Brabandere

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In the context of multi-task learning, neural networks with branched architectures have often been employed to jointly tackle the tasks at hand. Such ramified networks typically start with a number of shared layers, after which different tasks branch out into their own sequence of layers. Understandably, as the number of possible network configurations is combinatorially large, deciding what layers to share and where to branch out becomes cumbersome. Prior works have either relied on ad hoc methods to determine the level of layer sharing, which is suboptimal, or utilized neural architecture search techniques to establish the network design, which is considerably expensive. In this paper, we go beyond these limitations and propose an approach to automatically construct branched multi-task networks, by leveraging the employed tasks affinities. Given a specific budget, i.e. number of learnable parameters, the proposed approach generates architectures, in which shallow layers are task-agnostic, whereas deeper ones gradually grow more task-specific. Extensive experimental analysis across numerous, diverse multi-tasking datasets shows that, for a given budget, our method consistently yields networks with the highest performance, while for a certain performance threshold it requires the least amount of learnable parameters.

قيم البحث

117 - Ximeng Sun , Rameswar Panda , Rogerio Feris 2019

Multi-task learning is an open and challenging problem in computer vision. The typical way of conducting multi-task learning with deep neural networks is either through handcrafted schemes that share all initial layers and branch out at an adhoc poin t, or through separate task-specific networks with an additional feature sharing/fusion mechanism. Unlike existing methods, we propose an adaptive sharing approach, called AdaShare, that decides what to share across which tasks to achieve the best recognition accuracy, while taking resource efficiency into account. Specifically, our main idea is to learn the sharing pattern through a task-specific policy that selectively chooses which layers to execute for a given task in the multi-task network. We efficiently optimize the task-specific policy jointly with the network weights, using standard back-propagation. Experiments on several challenging and diverse benchmark datasets with a variable number of tasks well demonstrate the efficacy of our approach over state-of-the-art methods. Project page: https://cs-people.bu.edu/sunxm/AdaShare/project.html.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

What Makes Training Multi-Modal Classification Networks Hard?

381 - Weiyao Wang , Du Tran , Matt Feiszli 2019

Consider end-to-end training of a multi-modal vs. a single-modal network on a task with multiple input modalities: the multi-modal network receives more information, so it should match or outperform its single-modal counterpart. In our experiments, h owever, we observe the opposite: the best single-modal network always outperforms the multi-modal network. This observation is consistent across different combinations of modalities and on different tasks and benchmarks. This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity. Second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal. We address these two problems with a technique we call Gradient Blending, which computes an optimal blend of modalities based on their overfitting behavior. We demonstrate that Gradient Blending outperforms widely-used baselines for avoiding overfitting and achieves state-of-the-art accuracy on various tasks including human action recognition, ego-centric action recognition, and acoustic event detection.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Mimicking Ensemble Learning with Deep Branched Networks

56 - Byungju Kim , Youngsoo Kim , Yeakang Lee 2017

This paper proposes a branched residual network for image classification. It is known that high-level features of deep neural network are more representative than lower-level features. By sharing the low-level features, the network can allocate more memory to high-level features. The upper layers of our proposed network are branched, so that it mimics the ensemble learning. By mimicking ensemble learning with single network, we have achieved better performance on ImageNet classification task.

الرؤية الحاسوبية وتمييز الأنماط

Multi-Task Spatiotemporal Neural Networks for Structured Surface Reconstruction

57 - Mingze Xu , Chenyou Fan , John D Paden 2018

Deep learning methods have surpassed the performance of traditional techniques on a wide range of problems in computer vision, but nearly all of this work has studied consumer photos, where precisely correct output is often not critical. It is less c lear how well these techniques may apply on structured prediction problems where fine-grained output with high precision is required, such as in scientific imaging domains. Here we consider the problem of segmenting echogram radar data collected from the polar ice sheets, which is challenging because segmentation boundaries are often very weak and there is a high degree of noise. We propose a multi-task spatiotemporal neural network that combines 3D ConvNets and Recurrent Neural Networks (RNNs) to estimate ice surface boundaries from sequences of tomographic radar images. We show that our model outperforms the state-of-the-art on this problem by (1) avoiding the need for hand-tuned parameters, (2) extracting multiple surfaces (ice-air and ice-bed) simultaneously, (3) requiring less non-visual metadata, and (4) being about 6 times faster.

الرؤية الحاسوبية وتمييز الأنماط

Deciding what to display: maximizing the information value of social media

113 - Sandra Servia-Rodriguez , Bernardo A. Huberman , Sitaram Asur 2014

In information-rich environments, the competition for users attention leads to a flood of content from which people often find hard to sort out the most relevant and useful pieces. Using Twitter as a case study, we applied an attention economy soluti on to generate the most informative tweets for its users. By considering the novelty and popularity of tweets as objective measures of their relevance and utility, we used the Huberman-Wu algorithm to automatically select the ones that will receive the most attention in the next time interval. Their predicted popularity was confirmed by using Twitter data collected for a period of 2 months.

أجهزة الكمبيوتر والمجتمع تفاعل الإنسان والحاسوب الشبكات الاجتماعية والمعلومات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

معهد تكنولوجيا المعلومات ITI

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Branched Multi-Task Networks: Deciding What Layers To Share

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً