ترغب بنشر مسار تعليمي؟ اضغط هنا

Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

198   0   0.0 ( 0 )
 نشر من قبل Jianghao Shen
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, softer decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate soft choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two extremes (i.e., full bitwidth and zero bitwidth). In this way, DFS can fractionally exploit a layers expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at link{https://github.com/Torment123/DFS}.



قيم البحث

اقرأ أيضاً

We present a more general analysis of $H$-calibration for adversarially robust classification. By adopting a finer definition of calibration, we can cover settings beyond the restricted hypothesis sets studied in previous work. In particular, our res ults hold for most common hypothesis sets used in machine learning. We both fix some previous calibration results (Bao et al., 2020) and generalize others (Awasthi et al., 2021). Moreover, our calibration results, combined with the previous study of consistency by Awasthi et al. (2021), also lead to more general $H$-consistency results covering common hypothesis sets.
Multi-graph multi-label learning (textsc{Mgml}) is a supervised learning framework, which aims to learn a multi-label classifier from a set of labeled bags each containing a number of graphs. Prior techniques on the textsc{Mgml} are developed based o n transfering graphs into instances and focus on learning the unseen labels only at the bag level. In this paper, we propose a textit{coarse} and textit{fine-grained} Multi-graph Multi-label (cfMGML) learning framework which directly builds the learning model over the graphs and empowers the label prediction at both the textit{coarse} (aka. bag) level and textit{fine-grained} (aka. graph in each bag) level. In particular, given a set of labeled multi-graph bags, we design the scoring functions at both graph and bag levels to model the relevance between the label and data using specific graph kernels. Meanwhile, we propose a thresholding rank-loss objective function to rank the labels for the graphs and bags and minimize the hamming-loss simultaneously at one-step, which aims to addresses the error accumulation issue in traditional rank-loss algorithms. To tackle the non-convex optimization problem, we further develop an effective sub-gradient descent algorithm to handle high-dimensional space computation required in cfMGML. Experiments over various real-world datasets demonstrate cfMGML achieves superior performance than the state-of-arts algorithms.
50 - Chang Liu , Yanan Xu , Yanmin Zhu 2019
In recent years, dock-less shared bikes have been widely spread across many cities in China and facilitate peoples lives. However, at the same time, it also raises many problems about dock-less shared bike management due to the mismatching between de mands and real distribution of bikes. Before deploying dock-less shared bikes in a city, companies need to make a plan for dispatching bikes from places having excessive bikes to locations with high demands for providing better services. In this paper, we study the problem of inferring fine-grained bike demands anywhere in a new city before the deployment of bikes. This problem is challenging because new city lacks training data and bike demands vary by both places and time. To solve the problem, we provide various methods to extract discriminative features from multi-source geographic data, such as POI, road networks and nighttime light, for each place. We utilize correlation Principle Component Analysis (coPCA) to deal with extracted features of both old city and new city to realize distribution adaption. Then, we adopt a discrete wavelet transform (DWT) based model to mine daily patterns for each place from fine-grained bike demand. We propose an attention based local CNN model, textbf{ALCNN}, to infer the daily patterns with latent features from coPCA with multiple CNNs for modeling the influence of neighbor places. In addition, ALCNN merges latent features from multiple CNNs and can select a suitable size of influenced regions. The extensive experiments on real-life datasets show that the proposed approach outperforms competitive methods.
Dynamic inference is a feasible way to reduce the computational cost of convolutional neural network(CNN), which can dynamically adjust the computation for each input sample. One of the ways to achieve dynamic inference is to use multi-stage neural n etwork, which contains a sub-network with prediction layer at each stage. The inference of a input sample can exit from early stage if the prediction of the stage is confident enough. However, design a multi-stage CNN architecture is a non-trivial task. In this paper, we introduce a general framework, ENAS4D, which can efficiently search for optimal multi-stage CNN architecture for dynamic inference in a well-designed search space. Firstly, we propose a method to construct the search space with multi-stage convolution. The search space include different numbers of layers, different kernel sizes and different numbers of channels for each stage and the resolution of input samples. Then, we train a once-for-all network that supports to sample diverse multi-stage CNN architecture. A specialized multi-stage network can be obtained from the once-for-all network without additional training. Finally, we devise a method to efficiently search for the optimal multi-stage network that trades the accuracy off the computational cost taking the advantage of once-for-all network. The experiments on the ImageNet classification task demonstrate that the multi-stage CNNs searched by ENAS4D consistently outperform the state-of-the-art method for dyanmic inference. In particular, the network achieves 74.4% ImageNet top-1 accuracy under 185M average MACs.
97 - John T. Baldwin 2021
Let $M$ be strongly minimal and constructed by a `Hrushovski construction. If the Hrushovski algebraization function $mu$ is in a certain class ${mathcal T}$ ($mu$ triples) we show that for independent $I$ with $|I| >1$, ${rm dcl}^*(I)= emptyset$ (* means not in ${rm dcl}$ of a proper subset). This implies the only definable truly $n$-ary function $f$ ($f$ `depends on each argument), occur when $n=1$. We prove, indicating the dependence on $mu$, for Hrushovskis original construction and including analogous results for the strongly minimal $k$-Steiner systems of Baldwin and Paolini 2021 that the symmetric definable closure, ${rm sdcl}^*(I) =emptyset$, and thus the theory does not admit elimination of imaginaries. In particular, such strongly minimal Steiner systems with line-length at least 4 do not interpret a quasigroup, even though they admit a coordinatization if $k = p^n$. The proofs depend on our introduction for appropriate $G subseteq {rm aut}(M)$ the notion of a $G$-normal substructure ${mathcal A}$ of $M$ and of a $G$-decomposition of ${mathcal A}$. These results lead to a finer classification of strongly minimal structures with flat geometry; according to what sorts of definable functions they admit.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا