ترغب بنشر مسار تعليمي؟ اضغط هنا

Probabilistic Value Selection for Space Efficient Model

112   0   0.0 ( 0 )
 نشر من قبل Gunarto Sindoro Njoo
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theorys metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results show that value selection can achieve the balance between accuracy and model size reduction.



قيم البحث

اقرأ أيضاً

Group-based sparsity models are proven instrumental in linear regression problems for recovering signals from much fewer measurements than standard compressive sensing. The main promise of these models is the recovery of interpretable signals through the identification of their constituent groups. In this paper, we establish a combinatorial framework for group-model selection problems and highlight the underlying tractability issues. In particular, we show that the group-model selection problem is equivalent to the well-known NP-hard weighted maximum coverage problem (WMC). Leveraging a graph-based understanding of group models, we describe group structures which enable correct model selection in polynomial time via dynamic programming. Furthermore, group structures that lead to totally unimodular constraints have tractable discrete as well as convex relaxations. We also present a generalization of the group-model that allows for within group sparsity, which can be used to model hierarchical sparsity. Finally, we study the Pareto frontier of group-sparse approximations for two tractable models, among which the tree sparsity model, and illustrate selection and computation trade-offs between our framework and the existing convex relaxations.
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $mathcal{S}$ and the action space $mathcal{A}$ are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with $|mathcal{S}|times|mathcal{A}|$, which can be prohibitively large when $mathcal{S}$ or $mathcal{A}$ is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.$~$Q-learning) provably learns an $varepsilon$-optimal policy (resp.$~$Q-function) with high probability as soon as the sample size exceeds the order of $frac{K}{(1-gamma)^{3}varepsilon^{2}}$ (resp.$~$$frac{K}{(1-gamma)^{4}varepsilon^{2}}$), up to some logarithmic factor. Here $K$ is the feature dimension and $gammain(0,1)$ is the discount factor of the MDP. Both sample complexity bounds are provably tight, and our result for the model-based approach matches the minimax lower bound. Our results show that for arbitrarily large-scale MDP, both the model-based approach and Q-learning are sample-efficient when $K$ is relatively small, and hence the title of this paper.
Todays intelligent applications can achieve high performance accuracy using machine learning (ML) techniques, such as deep neural networks (DNNs). Traditionally, in a remote DNN inference problem, an edge device transmits raw data to a remote node th at performs the inference task. However, this may incur high transmission energy costs and puts data privacy at risk. In this paper, we propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes. The time-varying representation accounts for time-varying channels and can significantly reduce the total energy at the edge device while maintaining high accuracy (low loss). We implement our approach in an image classification task using the MNIST dataset, and the system environment is simulated as a trajectory navigation scenario to emulate different channel conditions. Numerical simulations show that our proposed solution results in minimal energy consumption and $CO_2$ emission compared to the considered baselines while exhibiting robust performance across different channel conditions and bandwidth regime choices.
Cellular vehicle-to-everything (V2X) communication is expected to herald the age of autonomous vehicles in the coming years. With the integration of blockchain in such networks, information of all granularity levels, from complete blocks to individua l transactions, would be accessible to vehicles at any time. Specifically, the blockchain technology is expected to improve the security, immutability, and decentralization of cellular V2X communication through smart contract and distributed ledgers. Although blockchain-based cellular V2X networks hold promise, many challenges need to be addressed to enable the future interoperability and accessibility of such large-scale platforms. One such challenge is the offloading of mining tasks in cellular V2X networks. While transportation authorities may try to balance the network mining load, the vehicles may select the nearest mining clusters to offload a task. This may cause congestion and disproportionate use of vehicular network resources. To address this issue, we propose a game-theoretic approach for balancing the load at mining clusters while maintaining fairness among offloading vehicles. Keeping in mind the low-latency requirements of vehicles, we consider a finite channel blocklength transmission which is more practical compared to the use of infinite blocklength codes. The simulation results obtained with our proposed offloading framework show improved performance over the conventional nearest mining cluster selection technique.
Pairwise alignment of DNA sequencing data is a ubiquitous task in bioinformatics and typically represents a heavy computational burden. State-of-the-art approaches to speed up this task use hashing to identify short segments (k-mers) that are shared by pairs of reads, which can then be used to estimate alignment scores. However, when the number of reads is large, accurately estimating alignment scores for all pairs is still very costly. Moreover, in practice, one is only interested in identifying pairs of reads with large alignment scores. In this work, we propose a new approach to pairwise alignment estimation based on two key new ingredients. The first ingredient is to cast the problem of pairwise alignment estimation under a general framework of rank-one crowdsourcing models, where the workers responses correspond to k-mer hash collisions. These models can be accurately solved via a spectral decomposition of the response matrix. The second ingredient is to utilise a multi-armed bandit algorithm to adaptively refine this spectral estimator only for read pairs that are likely to have large alignments. The resulting algorithm iteratively performs a spectral decomposition of the response matrix for adaptively chosen subsets of the read pairs.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا