ترغب بنشر مسار تعليمي؟ اضغط هنا

We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample, however, increasing the frequency of sampling incurs an additive pe nalty/cost. Thus, there is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency. The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm. CTMAB is fundamentally different than the usual multi-arm bandit problem (MAB), e.g., even the single-arm case is non-trivial in CTMAB, since the optimal sampling frequency depends on the mean of the arm, which needs to be estimated. We first establish lower bounds on the regret achievable with any algorithm and then propose algorithms that achieve the lower bound up to logarithmic factors. For the single-arm case, we show that the lower bound on the regret is $Omega((log T)^2/mu)$, where $mu$ is the mean of the arm, and $T$ is the time horizon. For the multiple arms case, we show that the lower bound on the regret is $Omega((log T)^2 mu/Delta^2)$, where $mu$ now represents the mean of the best arm, and $Delta$ is the difference of the mean of the best and the second-best arm. We then propose an algorithm that achieves the bound up to constant terms.
Critical role of Internet of Things (IoT) in various domains like smart city, healthcare, supply chain and transportation has made them the target of malicious attacks. Past works in this area focused on centralized Intrusion Detection System (IDS), assuming the existence of a central entity to perform data analysis and identify threats. However, such IDS may not always be feasible, mainly due to spread of data across multiple sources and gathering at central node can be costly. Also, the earlier works primarily focused on improving True Positive Rate (TPR) and ignored the False Positive Rate (FPR), which is also essential to avoid unnecessary downtime of the systems. In this paper, we first present an architecture for IDS based on hybrid ensemble model, named PHEC, which gives improved performance compared to state-of-the-art architectures. We then adapt this model to a federated learning framework that performs local training and aggregates only the model parameters. Next, we propose Noise-Tolerant PHEC in centralized and federated settings to address the label-noise problem. The proposed idea uses classifiers using weighted convex surrogate loss functions. Natural robustness of KNN classifier towards noisy data is also used in the proposed architecture. Experimental results on four benchmark datasets drawn from various security attacks show that our model achieves high TPR while keeping FPR low on noisy and clean data. Further, they also demonstrate that the hybrid ensemble models achieve performance in federated settings close to that of the centralized settings.
This paper studies a new variant of the stochastic multi-armed bandits problem, where the learner has access to auxiliary information about the arms. The auxiliary information is correlated with the arm rewards, which we treat as control variates. In many applications, the arm rewards are a function of some exogenous values, whose mean value is known a priori from historical data and hence can be used as control variates. We use the control variates to obtain mean estimates with smaller variance and tighter confidence bounds. We then develop an algorithm named UCB-CV that uses improved estimates. We characterize the regret bounds in terms of the correlation between the rewards and control variates. The experiments on synthetic data validate the performance guarantees of our proposed algorithm.
We consider the problem of sequentially allocating resources in a censored semi-bandits setup, where the learner allocates resources at each step to the arms and observes loss. The loss depends on two hidden parameters, one specific to the arm but in dependent of the resource allocation, and the other depends on the allocated resource. More specifically, the loss equals zero for an arm if the resource allocated to it exceeds a constant (but unknown) arm dependent threshold. The goal is to learn a resource allocation that minimizes the expected loss. The problem is challenging because the loss distribution and threshold value of each arm are unknown. We study this setting by establishing its `equivalence to Multiple-Play Multi-Armed Bandits (MP-MAB) and Combinatorial Semi-Bandits. Exploiting these equivalences, we derive optimal algorithms for our problem setting using known algorithms for MP-MAB and Combinatorial Semi-Bandits. The experiments on synthetically generated data validate the performance guarantees of the proposed algorithms.
Network middle-boxes often classify the traffic flows on the Internet to perform traffic management or discriminate one traffic against the other. As the widespread adoption of HTTPS protocol has made it difficult to classify the traffic looking into the content field, one of the fields the middle-boxes look for is Server Name Indicator (SNI), which goes in plain text. SNI field contains information about the host and can, in turn, reveal the type of traffic. This paper presents a method to mask the server host identity by encrypting the SNI. We develop a simple method that completes the SSL/TLS connection establishment over two handshakes - the first handshake establishes a secure channel without sharing SNI information, and the second handshake shares the encrypted SNI. Our method makes it mandatory for fronting servers to always accept the handshake request without the SNI and respond with a valid SSL certificate. As there is no modification in already proven SSL/TLS encryption mechanism and processing of handshake messages, the new method enjoys all security benefits of existing secure channel establishment and needs no modification in existing routers/middle-boxes. Using customized client-server over the live Internet, we demonstrate the feasibility of our method. Moreover, the impact analysis shows that the method adheres to almost all SSL/TLS related Internet standards requirements.
The debate on Net-neutrality and events pointing towards its possible violations have led to the development of tools to detect deliberate traffic discrimination on the Internet. Given the complex nature of the Internet, neutrality violations are not easy to detect, and tools developed so far suffer from various limitations. In this paper, we study many challenges in detecting the violations and discuss possible approaches to mitigate them. As a case study, we focus on the tool Wehe cite{wehe} and discuss its limitations and propose the aspects that need to be strengthened. Wehe is the most recent tool to detect neutrality violations. Despite Wehes vast utility and possible influences over policy decisions, its mechanisms are not yet fully validated by researchers other than original tool developers. We seek to fill this gap by conducting a thorough and in-depth validation of Wehe. Our validation uses the Wehe App, a client-server setup mimicking Wehes behavior and its theoretical arguments. We validated the Wehe app for its methodology, traffic discrimination detection, and operational environments. We found that the critical weaknesses of the Wehe App are due to its design choices of using port number 80, overlooking the effect of background traffic, and the direct performance comparison.
We study wireless power transmission by an energy source to multiple energy harvesting nodes with the aim to maximize the energy efficiency. The source transmits energy to the nodes using one of the available power levels in each time slot and the no des transmit information back to the energy source using the harvested energy. The source does not have any channel state information and it only knows whether a received codeword from a given node was successfully decoded or not. With this limited information, the source has to learn the optimal power level that maximizes the energy efficiency of the network. We model the problem as a stochastic Multi-Armed Bandits problem and develop an Upper Confidence Bound based algorithm, which learns the optimal transmit power of the energy source that maximizes the energy efficiency. Numerical results validate the performance guarantees of the proposed algorithm and show significant gains compared to the benchmark schemes.
In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated with fixe d costs and are ordered, forming a cascade. In each round, a context is presented, and the learner selects the arms sequentially till some depth. The total cost incurred by stopping at an arm is the sum of fixed costs of arms selected and the stochastic loss associated with the arm. The learners goal is to learn a decision rule that maps contexts to arms with the goal of minimizing the total expected loss. The problem is challenging as we are faced with an unsupervised setting as the total loss cannot be estimated. Clearly, learning is feasible only if the optimal arm can be inferred (explicitly or implicitly) from the problem structure. We observe that learning is still possible when the problem instance satisfies the so-called Contextual Weak Dominance (CWD) property. Under CWD, we propose an algorithm for the contextual USS problem and demonstrate that it has sub-linear regret. Experiments on synthetic and real datasets validate our algorithm.
Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. In the USS setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, the learner selects an arm and observes the feedback from arms up to the selected arm. The learners goal is to find the arm that minimizes the expected total loss. The total loss is the sum of the cost incurred for selecting the arm and the stochastic loss associated with the selected arm. The problem is challenging because, without knowing the mean loss, one cannot compute the total loss for the selected arm. Clearly, learning is feasible only if the optimal arm can be inferred from the problem structure. As shown in the prior work, learning is possible when the problem instance satisfies the so-called `Weak Dominance (WD) property. Under WD, we show that our Thompson Sampling based algorithm for the USS problem achieves near optimal regret and has better numerical performance than existing algorithms.
In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا