ترغب بنشر مسار تعليمي؟ اضغط هنا

Motivated by the common strategic activities in crowdsourcing labeling, we study the problem of sequential eliciting information without verification (EIWV) for workers with a heterogeneous and unknown crowd. We propose a reinforcement learning-based approach that is effective against a wide range of settings including potential irrationality and collusion among workers. With the aid of a costly oracle and the inference method, our approach dynamically decides the oracle calls and gains robustness even under the presence of frequent collusion activities. Extensive experiments show the advantage of our approach. Our results also present the first comprehensive experiments of EIWV on large-scale real datasets and the first thorough study of the effects of environmental variables.
A $k$-submodular function is a function that given $k$ disjoint subsets outputs a value that is submodular in every orthant. In this paper, we provide a new framework for $k$-submodular maximization problems, by relaxing the optimization to the conti nuous space with the multilinear extension of $k$-submodular functions and a variant of pipage rounding that recovers the discrete solution. The multilinear extension introduces new techniques to analyze and optimize $k$-submodular functions. When the function is monotone, we propose almost $frac{1}{2}$-approximation algorithms for unconstrained maximization and maximization under total size and knapsack constraints. For unconstrained monotone and non-monotone maximization, we propose an algorithm that is almost as good as any combinatorial algorithm based on Iwata, Tanigawa, and Yoshidas meta-framework ($frac{k}{2k-1}$-approximation for the monotone case and $frac{k^2+1}{2k^2+1}$-approximation for the non-monotone case).
Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process i s credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.
We analyze the Gamblers problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could provide insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.
112 - Jiajin Li , Baoxiang Wang 2018
Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Bla ckwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide & deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gyms MuJoCo continuous control tasks.
130 - Baoxiang Wang , Yuzhao Wang 2009
In this paper we study the Cauchy problem for the elliptic and non-elliptic derivative nonlinear Schrodinger equations in higher spatial dimensions ($ngeq 2$) and some global well-posedness results with small initial data in critical Besov spaces $B^ s_{2,1}$ are obtained. As by-products, the scattering results with small initial data are also obtained.
In this paper, we consider the trace theorem for modulation spaces, alpha modulation spaces and Besov spaces. For the modulation space, we obtain the sharp results.
We study the wellposedness of Cauchy problem for the fourth order nonlinear Schrodinger equations ipartial_t u=-epsDelta u+Delta^2 u+P((partial_x^alpha u)_{abs{alpha}ls 2}, (partial_x^alpha bar{u})_{abs{alpha}ls 2}),quad tin Real, xinReal^n, where $e psin{-1,0,1}$, $ngs 2$ denotes the spatial dimension and $P(cdot)$ is a polynomial excluding constant and linear terms.
174 - Zihua Guo , Baoxiang Wang 2008
Considering the Cauchy problem for the modified finite-depth-fluid equation $partial_tu-G_delta(partial_x^2u)mp u^2u_x=0, u(0)=u_0$, where $G_delta f=-i ft ^{-1}[coth(2pi delta xi)-frac{1}{2pi delta xi}]ft f$, $deltages 1$, and $u$ is a real-valued f unction, we show that it is uniformly globally well-posed if $u_0 in H^s (sgeq 1/2)$ with $ orm{u_0}_{L^2}$ sufficiently small for all $delta ges 1$. Our result is sharp in the sense that the solution map fails to be $C^3$ in $H^s (s<1/2)$. Moreover, we prove that for any $T>0$, its solution converges in $C([0,T]; H^s)$ to that of the modified Benjamin-Ono equation if $delta$ tends to $+infty$.
152 - Zihua Guo , Baoxiang Wang 2008
Considering the Cauchy problem for the Korteweg-de Vries-Burgers equation begin{eqnarray*} u_t+u_{xxx}+epsilon |partial_x|^{2alpha}u+(u^2)_x=0, u(0)=phi, end{eqnarray*} where $0<epsilon,alphaleq 1$ and $u$ is a real-valued function, we show that it is globally well-posed in $H^s (s>s_alpha)$, and uniformly globally well-posed in $H^s (s>-3/4)$ for all $epsilon in (0,1)$. Moreover, we prove that for any $T>0$, its solution converges in $C([0,T]; H^s)$ to that of the KdV equation if $epsilon$ tends to 0.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا