No Arabic abstract
Program synthesis has emerged as a successful approach to the image parsing task. Most prior works rely on a two-step scheme involving supervised pretraining of a Seq2Seq model with synthetic programs followed by reinforcement learning (RL) for fine-tuning with real reference images. Fully unsupervised approaches promise to train the model directly on the target images without requiring curated pretraining datasets. However, they struggle with the inherent sparsity of meaningful programs in the search space. In this paper, we present the first unsupervised algorithm capable of parsing constructive solid geometry (CSG) images into context-free grammar (CFG) without pretraining via non-differentiable renderer. To tackle the emph{non-Markovian} sparse reward problem, we combine three key ingredients -- (i) a grammar-encoded tree LSTM ensuring program validity (ii) entropy regularization and (iii) sampling without replacement from the CFG syntax tree. Empirically, our algorithm recovers meaningful programs in large search spaces (up to $3.8 times 10^{28}$). Further, even though our approach is fully unsupervised, it generalizes better than supervised methods on the synthetic 2D CSG dataset. On the 2D computer aided design (CAD) dataset, our approach significantly outperforms the supervised pretrained model and is competitive to the refined model.
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.
Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. In the USS setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, the learner selects an arm and observes the feedback from arms up to the selected arm. The learners goal is to find the arm that minimizes the expected total loss. The total loss is the sum of the cost incurred for selecting the arm and the stochastic loss associated with the selected arm. The problem is challenging because, without knowing the mean loss, one cannot compute the total loss for the selected arm. Clearly, learning is feasible only if the optimal arm can be inferred from the problem structure. As shown in the prior work, learning is possible when the problem instance satisfies the so-called `Weak Dominance (WD) property. Under WD, we show that our Thompson Sampling based algorithm for the USS problem achieves near optimal regret and has better numerical performance than existing algorithms.
Star sampling (SS) is a random sampling procedure on a graph wherein each sample consists of a randomly selected vertex (the star center) and its one-hop neighbors (the star endpoints). We consider the use of star sampling to find any member of an arbitrary target set of vertices in a graph, where the figure of merit (cost) is either the expected number of samples (unit cost) or the expected number of star centers plus star endpoints (linear cost) until a vertex in the target set is encountered, either as a star center or as a star point. We analyze this performance measure on three related star sampling paradigms: SS with replacement (SSR), SS without center replacement (SSC), and SS without star replacement (SSS). We derive exact and approximate expressions for the expected unit and linear costs of SSR, SSC, and SSS on Erdos-Renyi (ER) graphs. Our results show there is i) little difference in unit cost, but ii) significant difference in linear cost, across the three paradigms. Although our results are derived for ER graphs, experiments on real-world graphs suggest our performance expressions are reasonably accurate for non-ER graphs.
Neural inductive program synthesis is a task generating instructions that can produce desired outputs from given inputs. In this paper, we focus on the generation of a chunk of assembly code that can be executed to match a state change inside the CPU and RAM. We develop a neural program synthesis algorithm, AutoAssemblet, learned via self-learning reinforcement learning that explores the large code space efficiently. Policy networks and value networks are learned to reduce the breadth and depth of the Monte Carlo Tree Search, resulting in better synthesis performance. We also propose an effective multi-entropy policy sampling technique to alleviate online update correlations. We apply AutoAssemblet to basic programming tasks and show significant higher success rates compared to several competing baselines.
Star sampling (SS) is a random sampling procedure on a graph wherein each sample consists of a randomly selected vertex (the star center) and its (one-hop) neighbors (the star points). We consider the use of SS to find any member of a target set of vertices in a graph, where the figure of merit (cost) is either the expected number of samples (unit cost) or the expected number of star centers plus star points (linear cost) until a vertex in the target set is encountered, either as a star center or as a star point. We analyze these two performance measures on three related star sampling paradigms: SS with replacement (SSR), SS without center replacement (SSC), and SS without star replacement (SSS). Exact and approximate expressions are derived for the expected unit and linear costs of SSR, SSC, and SSS on ErdH{o}s-R{e}nyi (ER) random graphs. The approximations are seen to be accurate. SSC/SSS are notably better than SSR under unit cost for low-density ER graphs, while SSS is notably better than SSR/SSC under linear cost for low- to moderate-density ER graphs. Simulations on twelve real-world graphs shows the cost approximations to be of variable quality: the SSR and SSC approximations are uniformly accurate, while the SSS approximation, derived for an ER graph, is of variable accuracy.