Generalizable Meta-Heuristic based on Temporal Estimation of Rewards for Large Scale Blackbox Optimization


Abstract in English

The generalization abilities of heuristic optimizers may deteriorate with the increment of the search space dimensionality. To achieve generalized performance across Large Scale Blackbox Optimization (LSBO) tasks, it ispossible to ensemble several heuristics and devise a meta-heuristic to control their initiation. This paper first proposes a methodology of transforming LSBO problems into online decision processes to maximize efficiency of resource utilization. Then, using the perspective of multi-armed bandits with non-stationary reward distributions, we propose a meta-heuristic based on Temporal Estimation of Rewards (TER) to address such decision process. TER uses a window for temporal credit assignment and Boltzmann exploration to balance the exploration-exploitation tradeoff. The prior-free TER generalizes across LSBO tasks with flexibility for different types of limited computational resources (e.g. time, money, etc.) and is easy to be adapted to new tasks for its simplicity and easy interface for heuristic articulation. Tests on the benchmarks validate the problem formulation and suggest significant effectiveness: when TER is articulated with three heuristics, competitive performance is reported across different sets of benchmark problems with search dimensions up to 10000.

Download