No Arabic abstract
Generalization, i.e., the ability of solving problem instances that are not available during the system design and development phase, is a critical goal for intelligent systems. A typical way to achieve good generalization is to learn a model from vast data. In the context of heuristic search, such a paradigm could be implemented as configuring the parameters of a parallel algorithm portfolio (PAP) based on a set of training problem instances, which is often referred to as PAP construction. However, compared to traditional machine learning, PAP construction often suffers from the lack of training instances, and the obtained PAPs may fail to generalize well. This paper proposes a novel competitive co-evolution scheme, named Co-Evolution of Parameterized Search (CEPS), as a remedy to this challenge. By co-evolving a configuration population and an instance population, CEPS is capable of obtaining generalizable PAPs with few training instances. The advantage of CEPS in improving generalization is analytically shown in this paper. Two concrete algorithms, namely CEPS-TSP and CEPS-VRPSPDTW, are presented for the Traveling Salesman Problem (TSP) and the Vehicle Routing Problem with Simultaneous Pickup-Delivery and Time Windows (VRPSPDTW), respectively. Experimental results show that CEPS has led to better generalization, and even managed to find new best-known solutions for some instances.
Generative adversarial networks (GANs) are pairs of artificial neural networks that are trained one against each other. The outputs from a generator are mixed with the real-world inputs to the discriminator and both networks are trained until an equilibrium is reached, where the discriminator cannot distinguish generated inputs from real ones. Since their introduction, GANs have allowed for the generation of impressive imitations of real-life films, images and texts, whose fakeness is barely noticeable to humans. Despite their impressive performance, training GANs remains to this day more of an art than a reliable procedure, in a large part due to training process stability. Generators are susceptible to mode dropping and convergence to random patterns, which have to be mitigated by computationally expensive multiple restarts. Curiously, GANs bear an uncanny similarity to a co-evolution of a pathogen and its hosts immune system in biology. In a biological context, the majority of potential pathogens indeed never make it and are kept at bay by the hots immune system. Yet some are efficient enough to present a risk of a serious condition and recurrent infections. Here, we explore that similarity to propose a more robust algorithm for GANs training. We empirically show the increased stability and a better ability to generate high-quality images while using less computational power.
Estimation of Distribution Algorithms have been proposed as a new paradigm for evolutionary optimization. This paper focuses on the parallelization of Estimation of Distribution Algorithms. More specifically, the paper discusses how to predict performance of parallel Mixed Bayesian Optimization Algorithm (MBOA) that is based on parallel construction of Bayesian networks with decision trees. We determine the time complexity of parallel Mixed Bayesian Optimization Algorithm and compare this complexity with experimental results obtained by solving the spin glass optimization problem. The empirical results fit well the theoretical time complexity, so the scalability and efficiency of parallel Mixed Bayesian Optimization Algorithm for unknown instances of spin glass benchmarks can be predicted. Furthermore, we derive the guidelines that can be used to design effective parallel Estimation of Distribution Algorithms with the speedup proportional to the number of variables in the problem.
A boundary evolution Algorithm (BEA) is proposed by simultaneously taking into account the bottom and the high-level crossover and mutation, ie., the boundary of the hierarchical genetic algorithm. Operators and optimal individuals based on optional annealing are designed. Based on the numerou
Simultaneously utilizing several complementary solvers is a simple yet effective strategy for solving computationally hard problems. However, manually building such solver portfolios typically requires considerable domain knowledge and plenty of human effort. As an alternative, automatic construction of parallel portfolios (ACPP) aims at automatically building effective parallel portfolios based on a given problem instance set and a given rich design space. One promising way to solve the ACPP problem is to explicitly group the instances into different subsets and promote a component solver to handle each of them.This paper investigates solving ACPP from this perspective, and especially studies how to obtain a good instance grouping.The experimental results showed that the parallel portfolios constructed by the proposed method could achieve consistently superior performances to the ones constructed by the state-of-the-art ACPP methods,and could even rival sophisticated hand-designed parallel solvers.
As supercomputers continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. In this paper, we propose a hardware-algorithm co-design of efficient and adaptive lossy compressor for scientific data on FPGAs (called CEAZ) to accelerate parallel I/O. Our contribution is fourfold: (1) We propose an efficient Huffman coding approach that can adaptively update Huffman codewords online based on codewords generated offline (from a variety of representative scientific datasets). (2) We derive a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode, enabling accurate offline Huffman codewords generation. This also helps us create a fixed-ratio compression mode for consistent throughput. (3) We develop an efficient compression pipeline by adopting cuSZs dual-quantization algorithm to our hardware use case. (4) We evaluate CEAZ on five real-world datasets with both a single FPGA board and 128 nodes from Bridges-2 supercomputer. Experiments show that CEAZ outperforms the second-best FPGA-based lossy compressor by 2X of throughput and 9.6X of compression ratio. It also improves MPI_File_write and MPI_Gather throughputs by up to 25.8X and 24.8X, respectively.