ترغب بنشر مسار تعليمي؟ اضغط هنا

Cutting-plane methods have enabled remarkable successes in integer programming over the last few decades. State-of-the-art solvers integrate a myriad of cutting-plane techniques to speed up the underlying tree-search algorithm used to find optimal so lutions. In this paper we prove the first guarantees for learning high-performing cut-selection policies tailored to the instance distribution at hand using samples. We first bound the sample complexity of learning cutting planes from the canonical family of Chvatal-Gomory cuts. Our bounds handle any number of waves of any number of cuts and are fine tuned to the magnitudes of the constraint coefficients. Next, we prove sample complexity bounds for more sophisticated cut selection policies that use a combination of scoring rules to choose from a family of cuts. Finally, beyond the realm of cutting planes for integer programming, we develop a general abstraction of tree search that captures key components such as node selection and variable selection. For this abstraction, we bound the sample complexity of learning a good policy for building the search tree.
Portfolio-based algorithm selection has seen tremendous practical success over the past two decades. This algorithm configuration procedure works by first selecting a portfolio of diverse algorithm parameter settings, and then, on a given problem ins tance, using an algorithm selector to choose a parameter setting from the portfolio with strong predicted performance. Oftentimes, both the portfolio and the algorithm selector are chosen using a training set of typical problem instances from the application domain at hand. In this paper, we provide the first provable guarantees for portfolio-based algorithm selection. We analyze how large the training set should be to ensure that the resulting algorithm selectors average performance over the training set is close to its future (expected) performance. This involves analyzing three key reasons why these two quantities may diverge: 1) the learning-theoretic complexity of the algorithm selector, 2) the size of the portfolio, and 3) the learning-theoretic complexity of the algorithms performance as a function of its parameters. We introduce an end-to-end learning-theoretic analysis of the portfolio construction and algorithm selection together. We prove that if the portfolio is large, overfitting is inevitable, even with an extremely simple algorithm selector. With experiments, we illustrate a tradeoff exposed by our theoretical analysis: as we increase the portfolio size, we can hope to include a well-suited parameter setting for every possible problem instance, but it becomes impossible to avoid overfitting.
Automating algorithm configuration is growing increasingly necessary as algorithms come with more and more tunable parameters. It is common to tune parameters using machine learning, optimizing performance metrics such as runtime and solution quality . The training set consists of problem instances from the specific domain at hand. We investigate a fundamental question about these techniques: how large should the training set be to ensure that a parameters average empirical performance over the training set is close to its expected, future performance? We answer this question for algorithm configuration problems that exhibit a widely-applicable structure: the algorithms performance as a function of its parameters can be approximated by a simple function. We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds. On the flip side, if the approximation holds only under the L-p norm for p smaller than infinity, it is not possible to provide meaningful sample complexity bounds in the worst case. We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science. Via experiments, we obtain sample complexity bounds that are up to 700 times smaller than the previously best-known bounds.
Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for th e user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that data-driven algorithm design can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide a broadly applicable theory for deriving generalization guarantees that bound the difference between the algorithms average performance over the training set and its expected performance. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause large changes in behavior. Prior research has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We uncover a unifying structure which we use to prove extremely general guarantees, yet we recover the bounds from prior research. Our guarantees apply whenever an algorithms performance is a piecewise-constant, -linear, or -- more generally -- piecewise-structured function of its parameters. Our theory also implies novel bounds for voting mechanisms and dynamic programming algorithms from computational biology.
Algorithms typically come with tunable parameters that have a considerable impact on the computational resources they consume. Too often, practitioners must hand-tune the parameters, a tedious and error-prone task. A recent line of research provides algorithms that return nearly-optimal parameters from within a finite set. These algorithms can be used when the parameter space is infinite by providing as input a random sample of parameters. This data-independent discretization, however, might miss pockets of nearly-optimal parameters: prior research has presented scenarios where the only viable parameters lie within an arbitrarily small region. We provide an algorithm that learns a finite set of promising parameters from within an infinite set. Our algorithm can help compile a configuration portfolio, or it can be used to select the input to a configuration algorithm for finite parameter spaces. Our approach applies to any configuration problem that satisfies a simple yet ubiquitous structure: the algorithms performance is a piecewise constant function of its parameters. Prior research has exhibited this structure in domains from integer programming to clustering.
It is common to encounter situations where one must solve a sequence of similar computational problems. Running a standard algorithm with worst-case runtime guarantees on each instance will fail to take advantage of valuable structure shared across t he problem instances. For example, when a commuter drives from work to home, there are typically only a handful of routes that will ever be the shortest path. A naive algorithm that does not exploit this common structure may spend most of its time checking roads that will never be in the shortest path. More generally, we can often ignore large swaths of the search space that will likely never contain an optimal solution. We present an algorithm that learns to maximally prune the search space on repeated computations, thereby reducing runtime while provably outputting the correct solution each period with high probability. Our algorithm employs a simple explore-exploit technique resembling those used in online algorithms, though our setting is quite different. We prove that, with respect to our model of pruning search spaces, our approach is optimal up to constant factors. Finally, we illustrate the applicability of our model and algorithm to three classic problems: shortest-path routing, string search, and linear programming. We present experiments confirming that our simple algorithm is effective at significantly reducing the runtime of solving repeated computations.
In practice, most mechanisms for selling, buying, matching, voting, and so on are not incentive compatible. We present techniques for estimating how far a mechanism is from incentive compatible. Given samples from the agents type distribution, we sho w how to estimate the extent to which an agent can improve his utility by misreporting his type. We do so by first measuring the maximum utility an agent can gain by misreporting his type on average over the samples, assuming his true and reported types are from a finite subset---which our technique constructs---of the type space. The challenge is that by measuring utility gains over a finite subset of the type space, we might miss type pairs $theta$ and $hat{theta}$ where an agent with type $theta$ can greatly improve his utility by reporting type $hat{theta}$. Indeed, our primary technical contribution is proving that the maximum utility gain over this finite subset nearly matches the maximum utility gain overall, despite the volatility of the utility functions we study. We apply our tools to the single-item and combinatorial first-price auctions, generalized second-price auction, discriminatory auction, uniform-price auction, and second-price auction with spiteful bidders.
Tree search algorithms, such as branch-and-bound, are the most widely used tools for solving combinatorial and nonconvex problems. For example, they are the foremost method for solving (mixed) integer programs and constraint satisfaction problems. Tr ee search algorithms recursively partition the search space to find an optimal solution. In order to keep the tree size small, it is crucial to carefully decide, when expanding a tree node, which question (typically variable) to branch on at that node in order to partition the remaining space. Numerous partitioning techniques (e.g., variable selection) have been proposed, but there is no theory describing which technique is optimal. We show how to use machine learning to determine an optimal weighting of any set of partitioning procedures for the instance distribution at hand using samples from the distribution. We provide the first sample complexity guarantees for tree search algorithm configuration. These guarantees bound the number of samples sufficient to ensure that the empirical performance of an algorithm over the samples nearly matches its expected performance on the unknown instance distribution. This thorough theoretical investigation naturally gives rise to our learning algorithm. Via experiments, we show that learning an optimal weighting of partitioning procedures can dramatically reduce tree size, and we prove that this reduction can even be exponential. Through theory and experiments, we show that learning to branch is both practical and hugely beneficial.
Data-driven algorithm design, that is, choosing the best algorithm for a specific application, is a crucial problem in modern data science. Practitioners often optimize over a parameterized algorithm family, tuning parameters based on problems from t heir domain. These procedures have historically come with no guarantees, though a recent line of work studies algorithm selection from a theoretical perspective. We advance the foundations of this field in several directions: we analyze online algorithm selection, where problems arrive one-by-one and the goal is to minimize regret, and private algorithm selection, where the goal is to find good parameters over a set of problems without revealing sensitive information contained therein. We study important algorithm families, including SDP-rounding schemes for problems formulated as integer quadratic programs, and greedy techniques for canonical subset selection problems. In these cases, the algorithms performance is a volatile and piecewise Lipschitz function of its parameters, since tweaking the parameters can completely change the algorithms behavior. We give a sufficient and general condition, dispersion, defining a family of piecewise Lipschitz functions that can be optimized online and privately, which includes the functions measuring the performance of the algorithms we study. Intuitively, a set of piecewise Lipschitz functions is dispersed if no small region contains many of the functions discontinuities. We present general techniques for online and private optimization of the sum of dispersed piecewise Lipschitz functions. We improve over the best-known regret bounds for a variety of problems, prove regret bounds for problems not previously studied, and give matching lower bounds. We also give matching upper and lower bounds on the utility loss due to privacy. Moreover, we uncover dispersion in auction design and pricing problems.
We study the design of multi-item mechanisms that maximize expected profit with respect to a distribution over buyers values. In practice, a full description of the distribution is typically unavailable. Therefore, we study the setting where the desi gner only has samples from the distribution and the goal is to find a high-profit mechanism within a class of mechanisms. If the class is complex, a mechanism may have high average profit over the samples but low expected profit. This raises the question: how many samples are sufficient to ensure that a mechanisms average profit is close to its expected profit? To answer this question, we uncover structure shared by many pricing, auction, and lottery mechanisms: for any set of buyers values, profit is piecewise linear in the mechanisms parameters. Using this structure, we prove new bounds for mechanism classes not yet studied in the sample-based mechanism design literature and match or improve over the best known guarantees for many classes. Finally, we provide tools for optimizing an important tradeoff: more complex mechanisms typically have higher average profit over the samples than simpler mechanisms, but more samples are required to ensure that average profit nearly matches expected profit.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا