ترغب بنشر مسار تعليمي؟ اضغط هنا

An Inverse QSAR Method Based on Linear Regression and Integer Programming

131   0   0.0 ( 0 )
 نشر من قبل Naveed Ahmed Azam Dr.
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Recently a novel framework has been proposed for designing the molecular structure of chemical compounds using both artificial neural networks (ANNs) and mixed integer linear programming (MILP). In the framework, we first define a feature vector $f(C)$ of a chemical graph $C$ and construct an ANN that maps $x=f(C)$ to a predicted value $eta(x)$ of a chemical property $pi$ to $C$. After this, we formulate an MILP that simulates the computation process of $f(C)$ from $C$ and that of $eta(x)$ from $x$. Given a target value $y^*$ of the chemical property $pi$, we infer a chemical graph $C^dagger$ such that $eta(f(C^dagger))=y^*$ by solving the MILP. In this paper, we use linear regression to construct a prediction function $eta$ instead of ANNs. For this, we derive an MILP formulation that simulates the computation process of a prediction function by linear regression. The results of computational experiments suggest our method can infer chemical graphs with around up to 50 non-hydrogen atoms.



قيم البحث

اقرأ أيضاً

A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In this paper, we design a new metho d for inferring a polymer based on the framework. For this, we introduce a new way of representing a polymer as a form of monomer and define new descriptors that feature the structure of polymers. We also use linear regression as a building block of constructing a prediction function in the framework. The results of our computational experiments reveal a set of chemical properties on polymers to which a prediction function constructed with linear regression performs well. We also observe that the proposed method can infer polymers with up to 50 non-hydrogen atoms in a monomer form.
Given restrictions on the availability of data, active learning is the process of training a model with limited labeled data by selecting a core subset of an unlabeled data pool to label. Although selecting the most useful points for training is an o ptimization problem, the scale of deep learning data sets forces most selection strategies to employ efficient heuristics. Instead, we propose a new integer optimization problem for selecting a core set that minimizes the discrete Wasserstein distance from the unlabeled pool. We demonstrate that this problem can be tractably solved with a Generalized Benders Decomposition algorithm. Our strategy requires high-quality latent features which we obtain by unsupervised learning on the unlabeled pool. Numerical results on several data sets show that our optimization approach is competitive with baselines and particularly outperforms them in the low budget regime where less than one percent of the data set is labeled.
In this paper, we develop a simple and fast online algorithm for solving a class of binary integer linear programs (LPs) arisen in general resource allocation problem. The algorithm requires only one single pass through the input data and is free of doing any matrix inversion. It can be viewed as both an approximate algorithm for solving binary integer LPs and a fast algorithm for solving online LP problems. The algorithm is inspired by an equivalent form of the dual problem of the relaxed LP and it essentially performs (one-pass) projected stochastic subgradient descent in the dual space. We analyze the algorithm in two different models, stochastic input and random permutation, with minimal technical assumptions on the input data. The algorithm achieves $Oleft(m sqrt{n}right)$ expected regret under the stochastic input model and $Oleft((m+log n)sqrt{n}right)$ expected regret under the random permutation model, and it achieves $O(m sqrt{n})$ expected constraint violation under both models, where $n$ is the number of decision variables and $m$ is the number of constraints. The algorithm enjoys the same performance guarantee when generalized to a multi-dimensional LP setting which covers a wider range of applications. In addition, we employ the notion of permutational Rademacher complexity and derive regret bounds for two earlier online LP algorithms for comparison. Both algorithms improve the regret bound with a factor of $sqrt{m}$ by paying more computational cost. Furthermore, we demonstrate how to convert the possibly infeasible solution to a feasible one through a randomized procedure. Numerical experiments illustrate the general applicability and effectiveness of the algorithms.
135 - Wenxia Guo , Jin Wang , Majun He 2018
In computational complexity theory, a decision problem is NP-complete when it is both in NP and NP-hard. Although a solution to a NP-complete can be verified quickly, there is no known algorithm to solve it in polynomial time. There exists a method t o reduce a SAT (Satifiability) problem to Subset Sum Problem (SSP) in the literature, however, it can only be applied to small or medium size problems. Our study is to find an efficient method to transform a SAT problem to a mixed integer linear programming problem in larger size. Observing the feature of variable-clauses constraints in SAT, we apply linear inequality model (LIM) to the problem and propose a method called LIMSAT. The new method can work efficiently for very large size problem with thousands of variables and clauses in SAT tested using up-to-date benchmarks.
Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarse-grained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CA D mapping tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large constraint systems to fully describe an architectures flexibility, resulting in lengthy run-times. In this paper, we propose to derive connectivity information from an otherwise generic device model, and use this to create simpler ILPs, which we combine in an iterative schedule and retain most of the exactness of a fully-generic ILP approach. This new approach has a speed-up geometric mean of 5.88x when considering benchmarks that do not hit a time-limit of 7.5 hours on the fully-generic ILP, and 37.6x otherwise. This was measured using the set of benchmarks used to originally evaluate the fully-generic approach and several more benchmarks representing computation tasks, over three different CGRA architectures. All run-times of the new approach are less than 20 minutes, with 90th percentile time of 410 seconds. The proposed mapping techniques are integrated into, and evaluated using the open-source CGRA-ME architecture modelling and exploration framework.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا