No Arabic abstract
We present Luce, the first life-long predictive model for automated property valuation. Luce addresses two critical issues of property valuation: the lack of recent sold prices and the sparsity of house data. It is designed to operate on a limited volume of recent house transaction data. As a departure from prior work, Luce organizes the house data in a heterogeneous information network (HIN) where graph nodes are house entities and attributes that are important for house price valuation. We employ a Graph Convolutional Network (GCN) to extract the spatial information from the HIN for house-related data like geographical locations, and then use a Long Short Term Memory (LSTM) network to model the temporal dependencies for house transaction data over time. Unlike prior work, Luce can make effective use of the limited house transactions data in the past few months to update valuation information for all house entities within the HIN. By providing a complete and up-to-date house valuation dataset, Luce thus massively simplifies the downstream valuation task for the targeting properties. We demonstrate the benefit of Luce by applying it to large, real-life datasets obtained from the Toronto real estate market. Extensive experimental results show that Luce not only significantly outperforms prior property valuation methods but also often reaches and sometimes exceeds the valuation accuracy given by independent experts when using the actual realization price as the ground truth.
Tidal range structures have been considered for large scale electricity generation for their potential ability to produce reasonable predictable energy without the emission of greenhouse gases. Once the main forcing components for driving the tides have deterministic dynamics, the available energy in a given tidal power plant has been estimated, through analytical and numerical optimisation routines, as a mostly predictable event. This constraint imposes state-of-art flexible operation methods to rely on tidal predictions (concurrent with measured data and up to a multiple of half-tidal cycles into the future) to infer best operational strategies for tidal lagoons, with the additional cost of requiring to run optimisation routines for every new tide. In this paper, we propose a novel optimised operation of tidal lagoons with proximal policy optimisation through Unity ML-Agents. We compare this technique with 6 different operation optimisation approaches (baselines) devised from the literature, utilising the Swansea Bay Tidal Lagoon as a case study. We show that our approach is successful in maximising energy generation through an optimised operational policy of turbines and sluices, yielding competitive results with state-of-the-art methods of optimisation, regardless of test data used, requiring training once and performing real-time flexible control with measured ocean data only.
The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor $EF_chi^{(R)}$, to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks.
Molecular property prediction plays a fundamental role in drug discovery to discover candidate molecules with target properties. However, molecular property prediction is essentially a few-shot problem which makes it hard to obtain regular models. In this paper, we propose a property-aware adaptive relation networks (PAR) for the few-shot molecular property prediction problem. In comparison to existing works, we leverage the facts that both substructures and relationships among molecules are different considering various molecular properties. Our PAR is compatible with existing graph-based molecular encoders, and are further equipped with the ability to obtain property-aware molecular embedding and model molecular relation graph adaptively. The resultant relation graph also facilitates effective label propagation within each task. Extensive experiments on benchmark molecular property prediction datasets show that our method consistently outperforms state-of-the-art methods and is able to obtain property-aware molecular embedding and model molecular relation graph properly.
Housing markets are inherently spatial, yet many existing models fail to capture this spatial dimension. Here we introduce a new graph-based approach for incorporating a spatial component in a large-scale urban housing agent-based model (ABM). The model explicitly captures several social and economic factors that influence the agents decision-making behaviour (such as fear of missing out, their trend following aptitude, and the strength of their submarket outreach), and interprets these factors in spatial terms. The proposed model is calibrated and validated with the housing market data for the Greater Sydney region. The ABM simulation results not only include predictions for the overall market, but also produce area-specific forecasting at the level of local government areas within Sydney as arising from individual buy and sell decisions. In addition, the simulation results elucidate agent preferences in submarkets, highlighting differences in agent behaviour, for example, between first-time home buyers and investors, and between both local and overseas investors.
Deep Learning approaches for real, large, and complex scientific data sets can be very challenging to design. In this work, we present a complete search for a finely-tuned and efficiently scaled deep learning classifier to identify usable energy from seismic data acquired using Distributed Acoustic Sensing (DAS). While using only a subset of labeled images during training, we were able to identify suitable models that can be accurately generalized to unknown signal patterns. We show that by using 16 times more GPUs, we can increase the training speed by more than two orders of magnitude on a 50,000-image data set.