Do you want to publish a course? Click here

Picking Winners: A Data Driven Approach to Evaluating the Quality of Startup Companies

72   0   0.0 ( 0 )
 Added by David Hunter
 Publication date 2017
and research's language is English




Ask ChatGPT about the research

We consider the problem of evaluating the quality of startup companies. This can be quite challenging due to the rarity of successful startup companies and the complexity of factors which impact such success. In this work we collect data on tens of thousands of startup companies, their performance, the backgrounds of their founders, and their investors. We develop a novel model for the success of a startup company based on the first passage time of a Brownian motion. The drift and diffusion of the Brownian motion associated with a startup company are a function of features based its sector, founders, and initial investors. All features are calculated using our massive dataset. Using a Bayesian approach, we are able to obtain quantitative insights about the features of successful startup companies from our model. To test the performance of our model, we use it to build a portfolio of companies where the goal is to maximize the probability of having at least one company achieve an exit (IPO or acquisition), which we refer to as winning. This $textit{picking winners}$ framework is very general and can be used to model many problems with low probability, high reward outcomes, such as pharmaceutical companies choosing drugs to develop or studios selecting movies to produce. We frame the construction of a picking winners portfolio as a combinatorial optimization problem and show that a greedy solution has strong performance guarantees. We apply the picking winners framework to the problem of choosing a portfolio of startup companies. Using our model for the exit probabilities, we are able to construct out of sample portfolios which achieve exit rates as high as 60%, which is nearly double that of top venture capital firms.



rate research

Read More

Air pollution is a major risk factor for global health, with both ambient and household air pollution contributing substantial components of the overall global disease burden. One of the key drivers of adverse health effects is fine particulate matter ambient pollution (PM$_{2.5}$) to which an estimated 3 million deaths can be attributed annually. The primary source of information for estimating exposures has been measurements from ground monitoring networks but, although coverage is increasing, there remain regions in which monitoring is limited. Ground monitoring data therefore needs to be supplemented with information from other sources, such as satellite retrievals of aerosol optical depth and chemical transport models. A hierarchical modelling approach for integrating data from multiple sources is proposed allowing spatially-varying relationships between ground measurements and other factors that estimate air quality. Set within a Bayesian framework, the resulting Data Integration Model for Air Quality (DIMAQ) is used to estimate exposures, together with associated measures of uncertainty, on a high resolution grid covering the entire world. Bayesian analysis on this scale can be computationally challenging and here approximate Bayesian inference is performed using Integrated Nested Laplace Approximations. Model selection and assessment is performed by cross-validation with the final model offering substantial increases in predictive accuracy, particularly in regions where there is sparse ground monitoring, when compared to current approaches: root mean square error (RMSE) reduced from 17.1 to 10.7, and population weighted RMSE from 23.1 to 12.1 $mu$gm$^{-3}$. Based on summaries of the posterior distributions for each grid cell, it is estimated that 92% of the worlds population reside in areas exceeding the World Health Organizations Air Quality Guidelines.
The Argo data is a modern oceanography dataset that provides unprecedented global coverage of temperature and salinity measurements in the upper 2,000 meters of depth of the ocean. We study the Argo data from the perspective of functional data analysis (FDA). We develop spatio-temporal functional kriging methodology for mean and covariance estimation to predict temperature and salinity at a fixed location as a smooth function of depth. By combining tools from FDA and spatial statistics, including smoothing splines, local regression, and multivariate spatial modeling and prediction, our approach provides advantages over current methodology that consider pointwise estimation at fixed depths. Our approach naturally leverages the irregularly-sampled data in space, time, and depth to fit a space-time functional model for temperature and salinity. The developed framework provides new tools to address fundamental scientific problems involving the entire upper water column of the oceans such as the estimation of ocean heat content, stratification, and thermohaline oscillation. For example, we show that our functional approach yields more accurate ocean heat content estimates than ones based on discrete integral approximations in pressure. Further, using the derivative function estimates, we obtain a new product of a global map of the mixed layer depth, a key component in the study of heat absorption and nutrient circulation in the oceans. The derivative estimates also reveal evidence for density
Recent data-driven approaches have shown great potential in early prediction of battery cycle life by utilizing features from the discharge voltage curve. However, these studies caution that data-driven approaches must be combined with specific design of experiments in order to limit the range of aging conditions, since the expected life of Li-ion batteries is a complex function of various aging factors. In this work, we investigate the performance of the data-driven approach for battery lifetime prognostics with Li-ion batteries cycled under a variety of aging conditions, in order to determine when the data-driven approach can successfully be applied. Results show a correlation between the variance of the discharge capacity difference and the end-of-life for cells aged under a wide range of charge/discharge C-rates and operating temperatures. This holds despite the different conditions being used not only to cycle the batteries but also to obtain the features: the features are calculated directly from cycling data without separate slow characterization cycles at a controlled temperature. However, the correlation weakens considerably when the voltage data window for feature extraction is reduced, or when features from the charge voltage curve instead of discharge are used. As deep constant-current discharges rarely happen in practice, this imposes new challenges for applying this method in a real-world system.
The analysis of the intraday dynamics of correlations among high-frequency returns is challenging due to the presence of asynchronous trading and market microstructure noise. Both effects may lead to significant data reduction and may severely underestimate correlations if traditional methods for low-frequency data are employed. We propose to model intraday log-prices through a multivariate local-level model with score-driven covariance matrices and to treat asynchronicity as a missing value problem. The main advantages of this approach are: (i) all available data are used when filtering correlations, (ii) market microstructure noise is taken into account, (iii) estimation is performed through standard maximum likelihood methods. Our empirical analysis, performed on 1-second NYSE data, shows that opening hours are dominated by idiosyncratic risk and that a market factor progressively emerges in the second part of the day. The method can be used as a nowcasting tool for high-frequency data, allowing to study the real-time response of covariances to macro-news announcements and to build intraday portfolios with very short optimization horizons.
We consider the problem of selecting a portfolio of entries of fixed cardinality for contests with top-heavy payoff structures, i.e. most of the winnings go to the top-ranked entries. This framework is general and can be used to model a variety of problems, such as movie studios selecting movies to produce, venture capital firms picking start-up companies to invest in, or individuals selecting lineups for daily fantasy sports contests, which is the example we focus on here. We model the portfolio selection task as a combinatorial optimization problem with a submodular objective function, which is given by the probability of at least one entry winning. We then show that this probability can be approximated using only pairwise marginal probabilities of the entries winning when there is a certain structure on their joint distribution. We consider a model where the entries are jointly Gaussian random variables and present a closed form approximation to the objective function. Building on this, we then consider a scenario where the entries are given by sums of constrained resources and present an integer programming formulation to construct the entries. Our formulation uses principles based on our theoretical analysis to construct entries: we maximize the expected score of an entry subject to a lower bound on its variance and an upper bound on its correlation with previously constructed entries. To demonstrate the effectiveness of our integer programming approach, we apply it to daily fantasy sports contests that have top-heavy payoff structures. We find that our approach performs well in practice. Using our integer programming approach, we are able to rank in the top-ten multiple times in hockey and baseball contests with thousands of competing entries. Our approach can easily be extended to other problems with constrained resources and a top-heavy payoff structure.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا