أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Matteo Marsili

Information thermodynamics of financial markets: the Glosten-Milgrom model

233 - Leo Touzo , Matteo Marsili , Don Zagier 2020

The Glosten-Milgrom model describes a single asset market, where informed traders interact with a market maker, in the presence of noise traders. We derive an analogy between this financial model and a Szilard information engine by {em i)} showing th at the optimal work extraction protocol in the latter coincides with the pricing strategy of the market maker in the former and {em ii)} defining a market analogue of the physical temperature from the analysis of the distribution of market orders. Then we show that the expected gain of informed traders is bounded above by the product of this market temperature with the amount of information that informed traders have, in exact analogy with the corresponding formula for the maximal expected amount of work that can be extracted from a cycle of the information engine. This suggests that recent ideas from information thermodynamics may shed light on financial markets, and lead to generalised inequalities, in the spirit of the extended second law of thermodynamics.

الميكانيكا الإحصائية الاقتصاد النظري الإحصاء والتجارة والسوق الصغير

Statistical Inference of Minimally Complex Models

119 - Clelia de Mulatier , Paolo P. Mazza , Matteo Marsili 2020

Finding the best model that describes a high dimensional dataset, is a daunting task. For binary data, we show that this becomes feasible, if the search is restricted to simple models. These models -- that we call Minimally Complex Models (MCMs) -- a re simple because they are composed of independent components of minimal complexity, in terms of description length. Simple models are easy to infer and to sample from. In addition, model selection within the MCMs class is invariant with respect to changes in the representation of the data. They portray the structure of dependencies among variables in a simple way. They provide robust predictions on dependencies and symmetries, as illustrated in several examples. MCMs may contain interactions between variables of any order. So, for example, our approach reveals whether a dataset is appropriately described by a pairwise interaction model.

الذكاء الاصطناعي نظرية الإحصاء تحليل البيانات والإحصاءات والاحتمال

Optimal Work Extraction and the Minimum Description Length Principle

119 - Leo Touzo , Matteo Marsili , Neri Merhav 2020

We discuss work extraction from classical information engines (e.g., Szilard) with $N$-particles, $q$ partitions, and initial arbitrary non-equilibrium states. In particular, we focus on their {em optimal} behaviour, which includes the measurement of a set of quantities $Phi$ with a feedback protocol that extracts the maximal average amount of work. We show that the optimal non-equilibrium state to which the engine should be driven before the measurement is given by the normalised maximum-likelihood probability distribution of a statistical model that admits $Phi$ as sufficient statistics. Furthermore, we show that the minimax universal code redundancy $mathcal{R}^*$ associated to this model, provides an upper bound to the work that the demon can extract on average from the cycle, in units of $k_{rm B}T$. We also find that, in the limit of $N$ large, the maximum average extracted work cannot exceed $H[Phi]/2$, i.e. one half times the Shannon entropy of the measurement. Our results establish a connection between optimal work extraction in stochastic thermodynamics and optimal universal data compression, providing design principles for optimal information engines. In particular, they suggest that: (i) optimal coding is thermodynamically efficient, and (ii) it is essential to drive the system into a critical state in order to achieve optimal performance.

الميكانيكا الإحصائية نظرية المعلومات نظرية المعلومات

Estimating the impact of preventive quarantine with reverse epidemiology

279 - Jacopo Grilli , Matteo Marsili , Guido Sanguinetti 2020

The impact of mitigation or control measures on an epidemics can be estimated by fitting the parameters of a compartmental model to empirical data, and running the model forward with modified parameters that account for a specific measure. This appro ach has several drawbacks, stemming from biases or lack of availability of data and instability of parameter estimates. Here we take the opposite approach -- that we call reverse epidemiology. Given the data, we reconstruct backward in time an ensemble of networks of contacts, and we assess the impact of measures on that specific realization of the contagion process. This approach is robust because it only depends on parameters that describe the evolution of the disease within one individual (e.g. latency time) and not on parameters that describe the spread of the epidemics in a population. Using this method, we assess the impact of preventive quarantine on the ongoing outbreak of Covid-19 in Italy. This gives an estimate of how many infected could have been avoided had preventive quarantine been enforced at a given time.

السكان والتطور الفيزياء البيولوجية الفيزياء والمجتمع

Influence of Reviewer Interaction Network on Long-term Citations: A Case Study of the Scientific Peer-Review System of the Journal of High Energy Physics

64 - Sandipan Sikdar , Matteo Marsili , Niloy Ganguly 2017

A `peer-review system in the context of judging research contributions, is one of the prime steps undertaken to ensure the quality of the submissions received, a significant portion of the publishing budget is spent towards successful completion of t he peer-review by the publication houses. Nevertheless, the scientific community is largely reaching a consensus that peer-review system, although indispensable, is nonetheless flawed. A very pertinent question therefore is could this system be improved?. In this paper, we attempt to present an answer to this question by considering a massive dataset of around $29k$ papers with roughly $70k$ distinct review reports together consisting of $12m$ lines of review text from the Journal of High Energy Physics (JHEP) between 1997 and 2015. In specific, we introduce a novel textit{reviewer-reviewer interaction network} (an edge exists between two reviewers if they were assigned by the same editor) and show that surprisingly the simple structural properties of this network such as degree, clustering coefficient, centrality (closeness, betweenness etc.) serve as strong predictors of the long-term citations (i.e., the overall scientific impact) of a submitted paper. These features, when plugged in a regression model, alone achieves a high $R^2$ of 0.79 and a low $RMSE$ of 0.496 in predicting the long-term citations. In addition, we also design a set of supporting features built from the basic characteristics of the submitted papers, the authors and the referees (e.g., the popularity of the submitting author, the acceptance rate history of a referee, the linguistic properties laden in the text of the review reports etc.), which further results in overall improvement with $R^2$ of 0.81 and $RMSE$ of 0.46.

المكتبات الرقمية

Translating ceRNA susceptibilities into correlation functions

141 - Araks Martirosyan , Matteo Marsili , Andrea De Martino 2017

Competition to bind microRNAs induces an effective positive crosstalk between their targets, therefore known as `competing endogenous RNAs or ceRNAs. While such an effect is known to play a significant role in specific conditions, estimating its stre ngth from data and, experimentally, in physiological conditions appears to be far from simple. Here we show that the susceptibility of ceRNAs to different types of perturbations affecting their competitors (and hence their tendency to crosstalk) can be encoded in quantities as intuitive and as simple to measure as correlation functions. We confirm this scenario by extensive numerical simulations and validate it by re-analyzing PTENs crosstalk pattern from TCGA breast cancer dataset. These results clarify the links between different quantities used to estimate the intensity of ceRNA crosstalk and provide new keys to analyze transcriptional datasets and effectively probe ceRNA networks in silico.

الشبكات الجزيئية الأنظمة المضطربة والشبكات العصبية الأساليب الكمية

The missing assets and the size of Shadow Banking: an update

132 - Davide Fiaschi , Imre Kondor , Matteo Marsili 2016

In a recent paper, using data from Forbes Global 2000, we have observed that the upper tail of the firm size distribution (by assets) falls off much faster than a Pareto distribution. The missing mass was suggested as an indicator of the size of the Shadow Banking (SB) sector. This short note provides the latest figures of the missing assets for 2013, 2014 and 2015. In 2013 and 2014 the dynamics of the missing assets continued being strongly correlated with estimates of the size of the SB sector of the Financial Stability Board. In 2015 we find a sharp decrease in the size of missing assets, suggesting that the SB sector is deflating.

اقتصاديات

Anomalies in the peer-review system: A case study of the journal of High Energy Physics

180 - Sandipan Sikdar , Matteo Marsili , Niloy Ganguly 2016

Peer-review system has long been relied upon for bringing quality research to the notice of the scientific community and also preventing flawed research from entering into the literature. The need for the peer-review system has often been debated as in numerous cases it has failed in its task and in most of these cases editors and the reviewers were thought to be responsible for not being able to correctly judge the quality of the work. This raises a question Can the peer-review system be improved? Since editors and reviewers are the most important pillars of a reviewing system, we in this work, attempt to address a related question - given the editing/reviewing history of the editors or re- viewers can we identify the under-performing ones?, with citations received by the edited/reviewed papers being used as proxy for quantifying performance. We term such review- ers and editors as anomalous and we believe identifying and removing them shall improve the performance of the peer- review system. Using a massive dataset of Journal of High Energy Physics (JHEP) consisting of 29k papers submitted between 1997 and 2015 with 95 editors and 4035 reviewers and their review history, we identify several factors which point to anomalous behavior of referees and editors. In fact the anomalous editors and reviewers account for 26.8% and 14.5% of the total editors and reviewers respectively and for most of these anomalous reviewers the performance degrades alarmingly over time.

المكتبات الرقمية أجهزة الكمبيوتر والمجتمع

Phenotypic constraints promote latent versatility and carbon efficiency in metabolic networks

74 - Marco Bardoscia , Matteo Marsili , Areejit Samal 2014

System-level properties of metabolic networks may be the direct product of natural selection or arise as a by-product of selection on other properties. Here we study the effect of direct selective pressure for growth or viability in particular enviro nments on two properties of metabolic networks: latent versatility to function in additional environments and carbon usage efficiency. Using a Markov Chain Monte Carlo (MCMC) sampling based on Flux Balance Analysis (FBA), we sample from a known biochemical universe random viable metabolic networks that differ in the number of directly constrained environments. We find that the latent versatility of sampled metabolic networks increases with the number of directly constrained environments and with the size of the networks. We then show that the average carbon wastage of sampled metabolic networks across the constrained environments decreases with the number of directly constrained environments and with the size of the networks. Our work expands the growing body of evidence about nonadaptive origins of key functional properties of biological networks.

الشبكات الجزيئية الفيزياء البيولوجية

The Interrupted Power Law and The Size of Shadow Banking

147 - Davide Fiaschi , Imre Kondor , Matteo Marsili 2013

Using public data (Forbes Global 2000) we show that the asset sizes for the largest global firms follow a Pareto distribution in an intermediate range, that is ``interrupted by a sharp cut-off in its upper tail, where it is totally dominated by finan cial firms. This flattening of the distribution contrasts with a large body of empirical literature which finds a Pareto distribution for firm sizes both across countries and over time. Pareto distributions are generally traced back to a mechanism of proportional random growth, based on a regime of constant returns to scale. This makes our findings of an ``interrupted Pareto distribution all the more puzzling, because we provide evidence that financial firms in our sample should operate in such a regime. We claim that the missing mass from the upper tail of the asset size distribution is a consequence of shadow banking activity and that it provides an (upper) estimate of the size of the shadow banking system. This estimate -- which we propose as a shadow banking index -- compares well with estimates of the Financial Stability Board until 2009, but it shows a sharper rise in shadow banking activity after 2010. Finally, we propose a proportional random growth model that reproduces the observed distribution, thereby providing a quantitative estimate of the intensity of shadow banking activity.

المالية العامة التمويل الإحصائي

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد