No Arabic abstract
I examine the topic of training scientific generalists. To focus the discussion, I propose the creation of a new graduate program, analogous in structure to existing MD/PhD programs, aimed at training a critical mass of scientific researchers with substantial intellectual breadth. In addition to completing the normal requirements for a PhD, students would undergo an intense, several year training period designed to expose them to the core vocabulary of multiple subjects at the graduate level. After providing some historical and philosophical context for this proposal, I outline how such a program could be implemented with little institutional overhead by existing research universities. Finally, I discuss alternative possibilities for training generalists by taking advantage of contemporary developments in online learning and open science.
The nature of the scientific method is controversial with claims that a single scientific method does not even exist. However the scientific method does exist. It is the building of logical and self consistent models to describe nature. The models are constrained by past observations and judged by their ability to correctly predict new observations and interesting phenomena. The observations exist independent of the models but acquire meaning from their context within a model. Observations must be carefully done and reproducible to minimize errors. Models assumptions that do not lead to testable predictions are rejected as unnecessary.
Hashing produces compact representations for documents, to perform tasks like classification or retrieval based on these short codes. When hashing is supervised, the codes are trained using labels on the training data. This paper first shows that the evaluation protocols used in the literature for supervised hashing are not satisfactory: we show that a trivial solution that encodes the output of a classifier significantly outperforms existing supervised or semi-supervised methods, while using much shorter codes. We then propose two alternative protocols for supervised hashing: one based on retrieval on a disjoint set of classes, and another based on transfer learning to new classes. We provide two baseline methods for image-related tasks to assess the performance of (semi-)supervised hashing: without coding and with unsupervised codes. These baselines give a lower- and upper-bound on the performance of a supervised hashing scheme.
We calculate Bayes factors to quantify how the feasibility of the constrained minimal supersymmetric standard model (CMSSM) has changed in the light of a series of observations. This is done in the Bayesian spirit where probability reflects a degree of belief in a proposition and Bayes theorem tells us how to update it after acquiring new information. Our experimental baseline is the approximate knowledge that was available before LEP, and our comparison model is the Standard Model with a simple dark matter candidate. To quantify the amount by which experiments have altered our relative belief in the CMSSM since the baseline data we compute the Bayes factors that arise from learning in sequence the LEP Higgs constraints, the XENON100 dark matter constraints, the 2011 LHC supersymmetry search results, and the early 2012 LHC Higgs search results. We find that LEP and the LHC strongly shatter our trust in the CMSSM (with $M_0$ and $M_{1/2}$ below 2 TeV), reducing its posterior odds by a factor of approximately two orders of magnitude. This reduction is largely due to substantial Occam factors induced by the LEP and LHC Higgs searches.
Code changes constitute one of the most important features of software evolution. Studying them can provide insights into the nature of software development and also lead to practical solutions - recommendations and automations of popular changes for developers. In our work, we developed a tool called PythonChangeMiner that allows to discover code change patterns in the histories of Python projects. We validated the tool and then employed it to discover patterns in the dataset of 120 projects from four different domains of software engineering. We manually categorized patterns that occur in more than one project from the standpoint of their structure and content, and compared different domains and patterns in that regard. We conducted a survey of the authors of the discovered changes: 82.9% of them said that they can give the change a name and 57.9% expressed their desire to have the changes automated, indicating the ability of the tool to discover valuable patterns. Finally, we interviewed 9 members of a popular integrated development environment (IDE) development team to estimate the feasibility of automating the discovered changes. It was revealed that independence from the context and high precision made a pattern a better candidate for automation. The patterns received mainly positive reviews and several were ranked as very likely for automation.
The growing need for affordable and accessible higher education is a major global challenge for the 21st century. Consequently, there is a need to develop a deeper understanding of the functionality and taxonomy of universities and colleges and, in particular, how their various characteristics change with size. Scaling has been a powerful tool for revealing systematic regularities in systems across a range of topics from physics and biology to cities, and for understanding the underlying principles of their organization and growth. Here, we apply this framework to institutions of higher learning in the United States and show that, like organisms, ecosystems and cities, they scale in a surprisingly systematic fashion following simple power law behavior. We analyze the entire spectrum encompassing 5,802 institutions ranging from large research universities to small professional schools, organized in seven commonly used sectors, which reveal distinct regimes of institutional scaling behavior. Metrics include variation in expenditures, revenues, graduation rates and estimated economic added value, expressed as functions of total enrollment, our fundamental measure of size. Our results quantify how each regime of institution leverages specific economies of scale to address distinct priorities. Taken together, the scaling of features within a sector and shifts in scaling across sectors implies that there are generic mechanisms and constraints shared by all sectors which lead to tradeoffs between their different societal functions and roles. We particularly highlight the strong complementarity between public and private research universities, and community and state colleges, four sectors that display superlinear returns to scale.