No Arabic abstract
A perplexing problem in understanding physical reality is why the universe seems comprehensible, and correspondingly why there should exist physical systems capable of comprehending it. In this essay I explore the possibility that rather than being an odd coincidence arising due to our strange position as passive (and even more strangely, conscious) observers in the cosmos, these two problems might be related and could be explainable in terms of fundamental physics. The perspective presented suggests a potential unified framework where, when taken together, comprehenders and comprehensibility are part of causal structure of physical reality, which is considered as a causal graph (network) connecting states that are physically possible. I argue that in some local regions, the most probable states are those that include physical systems which contain information encodings - such as mathematics, language and art - because these are the most highly connected to other possible states in this causal graph. Such physical systems include life and - of particular interest for the discussion of the place of math in physical reality - comprehenders capable of making mathematical sense of the world. Within this framework, the descent of math is an undirected outcome of the evolution of the universe, which will tend toward states that are increasingly connected to other possible states of the universe, a process greatly facilitated if some physical systems know the rules of the game. I therefore conclude that our ability to use mathematics to describe, and more importantly manipulate, the natural world may not be an anomaly or trick, but instead could provide clues to the underlying causal structure of physical reality.
The key difference between math as math and math in science is that in science we blend our physical knowledge with our knowledge of math. This blending changes the way we put meaning to math and even to the way we interpret mathematical equations. Learning to think about physics with math instead of just calculating involves a number of general scientific thinking skills that are often taken for granted (and rarely taught) in physics classes. In this paper, I give an overview of my analysis of these additional skills. I propose specific tools for helping students develop these skills in subsequent papers.
Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics. Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.
Variational algorithms have particular relevance for near-term quantum computers but require non-trivial parameter optimisations. Here we propose Analytic Descent: Given that the energy landscape must have a certain simple form in the local region around any reference point, it can be efficiently approximated in its entirety by a classical model -- we support these observations with rigorous, complexity-theoretic arguments. One can classically analyse this approximate function in order to directly `jump to the (estimated) minimum, before determining a more refined function if necessary. We derive an optimal measurement strategy and generally prove that the asymptotic resource cost of a `jump corresponds to only a single gradient vector evaluation.
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) dont increase the stepsize too fast and 2) dont overstep the local curvature. No need for functional values, no line search, no information about the function except for the gradients. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on the smoothness in a neighborhood of a solution. Given that the problem is convex, our method converges even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including logistic regression and matrix factorization.
I argue that European schools of thought on memory and memorization were critical in enabling the growth of the scientific method. After giving a historical overview of the development of the memory arts from ancient Greece through 17th century Europe, I describe how the Baconian viewpoint on the scientific method was fundamentally part of a culture and a broader dialogue that conceived of memorization as a foundational methodology for structuring knowledge and for developing symbolic means for representing scientific concepts. The principal figures of this intense and rapidly evolving intellectual milieu included some of the leading thinkers traditionally associated with the scientific revolution; among others, Francis Bacon, Renes Descartes, and Gottfried Leibniz. I close by examining the acceleration of mathematical thought in light of the art of memory and its role in 17th century philosophy, and in particular, Leibniz project to develop a universal calculus.