A flag is a sequence of nested subspaces. Flags are ubiquitous in numerical analysis, arising in finite elements, multigrid, spectral, and pseudospectral methods for numerical PDE; they arise in the form of Krylov subspaces in matrix computations, and as multiresolution analysis in wavelets constructions. They are common in statistics too --- principal component, canonical correlation, and correspondence analyses may all be viewed as methods for extracting flags from a data set. The main goal of this article is to develop the tools needed for optimizing over a set of flags, which is a smooth manifold called the flag manifold, and it contains the Grassmannian as the simplest special case. We will derive closed-form analytic expressions for various differential geometric objects required for Riemannian optimization algorithms on the flag manifold; introducing various systems of extrinsic coordinates that allow us to parameterize points, metrics, tangent spaces, geodesics, distance, parallel transport, gradients, Hessians in terms of matrices and matrix operations; and thereby permitting us to formulate steepest descent, conjugate gradient, and Newton algorithms on the flag manifold using only standard numerical linear algebra.