Suppose we have many copies of an unknown $n$-qubit state $rho$. We measure some copies of $rho$ using a known two-outcome measurement $E_{1}$, then other copies using a measurement $E_{2}$, and so on. At each stage $t$, we generate a current hypothesis $sigma_{t}$ about the state $rho$, using the outcomes of the previous measurements. We show that it is possible to do this in a way that guarantees that $|operatorname{Tr}(E_{i} sigma_{t}) - operatorname{Tr}(E_{i}rho) |$, the error in our prediction for the next measurement, is at least $varepsilon$ at most $operatorname{O}!left(n / varepsilon^2 right) $ times. Even in the non-realizable setting---where there could be arbitrary noise in the measurement outcomes---we show how to output hypothesis states that do significantly worse than the best possible states at most $operatorname{O}!left(sqrt {Tn}right) $ times on the first $T$ measurements. These results generalize a 2007 theorem by Aaronson on the PAC-learnability of quantum states, to the online and regret-minimization settings. We give three different ways to prove our results---using convex optimization, quantum postselection, and sequential fat-shattering dimension---which have different advantages in terms of parameters and portability.