Reinforcement learning has witnessed recent applications to a variety of tasks in quantum programming. The underlying assumption is that those tasks could be modeled as Markov Decision Processes (MDPs). Here, we investigate the feasibility of this assumption by exploring its consequences for two of the simplest tasks in quantum programming: state preparation and gate compilation. By forming discrete MDPs, focusing exclusively on the single-qubit case, we solve for the optimal policy exactly through policy iteration. We find optimal paths that correspond to the shortest possible sequence of gates to prepare a state, or compile a gate, up to some target accuracy. As an example, we find sequences of H and T gates with length as small as 11 producing ~99% fidelity for states of the form (HT)^{n} |0> with values as large as n=10^{10}. This work provides strong evidence that reinforcement learning can be used for optimal state preparation and gate compilation for larger qubit spaces.