The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, however, we still have a far more limited understanding of maximal matching which is one of the central problems of parallel and distributed computing. All known MPC algorithms for maximal matching either take polylogarithmic time which is considered inefficient, or require a strictly super-linear space of $n^{1+Omega(1)}$ per machine. In this work, we close this gap by providing a novel analysis of an extremely simple algorithm a variant of which was conjectured to work by Czumaj et al. [STOC18]. The algorithm edge-samples the graph, randomly partitions the vertices, and finds a random greedy maximal matching within each partition. We show that this algorithm drastically reduces the vertex degrees. This, among some other results, leads to an $O(log log Delta)$ round algorithm for maximal matching with $O(n)$ space (or even mildly sublinear in $n$ using standard techniques). As an immediate corollary, we get a $2$ approximate minimum vertex cover in essentially the same rounds and space. This is the best possible approximation factor under standard assumptions, culminating a long line of research. It also leads to an improved $O(loglog Delta)$ round algorithm for $1 + varepsilon$ approximate matching. All these results can also be implemented in the congested clique model within the same number of rounds.