We show that entangled measurements provide an exponential advantage in sample complexity for Pauli channel estimation, which is both a fundamental problem and a practically important subroutine for benchmarking near-term quantum devices. The specific task we consider is to learn the eigenvalues of an $n$-qubit Pauli channel to precision $varepsilon$ in $l_infty$ distance. We give an estimation protocol with an $n$-qubit ancilla that succeeds with high probability using only $O(n/varepsilon^{2})$ copies of the Pauli channel, while prove that any ancilla-free protocol (possibly with adaptive control and channel concatenation) would need at least $Omega(2^{n/3})$ rounds of measurement. We further study the advantages provided by a small number of ancillas. For the case that a $k$-qubit ancilla ($kle n$) is available, we obtain a sample complexity lower bound of $Omega(2^{(n-k)/3})$ for any non-concatenating protcol, and a stronger lower bound of $Omega(n2^{n-k})$ for any non-adaptive, non-concatenating protocol. The latter is shown to be tight by explicitly constructing a $k$-qubit-ancilla-assisted estimation protocol. We also show how to apply the ancilla-assisted estimation protocol to a practical quantum benchmarking task in a noise-resilient and sample-efficient manner, given reasonable noise assumptions. Our results provide a practically-interesting example for quantum advantages in property learning and also bring new insight for quantum benchmarking.