This paper considers the steady-state performance of load balancing algorithms in a many-server system with distributed queues. The system has $N$ servers, and each server maintains a local queue with buffer size $b-1,$ i.e. a server can hold at most one job in service and $b-1$ jobs in the queue. Jobs in the same queue are served according to the first-in-first-out (FIFO) order. The system is operated in a heavy-traffic regime such that the workload per server is $lambda = 1 - N^{-alpha}$ for $0.5leq alpha<1.$ We identify a set of algorithms such that the steady-state queues have the following universal scaling, where {em universal} means that it holds for any $alphain[0.5,1)$: (i) the number of of busy servers is $lambda N-o(1);$ and (ii) the number of servers with two jobs (one in service and one in queue) is $O(N^{alpha}log N);$ and (iii) the number of servers with more than two jobs is $Oleft(frac{1}{N^{r(1-alpha)-1}}right),$ where $r$ can be any positive integer independent of $N.$ The set of load balancing algorithms that satisfy the sufficient condition includes join-the-shortest-queue (JSQ), idle-one-first (I1F), and power-of-$d$-choices (Po$d$) with $dgeq N^alphalog^2 N.$ We further argue that the waiting time of such an algorithm is near optimal order-wise.