Randomized Load-balanced Routing for Fat-tree Networks


Abstract in English

Fat-tree networks have been widely adopted to High Performance Computing (HPC) clusters and to Data Center Networks (DCN). These parallel systems usually have a large number of servers and hosts, which generate large volumes of highly-volatile traffic. Thus, distributed load-balancing routing design becomes critical to achieve high bandwidth utilization, and low-latency packet delivery. Existing distributed designs rely on remote congestion feedbacks to address congestion, which add overheads to collect and react to network-wide congestion information. In contrast, we propose a simple but effective load-balancing scheme, called Dynamic Randomized load-Balancing (DRB), to achieve network-wide low levels of path collisions through local-link adjustment which is free of communications and cooperations between switches. First, we use D-mod-k path selection scheme to allocate default paths to all source-destination (S-D) pairs in a fat-tree network, guaranteeing low levels of path collision over downlinks for any set of active S-D pairs. Then, we propose Threshold-based Two-Choice (TTC) randomized technique to balance uplink traffic through local uplink adjustment at each switch. We theoretically show that the proposed TTC for the uplink-load balancing in a fat-tree network have a similar performance as the two-choice technique in the area of randomized load balancing. Simulation results show that DRB with TTC technique achieves a significant improvement over many randomized routing schemes for fat-tree networks.

Download