Dragonfly and Dragonfly+ Topology

In this section, we talk about Dragonfly and Dragonfly+ topology.

The Dragonfly network topology is a high-performance interconnection network commonly used in large-scale computing systems like supercomputers and data centers. It is designed to provide low latency, high bandwidth, and high fault tolerance for communication between computing nodes.

In a Dragonfly network, the system is organized into groups of interconnected routers and compute nodes. Here are some key features of the Dragonfly network topology:

  1. Groups: The network is divided into multiple groups, each containing a set of routers and compute nodes.
  2. Routers: Routers within a group are connected to all other routers in the same group and to a subset of routers in other groups. This allows for efficient communication between nodes within and across groups.
  3. Compute Nodes: Compute nodes are connected to routers within the same group. They communicate with each other through the routers.
  4. Low Latency: The design of the Dragonfly network minimizes the number of hops required for communication between nodes, leading to low latency and fast data transfer.
  5. High Bandwidth: The network provides high-bandwidth links between nodes, allowing for high-speed data communication and efficient parallel processing.
  6. Fault Tolerance: Dragonfly networks typically incorporate redundancy and fault-tolerant mechanisms to ensure that communication can continue despite failures.
  7. Scalability: The Dragonfly network is designed to be scalable, allowing for the addition of more nodes and routers as the system grows without significantly impacting performance.

So what about Dragonfly+?

Dragonfly+ is an enhanced version of the Dragonfly network topology. The “+” in Dragonfly+ signifies enhancements or modifications to the traditional Dragonfly topology. Dragonfly+ topologies aim to address some of the limitations or performance bottlenecks present in the original Dragonfly architecture.

Dragonfly+ can be called Megafly.

In addition to the above, the Dragonfly+ group structure is common to that used by many Fat Trees topology, hence network products are re-usable between these two topologies.

We can see in the Dragonfly+ topology example above, that Dragonfly+ extends Dragonfly by connecting intra-group routers in a full bipartite manner. We use the notation of the leaf router for first layer routers that are directly connected to hosts, and the notation of spine routers for the second-layer routers.

Dragonfly and Dragonfly+ topologies are defined by the existence of at least one direct global link between every pair of groups. Minimal intergroup routes traverse a single global link.

Conclusion:

Dragonfly+ is based on the conventional Dragonfly and extends it using properties of Fat Tree topologies. In this manner, we create a hybrid topology that provides benefit of Dragonfly and Fat Tree. Dragonfly+ is also more scalable than Dragonfly with the same cost.

We also can use this Dragonfly+ topology for future planning to spine-leaf topology fabric existing, if we need to expand the fabric pods without changing the whole spine-leaf topology but still can get benefits to provide the same or even better throughput for equivalent Dragonfly and Fat Tree under various patterns, at the less cost than using super-spine topology.

Leave a Reply

Your email address will not be published. Required fields are marked *