In the need for AI network infrastructure, an ROCE test in the network fabric is needed to ensure that the ongoing traffic is always in the low latency condition and get the best high bandwidth in the future. In this section, we talk about how to test the ROCE network in the Kubernetes cluster premise. […]
Tag: NVIDIA
Dragonfly and Dragonfly+ Topology
In this section, we talk about Dragonfly and Dragonfly+ topology. The Dragonfly network topology is a high-performance interconnection network commonly used in large-scale computing systems like supercomputers and data centers. It is designed to provide low latency, high bandwidth, and high fault tolerance for communication between computing nodes. In a Dragonfly network, the system is […]
Check topology Layer 2 change real time in Cumulus (STP issue related)
In Layer 2 network topology, when there is changes within the topology or new device connected with Layer 2 feature, the spanning tree feature will calculated again which device is gonna be root switch, which port is gonna be designated port, alternate port, etc. In Cumulus, we can check which port is triggered by that […]
Troubleshooting : Check MAC Address Learning on Cumulus
In Cumulus OS Switches, we can do MAC checking regarding troubleshooting Layer 2 function. At first we can use this command to make sure the MAC address is learned from some interfaces source. net show mac bridge macs If we want to check the Forward Database Mac on Cumulus OS Switches, you can use this […]
How to Check Uptime & Upgrade Firmware on infiniband unmanage switch
You can check the unmanaged status with some MFT commands on the hosts in the fabric. Assume there is an unmanaged switch with LID 1, then you can check the uptime information with the following command: And you can also check the current firmware information with this following command: Almost all devices consist of hardware, […]