In the need for AI network infrastructure, an ROCE test in the network fabric is needed to ensure that the ongoing traffic is always in the low latency condition and get the best high bandwidth in the future. In this section, we talk about how to test the ROCE network in the Kubernetes cluster premise. […]
Category: NVIDIA
Dragonfly and Dragonfly+ Topology
In this section, we talk about Dragonfly and Dragonfly+ topology. The Dragonfly network topology is a high-performance interconnection network commonly used in large-scale computing systems like supercomputers and data centers. It is designed to provide low latency, high bandwidth, and high fault tolerance for communication between computing nodes. In a Dragonfly network, the system is […]
Mellanox Connect x4 SRIOV Ubuntu 22.04
SRIOV-CNI support Mellanox ConnectX®-4 Lx and ConnectX®-5 adapter cards. To enable SR-IOV functionality the following steps are required:1.Verify that the system has a Mellanox network adapter (HCA/NIC) installed. 1- Enable SR-IOV in the NIC’s Firmware. Installing Mellanox Management Tools (MFT) or mstflint is a pre-requisite, MFT can be downloaded from http://www.mellanox.com/page/management_tools Download the mft package […]
Error installing Netris agent on Cumulus
Installing netris agent in Mellanox Switch with new cumulus it might there are some trouble while in progress. first you need to check date on the switch, is it real-time date? If it isn’t, you can change the timezone with command timedatectl set-timezone. In this case my timezone GMT+7 which is Asia/Jakarta time. next, you […]
Checking Manageable Switch Infiniband on Subnet Manager
You can checking Infiniband Subnet Manager (SM) via web by login in with ip address of SM. First, check on status Summary of SM. It will display the Uptime of SM, State of SM, counter failures, SM Priority(HA), Autostart status, and SM version. Then check on Base SM tab, it will display detected SM node […]
Check topology Layer 2 change real time in Cumulus (STP issue related)
In Layer 2 network topology, when there is changes within the topology or new device connected with Layer 2 feature, the spanning tree feature will calculated again which device is gonna be root switch, which port is gonna be designated port, alternate port, etc. In Cumulus, we can check which port is triggered by that […]
Best Practice: Restart Service switchd on Cumulus MLAG-Pair Switch
Sometimes we/user want to restart service switchd on the one of MLAG-Pair switch Cumulus, or maybe testing to cut off the peerlink between that MLAG-Pair switch (like UAT or similar activity). But if we don’t know who is the primary switch on that MLAG-Pair, we could get into trouble like the Host will disconnected from […]
Automation Cumulus with Playbook-Ansible
Preparation Memberikan akses “root/sudo” pada user user tanpa password pada semua perangkat, baik di dalam server maupun switch yang terhubung dalam ansible. Dalam kasus ini menggunakan user ubuntu dan dapat diubah sesuai kebutuhan. update dan upgrade system install python3-dev dan virtual environment install git dan copy repository file git ‘nvue’ yang dibutuhkan dari gitlab cumulus […]
Troubleshooting : Check MAC Address Learning on Cumulus
In Cumulus OS Switches, we can do MAC checking regarding troubleshooting Layer 2 function. At first we can use this command to make sure the MAC address is learned from some interfaces source. net show mac bridge macs If we want to check the Forward Database Mac on Cumulus OS Switches, you can use this […]
How to Check Uptime & Upgrade Firmware on infiniband unmanage switch
You can check the unmanaged status with some MFT commands on the hosts in the fabric. Assume there is an unmanaged switch with LID 1, then you can check the uptime information with the following command: And you can also check the current firmware information with this following command: Almost all devices consist of hardware, […]