Differences between revisions 55 and 56
Revision 55 as of 2019-09-02 11:29:33
Size: 16046
Comment: Managed switches opachassisadmin
Revision 56 as of 2019-09-02 11:43:54
Size: 16069
Comment:
Deletions are marked like this. Additions are marked like this.
Line 59: Line 59:
In a working Omni-Path fabric inquire all managed switches by:: In a working Omni-Path fabric inquire the list all managed switches by::
Line 63: Line 63:
and append the output to the file ``/etc/opa/chassis``. and append the list of switch names to the file ``/etc/opa/chassis``.

OmniPath switches

Software and documentation

Serial console connection

Connect to the serial console using the (unusual) supplied USB type A (male) to type A (male) cable. The switch contains a serial port device of type FT231X USB UART from FTDI Chip:

  • Download and unpack the appropriate FT231X_drivers.
  • Update device drivers from the downloaded files.
  • A new USB serial port device appears, update its drivers from the same files.
  • Connect to the serial port using a terminal (for example, PuTTy) with port speed 115200 baud.

SSH or telnet connection

Use SSH or telnet to connect to the switch IP address:

  • The system prompts for a user name. The CLI has the following default user names:

    Operator access: operator
    Administrator access: admin

    Type the appropriate user name and press ENTER.

  • The system prompts for a password. The CLI has the following default passwords:

    Operator access: operpass
    Administrator access: adminpass

Please note that the SSH connection will appear to be very slow (up to 1 minute) because SSH will try public-key logins before trying the password for login. Question: How to make SSH logins and ignoring the public-keys?

Managed switches

In a working Omni-Path fabric inquire the list all managed switches by:

opagenchassis

and append the list of switch names to the file /etc/opa/chassis.

To inquire the switch chassises use:

opachassisadmin -S getconfig

where the -S option will ask for the switches password.

Managed switch configuration

Only switches with the Management Module installed can be accessed via the RJ45 network port. Unmanaged switches have a deactivated RJ45 port.

To access the CLI interface:

  • Connect a cross-over network cable to the switch RJ45 port and a PC network port.

  • The switch factory default network configuration is:

    • IP 192.168.100.9 netmask 255.255.255.0
    • Administrator login: admin password: adminpass
  • Configure the PC network port with another address in the same subnet 192.168.100.0/24.

  • From the PC try:

    ping 192.168.100.9
  • If ping is successful, try to login by SSH to the IP address and using the above login.

    Notice: The telnet connection doesn't work, even though the Intel manual advises to use it!

  • Once logged into the switch, type:

    list
    list all

    to display brief help for all available commands.

  • Learn the network MAC address information by:

    ifShow
  • To change the switch to use standard DHCP network configuration, first configure your DHCP server with MAC and IP address for the switch. Then configure the switch using:

    dhcpEnable

    and reconnect the switch to the normal network. The original network session will be disconnected.

  • Logging out:

    exit
  • Saving the settings: As far as we can see in the manual, system settings are permanent when entered. The good side of this is that you don't have to remember saving the settings to permanent memory.

Additional switch configurations

  • You must set the system hostname (choose a name XYZ):

    setSystemName "XYZ"

    as well as the default Node Name (SMA Description)

    setNodeDesc "XYZ"

    There is no documentation of the meaning of nor difference between the System Name and Node Description parameters.

    After a few minutes the Subnet Manager seems to recognize the new system name.

  • To display the system hostname and description:

    setSystemName
    showNodeDesc
  • Change the logged in user's password:

    passwd
  • Set the conventional CLI Backspace and Delete keys by swapping the factory default:

    swapBsDel
  • To change the CLI prompt:

    prompt
    
    OPTIONS
       <prompt> :new prompt.
    
    NOTES
       The prompt may not exceed 11 characters and is not saved across reboots.
       If the prompt contains a space, asterisk, comma, parenthesis or semicolon
       it must be enclosed with double quotes. For example: "*a prompt*". Also,
       if a prompt is not accepted, try to enclose it with double quotes.
       In order for some FastFabric Tools to function correctly, the prompt must
       end in "-> "(note trailing space).
  • Hardware check:

    hwCheck -verbose
  • Check and update the timezone and time:

    timeZoneConf 1  # Hours East/West of UTC
    timeDSTConf [<sw> <sd> <sm> <ew> <ed> <em>]  # For example: timeDSTConf 5 1 3 5 1 10
    time
    time -T <hhmmss>[<mmddyyyy>]

Subnet Manager

  • To start a Subnet Manager (fabric size up to only 100 nodes supported, 200 nodes hard limit):

    smControl start

SNMP configuration

By default the managed switches do not respond to SNMP requests, one has to enable this explicitly in the CLI interface:

-> uiconfig   # Lists current settings
-> uiconfig -snmp 1 -snmpv1 1 -snmpv2 1

To set the system name, contact and location information:

-> snmpSystem edit -n opa-c3 -c support@fysik.dtu.dk -l "DK;Lyngby;Niflheim;B309;059;rack 22;38"

This information may also be entered on the switch web interface under the System tab, but this currently has a bug causing the settings to be ignored.

From a management host one may then send inquiries by SNMP:

snmpwalk -Os -c public -v 2c <IP-address> system

SNMP monitoring of OPA switches

Managed OPA switches have a limited support for monitoring by SNMP (as of firmware release 10.8, end of 2018). Unmanaged switches do not have TCP/IP and SNMP capabilities.

In the switch CLI you must first enable SNMP read-only access as shown above.

Install the snmpwalk and unzip commands:

yum install net-snmp-utils unzip

Documentation of the Net-SNMP tools include:

  • Using_and_loading_MIBS tutorial.

  • The default MIBs directory search paths are displayed by:

    net-snmp-config --default-mibdirs
  • See man snmp.conf regarding configuration of applications built using the Net-SNMP libraries.

  • See man snmpcmd about arguments to the snmpwalk command.

  • See man net-snmp-config about installed Net-SNMP libraries and binaries.

Omni-Path MIB files

Download the MIB files from OmniPath_software page, for example:

IntelOPA-STL1_MIBs-10.8.0.0.186.zip

The following steps makes it possible to use Intel's MIB files for defining SNMP OIDs:

  1. Create a directory for the MIB files and unpack the .mi2 files:

    mkdir -p $HOME/.snmp/mibs
    unzip -d $HOME/.snmp/mibs <download-location>/IntelOPA-STL1_MIBs-10.8.0.0.186.zip
  2. Create a file $HOME/.snmp/snmp.conf that includes entries corresponding to the "DEFINITIONS ::= BEGIN" lines in all the .mi2 files:

    mibs +ICS-CHASSIS-MIB
    mibs +ICS-CHASSIS-NOTIFICATION-MIB
    mibs +ICS-MASTER-MIB
    mibs +OPA-STAT-MIB

    The following one-liner will print the lines needed for $HOME/.snmp/snmp.conf:

    grep DEFINITIONS $HOME/.snmp/mibs/*.mi2 | awk -F: '{print $2}' | awk '{printf("mibs +%s\n", $1)}'
  3. Run the snmpwalk command for the given managed switch name (or IP address) specifying ics as the OID value:

    snmpwalk -Oa -v2c -cpublic <switchname> ics

Serial number of managed switches

It is possible to read the switch serial number using the above SNMP setup with this command:

$ snmpwalk -Oa -v2c -cpublic <IP-address> icsChassisSystemUnitFruSerialNumber.1
ICS-CHASSIS-MIB::icsChassisSystemUnitFruSerialNumber.1 = STRING: USFUxxxx

Unmanaged switch configurations

Only switches with the Management Module installed can be accessed via the RJ45 network port. Unmanaged switches have a deactivated RJ45 port.

See the document Intel® Omni-Path Fabric Suite FastFabric User Guide.

To list all switches in the OPA fabric:

opasaquery -t sw

To configure the switch names, see section 3.5 FastFabric OPA Switch Setup/Admin Menu. Run the command:

opafastfabric
 2) Externally Managed Switch Setup/Admin
   2) Test for Switch Presence
   0) Edit Config and Select/Edit Switch File

A better way is to use the OPA Fabric Manager GUI, see https://wiki.fysik.dtu.dk/niflheim/OmniPath#installation-of-fabric-manager-gui. The fmgui tool allows you discover the network topology and see which nodes are connected to the unmanaged switches. Then you can assign switch names and edit them in the file /etc/opa/switches.

Notice: The switch names only take effect when the switch is rebooted!

To analyze the present fabric and produce a list of Externally Managed switches in the required format for the /etc/opa/switches file:

opagenswitches

Insert this into the file /etc/opa/switches and review the 2nd field (hostname). The 3rd field is the link hop count, which needs to be correct when you do firmware updates, in order to avoid disrupted firmware upgrades.

Our current switches file contains:

0x0011750102754f00,opa-L2,3
0x001175010275c4ed,opa-L1,3
0x001175010275c70f,opa-L5,3
0x00117501027788dc,opa-L3,3
0x00117501027795db,opa-L6,3
0x0011750102779b92,opa-L4,3
0x001175010277a5fb,opa-L7,1
0x0011750102702b27,opa-L8,3

Serial number of unmanaged switches

It is possible to read the switch serial number using the opaswitchadmin command with the option hwvpd - complete hardware VPD report:

$ opaswitchadmin hwvpd | grep serial

Firmware updates

Firmware update on managed switch

Firmware update requires an FTP server. Assume that the firmware files are stored in the directory pub/OmniPath.

On a Managed switch configure download from the FTP server's IP-address:

fwversion   # Display fw versions
fwSetUpdateParams -c management -h <IP-address> -d pub/OmniPath -f STL1.q7.10.8.0.0.186.spkg
fwShowUpdateParams
fwUpdate

Useful commands:

chassisQuery   # Check presence of module
bootQuery module -all
bootSelect module   # choose image to boot from

When everything looks fine, reboot the switch:

reboot

Updating the Omni-Path Q7 BIOS and Board Controller Firmware

Intel has Omni-Path Q7 BIOS and Board Controller Firmware updates. Read the Switch_documentation Intel® Omni-Path Fabric Switches Hardware Installation Guide in Appendix A Q7 BIOS Update for upgrade instructions.

If updates are required, this requires a host with an SFTP client (part of the openssh-clients RPM package). The procedure is documented in the Appendix A:

Make sure that the subnet manager (SM) is not running on the switch:

Edge-> smControl
Subnet manager is not started.

Display versions by:

bcfwversion
biosversion

You must read the detailed instructions in Appendix A at this time before performing the upgrade.

Use SFTP on a remote host to upload firmware files to the OPA switch:

$ sftp admin@opa-switch
...
Connected to opa-switch.
sftp>

Upload the BIOS file from the current directory:

$ sftp> put QA3ER962.q7.spkg /firmware/bios.pkg update

On the OPA switch CLI monitor the update progress until completed (this takes several minutes):

Edge-> showLastBiosRetCode -all
Module System BIOS Last Exit Code: 0: Success

Upload the board controller firmware (BCFW) file from the current directory:

sftp> put CGBCP429.q7.spkg /firmware/bcfw.pkg update

On the OPA switch CLI monitor the update progress until completed:

Edge-> showLastBcFwRetCode -all
Module Board Controller Firmware Last Exit Code: 0: Success

If everything is OK, reboot the switch via CLI:

Edge-> reboot

After rebooting verify versions by:

bcfwversion
biosversion

Firmware update on unmanaged switch

The CLI tool opaswitchadmin is used for unmanaged switch operations, see man opaswitchadmin. This command is part of the RPM package:

rpm -q opa-fastfabric

To select individual or subsets of switches, see the Intel® Omni-Path Fabric Suite FastFabric User Guide section 3.3.2.3 Selection of Switches:

To perform operations against a set of externally-managed switches, you can specify the switch on which to operate using one of the following methods:

  • On the command line, using the -N option.
  • Using the environment variable SWITCHES to specify a space-separated list of switches. Useful when multiple commands are performed against the same small set of switches.
  • Using the -L option or the SWITCHES_FILE environment variable to specify a file containing the set of switches. Useful for groups of switches that are used often. The file is located here: /etc/opa/switches by default. The file must list all switches in the cluster.

Within the tools, the options are considered in the following order:

  1. -N option
  2. SWITCHES environment variable
  3. -L option
  4. SWITCHES_FILE environment variable
  5. /etc/opa/switches file

On the OPA management host copy the firmware file like:

Intel_PRREdge_V1_firmware.10.8.0.0.186.emfw

To identify leaf/edge switch GUIDs, pull out the tab at the back of the switch and read the GUID.

To list all unmanaged switches in the fabric:

opagenswitches

The output will be copied to /etc/opa/switches by opaswitchadmin.

To ping the unmanaged switches:

/usr/sbin/opaswitchadmin ping

Report switch firmware and hardware Info:

/usr/sbin/opaswitchadmin info

Upgrade switch firmware:

/usr/sbin/opaswitchadmin -a run -P Intel_PRREdge_V1_firmware.10.8.0.0.186.emfw upgrade

Reboot switches:

/usr/sbin/opaswitchadmin reboot

WARNING: The Reboot Switch action will reboot all switches. To select a subset of switches, use the -N flag or other options shown above.

To operate on subsets of switches that are defined in the file specified by -L add flags like:

-L /etc/opa/switches
-L /etc/opa/switch-subset

To install a new firmware on unmanaged switches using the TUI tool:

# opafastfabric
  2) Externally Managed Switch Setup/Admin
  4) Update Switch Firmware
  Enter Files/Directories to use (or none): Intel_PRREdge_V1_firmware.10.8.0.0.186.emfw
  6) Reboot Switch  # When all is OK
  7) Report Switch Firmware & Hardware Info  # Verification

WARNING: The Reboot Switch action will reboot all switches.

Niflheim: OmniPath_switches (last edited 2019-09-02 11:43:54 by OleHolmNielsen)