Differences between revisions 40 and 41
Revision 40 as of 2020-06-02 09:22:27
Size: 14134
Comment:
Revision 41 as of 2020-06-02 09:35:04
Size: 14134
Comment: CM V2.40
Deletions are marked like this. Additions are marked like this.
Line 233: Line 233:
  ipmitool raw 0x30 0x12
   01 cc 1b 01 43 01 00 00 00 04 02 00 01 2d 37 ff
  ff 08 c2 00 00 00 08 01 08 10 64 23 fa 01
  $ ipmitool raw 0x30 0x12
  01 e9 1b 02 28 01 00 00 00 01 02 00 01 2d 37 ff
  ff 08 c2 00 00 00 08 01 08 10 64 23 fa 01
Line 239: Line 239:
* Byte 4 - Major version (01 hex) - 01
* Byte 5 - Minor version (66 hex) - 43
* Byte 4 - Major version (02 hex) - 02
* Byte 5 - Minor version (28 hex) - 40

Dell C6420 server

This page contains information about Dell PowerEdge C6420 servers deployed in our Niflheim cluster.

Dell OpenManage

Download the OpenManage software ISO image from the C6420_downloads page in the Systems Management download category.

Booting and BIOS configuration

Press F2 during start-up to enter the BIOS and firmware setup menus.

See How to inspect and change BIOS configurations on a PowerEdge C-Series server with SetupBIOS tool.

Minimal configuration of a new server or motherboard

At our site the following minimal settings are required for a new server or a new motherboard. Remaining settings will be configured by SYSCFG (see Dell_R640 page).

The Dell iDRAC9 (BMC) setup is accessed via the System Setup menu item iDRAC Settings:

  • In the Network page set:
    • NIC Selection: Dedicated
    • Enable IPMI over LAN to Enabled.
  • In the System Summary page read the NIC iDRAC MAC Address from this page for configuring the DHCP server.

Go to the System Setup menu item Device Settings and select the Integrated NIC items:

  • In the NIC Main Configuration Page select NIC Configuration. We use NIC port 3 (1 Gbit) as the system's NIC.
  • Read the NIC Ethernet MAC Address from this page for configuring the DHCP server.
  • Select the Legacy Boot Protocol item PXE.

Boot Sequence menu:

  • Click the Boot Sequence item to move PXE boot up above the hard disk boot.

Reconfiguring the network in BIOS

The C6420 factory default network settings has the iDRAC9 port shared with a 1 Gbit LOM Ethernet port.

Our network configuration is designed to be:

  1. A dedicated iDRAC9 port.
  2. Server Ethernet port in port 2 of 10 Gbit mezzanine card.
  3. The server Ethernet port must do PXE boot before hard disk boot.
  4. BIOS boot (not UEFI boot).

The following procedure has been tested with BIOS 1.6.11 and iDRAC9 firmware 3.21:

  1. During system boot press F2 to go to System Setup.
  2. Enter iDRAC Settings:

    • Network:
      • NIC Selection: Dedicated
    • Optional: Thermal:
      • Thermal Profile: Maximum Performance
  3. Go back and enter Device Settings:

    • Read the Ethernet MAC address of the desired port.
    • Go to NIC in Mezzanine 3 Port 2:
      • NIC Configuration:
        • Legacy Boot Protocol: PXE
        • Wake on LAN: Enabled
  4. Go back to Finish:

    • Reboot the system (mandatory).
  5. During system boot press F2 to go to System Setup.
  6. Enter System Setup:

    • Boot Settings:
      • BIOS Boot Settings:
        • Change Boot Sequence: NIC, then Hard drive C:
  7. Go back to Finish:

    • Reboot the system (mandatory).

C6400 Chassis System Management

Front control panel LED colors

The C6400 chassis front control panel LED colors are undocumented. The page Interpreting LED Colors and Blinking Patterns for the PowerEdge_VRTX product may perhaps be useful, see the Server table:

Green, glowing steadily       Turned on
Green, blinking       Firmware is being uploaded
Green, dark   Turned off
Blue, glowing steadily        Normal
Blue, blinking        User-enabled module identifier
Amber, glowing steadily       Not used
Amber, blinking       Fault
Blue, dark    No fault

DellEMC System Update

The C6400 chassis can be managed using the Dell EMC iDRAC_Tools for Linux.

DellEMC System Update (DSU) is a script optimized update deployment tool for applying Dell Update Packages (DUP) to Dell EMC PowerEdge servers. See the DSU manuals.

The DSU may also be configured as a Yum repository, see the DSU page. The commands are:

curl -O https://linux.dell.com/repo/hardware/dsu/bootstrap.cgi
bash bootstrap.cgi

Alternatively, download the Systems-Management_Application_* file and execute it.

This will create the Yum repository file:

/etc/yum.repos.d/dell-system-update.repo

Install RPM packages including iDRAC tools:

yum install dell-system-update syscfg srvadmin-idracadm7

racadm command

Make a soft link for the racadm command:

ln -s /opt/dell/srvadmin/bin/idracadm7 /usr/local/bin/racadm

Chassis information

Get the chassis information:

racadm get System.ChassisInfo

Get the chassis power information:

racadm get System.SC-BMC

Inquire the system power:

racadm get System.Power
racadm get System.ServerPwr

Inquire thermal and fan settings:

racadm get System.ThermalSettings

Inquire the Thermal Profile:

racadm get System.ThermalSettings.ThermalProfile

Values for the ThermalProfile parameter are:

0- Default Thermal Profile Settings; 1- Maximum Performance; 2- Minimum Power; 3- Sound Cap; Default - 0

Use the set command to change the values.

C6400 firmware upgrade

In the C6400_drivers page find Chassis System Management firmware. Read the firmware Release Notes file.

Get the chassis information (including firmware):

racadm get System.ChassisInfo

You may also verify the CM chassis firmware version by:

$ ipmitool raw 0x30 0x12
01 e9 1b 02 28 01 00 00 00 01 02 00 01 2d 37 ff
ff 08 c2 00 00 00 08 01 08 10 64 23 fa 01

Interpretation:

  • Byte 4 - Major version (02 hex) - 02
  • Byte 5 - Minor version (28 hex) - 40

Install firmware updates by:

racadm update -f cm.sc

Note: It may take 5-10 minutes after upgrading before the new firmware is active on the chassis.

It is useful to loop over every 4th node using clush. For example, a list of every 4th c[001-192] node name between 1 and 192 may be generated by seq:

seq -f 'c%03g' -s, 1 4 192

This can be used with clush as in this example:

clush -bw `seq -f 'c%03g' -s, 1 4 192` 'racadm get System.ChassisInfo | grep FirmwareVersion'

C6400 Chassis Power Supplies

The C6400 chassis contains two power supplies of either 1600, 2000 or 2400 W. The C6400 factory default PSU configuration is:

Dual, HotPlug Redundant Power Supply

The PSU handle LED codes are:

  • Off: Input fail / System off
  • Solid Green: Input OK / System on
  • Blinking Amber: PSU failsafe failure
  • Blinking Green: Firmware updating
  • Blinking Amber: Firmware update failed
  • Blinking Green and Off: PSU mismatch

Internal Dell information

The 1600 W PSU is special since it can be configured in 1+1 (default) redundancy mode, as well as a non-redundant 2+0 mode.

To set non-redundant 2+0 mode:

ipmitool -I wmi 0x30 0xC7 0x30 0x2 0x0

Set back to 1+1 redundant mode:

ipmitool -I wmi 0x30 0xC7 0x30 0x1 0x1

The C6400 Chassis Manager must be reset to activate a new mode by:

ipmitool -I wmi 0x6 0x34 0x45 0x70 0x18 0xc8 0x20 0x0 0x2 0xd8

Disk firmware upgrade without RAID controller

Our C6420 servers have a SATA SSD disk connected to a simple S140 HBA controller. The Linux firmware update xxx.BIN file apparently can only update firmwares on disks connected to a PERC RAID controller.

The alternative is to upgrade the disk firmware through the iDRAC9 controller using the above racadm command together with the upgrade file in Windows 32-bit .EXE format.

For example:

racadm update -f /home/que/Dell/C6420/Serial-ATA_Firmware_8VGP8_WN32_DL63_A00.EXE --reboot

NOTE: This upgrade does not work in a non-interactive crontab job. You have to run the racadm command from an interactive shell, which also includes ssh and clush commands.

iDRAC Easy Restore

See the iDRAC9 User's Guide:

After you replace the motherboard on your server, Easy Restore allows you to automatically restore the following data:

  • System Service Tag
  • Asset Tag
  • Licenses data
  • UEFI Diagnostics application
  • System configuration settings—BIOS, iDRAC, and NIC

Easy Restore uses the Easy Restore flash memory to back up the data. When you replace the motherboard and power on the system, the BIOS queries the iDRAC and prompts you to restore the backed-up data. The first BIOS screen prompts you to restore the Service Tag, licenses, and UEFI diagnostic application. The second BIOS screen prompts you to restore system configuration settings. If you choose not to restore data on the first BIOS screen and if you do not set the Service Tag by another method, the first BIOS screen is displayed again. The second BIOS screen is displayed only once.

Memory DIMM errors

In the BMC and iDRAC event log there may be DIMM errors such as:

  • Correctable memory error rate exceeded for DIMM_XX
  • Warning - MEM0701- "Correctable memory error rate exceeded for DIMM_XX."
  • Critical - MEM0702 - "Correctable memory error rate exceeded for DIMM_XX."

NOTE: Do not swap DIMM modules as the first action!

According to the internal Dell KB QNA44643, with BIOS 2.1.x and newer there is a memory "self healing" operation performed during reboot, see the QNA44643 What is DDR4 Self-healing on Dell PowerEdge Servers with Intel Xeon Scalable Processors. See also the paper Memory Errors and Dell EMC PowerEdge YX4X Server Memory RAS Features.

Memory retraining enhancements

Memory retraining which happens during boot, optimizes the signal timing/margining for each DIMM/slot for best access. Timing characteristics of a DIMM may change for several different reasons:

  • Changes in Server memory configuration
  • BIOS changes
  • Different operating temperatures of the Server or DIMM
  • General age of the DIMM

Post Package Repair (PPR)

Post Package Repair (PPR) - The second "self-healing' memory enhancement, results in repairing a failing memory location on a DIMM by disabling the location/address at the hardware layer enabling a spare memory row to be used instead. The exact number of spare memory rows available depends on the DRAM device, and DIMM size.

Note: Message ID MEM8000 (Correctable memory error logging disabled for a memory device at location DIMM_XX) by itself, will not result in a PPR being scheduled for the next reboot. This message ID is expected to result in a PPR being scheduled for the next reboot starting with the BIOS version AFTER 2.3.10 (February 2020).

After the reboot, verify that the PPR operation was successfully performed. An example of a successful PPR operation will be similar to:

  • Message ID MEM9060 - "The Post-Package Repair operation is successfully completed on the Dual In-line memory Module (DIMM) device that was failing earlier."

A DIMM replacement for these correctable memory errors is not necessary unless the PPR operation failed after the reboot. An example of a failing PPR message is:

  • Critical - Message ID UEFI10278 - "Unable to complete the Post Package Repair (PPR) operation because of an issue in the DIMM memory slot X."

IT-wiki: Dell_C6420 (last edited 2020-06-02 09:35:04 by OleHolmNielsen)