User Tools

Site Tools


Sidebar

General Lab Info (Mainly for Devs)

Hardware

Lab Infrastructure Services

Misc Admin Tasks
These are infrequently completed tasks that don't fit under any specific service

Production Services

OVH = OVH
RHEV = Sepia RHE instance
Baremetal = Host in Sepia lab

The Attic/Legacy Info

hardware:smithi

smithi{001..205}

Summary

We have 205 smithi systems used solely as testnodes except a few used as static Jenkins slaves.

Purchasing details

smithi{001..128} were purchased and owned by Red Hat. ​(WARRANTY EXPIRED)
smithi{129..205} were purchased by Sage and “loaned” to Red Hat. (WARRANTY VALID THRU 10/20/2019)

Racking Tickets
smithi{001..060} RT 374621
smithi{061..068} RT 390716
smithi{069..128} RT 401820
smithi{129..205} RT 427903

Hardware Specs

Count Manufacturer Model Capacity Notes
Chassis 1U Supermicro SYS-5018R-WR N/A
Mainboard N/A Supermicro X10SRW-F N/A
CPU 1 Intel Xeon(R) CPU E5-1620 v3 @ 3.50GHz N/A ARK
RAM 2 DIMMs Hynix Semiconductor HMA42GR7MFR4N-TF 16GB 32GB Total
HDD 1 Seagate ST1000NM0033 1TB
NVMe 1 Intel P3700 400GB
NIC 2 ports Intel I350 1Gb Not used
NIC 2 ports Intel 82599ES 10Gb 1 port cabled per system on front VLAN
BMC 1 Supermicro N/A N/A Reachable at $host.ipmi.sepia.ceph.com

NVMe card notes

nvme-cli

This is untested but claims to have packages available in Fedora 23+ and Ubuntu 16.04 and up.

See https://github.com/linux-nvme/nvme-cli

Checking NVMe Card SMART Data

cd /tmp
git clone https://github.com/linux-nvme/nvme-cli.git
cd nvme-cli/
make
sudo ./nvme smart-log-add /dev/nvme0

Flashing Firmware

Intel provides an RPM (Intel Datacenter Tool) to configure the NVMe cards, update firmware, etc. The firmwares are baked into the RPM so it's important to download the latest zip from Intel whenever possible.

Download the zip file, copy to testnode, unzip and yum localinstall the applicable isdct RPM.

Show NVMe cards

isdct show -intelssd

Update Firmware Example

[root@smithi001 ~]# isdct show -intelssd

- Intel SSD DC P3700 Series CVFT533000EN400BGN -

Bootloader : 8B1B012E
DevicePath : /dev/nvme0n1
DeviceStatus : Healthy
Firmware : 8DV10131
FirmwareUpdateAvailable : Firmware=8DV10171 Bootloader=8B1B0131
Index : 0
ModelNumber : INTEL SSDPEDMD400G4
ProductFamily : Intel SSD DC P3700 Series
SerialNumber : CVFT533000EN400BGN

## NOTE: Use the Index number from above to specify the drive you want to update

[root@smithi001 ~]# isdct load -intelssd 0
WARNING! You have selected to update the drives firmware! 
Proceed with the update? (Y|N): Y
Updating firmware...

- Intel SSD DC P3700 Series CVFT533000EN400BGN -

Status : Firmware Updated Successfully. Please reboot the system.

NVMe Failure Tracking

The NVMe cards have started failing at a faster rate. I'm keeping track of when and how often to see if there's a pattern we can interrupt.

System Date Failed Ticket Notes
smithi043 1/3/2017 RT 433092
smithi048 4/19/2017 RT 444421
smithi050 4/19/2017 RT 444421
smithi038 11/29/2017 PNT0111325
smithi025 12/11/2017 PNT0120775
smithi039 12/11/2017 PNT0120775
smithi055 12/8/2017 PNT0120775
smithi057 1/12/2018 PNT0138316
smithi054 1/16/2018 PNT0141432 Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave.
smithi021 1/19/2018 PNT0143431 Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave.
smithi180 1/22/2018 PNT0160022 Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave.
smithi147 2/20/2018 PNT0214971 Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave.
smithi034 3/11/2018 PNT0214971 Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave.
smithi020 4/9/2018 PNT0214971 Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave.
smithi108 4/10/2018 PNT0214971 Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave.
smithi058 4/17/2018 PNT0288764
smithi030 4/26/2018 PNT0288764
smithi043 5/18/2018 PNT0288764
smithi113 5/20/2018 PNT0288764
smithi028 6/3/2018 PNT0288764
smithi002 6/3/2018 PNT0288764
smithi136 6/12/2018 PNT0288764
smithi011 8/26/2018 PNT0416691 RMA denied due to expired warranty.
smithi048 9/2/2018 PNT0416691 RMA denied due to expired warranty.
smithi094 9/4/2018 PNT0416691 RMA denied due to expired warranty. INSTALL Optane 900P
smithi004 9/24/2018 PNT0416691 RMA denied due to expired warranty.
smithi056 10/7/2018 PNT0416691 RMA denied due to expired warranty.
smithi010 10/11/2018 PNT0416691 RMA denied due to expired warranty.
smithi015 11/11/2018 PNT0416691 RMA denied due to expired warranty.
smithi195 12/13/2018 PNT0514346 Requested Labs team file RMA
smithi045 12/14/2018 PNT0512415 INSTALL Optane 900P
smithi016 1/1/2019 PNT0512415 INSTALL Optane 900P
smithi047 2/4/2019 PNT0512415 INSTALL Optane 900P
smithi050 2/28/2019 PNT0512415 INSTALL Optane 900P
smithi029 3/4/2019 PNT0512415 INSTALL Optane 900P
smithi007 3/8/2019 PNT0512415 INSTALL Optane 900P
smithi043 3/12/2019 PNT0512415 INSTALL Optane 900P
smithi091 6/18/2020 RITM0775392 INSTALL Optane 900P
smithi014 8/8/2020 RITM0775392 INSTALL Optane 900P
smithi163 9/1/2020 RITM0775392 INSTALL Optane 900P
smithi118 9/25/2020 RITM0775392 INSTALL Optane 900P

Reimage Failure Tracking

Teuthology will automatically mark a smithi down if it fails to reimage 10 times in a row. This table is used to track if particular machines keep hitting this problem even after BIOS/firmware updates and power resets.

System Date Failed Notes
smithi180 4/1/2022 Power cycled PDU port, Updated BMC firmware and BIOS
smithi168 4/1/2022
smithi199 4/1/2022

Donated hardware notes

Some partners have donated hardware for us to test. This table shows what and where ended up in smithis.

Node Hardware
smithi009 MACH.2 SAS drives donated by Seagate
smithi091 MACH.2 SAS drives donated by Seagate
smithi205 Seagate drives. Still under NDA. Contact sjust.

Other notes

Mounting remote ISOs

I (dgalloway) copy ISOs to cobbler.front.sepia.ceph.com:/samba/anonymous.

  1. Disable the firewall on cobbler.front.sepia.ceph.com
    1. service iptables stop
  2. In the smithi IPMI web UI, under Virtual Media → CD-ROM Image, set the following parameters:
    1. Share Host: 172.21.0.11
    2. Path: \Anonymous\foo.iso
  3. Click Save
  4. Click Mount
  5. Reboot and enter the BIOS, when prompted, by hitting <DEL>
  6. In the last BIOS tab, “Save & Exit”, select ATEN Virtual CDROM under Boot Override

Updating BIOS

As of this writing, the latest BIOS version is X10SRW7.410 (4/10/2017)

  1. Mount smithibiosx10srw7410.iso using the instructions above
  2. Boot to the Virtual CD
  3. Once you get a DOS prompt, run flash.bat X10SRW7.410
  4. The first time, it'll put the system in “Flashing Mode” and you'll have to reboot and run this again for the actual BIOS update
  5. Shut the system OFF then back on for the BIOS update to complete
  6. Supermicro suggests restoring BIOS to default settings then setting back up although I've found some settings get reset automatically anyway (like boot order)
hardware/smithi.txt · Last modified: 2022/04/01 18:51 by djgalloway