====== smithi{001..205} ====== ===== Summary ===== We have 205 smithi systems used solely as testnodes except a few used as static Jenkins slaves. ===== Purchasing details ===== smithi{001..128} were purchased and owned by Red Hat. ​(WARRANTY EXPIRED)\\ smithi{129..205} were purchased by Sage and "loaned" to Red Hat. (WARRANTY VALID THRU 10/20/2019) **Racking Tickets**\\ smithi{001..060} RT 374621\\ smithi{061..068} RT 390716\\ smithi{069..128} RT 401820\\ smithi{129..205} RT 427903\\ ===== Hardware Specs ===== | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ ^ Chassis | 1U | Supermicro | SYS-5018R-WR | N/A | | ^ Mainboard | N/A | Supermicro | X10SRW-F | N/A | | ^ CPU | 1 | Intel | Xeon(R) CPU E5-1620 v3 @ 3.50GHz | N/A | [[http://ark.intel.com/products/82763/Intel-Xeon-Processor-E5-1620-v3-10M-Cache-3_50-GHz|ARK]] | ^ RAM | 2 DIMMs | Hynix Semiconductor | HMA42GR7MFR4N-TF | 16GB | 32GB Total | ^ HDD | 1 | Seagate | ST1000NM0033 | 1TB | | ^ NVMe | 1 | Intel | P3700 | 400GB | | ^ NIC | 2 ports | Intel | I350 | 1Gb | Not used | ^ NIC | 2 ports | Intel | 82599ES | 10Gb | 1 port cabled per system on front VLAN | ^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | ===== NVMe card notes ===== ==== nvme-cli ==== This is untested but claims to have packages available in Fedora 23+ and Ubuntu 16.04 and up. See https://github.com/linux-nvme/nvme-cli === Checking NVMe Card SMART Data === cd /tmp git clone https://github.com/linux-nvme/nvme-cli.git cd nvme-cli/ make sudo ./nvme smart-log-add /dev/nvme0 ==== Flashing Firmware ==== Intel provides an RPM ([[https://downloadcenter.intel.com/download/23931/Intel-SSD-Data-Center-Tool|Intel Datacenter Tool]]) to configure the NVMe cards, update firmware, etc. The firmwares are baked into the RPM so it's important to download the latest zip from Intel whenever possible. Download the zip file, copy to testnode, unzip and ''yum localinstall'' the applicable isdct RPM. **Show NVMe cards** isdct show -intelssd **Update Firmware Example** [root@smithi001 ~]# isdct show -intelssd - Intel SSD DC P3700 Series CVFT533000EN400BGN - Bootloader : 8B1B012E DevicePath : /dev/nvme0n1 DeviceStatus : Healthy Firmware : 8DV10131 FirmwareUpdateAvailable : Firmware=8DV10171 Bootloader=8B1B0131 Index : 0 ModelNumber : INTEL SSDPEDMD400G4 ProductFamily : Intel SSD DC P3700 Series SerialNumber : CVFT533000EN400BGN ## NOTE: Use the Index number from above to specify the drive you want to update [root@smithi001 ~]# isdct load -intelssd 0 WARNING! You have selected to update the drives firmware! Proceed with the update? (Y|N): Y Updating firmware... - Intel SSD DC P3700 Series CVFT533000EN400BGN - Status : Firmware Updated Successfully. Please reboot the system. ==== NVMe Failure Tracking ==== The NVMe cards have started failing at a faster rate. I'm keeping track of when and how often to see if there's a pattern we can interrupt. ^ System ^ Date Failed ^ Ticket ^ Notes ^ | smithi043 | 1/3/2017 | RT 433092 | | | smithi048 | 4/19/2017 | RT 444421 | | | smithi050 | 4/19/2017 | RT 444421 | | | smithi038 | 11/29/2017 | PNT0111325 | | | smithi025 | 12/11/2017 | PNT0120775 | | | smithi039 | 12/11/2017 | PNT0120775 | | | smithi055 | 12/8/2017 | PNT0120775 | | | smithi057 | 1/12/2018 | PNT0138316 | | | smithi054 | 1/16/2018 | PNT0141432 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | smithi021 | 1/19/2018 | PNT0143431 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | smithi180 | 1/22/2018 | PNT0160022 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | smithi147 | 2/20/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | smithi034 | 3/11/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | smithi020 | 4/9/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | smithi108 | 4/10/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | smithi058 | 4/17/2018 | PNT0288764 | | | smithi030 | 4/26/2018 | PNT0288764 | | | smithi043 | 5/18/2018 | PNT0288764 | | | smithi113 | 5/20/2018 | PNT0288764 | | | smithi028 | 6/3/2018 | PNT0288764 | | | smithi002 | 6/3/2018 | PNT0288764 | | | smithi136 | 6/12/2018 | PNT0288764 | | | smithi011 | 8/26/2018 | PNT0416691 | RMA denied due to expired warranty. | | smithi048 | 9/2/2018 | PNT0416691 | RMA denied due to expired warranty. | | smithi094 | 9/4/2018 | PNT0416691 | RMA denied due to expired warranty. INSTALL Optane 900P | | smithi004 | 9/24/2018 | PNT0416691 | RMA denied due to expired warranty. | | smithi056 | 10/7/2018 | PNT0416691 | RMA denied due to expired warranty. | | smithi010 | 10/11/2018 | PNT0416691 | RMA denied due to expired warranty. | | smithi015 | 11/11/2018 | PNT0416691 | RMA denied due to expired warranty. | | smithi195 | 12/13/2018 | PNT0514346 | Requested Labs team file RMA | | smithi045 | 12/14/2018 | PNT0512415 | INSTALL Optane 900P | | smithi016 | 1/1/2019 | PNT0512415 | INSTALL Optane 900P | | smithi047 | 2/4/2019 | PNT0512415 | INSTALL Optane 900P | | smithi050 | 2/28/2019 | PNT0512415 | INSTALL Optane 900P | | smithi029 | 3/4/2019 | PNT0512415 | INSTALL Optane 900P | | smithi007 | 3/8/2019 | PNT0512415 | INSTALL Optane 900P | | smithi043 | 3/12/2019 | PNT0512415 | INSTALL Optane 900P | | smithi091 | 6/18/2020 | RITM0775392 | INSTALL Optane 900P | | smithi014 | 8/8/2020 | RITM0775392 | INSTALL Optane 900P | | smithi163 | 9/1/2020 | RITM0775392 | INSTALL Optane 900P | | smithi118 | 9/25/2020 | RITM0775392 | INSTALL Optane 900P | ==== Reimage Failure Tracking ==== Teuthology will automatically mark a smithi down if it fails to reimage 10 times in a row. This table is used to track if particular machines keep hitting this problem even after BIOS/firmware updates and power resets. ^ System ^ Date Failed ^ Notes ^ | smithi180 | 4/1/2022 | Power cycled PDU port, Updated BMC firmware and BIOS | | smithi168 | 4/1/2022 | | | smithi199 | 4/1/2022 | | ==== Donated hardware notes ==== Some partners have donated hardware for us to test. This table shows what and where ended up in smithis. ^ Node ^ Hardware ^ | smithi009 | MACH.2 SAS drives donated by Seagate | | smithi091 | MACH.2 SAS drives donated by Seagate | | smithi205 | Seagate drives. Still under NDA. Contact sjust. | ===== Other notes ===== ==== Mounting remote ISOs ==== I (dgalloway) copy ISOs to ''cobbler.front.sepia.ceph.com:/samba/anonymous''. - Disable the firewall on cobbler.front.sepia.ceph.com - ''service iptables stop'' - In the smithi IPMI web UI, under Virtual Media -> CD-ROM Image, set the following parameters: - **Share Host**: 172.21.0.11 - **Path**: ''\Anonymous\foo.iso'' - Click **Save** - Click **Mount** - Reboot and enter the BIOS, when prompted, by hitting **** - In the last BIOS tab, "Save & Exit", select **ATEN Virtual CDROM** under **Boot Override** ===== Updating BIOS ===== As of this writing, the latest BIOS version is X10SRW7.410 (4/10/2017) - Mount ''smithibiosx10srw7410.iso'' using the instructions above - Boot to the Virtual CD - Once you get a DOS prompt, run ''flash.bat X10SRW7.410'' - The first time, it'll put the system in "Flashing Mode" and you'll have to reboot and run this again for the actual BIOS update - Shut the system **OFF** then back on for the BIOS update to complete - Supermicro suggests restoring BIOS to default settings then setting back up although I've found some settings get reset automatically anyway (like boot order)