This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
hardware:trial [2026/01/20 21:51] djgalloway [Table] |
hardware:trial [2026/02/12 19:17] (current) djgalloway [Updating BIOS] |
||
|---|---|---|---|
| Line 19: | Line 19: | ||
| ^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | | ^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | | ||
| + | ===== Updating BIOS ===== | ||
| - | ===== NVMe card notes ===== | + | Recently done from ''soko03:/home/dgalloway'' |
| - | ==== nvme-cli ==== | + | |
| - | This is untested but claims to have packages available in Fedora 23+ and Ubuntu 16.04 and up. | + | |
| - | See https://github.com/linux-nvme/nvme-cli | ||
| - | |||
| - | === Checking NVMe Card SMART Data === | ||
| <code> | <code> | ||
| - | cd /tmp | + | ./saa -l bmcs.txt -u inktank -p xxx -c UpdateBios --preserve_setting --file ~/BIOS_H13SRD-1C99_20251223_1.8_STDsp.bin --batch_count 10 |
| - | git clone https://github.com/linux-nvme/nvme-cli.git | + | |
| - | cd nvme-cli/ | + | |
| - | make | + | |
| - | sudo ./nvme smart-log-add /dev/nvme0 | + | |
| </code> | </code> | ||
| - | ==== Flashing Firmware ==== | + | ===== KVM access ===== |
| - | Intel provides an RPM ([[https://downloadcenter.intel.com/download/23931/Intel-SSD-Data-Center-Tool|Intel Datacenter Tool]]) to configure the NVMe cards, update firmware, etc. The firmwares are baked into the RPM so it's important to download the latest zip from Intel whenever possible. | + | |
| - | + | ||
| - | Download the zip file, copy to testnode, unzip and ''yum localinstall'' the applicable isdct RPM. | + | |
| - | + | ||
| - | **Show NVMe cards** | + | |
| - | <code>isdct show -intelssd</code> | + | |
| - | + | ||
| - | **Update Firmware Example** | + | |
| - | <code> | + | |
| - | [root@smithi001 ~]# isdct show -intelssd | + | |
| - | + | ||
| - | - Intel SSD DC P3700 Series CVFT533000EN400BGN - | + | |
| - | + | ||
| - | Bootloader : 8B1B012E | + | |
| - | DevicePath : /dev/nvme0n1 | + | |
| - | DeviceStatus : Healthy | + | |
| - | Firmware : 8DV10131 | + | |
| - | FirmwareUpdateAvailable : Firmware=8DV10171 Bootloader=8B1B0131 | + | |
| - | Index : 0 | + | |
| - | ModelNumber : INTEL SSDPEDMD400G4 | + | |
| - | ProductFamily : Intel SSD DC P3700 Series | + | |
| - | SerialNumber : CVFT533000EN400BGN | + | |
| - | + | ||
| - | ## NOTE: Use the Index number from above to specify the drive you want to update | + | |
| - | + | ||
| - | [root@smithi001 ~]# isdct load -intelssd 0 | + | |
| - | WARNING! You have selected to update the drives firmware! | + | |
| - | Proceed with the update? (Y|N): Y | + | |
| - | Updating firmware... | + | |
| - | + | ||
| - | - Intel SSD DC P3700 Series CVFT533000EN400BGN - | + | |
| - | + | ||
| - | Status : Firmware Updated Successfully. Please reboot the system. | + | |
| - | + | ||
| - | </code> | + | |
| - | + | ||
| - | ==== NVMe Failure Tracking ==== | + | |
| - | The NVMe cards have started failing at a faster rate. I'm keeping track of when and how often to see if there's a pattern we can interrupt. | + | |
| - | + | ||
| - | ^ System ^ Date Failed ^ Ticket ^ Notes ^ | + | |
| - | | smithi043 | 1/3/2017 | RT 433092 | | | + | |
| - | | smithi048 | 4/19/2017 | RT 444421 | | | + | |
| - | | smithi050 | 4/19/2017 | RT 444421 | | | + | |
| - | | smithi038 | 11/29/2017 | PNT0111325 | | | + | |
| - | | smithi025 | 12/11/2017 | PNT0120775 | | | + | |
| - | | smithi039 | 12/11/2017 | PNT0120775 | | | + | |
| - | | smithi055 | 12/8/2017 | PNT0120775 | | | + | |
| - | | smithi057 | 1/12/2018 | PNT0138316 | | | + | |
| - | | smithi054 | 1/16/2018 | PNT0141432 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | + | |
| - | | smithi021 | 1/19/2018 | PNT0143431 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | + | |
| - | | smithi180 | 1/22/2018 | PNT0160022 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | + | |
| - | | smithi147 | 2/20/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | + | |
| - | | smithi034 | 3/11/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | + | |
| - | | smithi020 | 4/9/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | + | |
| - | | smithi108 | 4/10/2018 | PNT0214971 | Card is EOL. Supermicro issued credit. Repurposed as Jenkins slave. | | + | |
| - | | smithi058 | 4/17/2018 | PNT0288764 | | | + | |
| - | | smithi030 | 4/26/2018 | PNT0288764 | | | + | |
| - | | smithi043 | 5/18/2018 | PNT0288764 | | | + | |
| - | | smithi113 | 5/20/2018 | PNT0288764 | | | + | |
| - | | smithi028 | 6/3/2018 | PNT0288764 | | | + | |
| - | | smithi002 | 6/3/2018 | PNT0288764 | | | + | |
| - | | smithi136 | 6/12/2018 | PNT0288764 | | | + | |
| - | | smithi011 | 8/26/2018 | PNT0416691 | RMA denied due to expired warranty. | | + | |
| - | | smithi048 | 9/2/2018 | PNT0416691 | RMA denied due to expired warranty. | | + | |
| - | | smithi094 | 9/4/2018 | PNT0416691 | RMA denied due to expired warranty. INSTALL Optane 900P | | + | |
| - | | smithi004 | 9/24/2018 | PNT0416691 | RMA denied due to expired warranty. | | + | |
| - | | smithi056 | 10/7/2018 | PNT0416691 | RMA denied due to expired warranty. | | + | |
| - | | smithi010 | 10/11/2018 | PNT0416691 | RMA denied due to expired warranty. | | + | |
| - | | smithi015 | 11/11/2018 | PNT0416691 | RMA denied due to expired warranty. | | + | |
| - | | smithi195 | 12/13/2018 | PNT0514346 | Requested Labs team file RMA | | + | |
| - | | smithi045 | 12/14/2018 | PNT0512415 | INSTALL Optane 900P | | + | |
| - | | smithi016 | 1/1/2019 | PNT0512415 | INSTALL Optane 900P | | + | |
| - | | smithi047 | 2/4/2019 | PNT0512415 | INSTALL Optane 900P | | + | |
| - | | smithi050 | 2/28/2019 | PNT0512415 | INSTALL Optane 900P | | + | |
| - | | smithi029 | 3/4/2019 | PNT0512415 | INSTALL Optane 900P | | + | |
| - | | smithi007 | 3/8/2019 | PNT0512415 | INSTALL Optane 900P | | + | |
| - | | smithi043 | 3/12/2019 | PNT0512415 | INSTALL Optane 900P | | + | |
| - | | smithi091 | 6/18/2020 | RITM0775392 | INSTALL Optane 900P | | + | |
| - | | smithi014 | 8/8/2020 | RITM0775392 | INSTALL Optane 900P | | + | |
| - | | smithi163 | 9/1/2020 | RITM0775392 | INSTALL Optane 900P | | + | |
| - | | smithi118 | 9/25/2020 | RITM0775392 | INSTALL Optane 900P | | + | |
| - | + | ||
| - | ==== Reimage Failure Tracking ==== | + | |
| - | Teuthology will automatically mark a smithi down if it fails to reimage 10 times in a row. This table is used to track if particular machines keep hitting this problem even after BIOS/firmware updates and power resets. | + | |
| - | + | ||
| - | ^ System ^ Date Failed ^ Notes ^ | + | |
| - | | smithi180 | 4/1/2022 | Power cycled PDU port, Updated BMC firmware and BIOS | | + | |
| - | | smithi168 | 4/1/2022 | | | + | |
| - | | smithi199 | 4/1/2022 | | | + | |
| - | + | ||
| - | ==== Donated hardware notes ==== | + | |
| - | Some partners have donated hardware for us to test. This table shows what and where ended up in smithis. | + | |
| - | + | ||
| - | ^ Node ^ Hardware ^ | + | |
| - | | smithi009 | MACH.2 SAS drives donated by Seagate | | + | |
| - | | smithi091 | MACH.2 SAS drives donated by Seagate | | + | |
| - | | smithi205 | Seagate drives. Still under NDA. Contact sjust. | | + | |
| - | + | ||
| - | ===== Other notes ===== | + | |
| - | ==== Mounting remote ISOs ==== | + | |
| - | I (dgalloway) copy ISOs to ''cobbler.front.sepia.ceph.com:/samba/anonymous''. | + | |
| - | - Disable the firewall on cobbler.front.sepia.ceph.com | + | |
| - | - ''service iptables stop'' | + | |
| - | - In the smithi IPMI web UI, under Virtual Media -> CD-ROM Image, set the following parameters: | + | |
| - | - **Share Host**: 172.21.0.11 | + | |
| - | - **Path**: ''\Anonymous\foo.iso'' | + | |
| - | - Click **Save** | + | |
| - | - Click **Mount** | + | |
| - | - Reboot and enter the BIOS, when prompted, by hitting **<DEL>** | + | |
| - | - In the last BIOS tab, "Save & Exit", select **ATEN Virtual CDROM** under **Boot Override** | + | |
| - | + | ||
| - | ===== Updating BIOS ===== | + | |
| - | As of this writing, the latest BIOS version is X10SRW7.410 (4/10/2017) | + | |
| - | - Mount ''smithibiosx10srw7410.iso'' using the instructions above | + | These systems are fairly-dense blades, so the back panel does not have dedicated VGA/HDMI/USB connectors. There is a custom wide connector for a 'dongle' breakout cable that was supplied with the chassis that connects to that wide connector on the blade and breaks it out to VGA, serial, and two USB 2.0 ports. There is a set of those in the Poughkeepsie lab; the lab staff should be able to find them. As of Feb2026 they were "In the crib in Rolling Bin 3 Shelf 3A3. Theres a white paper taped to the shelf that says CEPH 3A4-3A1" |
| - | - Boot to the Virtual CD | + | |
| - | - Once you get a DOS prompt, run ''flash.bat X10SRW7.410'' | + | |
| - | - The first time, it'll put the system in "Flashing Mode" and you'll have to reboot and run this again for the actual BIOS update | + | |
| - | - Shut the system **OFF** then back on for the BIOS update to complete | + | |
| - | - Supermicro suggests restoring BIOS to default settings then setting back up although I've found some settings get reset automatically anyway (like boot order) | + | |