This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| 
                    hardware:reesi [2018/02/16 21:51] djgalloway [Table]  | 
                
                    hardware:reesi [2019/08/01 23:49] (current) djgalloway  | 
            ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== reesi{001..006} ====== | ====== reesi{001..006} ====== | ||
| ===== Summary ===== | ===== Summary ===== | ||
| - | The Sepia lab has 6 storage hosts purchased January 2018 to replace the miras used in the [[services:LONG_RUNNING_CLUSTER]]. | + | The Sepia lab has 6 storage hosts purchased December 2017 to replace the miras used in the [[services:LONGRUNNINGCLUSTER]]. | 
| ===== Purchasing details ===== | ===== Purchasing details ===== | ||
| Line 10: | Line 10: | ||
| ===== Hardware Specs ===== | ===== Hardware Specs ===== | ||
| - | | ^ Count ^ Manufacturer  ^ Model ^ Capacity  ^ Notes ^ | + | | ^ Count ^ Manufacturer  ^ Model ^ Capacity  ^ Notes ^ | 
| - | ^ Chassis  | 2U | Supermicro  | SSG-6028R-E1CR12H  | N/A |  | | + | ^ Chassis  | 2U | Supermicro  | SSG-6028R-E1CR12H  | N/A |  | | 
| - | ^ Mainboard  | N/A | Supermicro  | X10DRH-iT  | N/A |  | | + | ^ Mainboard  | N/A | Supermicro  | X10DRH-iT  | N/A |  | | 
| - | ^ CPU | 1 | Intel | Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz  | N/A | [[https://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz|ARK]]  | | + | ^ CPU | 1 | Intel | Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz  | N/A | [[https://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz|ARK]]  | | 
| - | ^ RAM | 4 DIMMs | Samsung  | M393A2G40EB1-CRC  | 16GB | 64GB total | | + | ^ RAM | 4 DIMMs | Samsung  | M393A2G40EB1-CRC  | 16GB | 64GB total | | 
| - | ^ SSD | 2 | Intel | SSDSC2BB150G7 (S3520)  | 150GB | Software RAID1 for OS | | + | ^ SSD | 2 | Intel | SSDSC2BB150G7 (S3520)  | 150GB | Software RAID1 for OS | | 
| - | ^ HDD | 12 | Seagate  | ST4000NM0025  | 4TB | SAS 7200RPM for OSDs | | + | ^ HDD | 11 | Seagate  | ST4000NM0025  | 4TB | SAS 7200RPM for OSDs | | 
| - | ^ NVMe | 1 | Micron  | MTFDHBG800MCG-1AN1ZABYY  | 800GB | Carved up as logical volumes.  400GB as an OSD and the other 400GB divided by 12 for HDD OSD journals  | | + | ^ HDD | 1 | HGST | HUH721212AL5200  | 12TB | SAS 7200RPM added 1AUG2019 at Brett's request.  | | 
| - | ^ NIC | 2 ports | Intel | X540-AT2  | 10Gb | RJ45 (not used) | | + | ^ NVMe | 1 | Micron  | MTFDHBG800MCG-1AN1ZABYY  | 800GB | Carved up as logical volumes on two partitions. 400GB as an OSD and the other 400GB divided by 12 for HDD OSD journals  | | 
| - | ^ NIC | 2 ports | Intel | 82599ES  | 10Gb | 1 port cabled per system on front VLAN | | + | ^ NIC | 2 ports | Intel | X540-AT2  | 10Gb | RJ45 (not used) | | 
| - | ^ BMC | 1 | Supermicro  | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com  | | + | ^ NIC | 2 ports | Intel | 82599ES  | 10Gb | 1 port cabled per system on front VLAN | | 
| + | ^ BMC | 1 | Supermicro  | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com  | | ||
| - | ===== NVMe card notes ===== | + | ===== OSD/Block Device Information ===== | 
| - | ==== nvme-cli ==== | + | The reesi have 11x 4TB HDD, 1x 12TB HDD, and 1x 800GB NVMe. | 
| - | This is untested but claims to have packages available in Fedora 23+ and Ubuntu 16.04 and up. | + | |
| - | See https://github.com/linux-nvme/nvme-cli | + | The 12TB were added to so we can say we're testing on drives larger than 8TB. | 
| + | |||
| + | The NVMe device is split into two equal partitions: | ||
| + | - Split into 12 LVMs to serve as block.db for each HDD OSD | ||
| + | - Used as an SSD OSD | ||
| - | === Checking NVMe Card SMART Data === | ||
| <code> | <code> | ||
| - | cd /tmp | + | root@reesi001:~# lsblk | 
| - | git clone https://github.com/linux-nvme/nvme-cli.git | + | NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT | 
| - | cd nvme-cli/ | + | sda 8:0  0 3.7T  0 disk | 
| - | make | + | └─ceph--28f7427e--5558--4ffd--ae1a--51ec3042759a-osd--block--63c0a64e--a0d2--4daf--87ec--c4b00b9f3ab9  252:12  0  3.7T  0 lvm | 
| - | sudo ./nvme smart-log-add /dev/nvme0 | + | sdb 8:16  0  3.7T  0 disk | 
| + | └─ceph--9ddb7c35--20a6--4099--9127--947141c5452e-osd--block--7233f174--6402--4094--a48b--9aaabf508cb2  252:13  0  3.7T  0 lvm | ||
| + | sdc 8:32  0  3.7T  0 disk | ||
| + | └─ceph--cbfd182d--02e9--4c5a--b06e--7497d6109c87-osd--block--58eecdfe--b984--46da--a37a--cd04867b4e3f  252:14  0  3.7T  0 lvm | ||
| + | sdd 8:48  0  3.7T  0 disk | ||
| + | └─ceph--9632797e--c4d3--45df--a2bc--03e466c16224-osd--block--20171f57--5931--4402--bbe8--a7f703aa47db  252:16  0  3.7T  0 lvm | ||
| + | sde 8:64  0  3.7T  0 disk | ||
| + | sdf 8:80  0  3.7T  0 disk | ||
| + | sdg 8:96  0  3.7T  0 disk | ||
| + | sdh 8:112  0 3.7T  0 disk | ||
| + | sdi 8:128  0 3.7T  0 disk | ||
| + | sdj 8:144  0 3.7T  0 disk | ||
| + | sdk 8:160  0 3.7T  0 disk | ||
| + | sdl 8:176  0 3.7T  0 disk | ||
| + | sdm 8:192  0 139.8G  0 disk | ||
| + | ├─sdm1  8:193 0 4.7G  0 part | ||
| + | │ └─md2  9:2  0 4.7G  0 raid1 /boot | ||
| + | ├─sdm2  8:194 0 116.4G  0 part | ||
| + | │ └─md1  9:1  0 116.4G  0 raid1 / | ||
| + | └─sdm3  8:195 0 14.9G 0 part [SWAP] | ||
| + | sdn 8:208  0 139.8G  0 disk | ||
| + | ├─sdn1  8:209 0 4.7G  0 part | ||
| + | │ └─md2  9:2  0 4.7G  0 raid1 /boot | ||
| + | └─sdn2  8:210 0 116.4G 0 part | ||
| + | └─md1  9:1  0 116.4G  0 raid1 / | ||
| + | nvme0n1  259:0  0 745.2G  0 disk | ||
| + | ├─nvme0n1p1  259:3  0 372.6G  0 part | ||
| + | │ ├─journals-sda 252:0 0 31G 0 lvm | ||
| + | │ ├─journals-sdb 252:1 0 31G 0 lvm | ||
| + | │ ├─journals-sdc 252:2 0 31G 0 lvm | ||
| + | │ ├─journals-sdd  252:3 0 31G 0 lvm | ||
| + | │ ├─journals-sde  252:4 0 31G 0 lvm | ||
| + | │ ├─journals-sdf  252:5 0 31G 0 lvm | ||
| + | │ ├─journals-sdg  252:6 0 31G 0 lvm | ||
| + | │ ├─journals-sdh  252:7 0 31G 0 lvm | ||
| + | │ ├─journals-sdi  252:8 0 31G 0 lvm | ||
| + | │ ├─journals-sdj  252:9 0 31G 0 lvm | ||
| + | │ ├─journals-sdk  252:10  0  31G 0 lvm | ||
| + | │ └─journals-sdl  252:11  0  31G 0 lvm | ||
| + | └─nvme0n1p2  259:4  0 365.2G 0 part | ||
| + | └─ceph--9f7b3261--4778--47f9--9291--55630a41c262-osd--block--90e64c51--3344--47ce--a390--7931be9f95f1 252:15  0 365.2G  0 lvm | ||
| </code> | </code> | ||
| - | ==== Flashing Firmware ==== | ||
| - | Intel provides an RPM ([[https://downloadcenter.intel.com/download/23931/Intel-SSD-Data-Center-Tool|Intel Datacenter Tool]]) to configure the NVMe cards, update firmware, etc. The firmwares are baked into the RPM so it's important to download the latest zip from Intel whenever possible. | ||
| - | Download the zip file, copy to testnode, unzip and ''yum localinstall'' the applicable isdct RPM. | + | ==== How to partition/re-partition the NVMe device ==== | 
| + | Here's my bash history that can be used to set up a reesi machine's NVMe card. | ||
| - | **Show NVMe cards** | ||
| - | <code>isdct show -intelssd</code> | ||
| - | |||
| - | **Update Firmware Example** | ||
| <code> | <code> | ||
| - | [root@smithi001 ~]# isdct show -intelssd | + | ansible -a "sudo parted -s /dev/nvme0n1 mktable gpt" reesi* | 
| + | ansible -a "sudo parted /dev/nvme0n1 unit '%' mkpart foo 0 50" reesi* | ||
| + | ansible -a "sudo parted /dev/nvme0n1 unit '%' mkpart foo 51 100" reesi* | ||
| + | ansible -a "sudo pvcreate /dev/nvme0n1p1" reesi* | ||
| + | ansible -a "sudo vgcreate journals /dev/nvme0n1p1" reesi* | ||
| + | for disk in sd{a..l}; do ansible -a "sudo lvcreate -L 31G -n $disk journals" reesi*; done | ||
| + | </code> | ||
| - | - Intel SSD DC P3700 Series CVFT533000EN400BGN - | + | ===== Replacing Drives ===== | 
| + | Like the [[hardware:mira]], the drive letters do **not** correspond to drive bays. So ''/dev/sda'' isn't necessarily in Drive Bay 1. Keep this in mind when zapping/setting up OSDs. Also, drive ''/dev/sda'' may not necessarily have its WAL Block on ''/dev/journals/sda''. | ||
| - | Bootloader : 8B1B012E | + | While watching the front of a system, send ''dd if=/dev/$DRIVE of=/dev/null'' where ''$DRIVE'' is the drive you're replacing to identify each drive. | 
| - | DevicePath : /dev/nvme0n1 | + | |
| - | DeviceStatus : Healthy | + | |
| - | Firmware : 8DV10131 | + | |
| - | FirmwareUpdateAvailable : Firmware=8DV10171 Bootloader=8B1B0131 | + | |
| - | Index : 0 | + | |
| - | ModelNumber : INTEL SSDPEDMD400G4 | + | |
| - | ProductFamily : Intel SSD DC P3700 Series | + | |
| - | SerialNumber : CVFT533000EN400BGN | + | |
| - | ## NOTE: Use the Index number from above to specify the drive you want to update | + | ===== Set up a new OSD with journal on NVMe logical volume ===== | 
| + | <code> | ||
| + | ceph-volume lvm create --bluestore --data /dev/sda --block.db journals/sda | ||
| + | </code> | ||
| - | [root@smithi001 ~]# isdct load -intelssd 0 | + | **Example of successfully deployed OSD** | 
| - | WARNING! You have selected to update the drives firmware! | + | <code> | 
| - | Proceed with the update? (Y|N): Y | + | root@reesi001:~# ceph-volume lvm list | 
| - | Updating firmware... | + | |
| - | - Intel SSD DC P3700 Series CVFT533000EN400BGN - | ||
| - | Status : Firmware Updated Successfully. Please reboot the system. | + | ====== osd.94 ====== | 
| - | </code> | + | [block]  /dev/ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a/osd-block-63c0a64e-a0d2-4daf-87ec-c4b00b9f3ab9 | 
| - | ==== NVMe Failure Tracking ==== | + | type block | 
| - | The NVMe cards have started failing at a faster rate. I'm keeping track of when and how often to see if there's a pattern we can interrupt. | + | osd id 94 | 
| + | cluster fsid 28f7427e-5558-4ffd-ae1a-51ec3042759a | ||
| + | cluster name ceph | ||
| + | osd fsid 63c0a64e-a0d2-4daf-87ec-c4b00b9f3ab9 | ||
| + | db device  /dev/journals/sda | ||
| + | encrypted  0 | ||
| + | db uuid X2SlQ5-0zx2-CuHJ-GEbJ-5JS8-ly5O-emmFI9 | ||
| + | cephx lockbox secret  | ||
| + | block uuid Xvjsm3-95vU-KNmw-5DuK-i3cx-fmic-xfSK2w | ||
| + | block device  /dev/ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a/osd-block-63c0a64e-a0d2-4daf-87ec-c4b00b9f3ab9 | ||
| + | crush device class None | ||
| - | ^ System  ^ Date Failed  ^ Ticket  ^ Notes ^ | + | [ db] /dev/journals/sda | 
| - | | smithi043  | 1/3/2017  | RT 433092  | | | + | |
| - | | smithi048  | 4/19/2017  | RT 444421  |  | | + | |
| - | | smithi050  | 4/19/2017 | RT 444421  |  | | + | |
| - | | smithi038  | 11/29/2017  | PNT0111325  | | | + | |
| - | | smithi025  | 12/11/2017  | PNT0120775  | | | + | |
| - | | smithi039  | 12/11/2017  | PNT0120775  | | | + | |
| - | | smithi055  | 12/8/2017  | PNT0120775  | | | + | |
| - | | smithi057  | 1/12/2018  | PNT0138316  | | | + | |
| - | | smithi054  | 1/16/2018  | PNT0141432  | | | + | |
| - | | smithi021  | 1/19/2018  | PNT0143431  | | | + | |
| - | | smithi180  | 1/22/2018  | PNT0160022  | New batch of servers :( | | + | |
| - | ===== Other notes ===== | + | type db | 
| - | ==== Mounting remote ISOs ==== | + | osd id 94 | 
| - | I (dgalloway) copy ISOs to ''cobbler.front.sepia.ceph.com:/samba/anonymous''. | + | cluster fsid 28f7427e-5558-4ffd-ae1a-51ec3042759a | 
| - | - Disable the firewall on cobbler.front.sepia.ceph.com | + | cluster name ceph | 
| - | - ''service iptables stop'' | + | osd fsid 63c0a64e-a0d2-4daf-87ec-c4b00b9f3ab9 | 
| - | - In the smithi IPMI web UI, under Virtual Media -> CD-ROM Image, set the following parameters: | + | db device  /dev/journals/sda | 
| - | - **Share Host**: 172.21.0.11 | + | encrypted  0 | 
| - | - **Path**: ''\Anonymous\foo.iso'' | + | db uuid X2SlQ5-0zx2-CuHJ-GEbJ-5JS8-ly5O-emmFI9 | 
| - | - Click **Save** | + | cephx lockbox secret  | 
| - | - Click **Mount** | + | block uuid Xvjsm3-95vU-KNmw-5DuK-i3cx-fmic-xfSK2w | 
| - | - Reboot and enter the BIOS, when prompted, by hitting **<DEL>** | + | block device  /dev/ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a/osd-block-63c0a64e-a0d2-4daf-87ec-c4b00b9f3ab9 | 
| - | - In the last BIOS tab, "Save & Exit", select **ATEN Virtual CDROM** under **Boot Override** | + | crush device class None | 
| + | </code> | ||
| - | ===== Updating BIOS ===== | + | ===== Checking NVMe Card SMART Data ===== | 
| - | As of this writing, the latest BIOS version is X10SRW7.410 (4/10/2017) | + | <code> | 
| + | nvme smart-log /dev/nvme0n1 | ||
| + | </code> | ||
| - | - Mount ''smithibiosx10srw7410.iso'' using the instructions above | + | ===== Updating BIOS ===== | 
| - | - Boot to the Virtual CD | + | TBD | 
| - | - Once you get a DOS prompt, run ''flash.bat X10SRW7.410'' | + | |
| - | - The first time, it'll put the system in "Flashing Mode" and you'll have to reboot and run this again for the actual BIOS update | + | |
| - | - Shut the system **OFF** then back on for the BIOS update to complete | + | |
| - | - Supermicro suggests restoring BIOS to default settings then setting back up although I've found some settings get reset automatically anyway (like boot order) | + | |