====== Officinalis | o{01..10} ====== ===== Summary ===== 10 nodes donated by Intel for Performance testing. ===== Hardware Specs ===== | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ ^ Chassis | 1U | Quanta | D52B-1U | N/A | | ^ Mainboard | N/A | Quanta | S5B-MB (LBG-1G) | N/A | | ^ CPU | 2 | Intel | Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz | 112 | [[https://ark.intel.com/content/www/us/en/ark/products/192471/intel-xeon-platinum-8276m-processor-38-5m-cache-2-20-ghz.html|ARK]] | ^ RAM | 12 DIMMs | Micron | 36ASF4G72PZ-2G6H1 | 32GB | 384GB Total | ^ SSD | 1 | Intel | SSDSC2KB960G8 | 1TB | For OS | ^ NVMe | 2 | Intel | SSDPE21K750GA | 1TB | For OSD journals? | ^ NVMe | 8 | Intel | SSDPE2KX080T8 | 8TB | For OSDs | ^ NIC | 2 NICs 2 x ports | Intel | XXV710 | 25Gb | All 4 ports cabled and bonded | ^ BMC | 1 | Quanta | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com using usual IPMI credentials or admin:cmb9.admin | ===== PXE/Reimaging ===== These machines are configured in DHCP to receive ''/var/lib/tftpboot/grub/grub-x86_64.efi'' from the Cobbler host when PXE booting. I had trouble PXEing using BIOS mode on these machines so we're using UEFI. Our usual cephlab_rhel.ks kickstart is not set up to do UEFI so Anaconda will stop and say the Storage Configuration needs editing. ==== Network Config ==== These nodes are connected to their own QFX5200 (s/n WH0218170419 [formerly WH3619030401]) uplinked and managed by Red Hat IT. For an example of how to report an outage, see https://redhat.service-now.com/surl.do?n=INC1201508. There is an ansible module that is supposed to allow you to create bonds but it requires NetworkManager-glib which isn't in CentOS8. So I found and used https://github.com/linux-system-roles/network. Here's the command I ran from the ''examples'' dir: for num in {1..9}; do sed -i "s/172.21.3..*/172.21.3.$num\/20/g" officinalis.yml; ansible-playbook -e ansible_python_interpreter=/usr/bin/python3 officinalis.yml --limit o0${num}*; done; sed -i "s/172.21.3..*/172.21.3.10\/20/g" officinalis.yml; ansible-playbook -e ansible_python_interpreter=/usr/bin/python3 officinalis.yml --limit o10* Here's the yml I used: --- - hosts: officinalis become: true vars: network_connections: - name: ens20f0 persistent_state: absent - name: ens20f1 persistent_state: absent - name: ens49f0 persistent_state: absent - name: ens49f1 persistent_state: absent # Create a bond profile - name: bond0 state: up type: bond ip: address: 172.21.3.10/20 gateway4: 172.21.15.254 dns: - 172.21.0.1 - 172.21.0.2 dns_search: - front.sepia.ceph.com bond: mode: 802.3ad mtu: 1450 # enslave an ethernet to the bond - name: ens20f0 state: up type: ethernet master: bond0 # enslave an ethernet to the bond - name: ens20f1 state: up type: ethernet master: bond0 # enslave an ethernet to the bond - name: ens49f0 state: up type: ethernet master: bond0 # enslave an ethernet to the bond - name: ens49f1 state: up type: ethernet master: bond0 roles: - linux-system-roles.network ==== List of Outages ==== The QFX5200 serving the Officinalis lab goes down on a regular basis. This table will keep track of dates and tickets. ^ Date ^ Ticket | Notes | | 12/10/2019 | https://redhat.service-now.com/surl.do?n=PNT0731289 | | | 3/2/2020 | https://redhat.service-now.com/surl.do?n=INC1201508 | | | 6/16/2020 | https://redhat.service-now.com/surl.do?n=RITM0706076 | | | 3/16/2021 | https://redhat.service-now.com/surl.do?n=INC1672259 | Added redundant link after this one\\ https://redhat.service-now.com/surl.do?n=RITM0884572 | | 5/11/2021 | https://redhat.service-now.com/surl.do?n=INC1758733 | |