This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
hardware:mako [2021/05/26 19:51] djgalloway [Purchasing Details] |
hardware:mako [2021/05/26 20:58] djgalloway [Table] |
||
---|---|---|---|
Line 9: | Line 9: | ||
Racking ticket: https://redhat.service-now.com/surl.do?n=PNT0915435 | Racking ticket: https://redhat.service-now.com/surl.do?n=PNT0915435 | ||
- | Samsung drive carriers purchasing ticket: https://redhat.service-now.com/surl.do?n=PNT1008475 | + | Samsung drive carriers purchasing ticket: https://redhat.service-now.com/surl.do?n=PNT1008475\\ |
Racking ticket: https://redhat.service-now.com/surl.do?n=PNT1008476 | Racking ticket: https://redhat.service-now.com/surl.do?n=PNT1008476 | ||
Line 15: | Line 15: | ||
===== Hardware Specs ===== | ===== Hardware Specs ===== | ||
- | | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ | + | | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ |
- | ^ Chassis | 1U | Quanta | D52B-1U | N/A | | | + | ^ Chassis | 1U | Dell | PowerEdge R6515 | N/A | | |
- | ^ Mainboard | N/A | Quanta | S5B-MB (LBG-1G) | N/A | | | + | ^ Mainboard | N/A | Dell | 0R4CNN | N/A | | |
- | ^ CPU | 2 | Intel | Intel(R) Xeon(R) Platinum 8276M CPU @ 2.20GHz | 112 | [[https://ark.intel.com/content/www/us/en/ark/products/192471/intel-xeon-platinum-8276m-processor-38-5m-cache-2-20-ghz.html|ARK]] | | + | ^ CPU | 1 | AMD | AMD EPYC 7742 | 128 cores | | |
- | ^ RAM | 12 DIMMs | Micron | 36ASF4G72PZ-2G6H1 | 32GB | 384GB Total | | + | ^ RAM | 8 DIMMs | Micron | 18ASF2G72PDZ-3G2E1 | 16GB | 128GB Total | |
- | ^ SSD | 1 | Intel | SSDSC2KB960G8 | 1TB | For OS | | + | ^ SSD | 2 | Micron | MTFDDAV480TDS | 480GB | Behind hardware RAID1 for OS | |
- | ^ NVMe | 2 | Intel | SSDPE21K750GA | 1TB | For OSD journals? | | + | ^ NVMe | 1 | Dell | P4510 | 1TB | For OSD journals? | |
- | ^ NVMe | 8 | Intel | SSDPE2KX080T8 | 8TB | For OSDs | | + | ^ NIC | 2 ports | Dell | | 1Gb | On-board. Unused. | |
- | ^ NIC | 2 NICs 2 x ports | Intel | XXV710 | 25Gb | All 4 ports cabled and bonded | | + | ^ NIC | 2 ports | Broadcom | 57416 BaseT | 1/10Gb | Oops. Won't be using this. | |
- | ^ BMC | 1 | Quanta | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com using usual IPMI credentials. | | + | ^ NIC | 2 ports | Mellanox | ConnectX-6 | 100Gb | 1 port as uplink | |
+ | ^ BMC | 1 | Quanta | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com using usual IPMI credentials. | | ||
===== PXE/Reimaging ===== | ===== PXE/Reimaging ===== | ||
- | These machines are configured in DHCP to receive ''/var/lib/tftpboot/grub/grub-x86_64.efi'' from the Cobbler host when PXE booting. I had trouble PXEing using BIOS mode on these machines so we're using UEFI. | + | These PXE using Legacy/BIOS mode and can be provisioned via Cobbler normally. |
- | + | ||
- | Our usual cephlab_rhel.ks kickstart is not set up to do UEFI so Anaconda will stop and say the Storage Configuration needs editing. | + | |
==== Network Config ==== | ==== Network Config ==== | ||
- | These nodes are connected to their own QFX5200 (s/n WH0218170419 [formerly WH3619030401]) uplinked and managed by Red Hat IT. For an example of how to report an outage, see https://redhat.service-now.com/surl.do?n=INC1201508. | + | These nodes are connected to the [[hardware:officinalis]] QFX5200 (s/n WH0218170419 [formerly WH3619030401]) uplinked and managed by Red Hat IT. For an example of how to report an outage, see https://redhat.service-now.com/surl.do?n=INC1201508. |
- | + | ||
- | There is an ansible module that is supposed to allow you to create bonds but it requires NetworkManager-glib which isn't in CentOS8. So I found and used https://github.com/linux-system-roles/network. | + | |
- | + | ||
- | Here's the command I ran from the ''examples'' dir: | + | |
- | + | ||
- | <code> | + | |
- | for num in {1..9}; do sed -i "s/172.21.3..*/172.21.3.$num\/20/g" officinalis.yml; ansible-playbook -e ansible_python_interpreter=/usr/bin/python3 officinalis.yml --limit o0${num}*; done; sed -i "s/172.21.3..*/172.21.3.10\/20/g" officinalis.yml; ansible-playbook -e ansible_python_interpreter=/usr/bin/python3 officinalis.yml --limit o10* | + | |
- | </code> | + | |
- | + | ||
- | Here's the yml I used: | + | |
- | + | ||
- | <code> | + | |
- | --- | + | |
- | - hosts: officinalis | + | |
- | become: true | + | |
- | vars: | + | |
- | network_connections: | + | |
- | + | ||
- | - name: ens20f0 | + | |
- | persistent_state: absent | + | |
- | + | ||
- | - name: ens20f1 | + | |
- | persistent_state: absent | + | |
- | + | ||
- | - name: ens49f0 | + | |
- | persistent_state: absent | + | |
- | + | ||
- | - name: ens49f1 | + | |
- | persistent_state: absent | + | |
- | + | ||
- | # Create a bond profile | + | |
- | - name: bond0 | + | |
- | state: up | + | |
- | type: bond | + | |
- | ip: | + | |
- | address: 172.21.3.10/20 | + | |
- | gateway4: 172.21.15.254 | + | |
- | dns: | + | |
- | - 172.21.0.1 | + | |
- | - 172.21.0.2 | + | |
- | dns_search: | + | |
- | - front.sepia.ceph.com | + | |
- | bond: | + | |
- | mode: 802.3ad | + | |
- | mtu: 1450 | + | |
- | + | ||
- | # enslave an ethernet to the bond | + | |
- | - name: ens20f0 | + | |
- | state: up | + | |
- | type: ethernet | + | |
- | master: bond0 | + | |
- | + | ||
- | # enslave an ethernet to the bond | + | |
- | - name: ens20f1 | + | |
- | state: up | + | |
- | type: ethernet | + | |
- | master: bond0 | + | |
- | + | ||
- | # enslave an ethernet to the bond | + | |
- | - name: ens49f0 | + | |
- | state: up | + | |
- | type: ethernet | + | |
- | master: bond0 | + | |
- | + | ||
- | # enslave an ethernet to the bond | + | |
- | - name: ens49f1 | + | |
- | state: up | + | |
- | type: ethernet | + | |
- | master: bond0 | + | |
- | + | ||
- | roles: | + | |
- | - linux-system-roles.network | + | |
- | </code> | + | |
- | + | ||
- | ==== List of Outages ==== | + | |
- | The QFX5200 serving the Officinalis lab goes down on a regular basis. This table will keep track of dates and tickets. | + | |
- | ^ Date ^ Ticket | Notes | | + | The 100Gb connection is the only uplink for now. The top-of-rack switch in that rack probably has capacity if we need a 1Gb uplink and reserve the 100Gb NIC for backend traffic. |
- | | 12/10/2019 | https://redhat.service-now.com/surl.do?n=PNT0731289 | | | + | |
- | | 3/2/2020 | https://redhat.service-now.com/surl.do?n=INC1201508 | | | + | |
- | | 6/16/2020 | https://redhat.service-now.com/surl.do?n=RITM0706076 | | | + | |
- | | 3/16/2021 | https://redhat.service-now.com/surl.do?n=INC1672259 | Added redundant link after this one\\ https://redhat.service-now.com/surl.do?n=RITM0884572 | | + | |
- | | 5/11/2021 | https://redhat.service-now.com/surl.do?n=INC1758733 | | | + |