====== ivan{01..07} ====== ===== Summary ===== The Ceph Foundation purchased 7 more servers to join the [[services:longrunningcluster]]. The three primary goals were: - Faster networking (25Gbps) between hosts - Large NVMe devices as OSDs - 12TB HDDs (largest up until now was 4TB) ===== Purchasing details ===== Racking ticket: https://redhat.service-now.com/surl.do?n=RITM0880714 ===== Hardware Specs ===== | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ ^ Chassis | 2U | Supermicro | SSG-6029P-E1CR12L | N/A | | ^ Mainboard | N/A | Supermicro | X11DPH-T | N/A | | ^ CPU | 2 | Intel | Intel(R) Xeon(R) Silver 4215R CPU @ 3.20GHz | N/A | [[https://ark.intel.com/content/www/us/en/ark/products/199349/intel-xeon-silver-4215r-processor-11m-cache-3-20-ghz.html|ARK]] | ^ RAM | 4 DIMMs | SK Hynix | HMAA4GR7AJR8N-XN | 32GB | 128GB Total | ^ SSD | 2 | Intel | SSDSC2KG960G8 (S4510) | 1TB | Software RAID1 for OS | ^ HDD | 9 | Seagate | ST12000NM002G | 12TB | SAS 7200RPM for OSDs | ^ NVMe | 2 | Intel | SSDPE2KE016T8 | 1.6TB | For large NVMe OSDs | ^ NVMe | 1 | Intel | SSDPE21M375GA | 375GB | Carved up as logical volumes for OSD journals | ^ NIC | 2 ports | Intel | X722 | 10Gb | 1 port cabled BUT DISABLED. See below. | ^ NIC | 2 ports | Mellanox | ConnectX-4 | 25Gb | For ''back'' / storage traffic | ^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | ===== OSD/Block Device Information ===== I used the Orchestrator to deploy OSDs on the ivan hosts (I did this one by one to avoid a mass data rebalance all to one rack). root@reesi001:~# cat ivan_osd_spec.yml service_type: osd service_id: osd_using_paths placement: hosts: - ivan01 - ivan02 - ivan03 - ivan04 - ivan05 - ivan06 - ivan07 spec: data_devices: paths: - /dev/sdc - /dev/sdd - /dev/sde - /dev/sdf - /dev/sdg - /dev/sdh - /dev/sdi - /dev/sdj - /dev/sdk - /dev/nvme1n1 - /dev/nvme2n1 db_devices: paths: - /dev/nvme0n1 ===== Checking NVMe Card SMART Data ===== nvme smart-log /dev/nvme0n1 ===== Updating BIOS ===== TBD ===== Installation Quirks/Difficulties ===== ==== Networking ==== Initially, I wanted to have the 1Gb interface cabled on VLAN100 and the 25Gb interfaces cabled to VLAN101 (back.sepia.ceph.com). Up until now I have never really used VLAN101. I was able to get both NICs up, IPs assigned, and the servers could reach each other. The LRC could also reach these servers on their 25Gb/''back'' interfaces. I added the hosts to the cluster using the ''back'' IPs. The cluster became very unhappy complaining about slow OPs. Come to find out the ivan servers couldn't get **out** from their ''back'' interfaces so the OSDs defaulted back to the 1Gb link. I reached out to Red Hat IT to have the 25Gb network ports switched over to VLAN100. After that, I struggled to get eno1 (the 1Gb interface) to **not** come up on boot since I didn't need it anymore. Finally I figured out # cat /etc/systemd/network/10-eno1.network [Match] Name=eno1 [Network] DHCP=no ==== CentOS 8 ==== I could not for the life of me get ivan05 to install using the Ubuntu preseed below. Its settings are identical to the rest of the machines. I remember someone (I think GregF?) suggest in a CLT call that we should have a mixture of OSes in the LRC so I decided to use CentOS8 instead. That led to its own difficulties. For example, I couldn't ping the ''back'' interface from a ''front'' interface on another host. This worked fine on Ubuntu. I finally landed on this very helpful post: https://unix.stackexchange.com/a/589133 After running ''sysctl -w net.ipv4.conf.enp216s0f0.rp_filter=2'', I could ping 172.21.18.225 **from** a ''front'' interface on reesi001. ==== Ubuntu Preseed ==== Here is the kickstart template used in [[services:cobbler]] to provision most of the hosts. As mentioned above, it did not work on ivan05 (would boot to ''grub rescue'' prompt). ## This file is managed by ansible, don't make changes here - they will be overwritten. # Fetch the os_version from the distro using this profile. #set os_version = $getVar('os_version','') # Fetch Ubuntu version (e.g., 14.04) #set distro_ver = $getVar('distro','').split("-")[1] # Fetch Ubuntu major version (e.g., 14) #set distro_ver_major = $distro_ver.split(".")[0] ### Apt setup # You can choose to install non-free and contrib software. #d-i apt-setup/non-free boolean true #d-i apt-setup/contrib boolean true # Preseeding only locale sets language, country and locale. d-i debian-installer/locale string en_US # Keyboard selection. # Disable automatic (interactive) keymap detection. d-i console-setup/ask_detect boolean false # If you select ftp, the mirror/country string does not need to be set. #d-i mirror/protocol string ftp d-i mirror/country string manual d-i mirror/http/hostname string archive.ubuntu.com d-i mirror/http/directory string /ubuntu d-i mirror/suite string $os_version #Removes the prompt about missing modules: # Continue without installing a kernel? #d-i base-installer/kernel/skip-install boolean true # Continue the install without loading kernel modules? #d-i anna/no_kernel_modules boolean true # Stop Ubuntu from installing random kernel choice #d-i base-installer/kernel/image select none # Controls whether or not the hardware clock is set to UTC. d-i clock-setup/utc boolean true # # # You may set this to any valid setting for $TZ; see the contents of # # /usr/share/zoneinfo/ for valid values. d-i time/zone string Etc/UTC # Controls whether to use NTP to set the clock during the install d-i clock-setup/ntp boolean true # NTP server to use. The default is almost always fine here. d-i clock-setup/ntp-server string pool.ntp.org ### Partitioning d-i partman/unmount_active boolean true #----------------------------------------------------------------------# # Partitioning d-i partman/early_command string \ umount /media ; \ mdadm --stop /dev/md0 ; \ mdadm --remove /dev/md0 ; \ mdadm --stop /dev/md127 ; \ mdadm --remove /dev/md127 ; \ for partition in /dev/sda* /dev/sdb*; do mdadm --zero-superblock $partition ; dd if=/dev/zero of=$partition bs=1M count=10; done ; \ echo 1 > /sys/block/sda/device/rescan ; \ echo 1 > /sys/block/sdb/device/rescan ; \ ls -C /dev/sd*; \ sleep 5; \ exit 0; \ # this only makes partman automatically partition without confirmation: d-i partman-partitionining/confirm_write_new_label boolean true d-i partman-md/device_remove_md boolean true d-i partman-md/confirm_nooverwrite boolean true d-i partman-md/confirm boolean true d-i partman-lvm/device_remove_lvm boolean true d-i partman-lvm/confirm_nooverwrite boolean true d-i partman-lvm/confirm boolean true d-i partman/confirm_nooverwrite boolean true d-i partman/choose_partition select finish d-i partman/confirm boolean true d-i mdadm/boot_degraded boolean true d-i partman-auto/method string raid d-i partman-auto/disk string /dev/sda /dev/sdb d-i partman-auto/expert_recipe string multiraid :: \ 256 512 512 free $bootable{ } method{ efi } format{ } . \ 1024 10000 -1 raid format{ } method{ raid } . # specify how the previously defined partitions will be # used in the RAID setup. d-i partman-auto-raid/recipe string \ 1 2 0 xfs / /dev/sda5#/dev/sdb5 . d-i partman/choose_partition select Finish partitioning and write changes to disk d-i partman-efi/non_efi_system boolean true # Partitioning #----------------------------------------------------------------------# #User account. d-i passwd/root-login boolean false d-i passwd/make-user boolean true d-i passwd/user-fullname string cm d-i passwd/username string cm d-i passwd/user-password-crypted password $default_password_crypted d-i passwd/user-uid string 1100 d-i user-setup/allow-password-weak boolean false d-i user-setup/encrypt-home boolean false # Individual additional packages to install #if $os_version == 'precise' d-i pkgsel/include string wget ntpdate bash sudo openssh-server #else if int($distro_ver_major) == 16 d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl #else if int($distro_ver_major) == 18 d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown python ntp curl #else if int($distro_ver_major) >= 20 d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown ntp curl gpg #else d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware linux-firmware-nonfree ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl #end if # Whether to upgrade packages after debootstrap. # Allowed values: none, safe-upgrade, full-upgrade d-i pkgsel/upgrade select safe-upgrade # Policy for applying updates. May be "none" (no automatic updates), # "unattended-upgrades" (install security updates automatically), or # "landscape" (manage system with Landscape). d-i pkgsel/update-policy select none # During installations from serial console, the regular virtual consoles # (VT1-VT6) are normally disabled in /etc/inittab. Uncomment the next # line to prevent this. d-i finish-install/keep-consoles boolean true # Avoid that last message about the install being complete. d-i finish-install/reboot_in_progress note # This command is run just before the install finishes, but when there is # still a usable /target directory. You can chroot to /target and use it # directly, or use the apt-install and in-target commands to easily install # packages and run commands in the target system. # cephlab_preseed_late lives in /var/lib/cobbler/scripts # It is passed to the cobbler xmlrpc generate_scripts function where it's rendered. # This means that snippets or other templating features can be used. d-i preseed/late_command string \ in-target wget http://$http_server/cblr/svc/op/script/system/$system_name/?script=cephlab_preseed_late -O /tmp/postinst.sh; \ in-target /bin/chmod 755 /tmp/postinst.sh; \ in-target /tmp/postinst.sh;