====== ivan{01..07} ======
===== Summary =====
The Ceph Foundation purchased 7 more servers to join the [[services:longrunningcluster]]. The three primary goals were:
- Faster networking (25Gbps) between hosts
- Large NVMe devices as OSDs
- 12TB HDDs (largest up until now was 4TB)
===== Purchasing details =====
Racking ticket: https://redhat.service-now.com/surl.do?n=RITM0880714
===== Hardware Specs =====
| ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^
^ Chassis | 2U | Supermicro | SSG-6029P-E1CR12L | N/A | |
^ Mainboard | N/A | Supermicro | X11DPH-T | N/A | |
^ CPU | 2 | Intel | Intel(R) Xeon(R) Silver 4215R CPU @ 3.20GHz | N/A | [[https://ark.intel.com/content/www/us/en/ark/products/199349/intel-xeon-silver-4215r-processor-11m-cache-3-20-ghz.html|ARK]] |
^ RAM | 4 DIMMs | SK Hynix | HMAA4GR7AJR8N-XN | 32GB | 128GB Total |
^ SSD | 2 | Intel | SSDSC2KG960G8 (S4510) | 1TB | Software RAID1 for OS |
^ HDD | 9 | Seagate | ST12000NM002G | 12TB | SAS 7200RPM for OSDs |
^ NVMe | 2 | Intel | SSDPE2KE016T8 | 1.6TB | For large NVMe OSDs |
^ NVMe | 1 | Intel | SSDPE21M375GA | 375GB | Carved up as logical volumes for OSD journals |
^ NIC | 2 ports | Intel | X722 | 10Gb | 1 port cabled BUT DISABLED. See below. |
^ NIC | 2 ports | Mellanox | ConnectX-4 | 25Gb | For ''back'' / storage traffic |
^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com |
===== OSD/Block Device Information =====
I used the Orchestrator to deploy OSDs on the ivan hosts (I did this one by one to avoid a mass data rebalance all to one rack).
root@reesi001:~# cat ivan_osd_spec.yml
service_type: osd
service_id: osd_using_paths
placement:
hosts:
- ivan01
- ivan02
- ivan03
- ivan04
- ivan05
- ivan06
- ivan07
spec:
data_devices:
paths:
- /dev/sdc
- /dev/sdd
- /dev/sde
- /dev/sdf
- /dev/sdg
- /dev/sdh
- /dev/sdi
- /dev/sdj
- /dev/sdk
- /dev/nvme1n1
- /dev/nvme2n1
db_devices:
paths:
- /dev/nvme0n1
===== Checking NVMe Card SMART Data =====
nvme smart-log /dev/nvme0n1
===== Updating BIOS =====
TBD
===== Installation Quirks/Difficulties =====
==== Networking ====
Initially, I wanted to have the 1Gb interface cabled on VLAN100 and the 25Gb interfaces cabled to VLAN101 (back.sepia.ceph.com). Up until now I have never really used VLAN101. I was able to get both NICs up, IPs assigned, and the servers could reach each other. The LRC could also reach these servers on their 25Gb/''back'' interfaces.
I added the hosts to the cluster using the ''back'' IPs. The cluster became very unhappy complaining about slow OPs. Come to find out the ivan servers couldn't get **out** from their ''back'' interfaces so the OSDs defaulted back to the 1Gb link.
I reached out to Red Hat IT to have the 25Gb network ports switched over to VLAN100. After that, I struggled to get eno1 (the 1Gb interface) to **not** come up on boot since I didn't need it anymore.
Finally I figured out
# cat /etc/systemd/network/10-eno1.network
[Match]
Name=eno1
[Network]
DHCP=no
==== CentOS 8 ====
I could not for the life of me get ivan05 to install using the Ubuntu preseed below. Its settings are identical to the rest of the machines. I remember someone (I think GregF?) suggest in a CLT call that we should have a mixture of OSes in the LRC so I decided to use CentOS8 instead.
That led to its own difficulties. For example, I couldn't ping the ''back'' interface from a ''front'' interface on another host. This worked fine on Ubuntu. I finally landed on this very helpful post: https://unix.stackexchange.com/a/589133
After running ''sysctl -w net.ipv4.conf.enp216s0f0.rp_filter=2'', I could ping 172.21.18.225 **from** a ''front'' interface on reesi001.
==== Ubuntu Preseed ====
Here is the kickstart template used in [[services:cobbler]] to provision most of the hosts. As mentioned above, it did not work on ivan05 (would boot to ''grub rescue'' prompt).
## This file is managed by ansible, don't make changes here - they will be overwritten.
# Fetch the os_version from the distro using this profile.
#set os_version = $getVar('os_version','')
# Fetch Ubuntu version (e.g., 14.04)
#set distro_ver = $getVar('distro','').split("-")[1]
# Fetch Ubuntu major version (e.g., 14)
#set distro_ver_major = $distro_ver.split(".")[0]
### Apt setup
# You can choose to install non-free and contrib software.
#d-i apt-setup/non-free boolean true
#d-i apt-setup/contrib boolean true
# Preseeding only locale sets language, country and locale.
d-i debian-installer/locale string en_US
# Keyboard selection.
# Disable automatic (interactive) keymap detection.
d-i console-setup/ask_detect boolean false
# If you select ftp, the mirror/country string does not need to be set.
#d-i mirror/protocol string ftp
d-i mirror/country string manual
d-i mirror/http/hostname string archive.ubuntu.com
d-i mirror/http/directory string /ubuntu
d-i mirror/suite string $os_version
#Removes the prompt about missing modules:
# Continue without installing a kernel?
#d-i base-installer/kernel/skip-install boolean true
# Continue the install without loading kernel modules?
#d-i anna/no_kernel_modules boolean true
# Stop Ubuntu from installing random kernel choice
#d-i base-installer/kernel/image select none
# Controls whether or not the hardware clock is set to UTC.
d-i clock-setup/utc boolean true
#
# # You may set this to any valid setting for $TZ; see the contents of
# # /usr/share/zoneinfo/ for valid values.
d-i time/zone string Etc/UTC
# Controls whether to use NTP to set the clock during the install
d-i clock-setup/ntp boolean true
# NTP server to use. The default is almost always fine here.
d-i clock-setup/ntp-server string pool.ntp.org
### Partitioning
d-i partman/unmount_active boolean true
#----------------------------------------------------------------------#
# Partitioning
d-i partman/early_command string \
umount /media ; \
mdadm --stop /dev/md0 ; \
mdadm --remove /dev/md0 ; \
mdadm --stop /dev/md127 ; \
mdadm --remove /dev/md127 ; \
for partition in /dev/sda* /dev/sdb*; do mdadm --zero-superblock $partition ; dd if=/dev/zero of=$partition bs=1M count=10; done ; \
echo 1 > /sys/block/sda/device/rescan ; \
echo 1 > /sys/block/sdb/device/rescan ; \
ls -C /dev/sd*; \
sleep 5; \
exit 0; \
# this only makes partman automatically partition without confirmation:
d-i partman-partitionining/confirm_write_new_label boolean true
d-i partman-md/device_remove_md boolean true
d-i partman-md/confirm_nooverwrite boolean true
d-i partman-md/confirm boolean true
d-i partman-lvm/device_remove_lvm boolean true
d-i partman-lvm/confirm_nooverwrite boolean true
d-i partman-lvm/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i mdadm/boot_degraded boolean true
d-i partman-auto/method string raid
d-i partman-auto/disk string /dev/sda /dev/sdb
d-i partman-auto/expert_recipe string multiraid :: \
256 512 512 free $bootable{ } method{ efi } format{ } . \
1024 10000 -1 raid format{ } method{ raid } .
# specify how the previously defined partitions will be
# used in the RAID setup.
d-i partman-auto-raid/recipe string \
1 2 0 xfs / /dev/sda5#/dev/sdb5 .
d-i partman/choose_partition select Finish partitioning and write changes to disk
d-i partman-efi/non_efi_system boolean true
# Partitioning
#----------------------------------------------------------------------#
#User account.
d-i passwd/root-login boolean false
d-i passwd/make-user boolean true
d-i passwd/user-fullname string cm
d-i passwd/username string cm
d-i passwd/user-password-crypted password $default_password_crypted
d-i passwd/user-uid string 1100
d-i user-setup/allow-password-weak boolean false
d-i user-setup/encrypt-home boolean false
# Individual additional packages to install
#if $os_version == 'precise'
d-i pkgsel/include string wget ntpdate bash sudo openssh-server
#else if int($distro_ver_major) == 16
d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl
#else if int($distro_ver_major) == 18
d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown python ntp curl
#else if int($distro_ver_major) >= 20
d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown ntp curl gpg
#else
d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware linux-firmware-nonfree ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl
#end if
# Whether to upgrade packages after debootstrap.
# Allowed values: none, safe-upgrade, full-upgrade
d-i pkgsel/upgrade select safe-upgrade
# Policy for applying updates. May be "none" (no automatic updates),
# "unattended-upgrades" (install security updates automatically), or
# "landscape" (manage system with Landscape).
d-i pkgsel/update-policy select none
# During installations from serial console, the regular virtual consoles
# (VT1-VT6) are normally disabled in /etc/inittab. Uncomment the next
# line to prevent this.
d-i finish-install/keep-consoles boolean true
# Avoid that last message about the install being complete.
d-i finish-install/reboot_in_progress note
# This command is run just before the install finishes, but when there is
# still a usable /target directory. You can chroot to /target and use it
# directly, or use the apt-install and in-target commands to easily install
# packages and run commands in the target system.
# cephlab_preseed_late lives in /var/lib/cobbler/scripts
# It is passed to the cobbler xmlrpc generate_scripts function where it's rendered.
# This means that snippets or other templating features can be used.
d-i preseed/late_command string \
in-target wget http://$http_server/cblr/svc/op/script/system/$system_name/?script=cephlab_preseed_late -O /tmp/postinst.sh; \
in-target /bin/chmod 755 /tmp/postinst.sh; \
in-target /tmp/postinst.sh;