This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
hardware:ivan [2022/05/10 13:18] djgalloway created |
hardware:ivan [2022/06/08 15:17] (current) djgalloway |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== ivan{01..07} ====== | ====== ivan{01..07} ====== | ||
===== Summary ===== | ===== Summary ===== | ||
- | The Ceph Foundation purchased 7 more servers to join the [[service:longrunningcluster]]. The three primary goals were: | + | The Ceph Foundation purchased 7 more servers to join the [[services:longrunningcluster]]. The three primary goals were: |
- | - Faster networking between hosts | + | - Faster networking (25Gbps) between hosts |
- Large NVMe devices as OSDs | - Large NVMe devices as OSDs | ||
- 12TB HDDs (largest up until now was 4TB) | - 12TB HDDs (largest up until now was 4TB) | ||
Line 11: | Line 11: | ||
===== Hardware Specs ===== | ===== Hardware Specs ===== | ||
- | | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ | + | | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ |
- | ^ Chassis | 2U | Supermicro | SSG-6028R-E1CR12H | N/A | | | + | ^ Chassis | 2U | Supermicro | SSG-6029P-E1CR12L | N/A | | |
- | ^ Mainboard | N/A | Supermicro | X10DRH-iT | N/A | | | + | ^ Mainboard | N/A | Supermicro | X11DPH-T | N/A | | |
- | ^ CPU | 1 | Intel | Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | N/A | [[https://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz|ARK]] | | + | ^ CPU | 2 | Intel | Intel(R) Xeon(R) Silver 4215R CPU @ 3.20GHz | N/A | [[https://ark.intel.com/content/www/us/en/ark/products/199349/intel-xeon-silver-4215r-processor-11m-cache-3-20-ghz.html|ARK]] | |
- | ^ RAM | 4 DIMMs | Samsung | M393A2G40EB1-CRC | 16GB | 64GB total | | + | ^ RAM | 4 DIMMs | SK Hynix | HMAA4GR7AJR8N-XN | 32GB | 128GB Total | |
- | ^ SSD | 2 | Intel | SSDSC2BB150G7 (S3520) | 150GB | Software RAID1 for OS | | + | ^ SSD | 2 | Intel | SSDSC2KG960G8 (S4510) | 1TB | Software RAID1 for OS | |
- | ^ HDD | 11 | Seagate | ST4000NM0025 | 4TB | SAS 7200RPM for OSDs | | + | ^ HDD | 9 | Seagate | ST12000NM002G | 12TB | SAS 7200RPM for OSDs | |
- | ^ HDD | 1 | HGST | HUH721212AL5200 | 12TB | SAS 7200RPM added 1AUG2019 at Brett's request. | | + | ^ NVMe | 2 | Intel | SSDPE2KE016T8 | 1.6TB | For large NVMe OSDs | |
- | ^ NVMe | 1 | Micron | MTFDHBG800MCG-1AN1ZABYY | 800GB | Carved up as logical volumes on two partitions. 400GB as an OSD and the other 400GB divided by 12 for HDD OSD journals | | + | ^ NVMe | 1 | Intel | SSDPE21M375GA | 375GB | Carved up as logical volumes for OSD journals | |
- | ^ NIC | 2 ports | Intel | X540-AT2 | 10Gb | RJ45 (not used) | | + | ^ NIC | 2 ports | Intel | X722 | 10Gb | 1 port cabled BUT DISABLED. See below. | |
- | ^ NIC | 2 ports | Intel | 82599ES | 10Gb | 1 port cabled per system on front VLAN | | + | ^ NIC | 2 ports | Mellanox | ConnectX-4 | 25Gb | For ''back'' / storage traffic | |
- | ^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | | + | ^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | |
===== OSD/Block Device Information ===== | ===== OSD/Block Device Information ===== | ||
- | The ivan have 9x 12TB HDD, 2x 1.5TB NVMe, and 1x 350GB NVMe. | + | I used the Orchestrator to deploy OSDs on the ivan hosts (I did this one by one to avoid a mass data rebalance all to one rack). |
- | + | ||
- | The 12TB were added to so we can say we're testing on drives larger than 8TB. | + | |
- | + | ||
- | The smaller NVMe device is split into eleven equal logical volumes. One for each OSD's journal. | + | |
<code> | <code> | ||
- | root@ivan04:~# lsblk | + | root@reesi001:~# cat ivan_osd_spec.yml |
- | NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT | + | service_type: osd |
- | sda 8:0 0 894.3G 0 disk | + | service_id: osd_using_paths |
- | `-sda1 8:1 0 894.3G 0 part | + | placement: |
- | `-md0 9:0 0 894.1G 0 raid1 / | + | hosts: |
- | sdb 8:16 0 894.3G 0 disk | + | - ivan01 |
- | `-sdb1 8:17 0 894.3G 0 part | + | - ivan02 |
- | `-md0 9:0 0 894.1G 0 raid1 / | + | - ivan03 |
- | sdc 8:32 0 10.9T 0 disk | + | - ivan04 |
- | sdd 8:48 0 10.9T 0 disk | + | - ivan05 |
- | sde 8:64 0 10.9T 0 disk | + | - ivan06 |
- | sdf 8:80 0 10.9T 0 disk | + | - ivan07 |
- | sdg 8:96 0 10.9T 0 disk | + | spec: |
- | sdh 8:112 0 10.9T 0 disk | + | data_devices: |
- | sdi 8:128 0 10.9T 0 disk | + | paths: |
- | sdj 8:144 0 10.9T 0 disk | + | - /dev/sdc |
- | sdk 8:160 0 10.9T 0 disk | + | - /dev/sdd |
- | sr0 11:0 1 841M 0 rom | + | - /dev/sde |
- | nvme0n1 259:0 0 349.3G 0 disk | + | - /dev/sdf |
- | `-nvme0n1p1 259:3 0 349.3G 0 part | + | - /dev/sdg |
- | |-journals-sdc 253:0 0 31G 0 lvm | + | - /dev/sdh |
- | |-journals-sdd 253:1 0 31G 0 lvm | + | - /dev/sdi |
- | |-journals-sde 253:2 0 31G 0 lvm | + | - /dev/sdj |
- | |-journals-sdf 253:3 0 31G 0 lvm | + | - /dev/sdk |
- | |-journals-sdg 253:4 0 31G 0 lvm | + | - /dev/nvme1n1 |
- | |-journals-sdh 253:5 0 31G 0 lvm | + | - /dev/nvme2n1 |
- | |-journals-sdi 253:6 0 31G 0 lvm | + | db_devices: |
- | |-journals-sdj 253:7 0 31G 0 lvm | + | paths: |
- | |-journals-sdk 253:8 0 31G 0 lvm | + | - /dev/nvme0n1 |
- | |-journals-nvme1n1 253:9 0 31G 0 lvm | + | |
- | `-journals-nvme2n1 253:10 0 31G 0 lvm | + | |
- | nvme1n1 259:1 0 1.5T 0 disk | + | |
- | nvme2n1 259:2 0 1.5T 0 disk | + | |
- | </code> | + | |
- | + | ||
- | ==== How to partition/re-partition the NVMe device ==== | + | |
- | Here's my bash history that can be used to set up a reesi machine's NVMe card. | + | |
- | + | ||
- | <code> | + | |
- | ansible -a "sudo parted -s /dev/nvme0n1 mktable gpt" reesi* | + | |
- | ansible -a "sudo parted /dev/nvme0n1 unit '%' mkpart foo 0 50" reesi* | + | |
- | ansible -a "sudo parted /dev/nvme0n1 unit '%' mkpart foo 51 100" reesi* | + | |
- | ansible -a "sudo pvcreate /dev/nvme0n1p1" reesi* | + | |
- | ansible -a "sudo vgcreate journals /dev/nvme0n1p1" reesi* | + | |
- | for disk in sd{a..l}; do ansible -a "sudo lvcreate -L 31G -n $disk journals" reesi*; done | + | |
</code> | </code> | ||
Line 87: | Line 67: | ||
===== Updating BIOS ===== | ===== Updating BIOS ===== | ||
TBD | TBD | ||
+ | |||
+ | ===== Installation Quirks/Difficulties ===== | ||
+ | ==== Networking ==== | ||
+ | Initially, I wanted to have the 1Gb interface cabled on VLAN100 and the 25Gb interfaces cabled to VLAN101 (back.sepia.ceph.com). Up until now I have never really used VLAN101. I was able to get both NICs up, IPs assigned, and the servers could reach each other. The LRC could also reach these servers on their 25Gb/''back'' interfaces. | ||
+ | |||
+ | I added the hosts to the cluster using the ''back'' IPs. The cluster became very unhappy complaining about slow OPs. Come to find out the ivan servers couldn't get **out** from their ''back'' interfaces so the OSDs defaulted back to the 1Gb link. | ||
+ | |||
+ | I reached out to Red Hat IT to have the 25Gb network ports switched over to VLAN100. After that, I struggled to get eno1 (the 1Gb interface) to **not** come up on boot since I didn't need it anymore. | ||
+ | |||
+ | Finally I figured out<code> | ||
+ | # cat /etc/systemd/network/10-eno1.network | ||
+ | [Match] | ||
+ | Name=eno1 | ||
+ | |||
+ | [Network] | ||
+ | DHCP=no | ||
+ | </code> | ||
+ | |||
+ | ==== CentOS 8 ==== | ||
+ | I could not for the life of me get ivan05 to install using the Ubuntu preseed below. Its settings are identical to the rest of the machines. I remember someone (I think GregF?) suggest in a CLT call that we should have a mixture of OSes in the LRC so I decided to use CentOS8 instead. | ||
+ | |||
+ | That led to its own difficulties. For example, I couldn't ping the ''back'' interface from a ''front'' interface on another host. This worked fine on Ubuntu. I finally landed on this very helpful post: https://unix.stackexchange.com/a/589133 | ||
+ | |||
+ | After running ''sysctl -w net.ipv4.conf.enp216s0f0.rp_filter=2'', I could ping 172.21.18.225 **from** a ''front'' interface on reesi001. | ||
+ | |||
+ | ==== Ubuntu Preseed ==== | ||
+ | Here is the kickstart template used in [[services:cobbler]] to provision most of the hosts. As mentioned above, it did not work on ivan05 (would boot to ''grub rescue'' prompt). | ||
+ | |||
+ | <code> | ||
+ | ## This file is managed by ansible, don't make changes here - they will be overwritten. | ||
+ | |||
+ | # Fetch the os_version from the distro using this profile. | ||
+ | #set os_version = $getVar('os_version','') | ||
+ | |||
+ | # Fetch Ubuntu version (e.g., 14.04) | ||
+ | #set distro_ver = $getVar('distro','').split("-")[1] | ||
+ | |||
+ | # Fetch Ubuntu major version (e.g., 14) | ||
+ | #set distro_ver_major = $distro_ver.split(".")[0] | ||
+ | |||
+ | ### Apt setup | ||
+ | # You can choose to install non-free and contrib software. | ||
+ | #d-i apt-setup/non-free boolean true | ||
+ | #d-i apt-setup/contrib boolean true | ||
+ | |||
+ | # Preseeding only locale sets language, country and locale. | ||
+ | d-i debian-installer/locale string en_US | ||
+ | |||
+ | # Keyboard selection. | ||
+ | # Disable automatic (interactive) keymap detection. | ||
+ | d-i console-setup/ask_detect boolean false | ||
+ | |||
+ | # If you select ftp, the mirror/country string does not need to be set. | ||
+ | #d-i mirror/protocol string ftp | ||
+ | d-i mirror/country string manual | ||
+ | d-i mirror/http/hostname string archive.ubuntu.com | ||
+ | d-i mirror/http/directory string /ubuntu | ||
+ | d-i mirror/suite string $os_version | ||
+ | |||
+ | #Removes the prompt about missing modules: | ||
+ | # Continue without installing a kernel? | ||
+ | #d-i base-installer/kernel/skip-install boolean true | ||
+ | # Continue the install without loading kernel modules? | ||
+ | #d-i anna/no_kernel_modules boolean true | ||
+ | |||
+ | # Stop Ubuntu from installing random kernel choice | ||
+ | #d-i base-installer/kernel/image select none | ||
+ | |||
+ | # Controls whether or not the hardware clock is set to UTC. | ||
+ | d-i clock-setup/utc boolean true | ||
+ | # | ||
+ | # # You may set this to any valid setting for $TZ; see the contents of | ||
+ | # # /usr/share/zoneinfo/ for valid values. | ||
+ | d-i time/zone string Etc/UTC | ||
+ | |||
+ | # Controls whether to use NTP to set the clock during the install | ||
+ | d-i clock-setup/ntp boolean true | ||
+ | # NTP server to use. The default is almost always fine here. | ||
+ | d-i clock-setup/ntp-server string pool.ntp.org | ||
+ | |||
+ | ### Partitioning | ||
+ | d-i partman/unmount_active boolean true | ||
+ | |||
+ | |||
+ | #----------------------------------------------------------------------# | ||
+ | # Partitioning | ||
+ | d-i partman/early_command string \ | ||
+ | umount /media ; \ | ||
+ | mdadm --stop /dev/md0 ; \ | ||
+ | mdadm --remove /dev/md0 ; \ | ||
+ | mdadm --stop /dev/md127 ; \ | ||
+ | mdadm --remove /dev/md127 ; \ | ||
+ | for partition in /dev/sda* /dev/sdb*; do mdadm --zero-superblock $partition ; dd if=/dev/zero of=$partition bs=1M count=10; done ; \ | ||
+ | echo 1 > /sys/block/sda/device/rescan ; \ | ||
+ | echo 1 > /sys/block/sdb/device/rescan ; \ | ||
+ | ls -C /dev/sd*; \ | ||
+ | sleep 5; \ | ||
+ | exit 0; \ | ||
+ | |||
+ | |||
+ | # this only makes partman automatically partition without confirmation: | ||
+ | d-i partman-partitionining/confirm_write_new_label boolean true | ||
+ | d-i partman-md/device_remove_md boolean true | ||
+ | d-i partman-md/confirm_nooverwrite boolean true | ||
+ | d-i partman-md/confirm boolean true | ||
+ | d-i partman-lvm/device_remove_lvm boolean true | ||
+ | d-i partman-lvm/confirm_nooverwrite boolean true | ||
+ | d-i partman-lvm/confirm boolean true | ||
+ | d-i partman/confirm_nooverwrite boolean true | ||
+ | d-i partman/choose_partition select finish | ||
+ | d-i partman/confirm boolean true | ||
+ | d-i mdadm/boot_degraded boolean true | ||
+ | |||
+ | d-i partman-auto/method string raid | ||
+ | d-i partman-auto/disk string /dev/sda /dev/sdb | ||
+ | |||
+ | d-i partman-auto/expert_recipe string multiraid :: \ | ||
+ | 256 512 512 free $bootable{ } method{ efi } format{ } . \ | ||
+ | 1024 10000 -1 raid format{ } method{ raid } . | ||
+ | |||
+ | # specify how the previously defined partitions will be | ||
+ | # used in the RAID setup. | ||
+ | d-i partman-auto-raid/recipe string \ | ||
+ | 1 2 0 xfs / /dev/sda5#/dev/sdb5 . | ||
+ | |||
+ | d-i partman/choose_partition select Finish partitioning and write changes to disk | ||
+ | d-i partman-efi/non_efi_system boolean true | ||
+ | |||
+ | # Partitioning | ||
+ | #----------------------------------------------------------------------# | ||
+ | |||
+ | #User account. | ||
+ | d-i passwd/root-login boolean false | ||
+ | d-i passwd/make-user boolean true | ||
+ | d-i passwd/user-fullname string cm | ||
+ | d-i passwd/username string cm | ||
+ | d-i passwd/user-password-crypted password $default_password_crypted | ||
+ | d-i passwd/user-uid string 1100 | ||
+ | d-i user-setup/allow-password-weak boolean false | ||
+ | d-i user-setup/encrypt-home boolean false | ||
+ | |||
+ | # Individual additional packages to install | ||
+ | #if $os_version == 'precise' | ||
+ | d-i pkgsel/include string wget ntpdate bash sudo openssh-server | ||
+ | #else if int($distro_ver_major) == 16 | ||
+ | d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl | ||
+ | #else if int($distro_ver_major) == 18 | ||
+ | d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown python ntp curl | ||
+ | #else if int($distro_ver_major) >= 20 | ||
+ | d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown ntp curl gpg | ||
+ | #else | ||
+ | d-i pkgsel/include string u-boot-tools pastebinit initramfs-tools wget linux-firmware linux-firmware-nonfree ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl | ||
+ | #end if | ||
+ | |||
+ | # Whether to upgrade packages after debootstrap. | ||
+ | # Allowed values: none, safe-upgrade, full-upgrade | ||
+ | d-i pkgsel/upgrade select safe-upgrade | ||
+ | |||
+ | # Policy for applying updates. May be "none" (no automatic updates), | ||
+ | # "unattended-upgrades" (install security updates automatically), or | ||
+ | # "landscape" (manage system with Landscape). | ||
+ | d-i pkgsel/update-policy select none | ||
+ | |||
+ | # During installations from serial console, the regular virtual consoles | ||
+ | # (VT1-VT6) are normally disabled in /etc/inittab. Uncomment the next | ||
+ | # line to prevent this. | ||
+ | d-i finish-install/keep-consoles boolean true | ||
+ | |||
+ | # Avoid that last message about the install being complete. | ||
+ | d-i finish-install/reboot_in_progress note | ||
+ | |||
+ | # This command is run just before the install finishes, but when there is | ||
+ | # still a usable /target directory. You can chroot to /target and use it | ||
+ | # directly, or use the apt-install and in-target commands to easily install | ||
+ | # packages and run commands in the target system. | ||
+ | |||
+ | # cephlab_preseed_late lives in /var/lib/cobbler/scripts | ||
+ | # It is passed to the cobbler xmlrpc generate_scripts function where it's rendered. | ||
+ | # This means that snippets or other templating features can be used. | ||
+ | d-i preseed/late_command string \ | ||
+ | in-target wget http://$http_server/cblr/svc/op/script/system/$system_name/?script=cephlab_preseed_late -O /tmp/postinst.sh; \ | ||
+ | in-target /bin/chmod 755 /tmp/postinst.sh; \ | ||
+ | in-target /tmp/postinst.sh; | ||
+ | </code> |