User Tools

Site Tools


hardware:ivan

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
hardware:ivan [2022/05/10 13:18]
djgalloway created
hardware:ivan [2022/06/08 15:17] (current)
djgalloway
Line 1: Line 1:
 ====== ivan{01..07} ====== ====== ivan{01..07} ======
 ===== Summary ===== ===== Summary =====
-The Ceph Foundation purchased 7 more servers to join the [[service:​longrunningcluster]]. ​ The three primary goals were: +The Ceph Foundation purchased 7 more servers to join the [[services:​longrunningcluster]]. ​ The three primary goals were: 
-  - Faster networking between hosts+  - Faster networking ​(25Gbps) ​between hosts
   - Large NVMe devices as OSDs   - Large NVMe devices as OSDs
   - 12TB HDDs (largest up until now was 4TB)   - 12TB HDDs (largest up until now was 4TB)
Line 11: Line 11:
  
 ===== Hardware Specs ===== ===== Hardware Specs =====
-|            ^ Count    ^ Manufacturer ​ ^ Model                                      ^ Capacity ​ ^ Notes                                                                                                                    +|            ^ Count    ^ Manufacturer ​ ^ Model                                        ^ Capacity ​ ^ Notes                                                                                                                          
-^ Chassis ​   | 2U       | Supermicro ​   | SSG-6028R-E1CR12H ​                         ​| N/A       ​| ​                                                                                                                         +^ Chassis ​   | 2U       | Supermicro ​   | SSG-6029P-E1CR12L ​                           ​| N/A       ​| ​                                                                                                                               
-^ Mainboard ​ | N/A      | Supermicro ​   | X10DRH-iT                                  ​| N/A       ​| ​                                                                                                                         +^ Mainboard ​ | N/A      | Supermicro ​   | X11DPH-T                                     | N/A       ​| ​                                                                                                                               
-^ CPU        |        | Intel         | Intel(R) Xeon(R) CPU E5-2620 v4 2.10GHz  | N/A       | [[https://​ark.intel.com/​products/​92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz|ARK]] ​                         +^ CPU        |        | Intel         | Intel(R) Xeon(R) ​Silver 4215R CPU @ 3.20GHz  | N/A       | [[https://​ark.intel.com/​content/​www/​us/​en/​ark/products/199349/intel-xeon-silver-4215r-processor-11m-cache-3-20-ghz.html|ARK]] ​ 
-^ RAM        | 4 DIMMs  | Samsung ​      M393A2G40EB1-CRC                           16GB      ​| ​64GB total                                                                                                               +^ RAM        | 4 DIMMs  | SK Hynix      ​HMAA4GR7AJR8N-XN                             32GB      ​| ​128GB Total                                                                                                                    ​
-^ SSD        | 2        | Intel         ​| ​SSDSC2BB150G7 ​(S3520                     150GB     | Software RAID1 for OS                                                                                                    +^ SSD        | 2        | Intel         ​| ​SSDSC2KG960G8 ​(S4510                       1TB       | Software RAID1 for OS                                                                                                          
-^ HDD        | 11       | Seagate ​      ​| ​ST4000NM0025 ​                              4TB       | SAS 7200RPM for OSDs                                                                                                     ​+^ HDD        | 9        ​| Seagate ​      ​| ​ST12000NM002G ​                               ​12TB      ​| SAS 7200RPM for OSDs                                                                                                           ​
-HDD        ​       ​| ​HGST          ​HUH721212AL5200 ​                           ​12TB      | SAS 7200RPM added 1AUG2019 at Brett'​s request                                                                          ​+NVMe              ​| ​Intel         SSDPE2KE016T8 ​                               ​1.6TB     | For large NVMe OSDs                                                                                                            ​
-^ NVMe       | 1        | Micron ​       ​MTFDHBG800MCG-1AN1ZABYY ​                   ​800GB     | Carved up as logical volumes ​on two partitions. ​ 400GB as an OSD and the other 400GB divided by 12 for HDD OSD journals ​ +^ NVMe       | 1        | Intel         SSDPE21M375GA ​                               ​375GB     | Carved up as logical volumes for OSD journals ​                                                                                 
-^ NIC        | 2 ports  | Intel         ​| ​X540-AT2 ​                                  | 10Gb      | RJ45 (not used)                                                                                                          ​+^ NIC        | 2 ports  | Intel         ​| ​X722                                         | 10Gb      | 1 port cabled BUT DISABLED. See below. ​                                                                                               ​
-^ NIC        | 2 ports  | Intel         82599ES ​                                   ​10Gb      ​| ​1 port cabled per system on front VLAN                                                                                   +^ NIC        | 2 ports  | Mellanox ​     ​ConnectX-4 ​                                  25Gb      ​| ​For ''​back''​ / storage traffic ​                                                                                                
-^ BMC        | 1        | Supermicro ​   | N/A                                        | N/A       | Reachable at $host.ipmi.sepia.ceph.com ​                                                                                  ​|+^ BMC        | 1        | Supermicro ​   | N/A                                          | N/A       | Reachable at $host.ipmi.sepia.ceph.com ​                                                                                        ​|
  
  
 ===== OSD/Block Device Information ===== ===== OSD/Block Device Information =====
-The ivan have 9x 12TB HDD, 2x 1.5TB NVMe, and 1x 350GB NVMe. +I used the Orchestrator ​to deploy OSDs on the ivan hosts (I did this one by one to avoid a mass data rebalance all to one rack).
- +
-The 12TB were added to so we can say we're testing ​on drives larger than 8TB. +
- +
-The smaller NVMe device is split into eleven equal logical volumes. One for each OSD's journal.+
  
 <​code>​ <​code>​
-root@ivan04:~# lsblk +root@reesi001:~# cat ivan_osd_spec.yml ​ 
-NAME                 MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT +service_typeosd 
-sda                    8:0    0 894.3G ​ 0 disk  ​ +service_idosd_using_paths 
-`-sda1 ​                8:1    0 894.3G ​ 0 part  ​ +placement
-  ​`-md0                9:   0 894.1G ​ 0 raid1 / +  ​hosts: 
-sdb                    8:16   0 894.3G ​ 0 disk  ​ +    - ivan01 
-`-sdb1                 ​8:​17 ​  0 894.3G ​ 0 part  ​ +    - ivan02 
-  `-md0                9:0    0 894.1G ​ 0 raid1 / +    ivan03 
-sdc                    8:32   ​0 ​ 10.9T  0 disk  ​ +    ivan04 
-sdd                    8:48   ​0 ​ 10.9T  0 disk  ​ +    - ivan05 
-sde                    8:64   ​0 ​ 10.9T  0 disk  ​ +    - ivan06 
-sdf                    8:80   ​0 ​ 10.9T  0 disk   +    - ivan07 
-sdg                    8:96   ​0 ​ 10.9T  0 disk  ​ +spec
-sdh                    8:112  0  10.9T  0 disk   +  ​data_devices
-sdi                    8:128  0  10.9T  0 disk   +    ​paths
-sdj                    8:144  0  10.9T  0 disk   +    - /dev/sdc 
-sdk                    8:160  0  10.9T  0 disk   +    ​- ​/dev/sdd 
-sr0                   ​11:​0 ​   1   ​841M ​ 0 rom    +    ​- ​/dev/sde 
-nvme0n1 ​             259:0    0 349.3G ​ 0 disk  ​ +    ​- ​/dev/sdf 
-`-nvme0n1p1 ​         259:3    0 349.3G ​ 0 part   +    ​- ​/dev/sdg 
-  |-journals-sdc     253:0    0    31G  0 lvm    +    ​- ​/dev/sdh 
-  |-journals-sdd     253:1    0    31G  0 lvm    +    - /dev/sdi 
-  |-journals-sde     253:2    0    31G  0 lvm    +    - /dev/sdj 
-  |-journals-sdf     253:3    0    31G  0 lvm    +    - /dev/sdk 
-  |-journals-sdg     253:4    0    31G  0 lvm    +    - /dev/nvme1n1 
-  |-journals-sdh     253:5    0    31G  0 lvm    +    - /dev/nvme2n1 
-  ​|-journals-sdi ​    ​253:​6 ​   0    31G  0 lvm    +  ​db_devices:​ 
-  |-journals-sdj ​    ​253:​7 ​   0    31G  0 lvm    +    paths: 
-  |-journals-sdk ​    ​253:​8 ​   0    31G  0 lvm    +    ​- /dev/nvme0n1
-  |-journals-nvme1n1 253:9    0    31G  0 lvm    +
-  `-journals-nvme2n1 253:​10 ​  ​0 ​   31G  0 lvm    +
-nvme1n1 ​             259:1    0   ​1.5T ​ 0 disk   +
-nvme2n1 ​             259:2    0   ​1.5T ​ 0 disk +
-</code> +
- +
-==== How to partition/re-partition the NVMe device ==== +
-Here's my bash history that can be used to set up a reesi machine'​s NVMe card. +
- +
-<​code>​ +
-ansible -a "sudo parted ​-/dev/nvme0n1 mktable gpt" reesi* +
-ansible ​-a "sudo parted ​/dev/nvme0n1 unit '​%'​ mkpart foo 0 50" reesi* +
-ansible ​-a "sudo parted ​/dev/nvme0n1 unit '​%'​ mkpart foo 51 100" reesi* +
-ansible ​-a "sudo pvcreate ​/dev/nvme0n1p1"​ reesi* +
-ansible ​-a "sudo vgcreate journals ​/dev/nvme0n1p1"​ reesi* +
-for disk in sd{a..l}; do ansible -a "sudo lvcreate -L 31G -n $disk journals"​ reesi*; done+
 </​code>​ </​code>​
  
Line 87: Line 67:
 ===== Updating BIOS ===== ===== Updating BIOS =====
 TBD TBD
 +
 +===== Installation Quirks/​Difficulties =====
 +==== Networking ====
 +Initially, I wanted to have the 1Gb interface cabled on VLAN100 and the 25Gb interfaces cabled to VLAN101 (back.sepia.ceph.com). Up until now I have never really used VLAN101. I was able to get both NICs up, IPs assigned, and the servers could reach each other. The LRC could also reach these servers on their 25Gb/''​back''​ interfaces.
 +
 +I added the hosts to the cluster using the ''​back''​ IPs. The cluster became very unhappy complaining about slow OPs. Come to find out the ivan servers couldn'​t get **out** from their ''​back''​ interfaces so the OSDs defaulted back to the 1Gb link.
 +
 +I reached out to Red Hat IT to have the 25Gb network ports switched over to VLAN100. After that, I struggled to get eno1 (the 1Gb interface) to **not** come up on boot since I didn't need it anymore.
 +
 +Finally I figured out<​code>​
 +# cat /​etc/​systemd/​network/​10-eno1.network ​
 +[Match]
 +Name=eno1
 +
 +[Network]
 +DHCP=no
 +</​code>​
 +
 +==== CentOS 8 ====
 +I could not for the life of me get ivan05 to install using the Ubuntu preseed below. Its settings are identical to the rest of the machines. I remember someone (I think GregF?) suggest in a CLT call that we should have a mixture of OSes in the LRC so I decided to use CentOS8 instead.
 +
 +That led to its own difficulties. For example, I couldn'​t ping the ''​back''​ interface from a ''​front''​ interface on another host. This worked fine on Ubuntu. I finally landed on this very helpful post: https://​unix.stackexchange.com/​a/​589133
 +
 +After running ''​sysctl -w net.ipv4.conf.enp216s0f0.rp_filter=2'',​ I could ping 172.21.18.225 **from** a ''​front''​ interface on reesi001.
 +
 +==== Ubuntu Preseed ====
 +Here is the kickstart template used in [[services:​cobbler]] to provision most of the hosts. As mentioned above, it did not work on ivan05 (would boot to ''​grub rescue''​ prompt).
 +
 +<​code>​
 +## This file is managed by ansible, don't make changes here - they will be overwritten.
 +
 +# Fetch the os_version from the distro using this profile.
 +#set os_version = $getVar('​os_version',''​)
 +
 +# Fetch Ubuntu version (e.g., 14.04)
 +#set distro_ver = $getVar('​distro',''​).split("​-"​)[1]
 +
 +# Fetch Ubuntu major version (e.g., 14)
 +#set distro_ver_major = $distro_ver.split("​."​)[0]
 +
 +### Apt setup
 +# You can choose to install non-free and contrib software.
 +#d-i apt-setup/​non-free boolean true
 +#d-i apt-setup/​contrib boolean true
 +
 +# Preseeding only locale sets language, country and locale.
 +d-i debian-installer/​locale string en_US
 +
 +# Keyboard selection.
 +# Disable automatic (interactive) keymap detection.
 +d-i console-setup/​ask_detect boolean false
 +
 +# If you select ftp, the mirror/​country string does not need to be set.
 +#d-i mirror/​protocol string ftp
 +d-i mirror/​country string manual
 +d-i mirror/​http/​hostname string archive.ubuntu.com
 +d-i mirror/​http/​directory string /ubuntu
 +d-i mirror/​suite string $os_version
 +
 +#Removes the prompt about missing modules:
 +# Continue without installing a kernel?
 +#d-i base-installer/​kernel/​skip-install boolean true
 +# Continue the install without loading kernel modules?
 +#d-i anna/​no_kernel_modules boolean true
 +
 +# Stop Ubuntu from installing random kernel choice
 +#d-i base-installer/​kernel/​image select none
 +
 +# Controls whether or not the hardware clock is set to UTC.
 +d-i clock-setup/​utc boolean true
 +#
 +# # You may set this to any valid setting for $TZ; see the contents of
 +# # /​usr/​share/​zoneinfo/​ for valid values.
 +d-i time/zone string Etc/UTC
 +
 +# Controls whether to use NTP to set the clock during the install
 +d-i clock-setup/​ntp boolean true
 +# NTP server to use. The default is almost always fine here.
 +d-i clock-setup/​ntp-server string pool.ntp.org
 +
 +### Partitioning
 +d-i partman/​unmount_active boolean true
 +
 +
 +#​----------------------------------------------------------------------#​
 +# Partitioning
 +d-i partman/​early_command string \
 + umount /media ; \
 + mdadm --stop /dev/md0 ; \
 + mdadm --remove /dev/md0 ; \
 + mdadm --stop /dev/md127 ; \
 + mdadm --remove /dev/md127 ; \
 +    for partition in /dev/sda* /dev/sdb*; do mdadm --zero-superblock $partition ; dd if=/​dev/​zero of=$partition bs=1M count=10; done ; \
 +    echo 1 > /​sys/​block/​sda/​device/​rescan ; \
 +    echo 1 > /​sys/​block/​sdb/​device/​rescan ; \
 +    ls -C /dev/sd*; \
 +    sleep 5; \
 + exit 0; \
 +
 +
 +# this only makes partman automatically partition without confirmation:​
 +d-i partman-partitionining/​confirm_write_new_label ​ boolean true
 +d-i partman-md/​device_remove_md ​    ​boolean true
 +d-i partman-md/​confirm_nooverwrite ​ boolean true
 +d-i partman-md/​confirm ​             boolean true
 +d-i partman-lvm/​device_remove_lvm ​  ​boolean true
 +d-i partman-lvm/​confirm_nooverwrite boolean true
 +d-i partman-lvm/​confirm ​            ​boolean true
 +d-i partman/​confirm_nooverwrite ​    ​boolean true
 +d-i partman/​choose_partition ​       select ​ finish
 +d-i partman/​confirm ​                ​boolean true
 +d-i mdadm/​boot_degraded ​            ​boolean true
 +
 +d-i partman-auto/​method string raid
 +d-i partman-auto/​disk string /dev/sda /dev/sdb
 +
 +d-i partman-auto/​expert_recipe ​     string multiraid :: \
 +    256   ​512 ​   512   ​free ​      ​$bootable{ } method{ efi } format{ } . \
 +    1024  10000  -1    raid       ​format{ } method{ raid } .
 +
 +# specify how the previously defined partitions will be
 +# used in the RAID setup.
 +d-i partman-auto-raid/​recipe string ​    \
 +    1 2 0 xfs / /​dev/​sda5#/​dev/​sdb5 .
 +
 +d-i partman/​choose_partition select Finish partitioning and write changes to disk
 +d-i partman-efi/​non_efi_system boolean true
 +
 +# Partitioning
 +#​----------------------------------------------------------------------#​
 +
 +#User account.
 +d-i passwd/​root-login boolean false 
 +d-i passwd/​make-user boolean true
 +d-i passwd/​user-fullname string cm
 +d-i passwd/​username string cm
 +d-i passwd/​user-password-crypted password $default_password_crypted
 +d-i passwd/​user-uid string 1100
 +d-i user-setup/​allow-password-weak boolean false
 +d-i user-setup/​encrypt-home boolean false
 +
 +# Individual additional packages to install
 +#if $os_version == '​precise'​
 +d-i pkgsel/​include string wget ntpdate bash sudo openssh-server
 +#else if int($distro_ver_major) == 16
 +d-i pkgsel/​include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl
 +#else if int($distro_ver_major) == 18
 +d-i pkgsel/​include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown python ntp curl
 +#else if int($distro_ver_major) >= 20
 +d-i pkgsel/​include string u-boot-tools pastebinit initramfs-tools wget linux-firmware ntpdate bash devmem2 fbset sudo openssh-server gawk gdisk ethtool net-tools ifupdown ntp curl gpg
 +#else
 +d-i pkgsel/​include string u-boot-tools pastebinit initramfs-tools wget linux-firmware linux-firmware-nonfree ntpdate bash devmem2 fbset sudo openssh-server udev-discover gawk gdisk ethtool curl
 +#end if
 +
 +# Whether to upgrade packages after debootstrap.
 +# Allowed values: none, safe-upgrade,​ full-upgrade
 +d-i pkgsel/​upgrade select safe-upgrade
 +
 +# Policy for applying updates. May be "​none"​ (no automatic updates),
 +# "​unattended-upgrades"​ (install security updates automatically),​ or
 +# "​landscape"​ (manage system with Landscape).
 +d-i pkgsel/​update-policy select none
 +
 +# During installations from serial console, the regular virtual consoles
 +# (VT1-VT6) are normally disabled in /​etc/​inittab. Uncomment the next
 +# line to prevent this.
 +d-i finish-install/​keep-consoles boolean true
 +
 +# Avoid that last message about the install being complete.
 +d-i finish-install/​reboot_in_progress note
 +
 +# This command is run just before the install finishes, but when there is
 +# still a usable /target directory. You can chroot to /target and use it
 +# directly, or use the apt-install and in-target commands to easily install
 +# packages and run commands in the target system.
 +
 +# cephlab_preseed_late lives in /​var/​lib/​cobbler/​scripts
 +# It is passed to the cobbler xmlrpc generate_scripts function where it's rendered.
 +# This means that snippets or other templating features can be used.
 +d-i preseed/​late_command string \
 +in-target wget http://​$http_server/​cblr/​svc/​op/​script/​system/​$system_name/?​script=cephlab_preseed_late -O /​tmp/​postinst.sh;​ \
 +in-target /bin/chmod 755 /​tmp/​postinst.sh;​ \
 +in-target /​tmp/​postinst.sh;​
 +</​code>​
hardware/ivan.1652188723.txt.gz · Last modified: 2022/05/10 13:18 by djgalloway