User Tools

Site Tools


services:rhev

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
services:rhev [2022/03/11 15:18]
djgalloway
services:rhev [2024/08/23 00:41] (current)
dmick [Gluster]
Line 2: Line 2:
 ===== Summary ===== ===== Summary =====
 We have have a RHEV instance running on [[hardware:​infrastructure#​hv_0103|hv{01..04}]] as the main hypervisor nodes. ​ They'​re listed as **Hosts** in the RHEV Manager. We have have a RHEV instance running on [[hardware:​infrastructure#​hv_0103|hv{01..04}]] as the main hypervisor nodes. ​ They'​re listed as **Hosts** in the RHEV Manager.
 +
 +Currently the RHEV Hosts installed version is 4.3.5-1
  
 The [[http://​mgr01.front.sepia.ceph.com|RHEV Manager]] is a [[https://​access.redhat.com/​documentation/​en-us/​red_hat_virtualization/​4.1/​html/​self-hosted_engine_guide/​|Self-Hosted VM]] inside the cluster. ​ The username for logging in is ''​admin''​ and the password is our standard root password. The [[http://​mgr01.front.sepia.ceph.com|RHEV Manager]] is a [[https://​access.redhat.com/​documentation/​en-us/​red_hat_virtualization/​4.1/​html/​self-hosted_engine_guide/​|Self-Hosted VM]] inside the cluster. ​ The username for logging in is ''​admin''​ and the password is our standard root password.
Line 10: Line 12:
  
 ===== Storage ===== ===== Storage =====
-Two new storage chassis are being used as the storage nodes. ​ [[hardware:​infrastructure#​ssdstore_01_02_frontsepiacephcom|ssdstore{01,​02}.front.sepia.ceph.com]] are populated with 8x 1.5TB NVMe drives in software RAID6 configuration. 
  
-A third host, [[hardware:senta|senta01]], is configured as the arbiter node for the Gluster volume. ​ 2x 240GB SSD drives are in a software RAID1 and mounted at ''/​gluster''​.+**Note: this was the original configuration. ​ Storage is now provided by an [[services:longrunningcluster#​lrc_iscsi_volume_for_the_rhev_cluster|iscsi service]] on the long-running cluster **
  
 +<​del>​Two new storage chassis are being used as the storage nodes. ​ [[hardware:​infrastructure#​ssdstore_01_02_frontsepiacephcom|ssdstore{01,​02}.front.sepia.ceph.com]] are populated with 8x 1.5TB NVMe drives in software RAID6 configuration.
 +
 +A third host, [[hardware:​senta|senta01]],​ is configured as the arbiter node for the Gluster volume. ​ 2x 240GB SSD drives are in a software RAID1 and mounted at ''/​gluster''​.
 +</​del>​
 ---- ----
  
 ==== Gluster ==== ==== Gluster ====
-All VMs (except the Hosted Engine which is on the ''​hosted-engine''​ volume) are backed by a sharded Gluster volume, ''​ssdstorage''​. ​ A sharded volume was chosen to decrease the time needed for the volume to heal after a storage failure. ​ This should reduce VM downtime in the event of a storage node failure.+<del>All VMs (except the Hosted Engine which is on the ''​hosted-engine''​ volume) are backed by a sharded Gluster volume, ''​ssdstorage''​. ​ A sharded volume was chosen to decrease the time needed for the volume to heal after a storage failure. ​ This should reduce VM downtime in the event of a storage node failure.
  
 If there is a storage node failure, RHEV will use the remaining Gluster node and Gluster will automatically heal as part of the recovery process. ​ It's possible a VM will be paused if its VM disk image changed while one of the storage nodes was down.  Run ''​gluster volume heal ssdstorage info''​ to see heal status. If there is a storage node failure, RHEV will use the remaining Gluster node and Gluster will automatically heal as part of the recovery process. ​ It's possible a VM will be paused if its VM disk image changed while one of the storage nodes was down.  Run ''​gluster volume heal ssdstorage info''​ to see heal status.
  
-A single software RAID6 was decided upon as the most redundant and reliable storage configuration. ​ See the graph below comparing the old storage as well as tests of various RAID5 and RAID6 configurations.+A single software RAID6 was decided upon as the most redundant and reliable storage configuration. ​ See the graph below comparing the old storage as well as tests of various RAID5 and RAID6 configurations.</​del>​
  
 {{ :​services:​screenshot_at_2017-07-05_16-12-30.png |}} {{ :​services:​screenshot_at_2017-07-05_16-12-30.png |}}
Line 52: Line 57:
 The Hypervisors (hv{01..04}) and Storage nodes (ssdstore{01..02}) have entries in ''/​etc/​hosts''​ in case of DNS failure. The Hypervisors (hv{01..04}) and Storage nodes (ssdstore{01..02}) have entries in ''/​etc/​hosts''​ in case of DNS failure.
  
 +Note: it is important that the version of glusterfs packages on the hypervisors does not exceed the version on the storage nodes (i.e. client is older or equal to server).
 ---- ----
  
Line 184: Line 190:
 This is https://​bugzilla.redhat.com/​show_bug.cgi?​id=1361518. This is https://​bugzilla.redhat.com/​show_bug.cgi?​id=1361518.
  
-As long as the unsynced entries are GFIDs only and they only appear under the arbiter (senta01) server, you can paste *just* the GFIDs into a ''/​tmp/​gfids''​ file and run the following script:+As long as the unsynced entries are GFIDs only and they only appear under the arbiter (senta01) server, you can paste **just** the GFIDs into a ''/​tmp/​gfids''​ file and run the following script:
  
 <​code>​ <​code>​
 #!/bin/bash #!/bin/bash
 set -ex set -ex
-for id in $(head -/tmp/gfids); do+ 
 +VOLNAME=ssdstorage 
 +for id in $(gluster volume heal $VOLNAME info | egrep '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{8}'​ -o); do
   file=$(find /​gluster/​arbiter/​.glusterfs -name $id -not -path '/​gluster/​arbiter/​.glusterfs/​indices/​*' ​ -type f)   file=$(find /​gluster/​arbiter/​.glusterfs -name $id -not -path '/​gluster/​arbiter/​.glusterfs/​indices/​*' ​ -type f)
-  if [ $(getfattr -d -m . -e hex $file | grep trusted.afr.ssdstorage* | grep "​0x000000"​ | wc -l) == 2 ]; then+  if [ $(getfattr -d -m . -e hex $(echo ​$file| grep trusted.afr.$VOLNAME* | grep "​0x000000"​ | wc -l) == 2 ]; then
     echo "​deleting xattr for gfid $id"     echo "​deleting xattr for gfid $id"
-    for i in $(getfattr -d -m . -e hex $file |grep trusted.afr.ssdstorage*|cut -f1 -d'​='​);​ do +    for i in $(getfattr -d -m . -e hex $(echo ​$file|grep trusted.afr.$VOLNAME*|cut -f1 -d'​='​);​ do 
-      setfattr -x $i $file+      setfattr -x $i $(echo ​$file)
     done     done
   else   else
Line 222: Line 230:
 I used to have a summary of steps here but it's safer to just follow the [[https://​access.redhat.com/​documentation/​en-us/​red_hat_virtualization/​|Red Hat docs]]. I used to have a summary of steps here but it's safer to just follow the [[https://​access.redhat.com/​documentation/​en-us/​red_hat_virtualization/​|Red Hat docs]].
  
 +==== VM has paused due to no storage space error ====
 +We started seeing this issue on VMs like teuthology and it looks like it's a known bug I updated /​etc/​vdsm/​vdsm.conf.d/​99-local.conf and restarted systemctl restart vdsmd as described here:
 +
 +https://​access.redhat.com/​solutions/​130843
 ==== Growing a VM's virtual disk ==== ==== Growing a VM's virtual disk ====
   - Log into the [[https://​mgr01.front.sepia.ceph.com|Web UI]]   - Log into the [[https://​mgr01.front.sepia.ceph.com|Web UI]]
services/rhev.1647011891.txt.gz · Last modified: 2022/03/11 15:18 by djgalloway