User Tools

Site Tools


services:rhev

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
services:rhev [2018/02/16 20:26]
djgalloway [Quirks Encountered]
services:rhev [2023/02/08 12:40] (current)
akraitman [Updating Hypervisors]
Line 1: Line 1:
 ====== RHEV ====== ====== RHEV ======
 ===== Summary ===== ===== Summary =====
-We have have a RHEV instance running on [[hardware:​infrastructure#​hv_0103|hv{01..03}]] as the main hypervisor nodes. ​ They'​re listed as **Hosts** in the RHEV Manager.+We have have a RHEV instance running on [[hardware:​infrastructure#​hv_0103|hv{01..04}]] as the main hypervisor nodes. ​ They'​re listed as **Hosts** in the RHEV Manager.
  
-The [[http://​mgr01.front.sepia.ceph.com|RHEV Manager]] is a [[https://​access.redhat.com/​documentation/​en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Installation_Guide/chap-The_Self-Hosted_Engine.html|Self-Hosted VM]] inside the cluster. ​ The username for logging in is ''​admin''​ and the password is our standard root password.+Currently the RHEV Hosts installed version is 4.3.5-1 
 + 
 +The [[http://​mgr01.front.sepia.ceph.com|RHEV Manager]] is a [[https://​access.redhat.com/​documentation/​en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/​|Self-Hosted VM]] inside the cluster. ​ The username for logging in is ''​admin''​ and the password is our standard root password.
  
 The purpose of the cluster is to provide High Availability for our critical services such as [[services:​OpenVPN]] and [[services:​DNS]]. The purpose of the cluster is to provide High Availability for our critical services such as [[services:​OpenVPN]] and [[services:​DNS]].
Line 28: Line 30:
  
 ===== Backups ===== ===== Backups =====
-The HostedEngine VM (mgr01) ​runs daily script ​that keeps 2 weeks worth of backups. ​ See ''/​root/​bin/​backup.sh''​.  gitbuilder.ceph.com (gitbuilder-archive) pulls that backup file during its daily backup routine as long as mgr01 is reachable via the VPN tunnel.+The HostedEngine VM (mgr01) ​has crontab entry that backs up the RHV Manager every day.  gitbuilder.ceph.com (gitbuilder-archive) pulls that backup file during its daily backup routine as long as mgr01 is reachable via the VPN tunnel.
  
 If needed, the HostedEngine VM can be restored using one of these backup files. ​ The backup includes a copy of the PostgreSQL database containing all metadata for the RHEV cluster. If needed, the HostedEngine VM can be restored using one of these backup files. ​ The backup includes a copy of the PostgreSQL database containing all metadata for the RHEV cluster.
Line 52: Line 54:
 The Hypervisors (hv{01..04}) and Storage nodes (ssdstore{01..02}) have entries in ''/​etc/​hosts''​ in case of DNS failure. The Hypervisors (hv{01..04}) and Storage nodes (ssdstore{01..02}) have entries in ''/​etc/​hosts''​ in case of DNS failure.
  
 +Note: it is important that the version of glusterfs packages on the hypervisors does not exceed the version on the storage nodes (i.e. client is older or equal to server).
 ---- ----
  
 ===== Creating New VMs ===== ===== Creating New VMs =====
 +==== How-To ====
 +  - Log in
 +  - Go to the **Virtual Machines** tab
 +  - Click **New VM**
 +  - General Settings
 +    - **Cluster:​** ''​hv_cluster''​
 +    - **Operating System:** ''​Linux''​
 +    - **Optimized for:** ''​Server''​
 +    - **Name:** Whatever you want
 +    - Descriptions are also nice
 +  - System
 +    - **Memory Size:** Up to you (it can take ''​4GB''​ as input and will convert)
 +    - **Total Virtual CPUs:** Also up to you
 +  - High Availability
 +    - **Highly Available:​** Checked (if desired)
 +    - Set the **Priority**
 +  - Boot Options
 +    - Probably PXE then Hard Disk.  This will boot to our Cobbler menu.
 +    - You could also do CD-ROM then Hard Disk.  Just check **Attach CD** and select the ISO (these are on ''​store01.front.sepia.ceph.com:/​srv/​isos/​67ff9a5d-b5da-4a2f-b5ce-2286bc82e3e4/​images/​11111111-1111-1111-1111-111111111111''​ if you want to add one)
 +  - **OK**
 +  - Now highlight your new VM
 +  - At the bottom, **Disks** tab
 +    - **New**
 +    - Set the **Size**
 +    - **Storage Domain:** ''​ssdstorage''​
 +    - **Allocation Policy:** ''​Preallocated''​ if IO performance is important
 +      - ''​Preallocated''​ will take longer to create the disk but IO performance in the VM will be faster
 +      - ''​Thin Provision''​ is almost immediate during VM creation but may slow down VM IO performance
 +    - **OK**
 +  - At the bottom, **Network Interfaces** tab
 +    - **New**
 +    - **Profile** should be ''​front''​ or ''​wan''​ (or both [one of them being on a second NIC] if desired).
 +    - **OK**
 +  - Now power the VM up (green arrow) and open the console (little computer monitor icon)
 +  - You can either
 +    - Select an entry from the Cobbler PXE menu (if the new VM is **NOT** in the ansible inventory
 +      - Make sure you press ''​[Tab]''​ and delete almost all of the kickstart parameters (the ''​ks=''​ most importantly)
 +    - Add the host to the ansible inventory, and thus, Cobbler, DNS, and DHCP, then set a kickstart in the Cobbler Web UI (see below)
 +
 +=== Using a Kickstart with Cobbler ===
 +The Sepia Cobbler instance has some kickstart profiles that will automate RHV VM installation. ​ I think in order to use these, you'd have to get the MAC from the **Network Interfaces** tab in RHV, then put your new VM in the ''​ceph-sepia-secrets''​ [[https://​github.com/​ceph/​ceph-sepia-secrets/​blob/​master/​ansible/​inventory/​sepia|ansible inventory]].
 +
 +Then run:
 +  - ''​%%ansible-playbook cobbler.yml --tags systems%%''​
 +  - ''​%%ansible-playbook dhcp-server.yml%%''​
 +  - ''​%%ansible-playbook nameserver.yml --tags records%%''​
 +
 +(See https://​wiki.sepia.ceph.com/​doku.php?​id=tasks:​adding_new_machines for more info)
 +
 +In cobbler, you can browse to the system and set the ''​Profile''​ and ''​Kickstart'':​
 +  * ''​dgalloway-ubuntu-vm''​ - Installs a basic Ubuntu installation using the entire disk and ''​ext4''​ filesystem. ​ I couldn'​t get ''​xfs''​ working.
 +  * ''​dgalloway-rhel-vm''​ - I don't remember if this one works but you can try.
 +
 +=== A note about installing RHEL/CentOS ===
 +You need to specify the URL for the installation repo as a kernel parameter. ​ So in the Cobbler PXE menu, when you hit ''​[Tab]'',​ add ''​%%ksdevice=link inst.repo=http://​172.21.0.11/​cobbler/​ks_mirror/​CentOS-X.X-x86_64%%''​ replacing X.X with the appropriate version.
 +
 +Otherwise you'll end up with an error like ''​dracut initqueue timeout''​ and the installer dies.
 +
 ==== ovirt-guest-agent ==== ==== ovirt-guest-agent ====
 After installing a new VM, be sure to install VM guest agent. ​ This, at the very least, allows a VM's FQDN and IP address(es) to show up in the RHEV Web UI. After installing a new VM, be sure to install VM guest agent. ​ This, at the very least, allows a VM's FQDN and IP address(es) to show up in the RHEV Web UI.
Line 63: Line 124:
 ansible-playbook ovirt-guest-agent.yml --limit="​$NEW_VM"​ ansible-playbook ovirt-guest-agent.yml --limit="​$NEW_VM"​
 </​code>​ </​code>​
 +
 +==== Migrating from Openstack to oVirt ====
 +Taken from https://​docs.fuga.cloud/​how-to-migrate-a-volume-from-one-openstack-provider-to-another
 +
 +I think these steps are only applicable if the instance has a standalone volume for the root disk (which I try to do for most Openstack instances)
 +
 +<​code>​
 +# Find UUID of the instance'​s root drive
 +openstack server list
 +# Create a snapshot of the volume
 +openstack volume snapshot create --volume $UUID_OF_ROOT_VOLUME --force telemetry-snapshot
 +# Create a separate *volume* from the snapshot
 +openstack volume create --snapshot telemetry-snapshot --size 50 telemetry-volume
 +# Create an image from the volume (this will take a long time)
 +openstack image create --volume telemetry-volume telemetry-image
 +# Download the image (this will take a long time)
 +openstack image save --file snapshot.raw telemetry-image
 +</​code>​
 +
 +Now proceed with the steps below.
  
 ==== Migrating libvirt/KVM disk to oVirt ==== ==== Migrating libvirt/KVM disk to oVirt ====
Line 82: Line 163:
  
 ---- ----
 +
 +==== Fixes to VMs imported from Openstack ====
 +  - In RHV, boot into a system rescue ISO (set the boot order for the VM to CD then HDD)
 +  - Mount the root disk and modify ''/​etc/​fstab''​ if needed
 +  - Edit ''/​boot/​grub/​grub.cfg''​ removing any ''​console=''​ lines from the default boot entry
 +    - (Make sure you make these changes persistent for subsequent reboots. ​ See https://​askubuntu.com/​a/​921830/​906620,​ for example)
 +  - Edit network config
 +  - It's also probably beneficial to remove cloud-init: e.g., ''​apt-get purge cloud-init''​
 +    - Even though cloud-init is purged, its grub.d settings still get read.
 +    - It might work to just delete ''/​etc/​default/​grub.d/​50-cloudimg-settings.cfg''​ but otherwise,
 +      - Modify it and get rid of any ''​console=''​ parameters
 +      - Run ''​update-grub''​
  
 ===== Troubleshooting ===== ===== Troubleshooting =====
 ''/​var/​log/​vdsm/​vdsm.log''​ is a useful place to check for errors on the hypervisors. ​ To check for errors relating to the HostedEngine VM, see ''/​var/​log/​ovirt-hosted-engine-ha/​agent.log''​. ''/​var/​log/​vdsm/​vdsm.log''​ is a useful place to check for errors on the hypervisors. ​ To check for errors relating to the HostedEngine VM, see ''/​var/​log/​ovirt-hosted-engine-ha/​agent.log''​.
 +
 +The ''​vdsmd'',​ ''​ovirt-ha-agent'',​ and ''​ovirt-ha-broker''​ services can be restarted on hypervisors without affecting running VMs.
  
 ==== Emergency RHEV Web UI Access w/o VPN ==== ==== Emergency RHEV Web UI Access w/o VPN ====
-In the event the OpenVPN gateway VM is inaccessible/​locked up/​whatever,​ you can open an SSH tunnel (''​ssh -D 9999 $YOURUSER@8.43.84.133''​) and set your browser'​s proxy settings to SOCKS5 localhost:​9999 to get at the RHEV web UI.+In the event the OpenVPN gateway VM is inaccessible/​locked up/​whatever,​ you can open an SSH tunnel (''​ssh -D 9999 $YOURUSER@8.43.84.133''​) and set your browser'​s proxy settings to SOCKS5 localhost:​9999 to get at the RHEV web UI.  That public IP is on store01 and is a leftover artifact from when store01 ran OpenVPN. 
 + 
 +==== GFIDs listed in ''​gluster volume heal ssdstorage info''​ forever ==== 
 +This is https://​bugzilla.redhat.com/​show_bug.cgi?​id=1361518. 
 + 
 +As long as the unsynced entries are GFIDs only and they only appear under the arbiter (senta01) server, you can paste **just** the GFIDs into a ''/​tmp/​gfids''​ file and run the following script: 
 + 
 +<​code>​ 
 +#​!/​bin/​bash 
 +set -ex 
 + 
 +VOLNAME=ssdstorage 
 +for id in $(gluster volume heal $VOLNAME info | egrep '​[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{8}'​ -o); do 
 +  file=$(find /​gluster/​arbiter/​.glusterfs -name $id -not -path '/​gluster/​arbiter/​.glusterfs/​indices/​*' ​ -type f) 
 +  if [ $(getfattr -d -m . -e hex $(echo $file) | grep trusted.afr.$VOLNAME* | grep "​0x000000"​ | wc -l) == 2 ]; then 
 +    echo "​deleting xattr for gfid $id" 
 +    for i in $(getfattr -d -m . -e hex $(echo $file) |grep trusted.afr.$VOLNAME*|cut -f1 -d'​='​);​ do 
 +      setfattr -x $i $(echo $file) 
 +    done 
 +  else 
 +    echo "not deleting xattr for gfid $id" 
 +  fi 
 +done 
 +</​code>​
  
 ---- ----
Line 105: Line 223:
  
 ==== Updating Hypervisors ==== ==== Updating Hypervisors ====
-hv01, hv02, and hv03 are running production RHEL7 and registered with RHSM.+hv{01..04} ​are running production RHEL7 and registered with RHSM.
  
-  - Log into the [[https://mgr01.front.sepia.ceph.com/ovirt-engine|Admin Portal]] and place hv01 into **Maintenance** mode +I used to have a summary of steps here but it's safer to just follow ​the [[https://access.redhat.com/documentation/​en-us/​red_hat_virtualization/​|Red Hat docs]].
-  - Wait for all the VMs to land on other Hosts (hypervisors) +
-  - ssh to hv01 and run the following commands in order <​code>​ +
-hosted-engine --set-maintenance --mode=local +
-yum update +
-service vdsmd restart +
-service ovirt-ha-broker restart +
-service ovirt-ha-agent restart +
-hosted-engine --set-maintenance --mode=none +
-</​code>​ +
-  - Set hv01 back to **Active** in the Admin Portal +
-  - Put hv02 in **Maintenance** mode +
-  - Wait for the VMs to land on other Hosts (hypervisors) +
-  - Repeat the commands from step 3 on hv02 +
-  - Set hv02 back to **Active** in the Admin Portal+
  
-This procedure is documented in length in the [[https://​access.redhat.com/​documentation/​en-US/​Red_Hat_Enterprise_Virtualization/​3.5/​html/​Installation_Guide/​Upgrading_the_Self-Hosted_Engine.html|Customer Portal]]. ​ However, I found some inconsistencies in the documentation so thought it'd be a good exercise to document the process as I upgrade from 3.5 to 3.6. +==== VM has paused due to no storage space error ==== 
- +We started seeing ​this issue on VMs like teuthology ​and it looks like it's a known bug updated /​etc/​vdsm/​vdsm.conf.d/99-local.conf and restarted systemctl restart vdsmd as described here:
-** Note:** Because the hypervisors are running RHEL7, the RHEV repo is called ''​rhel-7-server-rhev-mgmt-agent-rpms''​ so the individual RHEV version repos don't need to be enabled/​disabled as instructed in the guide. +
- +
-==== Updating RHEV Manager (HostedEngine/​mgr01) ​==== +
- +
-  - ssh to all hv nodes and run ''​%%hosted-engine --set-maintenance --mode=global%%''​ +
-  - ssh to mgr01.front.sepia.ceph.com +
-  - Note the current version of RHEV (3.6 as of this writing) ''​rpm -q rhevm''​ +
-  - Enable the repo containing the new version of RHEV-M where X is the new minor version<​code>​subscription-manager repos --enable=rhel-6-server-rhevm-3.X-rpms</​code>​  +
-  - Update the //​rhevm-setup//​ package <​code>​yum update rhevm-setup</​code>​ +
-  - Run ''​engine-setup'' ​and follow prompts +
-  - Use this (for real.. ​it might save your ass): https://​access.redhat.com/​labs/​rhevupgradehelper/​ +
- +
-** Lessons learned:​** +
- +
-  * The RHEV-M VM takes a loooong time to come back up and since it'​s ​running java, is resource-heavy.. +
-  * The RHEV-M VM immediately set 2-minute timer to shut down after enabled the newer repo ​According to the RHEV docs, ''​%%hosted-engine --set-maintenance --mode=global%%''​ on the **hypervisors** should mitigate this. +
-  * Run ''​%%hosted-engine --vm-start%%''​ to bring it back up on either hypervisor if the VM goes down It takes a while. +
-  * I think the RHEV-M VM is supposed to be on RHEL7 but, again, docs were not entirely clear. ​ The VM does need, at least, the following repos enabled for ''​engine-setup''​ to upgrade the host safely<​code>​ +
-jb-eap-6-for-rhel-6-server-rpms +
-rhel-6-server-rhevm-3.6-rpms # AND previous version +
-rhel-6-server-rhevm-3.5-rpms +
-rhel-6-server-rpms +
-rhel-6-server-supplementary-rpms +
-</​code>​ +
-  * The orange exclamation point next to a VM means it was modified (e.g., Added more vCPUs) ​and needs to be powered off and powered back on (not rebooted) for changes to take effect+
  
 +https://​access.redhat.com/​solutions/​130843
 ==== Growing a VM's virtual disk ==== ==== Growing a VM's virtual disk ====
   - Log into the [[https://​mgr01.front.sepia.ceph.com|Web UI]]   - Log into the [[https://​mgr01.front.sepia.ceph.com|Web UI]]
Line 173: Line 253:
  
 If anything goes wrong, you can shut the VM down, select the snapshot you made, click **Preview** and boot back to it to revert your changes. ​ Click **Commit** if you want to permanently revert to the snapshot. If anything goes wrong, you can shut the VM down, select the snapshot you made, click **Preview** and boot back to it to revert your changes. ​ Click **Commit** if you want to permanently revert to the snapshot.
 +
 +==== Onlining Hot-Plugged CPU/RAM ====
 +https://​askubuntu.com/​a/​764621
 +
 +<​code>​
 +#!/bin/bash
 +# Based on script by William Lam - http://​engineering.ucsb.edu/​~duonglt/​vmware/​
 +
 +# Bring CPUs online
 +for CPU_DIR in /​sys/​devices/​system/​cpu/​cpu[0-9]*
 +do
 +    CPU=${CPU_DIR##​*/​}
 +    echo "Found cpu: '​${CPU_DIR}'​ ..."
 +    CPU_STATE_FILE="​${CPU_DIR}/​online"​
 +    if [ -f "​${CPU_STATE_FILE}"​ ]; then
 +        if grep -qx 1 "​${CPU_STATE_FILE}";​ then
 +            echo -e "​\t${CPU} already online"​
 +        else
 +            echo -e "​\t${CPU} is new cpu, onlining cpu ..."
 +            echo 1 > "​${CPU_STATE_FILE}"​
 +        fi
 +    else 
 +        echo -e "​\t${CPU} already configured prior to hot-add"​
 +    fi
 +done
 +
 +# Bring all new Memory online
 +for RAM in $(grep line /​sys/​devices/​system/​memory/​*/​state)
 +do
 +    echo "Found ram: ${RAM} ..."
 +    if [[ "​${RAM}"​ == *":​offline"​ ]]; then
 +        echo "​Bringing online"​
 +        echo $RAM | sed "​s/:​offline$//"​|sed "​s/​^/​echo online > /"​|source /dev/stdin
 +    else
 +        echo "​Already online"​
 +    fi
 +done
 +</​code>​
services/rhev.1518812777.txt.gz · Last modified: 2018/02/16 20:26 by djgalloway