User Tools

Site Tools


services:rhev

RHEV

Summary

We have have a RHEV instance running on hv{01..04} as the main hypervisor nodes. They're listed as Hosts in the RHEV Manager.

The RHEV Manager is a Self-Hosted VM inside the cluster. The username for logging in is admin and the password is our standard root password.

The purpose of the cluster is to provide High Availability for our critical services such as OpenVPN and DNS.


Storage

Two new storage chassis are being used as the storage nodes. ssdstore{01,02}.front.sepia.ceph.com are populated with 8x 1.5TB NVMe drives in software RAID6 configuration.

A third host, senta01, is configured as the arbiter node for the Gluster volume. 2x 240GB SSD drives are in a software RAID1 and mounted at /gluster.


Gluster

All VMs (except the Hosted Engine which is on the hosted-engine volume) are backed by a sharded Gluster volume, ssdstorage. A sharded volume was chosen to decrease the time needed for the volume to heal after a storage failure. This should reduce VM downtime in the event of a storage node failure.

If there is a storage node failure, RHEV will use the remaining Gluster node and Gluster will automatically heal as part of the recovery process. It's possible a VM will be paused if its VM disk image changed while one of the storage nodes was down. Run gluster volume heal ssdstorage info to see heal status.

A single software RAID6 was decided upon as the most redundant and reliable storage configuration. See the graph below comparing the old storage as well as tests of various RAID5 and RAID6 configurations.


Backups

The HostedEngine VM (mgr01) has a crontab entry that backs up the RHV Manager every day. gitbuilder.ceph.com (gitbuilder-archive) pulls that backup file during its daily backup routine as long as mgr01 is reachable via the VPN tunnel.

If needed, the HostedEngine VM can be restored using one of these backup files. The backup includes a copy of the PostgreSQL database containing all metadata for the RHEV cluster.

Here's the cronjob that gets run on the RHEV-Manager VM

@daily rm -f /root/backups/backup* && engine-backup --mode=backup --scope=all --file=/root/backups/backup.tar.gz --log=/root/backups/backup.log

RHEV-Manager VM

If the HostedEngine VM dies, high availability of the RHEV VMs is lost. VMs will stay up, however. In other words, if the HostedEngine VM dies and a hypervisor host fails as well, the VMs that were running on the downed hypervisor will not automatically migrate to the other hypervisor.

The backend storage for the VM was moved from an NFS export on store01 to a separate gluster volume using ssdstore{01,02}.

Run hosted-engine --vm-status on any hypervisor to check if the VM is running.


Other Notes

The Hypervisors (hv{01..04}) and Storage nodes (ssdstore{01..02}) have entries in /etc/hosts in case of DNS failure.


Creating New VMs

ovirt-guest-agent

After installing a new VM, be sure to install VM guest agent. This, at the very least, allows a VM's FQDN and IP address(es) to show up in the RHEV Web UI.

git clone https://github.com/djgalloway/sepia.git
cd ansible-playbooks
ansible-playbook ovirt-guest-agent.yml --limit="$NEW_VM"

Migrating libvirt/KVM disk to oVirt

Ubuntu

# Get the latest import-to-virt.pl from http://git.annexia.org/?p=import-to-ovirt.git;a=summary

# From the baremetal host running the VM to migrate,
apt-get install libxml-writer-perl libguestfs-perl nfs-common libguestfs-tools
update-guestfs-appliance
mkdir /export
mount -o v3 store01.front.sepia.ceph.com:/srv/rhev_export /export
import-to-ovirt.pl /path/to/disk.img /export

# Follow instructions in import-to-ovirt.pl output
# Once imported, adjust vCPUs and Memory as needed
# Also, add a NIC.  You can re-use the same MAC address

Troubleshooting

/var/log/vdsm/vdsm.log is a useful place to check for errors on the hypervisors. To check for errors relating to the HostedEngine VM, see /var/log/ovirt-hosted-engine-ha/agent.log.

The vdsmd, ovirt-ha-agent, and ovirt-ha-broker services can be restarted on hypervisors without affecting running VMs.

Emergency RHEV Web UI Access w/o VPN

In the event the OpenVPN gateway VM is inaccessible/locked up/whatever, you can open an SSH tunnel (ssh -D 9999 $YOURUSER@8.43.84.133) and set your browser's proxy settings to SOCKS5 localhost:9999 to get at the RHEV web UI. That public IP is on store01 and is a leftover artifact from when store01 ran OpenVPN.


Maintenance

Quirks Encountered

FIXED: See https://bugzilla.redhat.com/show_bug.cgi?id=1363926#c2

If the HostedEngine VM goes offline, it may not come back up. After a long night of troubleshooting, it was discovered that the XML for the VM stored on the NFS “rhevstor” export is invalid. To work around this error,

In case the RHEV-Manager VM fails to start for whatever reason, try this first:

hosted-engine --vm-start --vm-conf=/etc/ovirt-hosted-engine/vm.conf
OR
hosted-engine --vm-start --vm-conf=/var/run/ovirt-hosted-engine-ha/vm.conf

Updating Hypervisors

hv{01..04} are running production RHEL7 and registered with RHSM.

I used to have a summary of steps here but it's safer to just follow the Red Hat docs.

Growing a VM's virtual disk

  1. Log into the Web UI
  2. Go to Virtual Machines tab
  3. Shut down the VM
  4. Highlight the VM then open the Snapshots tab at the bottom
  5. Create a snapshot in case anything goes wrong with growing the disk
  6. Under the Disks tab, highlight the disk and click Edit
  7. Update Extend size by(GB) and save
  8. Now you'll need to grow the partition
    1. Right-click the VM and click Edit
    2. Change the boot order under Boot Options, if necessary, to PXE then HDD
    3. (Or just boot to the gparted live CD and it'll handle growing partition and ISO for you)
  9. Bring the VM back up and open a console
  10. Choose inktank-rescue from the Cobbler PXE menu (Assumes not managed by Cobbler)
  11. Log in as root
    1. If the filesystem is xfs instead of ext,
      1. mount /dev/vda3 /mnt/vda3
      2. xfs_growfs /dev/vda3
  12. Reboot to the OS

If anything goes wrong, you can shut the VM down, select the snapshot you made, click Preview and boot back to it to revert your changes. Click Commit if you want to permanently revert to the snapshot.

Onlining Hot-Plugged CPU/RAM

services/rhev.txt · Last modified: 2018/03/21 13:56 by djgalloway