This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
services:rhev [2018/02/16 20:32] djgalloway |
services:rhev [2023/02/08 12:40] akraitman [Updating Hypervisors] |
||
---|---|---|---|
Line 2: | Line 2: | ||
===== Summary ===== | ===== Summary ===== | ||
We have have a RHEV instance running on [[hardware:infrastructure#hv_0103|hv{01..04}]] as the main hypervisor nodes. They're listed as **Hosts** in the RHEV Manager. | We have have a RHEV instance running on [[hardware:infrastructure#hv_0103|hv{01..04}]] as the main hypervisor nodes. They're listed as **Hosts** in the RHEV Manager. | ||
+ | |||
+ | Currently the RHEV Hosts installed version is 4.3.5-1 | ||
The [[http://mgr01.front.sepia.ceph.com|RHEV Manager]] is a [[https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/|Self-Hosted VM]] inside the cluster. The username for logging in is ''admin'' and the password is our standard root password. | The [[http://mgr01.front.sepia.ceph.com|RHEV Manager]] is a [[https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/self-hosted_engine_guide/|Self-Hosted VM]] inside the cluster. The username for logging in is ''admin'' and the password is our standard root password. | ||
Line 52: | Line 54: | ||
The Hypervisors (hv{01..04}) and Storage nodes (ssdstore{01..02}) have entries in ''/etc/hosts'' in case of DNS failure. | The Hypervisors (hv{01..04}) and Storage nodes (ssdstore{01..02}) have entries in ''/etc/hosts'' in case of DNS failure. | ||
+ | Note: it is important that the version of glusterfs packages on the hypervisors does not exceed the version on the storage nodes (i.e. client is older or equal to server). | ||
---- | ---- | ||
===== Creating New VMs ===== | ===== Creating New VMs ===== | ||
+ | ==== How-To ==== | ||
+ | - Log in | ||
+ | - Go to the **Virtual Machines** tab | ||
+ | - Click **New VM** | ||
+ | - General Settings | ||
+ | - **Cluster:** ''hv_cluster'' | ||
+ | - **Operating System:** ''Linux'' | ||
+ | - **Optimized for:** ''Server'' | ||
+ | - **Name:** Whatever you want | ||
+ | - Descriptions are also nice | ||
+ | - System | ||
+ | - **Memory Size:** Up to you (it can take ''4GB'' as input and will convert) | ||
+ | - **Total Virtual CPUs:** Also up to you | ||
+ | - High Availability | ||
+ | - **Highly Available:** Checked (if desired) | ||
+ | - Set the **Priority** | ||
+ | - Boot Options | ||
+ | - Probably PXE then Hard Disk. This will boot to our Cobbler menu. | ||
+ | - You could also do CD-ROM then Hard Disk. Just check **Attach CD** and select the ISO (these are on ''store01.front.sepia.ceph.com:/srv/isos/67ff9a5d-b5da-4a2f-b5ce-2286bc82e3e4/images/11111111-1111-1111-1111-111111111111'' if you want to add one) | ||
+ | - **OK** | ||
+ | - Now highlight your new VM | ||
+ | - At the bottom, **Disks** tab | ||
+ | - **New** | ||
+ | - Set the **Size** | ||
+ | - **Storage Domain:** ''ssdstorage'' | ||
+ | - **Allocation Policy:** ''Preallocated'' if IO performance is important | ||
+ | - ''Preallocated'' will take longer to create the disk but IO performance in the VM will be faster | ||
+ | - ''Thin Provision'' is almost immediate during VM creation but may slow down VM IO performance | ||
+ | - **OK** | ||
+ | - At the bottom, **Network Interfaces** tab | ||
+ | - **New** | ||
+ | - **Profile** should be ''front'' or ''wan'' (or both [one of them being on a second NIC] if desired). | ||
+ | - **OK** | ||
+ | - Now power the VM up (green arrow) and open the console (little computer monitor icon) | ||
+ | - You can either | ||
+ | - Select an entry from the Cobbler PXE menu (if the new VM is **NOT** in the ansible inventory | ||
+ | - Make sure you press ''[Tab]'' and delete almost all of the kickstart parameters (the ''ks='' most importantly) | ||
+ | - Add the host to the ansible inventory, and thus, Cobbler, DNS, and DHCP, then set a kickstart in the Cobbler Web UI (see below) | ||
+ | |||
+ | === Using a Kickstart with Cobbler === | ||
+ | The Sepia Cobbler instance has some kickstart profiles that will automate RHV VM installation. I think in order to use these, you'd have to get the MAC from the **Network Interfaces** tab in RHV, then put your new VM in the ''ceph-sepia-secrets'' [[https://github.com/ceph/ceph-sepia-secrets/blob/master/ansible/inventory/sepia|ansible inventory]]. | ||
+ | |||
+ | Then run: | ||
+ | - ''%%ansible-playbook cobbler.yml --tags systems%%'' | ||
+ | - ''%%ansible-playbook dhcp-server.yml%%'' | ||
+ | - ''%%ansible-playbook nameserver.yml --tags records%%'' | ||
+ | |||
+ | (See https://wiki.sepia.ceph.com/doku.php?id=tasks:adding_new_machines for more info) | ||
+ | |||
+ | In cobbler, you can browse to the system and set the ''Profile'' and ''Kickstart'': | ||
+ | * ''dgalloway-ubuntu-vm'' - Installs a basic Ubuntu installation using the entire disk and ''ext4'' filesystem. I couldn't get ''xfs'' working. | ||
+ | * ''dgalloway-rhel-vm'' - I don't remember if this one works but you can try. | ||
+ | |||
+ | === A note about installing RHEL/CentOS === | ||
+ | You need to specify the URL for the installation repo as a kernel parameter. So in the Cobbler PXE menu, when you hit ''[Tab]'', add ''%%ksdevice=link inst.repo=http://172.21.0.11/cobbler/ks_mirror/CentOS-X.X-x86_64%%'' replacing X.X with the appropriate version. | ||
+ | |||
+ | Otherwise you'll end up with an error like ''dracut initqueue timeout'' and the installer dies. | ||
+ | |||
==== ovirt-guest-agent ==== | ==== ovirt-guest-agent ==== | ||
After installing a new VM, be sure to install VM guest agent. This, at the very least, allows a VM's FQDN and IP address(es) to show up in the RHEV Web UI. | After installing a new VM, be sure to install VM guest agent. This, at the very least, allows a VM's FQDN and IP address(es) to show up in the RHEV Web UI. | ||
Line 63: | Line 124: | ||
ansible-playbook ovirt-guest-agent.yml --limit="$NEW_VM" | ansible-playbook ovirt-guest-agent.yml --limit="$NEW_VM" | ||
</code> | </code> | ||
+ | |||
+ | ==== Migrating from Openstack to oVirt ==== | ||
+ | Taken from https://docs.fuga.cloud/how-to-migrate-a-volume-from-one-openstack-provider-to-another | ||
+ | |||
+ | I think these steps are only applicable if the instance has a standalone volume for the root disk (which I try to do for most Openstack instances) | ||
+ | |||
+ | <code> | ||
+ | # Find UUID of the instance's root drive | ||
+ | openstack server list | ||
+ | # Create a snapshot of the volume | ||
+ | openstack volume snapshot create --volume $UUID_OF_ROOT_VOLUME --force telemetry-snapshot | ||
+ | # Create a separate *volume* from the snapshot | ||
+ | openstack volume create --snapshot telemetry-snapshot --size 50 telemetry-volume | ||
+ | # Create an image from the volume (this will take a long time) | ||
+ | openstack image create --volume telemetry-volume telemetry-image | ||
+ | # Download the image (this will take a long time) | ||
+ | openstack image save --file snapshot.raw telemetry-image | ||
+ | </code> | ||
+ | |||
+ | Now proceed with the steps below. | ||
==== Migrating libvirt/KVM disk to oVirt ==== | ==== Migrating libvirt/KVM disk to oVirt ==== | ||
Line 82: | Line 163: | ||
---- | ---- | ||
+ | |||
+ | ==== Fixes to VMs imported from Openstack ==== | ||
+ | - In RHV, boot into a system rescue ISO (set the boot order for the VM to CD then HDD) | ||
+ | - Mount the root disk and modify ''/etc/fstab'' if needed | ||
+ | - Edit ''/boot/grub/grub.cfg'' removing any ''console='' lines from the default boot entry | ||
+ | - (Make sure you make these changes persistent for subsequent reboots. See https://askubuntu.com/a/921830/906620, for example) | ||
+ | - Edit network config | ||
+ | - It's also probably beneficial to remove cloud-init: e.g., ''apt-get purge cloud-init'' | ||
+ | - Even though cloud-init is purged, its grub.d settings still get read. | ||
+ | - It might work to just delete ''/etc/default/grub.d/50-cloudimg-settings.cfg'' but otherwise, | ||
+ | - Modify it and get rid of any ''console='' parameters | ||
+ | - Run ''update-grub'' | ||
===== Troubleshooting ===== | ===== Troubleshooting ===== | ||
Line 90: | Line 183: | ||
==== Emergency RHEV Web UI Access w/o VPN ==== | ==== Emergency RHEV Web UI Access w/o VPN ==== | ||
In the event the OpenVPN gateway VM is inaccessible/locked up/whatever, you can open an SSH tunnel (''ssh -D 9999 $YOURUSER@8.43.84.133'') and set your browser's proxy settings to SOCKS5 localhost:9999 to get at the RHEV web UI. That public IP is on store01 and is a leftover artifact from when store01 ran OpenVPN. | In the event the OpenVPN gateway VM is inaccessible/locked up/whatever, you can open an SSH tunnel (''ssh -D 9999 $YOURUSER@8.43.84.133'') and set your browser's proxy settings to SOCKS5 localhost:9999 to get at the RHEV web UI. That public IP is on store01 and is a leftover artifact from when store01 ran OpenVPN. | ||
+ | |||
+ | ==== GFIDs listed in ''gluster volume heal ssdstorage info'' forever ==== | ||
+ | This is https://bugzilla.redhat.com/show_bug.cgi?id=1361518. | ||
+ | |||
+ | As long as the unsynced entries are GFIDs only and they only appear under the arbiter (senta01) server, you can paste **just** the GFIDs into a ''/tmp/gfids'' file and run the following script: | ||
+ | |||
+ | <code> | ||
+ | #!/bin/bash | ||
+ | set -ex | ||
+ | |||
+ | VOLNAME=ssdstorage | ||
+ | for id in $(gluster volume heal $VOLNAME info | egrep '[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{8}' -o); do | ||
+ | file=$(find /gluster/arbiter/.glusterfs -name $id -not -path '/gluster/arbiter/.glusterfs/indices/*' -type f) | ||
+ | if [ $(getfattr -d -m . -e hex $(echo $file) | grep trusted.afr.$VOLNAME* | grep "0x000000" | wc -l) == 2 ]; then | ||
+ | echo "deleting xattr for gfid $id" | ||
+ | for i in $(getfattr -d -m . -e hex $(echo $file) |grep trusted.afr.$VOLNAME*|cut -f1 -d'='); do | ||
+ | setfattr -x $i $(echo $file) | ||
+ | done | ||
+ | else | ||
+ | echo "not deleting xattr for gfid $id" | ||
+ | fi | ||
+ | done | ||
+ | </code> | ||
---- | ---- | ||
Line 111: | Line 227: | ||
I used to have a summary of steps here but it's safer to just follow the [[https://access.redhat.com/documentation/en-us/red_hat_virtualization/|Red Hat docs]]. | I used to have a summary of steps here but it's safer to just follow the [[https://access.redhat.com/documentation/en-us/red_hat_virtualization/|Red Hat docs]]. | ||
+ | ==== VM has paused due to no storage space error ==== | ||
+ | We started seeing this issue on VMs like teuthology and it looks like it's a known bug I updated /etc/vdsm/vdsm.conf.d/99-local.conf and restarted systemctl restart vdsmd as described here: | ||
+ | |||
+ | https://access.redhat.com/solutions/130843 | ||
==== Growing a VM's virtual disk ==== | ==== Growing a VM's virtual disk ==== | ||
- Log into the [[https://mgr01.front.sepia.ceph.com|Web UI]] | - Log into the [[https://mgr01.front.sepia.ceph.com|Web UI]] | ||
Line 133: | Line 253: | ||
If anything goes wrong, you can shut the VM down, select the snapshot you made, click **Preview** and boot back to it to revert your changes. Click **Commit** if you want to permanently revert to the snapshot. | If anything goes wrong, you can shut the VM down, select the snapshot you made, click **Preview** and boot back to it to revert your changes. Click **Commit** if you want to permanently revert to the snapshot. | ||
+ | |||
+ | ==== Onlining Hot-Plugged CPU/RAM ==== | ||
+ | https://askubuntu.com/a/764621 | ||
+ | |||
+ | <code> | ||
+ | #!/bin/bash | ||
+ | # Based on script by William Lam - http://engineering.ucsb.edu/~duonglt/vmware/ | ||
+ | |||
+ | # Bring CPUs online | ||
+ | for CPU_DIR in /sys/devices/system/cpu/cpu[0-9]* | ||
+ | do | ||
+ | CPU=${CPU_DIR##*/} | ||
+ | echo "Found cpu: '${CPU_DIR}' ..." | ||
+ | CPU_STATE_FILE="${CPU_DIR}/online" | ||
+ | if [ -f "${CPU_STATE_FILE}" ]; then | ||
+ | if grep -qx 1 "${CPU_STATE_FILE}"; then | ||
+ | echo -e "\t${CPU} already online" | ||
+ | else | ||
+ | echo -e "\t${CPU} is new cpu, onlining cpu ..." | ||
+ | echo 1 > "${CPU_STATE_FILE}" | ||
+ | fi | ||
+ | else | ||
+ | echo -e "\t${CPU} already configured prior to hot-add" | ||
+ | fi | ||
+ | done | ||
+ | |||
+ | # Bring all new Memory online | ||
+ | for RAM in $(grep line /sys/devices/system/memory/*/state) | ||
+ | do | ||
+ | echo "Found ram: ${RAM} ..." | ||
+ | if [[ "${RAM}" == *":offline" ]]; then | ||
+ | echo "Bringing online" | ||
+ | echo $RAM | sed "s/:offline$//"|sed "s/^/echo online > /"|source /dev/stdin | ||
+ | else | ||
+ | echo "Already online" | ||
+ | fi | ||
+ | done | ||
+ | </code> |