User Tools

Site Tools


services:fog

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
services:fog [2017/08/23 19:50]
djgalloway
services:fog [2023/07/14 22:41] (current)
dmick Added description of reimage flow for both fog and cobbler
Line 1: Line 1:
-====== FOG - WIP ======+====== FOG ======
 ===== Summary ===== ===== Summary =====
-We're currently testing ​[[https://​fogproject.org/​|FOG]].  ​These are just some quick how-tos and notes.+[[https://​fogproject.org/​|FOG]] ​is in use in the Sepia lab.  ​It enables us to reimage baremetal testnodes before every job.
  
-The deployment of a FOG server could be mostly automated via ansible I wrote a playbook to take the ansible inventory and make it into a FOG-consumable CSV but it still has to be manually uploaded via the web UI.+Our instance is currently hosted at http://fog.front.sepia.ceph.com/​fog.
  
-Zack is currently working on a teuthology branch that'​ll allow us to run a proof-of-concept teuthology run where the testnodes got reimaged after every job. +**Username:** fog\\  
- +**Password:** Standard root password
-Our instance is currently hosted at http://​172.21.13.245/​fog. ​ Login is fog:​password.+
  
 ===== How-Tos ===== ===== How-Tos =====
-==== Capturing ​an OS image ==== +==== Adding a new distro ==== 
-Each testnode ​machine ​type will need an image for each distro version. ​ e.g., mira_ubuntu_16.04smithi_ubuntu_14.04,​ etc.+ 
 +The [[https://​github.com/​ceph/​ceph-build/​pull/​1706|Jenkins job]] does this automatically now. 
 +==== Capturing OS images ​==== 
 +This can be done manually by basically deciphering the bash monster in https://​github.com/​ceph/​ceph-build/​tree/​master/​sepia-fog-images 
 + 
 +  - Navigate to https://​jenkins.ceph.com/​job/​sepia-fog-images/​build?​delay=0sec 
 +  - Choose which machine ​types and distros you want to capture images ​for 
 +  - Click **Build** 
 +  - ... 
 +  - Profit 
 + 
 +If capturing any image failsthe job is configured to cancel the OS capture and will leave the testnodes locked so you can debug/​investigate. 
 + 
 +==== Image capture control flow ====
  
-To capture an image: +jenkins job sepia-fog-images runs on the teuthology host, and
-  ​Create the image in FOG +
-    - http://​172.21.13.245/​fog/​management/​index.php?​node=image&​sub=add +
-    ​Name should be in a $MACHINETYPE_$OS_$VERSION format +
-      - e.g., mira_ubuntu_16.04,​ smithi_ubuntu_14.04,​ etc. +
-    - Set **Operating System** to Linux +
-    - The rest of the defaults are fine +
-  - Reimage a testnode with the OS image you want to capture +
-    - Do this using [[:​testnodereimage|cobbler]] +
-  - Once the testnode has been reimaged and you've confirmed ceph-cm-ansible ran successfullyupdate DHCP +
-    - ''​ssh store01.front.sepia.ceph.com''​ +
-    - ''​sudo -i''​ +
-    - Edit ''/​etc/​dhcp/​dhcpd.front.conf''​ +
-    - Find the entry for the testnode you just reimaged ​and add<​code>​ +
-next-server 172.21.13.245;​ +
-filename "/​undionly.kpxe";​+
  
-Example: +  - clones/​updates and sets up teuthology to use teuthology-lock 
-    host mira082 { +  - clones/​updates ceph-cm-ansible 
-      ​hardware ethernet 00:​25:​90:​09:​e2:​0a;​ +  - locks a machine of the requested type(s) (or uses hosts passed in as arguments), setting their descriptions to "​Locked to capture FOG image for Jenkins build ###" 
-      fixed-address 172.21.7.120; +  - uses /​usr/​local/​sbin/​set-next-server.sh on the store01 DHCP server to set the targets to PXE boot from cobbler (rather than fog) and restarts the dhcpd 
-      next-server 172.21.13.245; +  - sshes to ubuntu@cobbler.front to set the right cobbler profile for the host and enable netboot 
-      filename "/undionly.kpxe";​ +  - powercycles the hosts in question 
-    } +  while the hosts are rebooting, Uses curl and the FOG api to GET an image id or POST an image template to create the image, and then sets up fog for image capture 
-</​code>​ +  - sleeps for 10s to allow the hosts to become inaccessible so it can.. 
-  - Restart dhcpd +  - ..start polling for the sentinel file /ceph-qa-ready which is created at the very end of the process (The cobbler install flow is documented below) 
-    ​- ''​service dhcpd restart''​ +  - If there'​s an error or /ceph-qa-ready isn't present, retry for up to 2 hours. ​ If normal completion is seen, set DHCP back to PXE-from-fogrun ansible-playbook (from teuthology, against ​the host) with tools/prep-fog-capture.yml,​ which removes some files from the prior installation:​ 
-  ​- ​Back in FOGopen the host entry (Host Management -> search for hostname) +    ​/​etc/​udev/​rules.d/​70-persistent-net.rules 
-  ​Set the **Host Image** to the image you want to capture +    ​/​.cephlab_net_configured 
-  - Click **UPDATE** +    ​/​ceph-qa-ready 
-  - On the left sidebarclick **Basic Tasks** +  - disables network configuration,​ kills the /​var/​lib/​ceph mount and removes from fstab, removes any ssh host keys, unsubscribes from RHEL, removes a katello.facts file, disables periodic dnf makecache jobs, cleans the dnf cache, stops ntp/chrony, and sets the hwclock 
-  - Click **Capture** +  - restarts dhcpd 
-  - Now power cycle the testnode+  - waits for any in-progress fog images to completepauses the teuthology queue if there are any 
 +  - powercycles the targets to boot into FOG and capture, and waits for FOG task completion 
 +  - teuthology-lock --unlock'​s any locked hosts and unpauses ​the queue if needed
  
-The testnode will reboot and PXE boot to the FOG server ​It ​will automatically ​run its LiveOS that will capture ​the image and upload ​it to FOG.+==== Cobbler install flow for reimaging process ====  
 +  - do a normal preseed/​kickstart install with cobbler-defined preseed/​kickstart files. ​ Some extra definitions:​ 
 +    * a smallish set of packages to install 
 +    * grub serial console setup 
 +    * install an /​etc/​rc.local to run once on first reboot 
 +    * install with ext4 on the appropriate drive, without swap 
 +    * set up subscription manager 
 +    * add the cm user with the admin_users'​ keys and passwordless sudo 
 +    * turn off cobbler ​PXE boot 
 +  - after rebooting ​to the fresh install, /etc/rc.local runs: 
 +    * search the nics for any active interfaces, and set them up for DHCP; if they receive no DHCP address, unconfigure them, assumption being they'​re not on any network we should configure 
 +    * touch /​.cephlab_net_configured when done 
 +    * try to get a hostname from reverse DNS and configure it 
 +    * generate SSH host keys 
 +    * ping the cobbler host to make sure it's reachable 
 +    * curl the cblr/​svc/​op/​trig/​mode/​post/​system/<​hostname>,​ which will run the /​var/​lib/​cobbler/​triggers/​install/​post/​cephlab_ansible.sh script from cobbler to the target host 
 +  -  cephlab_ansible.sh ​will (running on the cobbler host): 
 +    * use scl on Centos 7 to get python 3.8 
 +    * clone/​update ceph-cm-ansible ​and ceph-sepia-secrets (root'​s ssh key allows access to the latter on github) 
 +    * look for port 22 to be open; there is apparently a way this trigger might run before the install is done, and if so, 22 won't be available; when it's run from the /​etc/​rc.local in step 2 above, it will find 22 open 
 +    * create a /​var/​log/​ansible and put log output there in a file named <​hostname>​ 
 +    * for CentOS 8 Stream, run tools/​convert-to-centos-stream.yml 
 +    * if the Cobbler profile is named '​*-stock',​ stop there 
 +    * run ansible cephlab.yml,​ skipping users,​pubkeys,​zap 
 +  - cephlab.yml runs  
 +    * teuthology.yml for teuthology 
 +    * testnodes.yml for testnodes 
 +    * container-host.yml for docker/​podman installation 
 +    * cobbler.yml for cobbler hosts 
 +    * same with paddles and pulpito 
 +    * finally, for testnodes, touches the /​ceph-qa-ready sentinel, used by the fog capture process above to notice that the installation is finished and proceed with the capture process.
  
-Repeat this process for each distro and machine type you want to capture an image of.  Be sure to comment the changes in the DHCP config each time so you can PXE boot and reimage via Cobbler. 
services/fog.1503517838.txt.gz · Last modified: 2017/08/23 19:50 by djgalloway