User Tools

Site Tools


tasks:scheduled-maintenance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tasks:scheduled-maintenance [2018/04/03 21:22]
djgalloway [Table]
tasks:scheduled-maintenance [2018/04/03 21:30] (current)
djgalloway
Line 42: Line 42:
 Updating the dev chacra nodes ({1..5}.chacra.ceph.com) has little chance to affect upstream teuthology testing except while the chacra service is redeployed or a host is rebooted. ​ Because of thise, it's relatively safe to perform CI maintenance separate from Sepia lab maintenance. ​ To be extra safe, you could pause the Sepia queue and wait ~30min to make sure no package manager processes get run against a chacra node. Updating the dev chacra nodes ({1..5}.chacra.ceph.com) has little chance to affect upstream teuthology testing except while the chacra service is redeployed or a host is rebooted. ​ Because of thise, it's relatively safe to perform CI maintenance separate from Sepia lab maintenance. ​ To be extra safe, you could pause the Sepia queue and wait ~30min to make sure no package manager processes get run against a chacra node.
  
 +  - Notify ceph-devel@
   - Log into each Jenkins instance, **Manage Jenkins** -> **Prepare for Shutdown**   - Log into each Jenkins instance, **Manage Jenkins** -> **Prepare for Shutdown**
   - Again in Jenkins, go to **Manage Jenkins** -> **Manage Plugins**   - Again in Jenkins, go to **Manage Jenkins** -> **Manage Plugins**
Line 158: Line 159:
  
 Check how many running workers there should be in ''/​home/​teuthworker/​bin/​worker_start''​ and start **1/4** of them at a time.  If too many start at once, they can overwhelm the teuthology VM with ansible processes or overwhelm FOG with Deploy tasks. Check how many running workers there should be in ''/​home/​teuthworker/​bin/​worker_start''​ and start **1/4** of them at a time.  If too many start at once, they can overwhelm the teuthology VM with ansible processes or overwhelm FOG with Deploy tasks.
 +
 +===== Boilerplate Outage Notices =====
 +==== CI ====
 +<​code>​
 +Hi All,
 +
 +A scheduled maintenance of the CI Infrastructure is planned for YYYY-MM-DD at HH:MM UTC.
 +
 +We will be updating and rebooting the following hosts:
 +jenkins.ceph.com
 +2.jenkins.ceph.com
 +chacra.ceph.com
 +{1..5}.chacra.ceph.com
 +shaman.ceph.com
 +1.shaman.ceph.com
 +2.shaman.ceph.com
 +
 +This means:
 +  - Jenkins will be paused and stop processing new jobs so PR checks will be delayed
 +  - Once there are no jobs running, all hosts will be updated and rebooted
 +  - Repos on chacra nodes will be temporarily unavailable
 +
 +Let me know if you have any questions/​concerns.
 +
 +Thanks,
 +</​code>​
 +
 +==== Sepia Lab ====
 +<​code>​
 +Hi All,
 +
 +A scheduled maintenance of the Sepia Lab Infrastructure is planned for YYYY-MM-DD at HH:MM UTC.
 +
 +We will be updating and rebooting the following hosts:
 +teuthology.front.sepia.ceph.com
 +labdashboard.front.sepia.ceph.com
 +circle.front.sepia.ceph.com
 +cobbler.front.sepia.ceph.com
 +conserver.front.sepia.ceph.com
 +fog.front.sepia.ceph.com
 +ns1.front.sepia.ceph.com
 +ns2.front.sepia.ceph.com
 +nsupdate.front.sepia.ceph.com
 +vpn-pub.ovh.sepia.ceph.com
 +satellite.front.sepia.ceph.com
 +sentry.front.sepia.ceph.com
 +pulpito.front.sepia.ceph.com
 +drop.ceph.com
 +git.ceph.com
 +gw.sepia.ceph.com
 +
 +This means:
 +  - teuthology workers will be instructed to die and new jobs will not be started until the maintenance is complete
 +  - DNS may be temporarily unavailable
 +  - All aforementioned hosts will be temporarily unavailable for a brief time
 +  - Your VPN connection will need to be restarted
 +
 +I will send a follow-up "all clear" e-mail as a reply to this one once the maintenance is complete.
 +
 +Let me know if you have any questions/​concerns.
 +
 +Thanks,
 +</​code>​
tasks/scheduled-maintenance.1522790541.txt.gz ยท Last modified: 2018/04/03 21:22 by djgalloway