https://github.com/ceph/shaman
There are three VMs in the OVH CI region that make up shaman.
User: ubuntu (or infra admin username) Key: CI private key (or your private key) Port: 2222
systemctl start|stop|restart|status shaman
deploy/playbooks/examples/deploy_production.yml
to deploy/playbooks/
/opt/shaman/src/shaman/prod.py
on 1.shaman.ceph.comapi_user
api_key
rabbit_host
rabbit_user
rabbit_pw
github_secret
**It is extremely important that the postgres tag is skipped**
Set --limit
to one node at a time to avoid disrupting the CI or lab testing.
ansible-playbook --tags="deploy_app" --skip-tags="postgres,nginx" --extra-vars="master_ip=158.69.71.144 standby_ip=158.69.71.192" deploy_production.yml --limit="1.shaman.ceph.com,2.shaman.ceph.com"
I needed to determine what percentage of jobs were running on static vs. ephemeral slaves. Alfredo wrote a python script to pull this data out of the shaman database. This script totals how many jobs ran on static vs. ephemeral slaves over a 2 week period (since that's how long we keep dev builds).
Doing this on 2.shaman.ceph.com ensures you're in a read-only capacity.
ssh 2.shaman.ceph.com
cd /opt/shaman/src/shaman
two_week_stats.py
import datetime from shaman import models from shaman.models import Build, Project models.start_read_only() def report(): two_weeks = datetime.datetime.utcnow() - datetime.timedelta(days=15) ceph_project = models.Project.filter_by(name='ceph').one() builds = Build.filter_by(project=ceph_project).filter(Build.completed > two_weeks).all() ovh_builds = {} irvingi_builds = {} braggi_builds = {} adami_builds = {} rest_of_the_world = {} ovh_count = 0 irvingi_count = 0 braggi_count = 0 adami_count = 0 for build in builds: node_name = build.extra['node_name'] if '__' in node_name: mapping = ovh_builds counter = ovh_count elif 'slave-' in node_name: mapping = irvingi_builds counter = irvingi_count elif 'braggi' in node_name: mapping = braggi_builds counter = braggi_count elif 'adami' in node_name: mapping = adami_builds counter = adami_count else: mapping = rest_of_the_world try: mapping[node_name] += 1 except KeyError: mapping[node_name] = 1 for mapping in [ovh_builds, irvingi_builds, braggi_builds, adami_builds]: count = 0 for key, value in mapping.items(): print key, value count += value print "TOTAL: %s" % count print "="*60 print
/opt/shaman/bin/pecan shell --shell ipython prod.py
In [1]: from shaman import models In [2]: models.start_read_only() In [3]: import two_week_stats In [4]: two_week_stats.report()
HYPOTHETICALLY if a repo/build got pushed to shaman that contains an embargoed security fix, you can delete the entries from shaman's DB. The packages will still be on chacra servers but shaman won't know about them. You can always delete them from chacra too if necessary.
ssh 1.shaman.ceph.com sudo su - postgres postgres@1:~$ psql -d shaman psql (9.5.23) Type "help" for help. shaman=# \dt List of relations Schema | Name | Type | Owner --------+-----------------+-------+-------- public | alembic_version | table | shaman public | archs | table | shaman public | builds | table | shaman public | nodes | table | shaman public | projects | table | shaman public | repos | table | shaman (6 rows) shaman=# delete from public.builds where sha1 = 'f73b19678311b996984c30e7c0eb96a22ffa29ce'; DELETE 6 shaman=# select id from public.repos where sha1 = 'f73b19678311b996984c30e7c0eb96a22ffa29ce'; id -------- 197001 197010 197011 197012 197030 196999 shaman=# delete from public.archs where repo_id = '197001'; DELETE 1 shaman=# delete from public.archs where repo_id = '197010'; DELETE 2 shaman=# delete from public.archs where repo_id = '197011'; DELETE 2 shaman=# delete from public.archs where repo_id = '197012'; DELETE 2 shaman=# delete from public.archs where repo_id = '197030'; DELETE 2 shaman=# delete from public.archs where repo_id = '196999'; DELETE 1 shaman=# delete from public.repos where sha1 = 'f73b19678311b996984c30e7c0eb96a22ffa29ce'; DELETE 6
There's a custom Nagios check in place that queries the /api/nodes/next
endpoint.
This check is in place to make sure the postgres database is writeable. An incident occurred in 2019 where OVH rebooted all 3 shaman-related VMs at the same time and the DB was read-only for an unknown reason.
root@nagios:~# cat /usr/lib/nagios/plugins/check_shaman #!/bin/bash # Checks shaman /api/nodes/next endpoint if curl -s -I -u XXXXX:XXXXX https://${1}/api/nodes/next | grep -q "200 OK"; then echo "OK - Shaman /api/nodes/next endpoint healthy" exit 0 else echo "CRITICAL - Shaman /api/nodes/next endpoint failed" exit 2 fi