User Tools

Site Tools


Sidebar

General Lab Info (Mainly for Devs)

Hardware

Lab Infrastructure Services

Misc Admin Tasks
These are infrequently completed tasks that don't fit under any specific service

Production Services

OVH = OVH
RHEV = Sepia RHE instance
Baremetal = Host in Sepia lab

The Attic/Legacy Info

production:shaman.ceph.com

This is an old revision of the document!


shaman.ceph.com

Summary

https://github.com/ceph/shaman

There are three VMs in the OVH CI region that make up shaman.

  • shaman.ceph.com is just a load balancing VM
  • 1.shaman.ceph.com is the primary shaman node that has the postgres DB with all the repo information
  • 2.shaman.ceph.com is a READ ONLY backup in the event 1.shaman.ceph.com goes down

User Access

User: ubuntu (or infra admin username)
Key: CI private key (or your private key)
Port: 2222

Ops Tasks

Starting/Restarting service

systemctl start|stop|restart|status shaman

Updating/Redeploying shaman

  1. If needed, copy deploy/playbooks/examples/deploy_production.yml to deploy/playbooks/
    1. Get and set the following credentials. These can be found in /opt/shaman/src/shaman/prod.py on 1.shaman.ceph.com
      1. api_user
      2. api_key
      3. rabbit_host
      4. rabbit_user
      5. rabbit_pw
      6. github_secret
  2. Run the playbook (see below)

**It is extremely important that the postgres tag is skipped**

Set --limit to one node at a time to avoid disrupting the CI or lab testing.

ansible-playbook --tags="deploy_app" --skip-tags="postgres,nginx" --extra-vars="master_ip=158.69.71.144 standby_ip=158.69.71.192" deploy_production.yml --limit="1.shaman.ceph.com,2.shaman.ceph.com"

Nagios Checks

There's a custom Nagios check in place that queries the /api/nodes/next endpoint.

This check is in place to make sure the postgres database is writeable. An incident occurred in 2019 where OVH rebooted all 3 shaman-related VMs at the same time and the DB was read-only for an unknown reason.

root@nagios:~# cat /usr/lib/nagios/plugins/check_shaman 
#!/bin/bash
# Checks shaman /api/nodes/next endpoint

if curl -s -I -u XXXXX:XXXXX https://${1}/api/nodes/next | grep -q "200 OK"; then
    echo "OK - Shaman /api/nodes/next endpoint healthy"
    exit 0
else
    echo "CRITICAL - Shaman /api/nodes/next endpoint failed"
    exit 2
fi
production/shaman.ceph.com.1548264731.txt.gz · Last modified: 2019/01/23 17:32 by djgalloway