User Tools

Site Tools


Sidebar

General Lab Info (Mainly for Devs)

Hardware

Lab Infrastructure Services

Misc Admin Tasks
These are infrequently completed tasks that don't fit under any specific service

Production Services

OVH = OVH
RHEV = Sepia RHE instance
Baremetal = Host in Sepia lab

The Attic/Legacy Info

services:longrunningcluster

This is an old revision of the document!


LONG_RUNNING_CLUSTER

Summary

A small subset of mira systems and all of the reesi and ivan systems are used in a permanent Ceph cluster.

It is managed using cephadm.

Here's a rundown of what this cluster stores

Topology

  services:
    mon: 5 daemons, quorum reesi003,reesi002,reesi001,ivan02,ivan01 (age 5h)
    mgr: reesi005.xxyjcw(active, since 2w), standbys: reesi006.erytot, reesi004.tplfrt
    mds: 3/3 daemons up, 5 standby

Retired hosts

mira{019,021,049,070,087,099,116,120} had all daemons removed, OSDs, evacuated and reclaimed as testnodes in February 2020. apama were retired entirely as well.

ceph.conf

This file (along with the admin keyring) can be saved on your workstation so you can use it as an admin node.

# minimal ceph.conf for 28f7427e-5558-4ffd-ae1a-51ec3042759a
[global]
        fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a
        mon_host = [v2:172.21.2.201:3300/0,v1:172.21.2.201:6789/0] [v2:172.21.2.202:3300/0,v1:172.21.2.202:6789/0] [v2:172.21.2.203:3300/0,v1:172.21.2.203:6789/0] [v2:172.21.2.204:3300/0,v1:172.21.2.204:6789/0] [v2:172.21.2.205:3300/0,v1:172.21.2.205:6789/0]

Upgrading the Cluster

As of this writing, the luminous branch is the repo defined in /etc/apt/sources.list.d/ceph.list on the LRC nodes. The Ceph docs can be followed for this procedure but, basically, update and reboot each host at a time starting with MONs, MGRs, MDSs, then OSD hosts.

MONs run out of disk space

I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in /var/log/ceph before logrotate gets a chance to run (even though it runs 4x a day. The process below will get you back up and running again but will wipe out all logs.

ansible  -m shell -a "sudo /bin/sh -c 'rm -vf /var/log/ceph/*/ceph*.gz'" reesi*
ansible  -m shell -a "sudo /bin/sh -c 'logrotate -f /etc/logrotate.d/ceph-*'" reesi*

Replace LRC Host's root drive

On non-mon hosts

  1. ceph osd set noout on admin host
    1. ceph osd set noscrub; ceph osd set nodeep-scrub to avoid unnecessary I/O
  2. Stop ceph services on OSD host
    1. stop ceph-osd-all on Ubuntu
    2. service ceph stop osd.# on RHEL
  3. Back up /etc/ceph
    1. scp root@mira###.front.sepia.ceph.com:/etc/ceph/ceph.conf .
  4. umount /var/lib/ceph/osd/*
  5. Back up /var/lib/ceph/osd
    1. scp -r root@mira###.front.sepia.ceph.com:/var/lib/ceph/osd/ .
  6. Reimage the machine
  7. Install ceph packages
    1. If needed. set up repo file
    2. Also if needed, import repo GPG key wget -qO - http://download.ceph.com/keys/release.asc | sudo apt-key add -
    3. apt-get install ceph ceph-base ceph-common ceph-osd ceph-test libcephfs1 python-cephfs ceph-deploy
  8. Make sure ntpd is configured and enabled
    1. Manually run ntpdate $ntpserver for one-time sync
  9. Configure or disable firewall
  10. Replace /etc/ceph and /var/lib/ceph/osd structures
    1. scp ceph.conf root@mira###.front.sepia.ceph.com:/etc/ceph/
    2. scp -r osd/* root@mira###.front.sepia.ceph.com:/var/lib/ceph/osd/
  11. Set permissions
    1. chown -R ceph:ceph /var/lib/ceph/osd/
    2. chown ceph:ceph /etc/ceph/ceph.conf
  12. Create an ssh key, copy the pubkey to /root/.ssh/authorized_keys on a monhost and run ceph-deploy gatherkeys $mon where $mon is a mon host
  13. Copy keys to their appropriate places
    1. For the bootstrap key,
      1. mv ceph.bootstrap-osd.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
      2. mv ceph.client.admin.keyring /etc/ceph/
      3. chown ceph:ceph /var/lib/ceph/bootstrap-osd/ceph.keyring
  14. reboot
  15. Unset flags from step 1

See Ceph Docs - Stopping without rebalancing

Add blank disk as OSD

disk=sdX
ceph-disk zap /dev/$disk
ceph-disk prepare /dev/$disk
ceph-disk activate /dev/${disk}1

Replace Failing OSD disk

Evacuating OSD data

If the disk is still relatively healthy and you think it can survive a while longer, you should evacuate the data off it slowly.

  1. On a mon node, ceph osd reweight $osdnum 0.75 or -0.25 the current weight
  2. Wait until recovery I/O is done and keep doing this until the OSD is reweighted to 0

Taking the OSD out of the cluster

  1. On a mon node, ceph osd out $id. This makes sure there are 3 replicas of each PG evacuated.
    1. If any recovery I/O occurs, wait for it to finish
  2. On the OSD host, stop ceph-osd id=$id
    1. Some recovery I/O will occur. This is just the cluster rebalancing. It's fine.
  3. Back on the mon host,
    ceph osd crush remove osd.$id
    ceph osd down osd.$id  # may not be needed as long as osd service is stopped
    ceph osd rm osd.$id
    ceph auth del osd.$id
  4. Unmount the disk from the OSD host
    1. umount /var/lib/ceph/osd/ceph-$id
    2. rm -rf /var/lib/ceph/osd/ceph-$id
  5. Replace the disk
  6. On the OSD host,
    disk=sdX
    ceph-disk zap /dev/$disk
    ceph-disk prepare /dev/$disk
    mkdir /mnt/tmp
    mount /dev/${disk}1 /mnt/tmp
    mkdir /var/lib/ceph/osd/ceph-$(cat /mnt/tmp/whoami)
    chown ceph:ceph /var/lib/ceph/osd/ceph-$(cat /mnt/tmp/whoami)
    umount /mnt/tmp
    ceph-disk activate /dev/${disk}1

One-liners

Most of the stuff above is no longer valuable since Ceph has evolved over time. Here's some one-liners that were useful at the time I posted them.

Restart mon service

systemctl restart ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a@mon.$(hostname -s).service

Watch logs for a mon

podman logs -f $(podman ps | grep "\-mon" | awk '{ print $1 }')
services/longrunningcluster.1661549839.txt.gz · Last modified: 2022/08/26 21:37 by djgalloway