User Tools

Site Tools


Sidebar

General Lab Info (Mainly for Devs)

Hardware

Lab Infrastructure Services

Misc Admin Tasks
These are infrequently completed tasks that don't fit under any specific service

Production Services

OVH = OVH
RHEV = Sepia RHE instance
Baremetal = Host in Sepia lab

The Attic/Legacy Info

services:longrunningcluster

This is an old revision of the document!


LONG_RUNNING_CLUSTER

Summary

A small subset of mira systems and all of the reesi systems are used in a Ceph cluster primarily used to store teuthology logs and storage for ceph-post-file.

Topology

Current as of 2020/03/10. OSDs are also collocated on all MON hosts.

MONs

reesi{001..005}

MGRs

reesi{004..006}

MDSs

reesi{001..003} ???

OSD hosts

mira055
mira060
mira093
reesi{001.006}

Retired hosts

mira{019,021,049,070,087,099,116,120} had all daemons removed, OSDs, evacuated and reclaimed as testnodes in February 2020. apama were retired entirely as well.

ceph.conf

This file can be saved on your workstation so you can use it as an admin node.

Current as of 2018/04/03 21:03

[global]
fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a
mon_host = 172.21.6.140, 172.21.6.108, 172.21.2.201, 172.21.2.202, 172.21.2.203, 172.21.2.204, 172.21.2.205
public_network = 172.21.0.0/20

# Setting below for cephmetrics.sepia.ceph.com dashboard use - dgalloway
mon_health_preluminous_compat = true

# ick, we have too many pgs on this cluster.
mon_max_pg_per_osd = 400

[mon]
debug ms = 1
debug mon = 10

[osd]
debug_ms = 1
debug_osd = 10
debug_filestore = 10
setuser_match_path = $osd_data
bluestore cache size = 512000000

[mds]
mds cache size = 500000
mds session timeout = 120
mds session autoclose = 600
debug mds = 4

[mgr]
debug mgr = 20
debug ms = 1

[mon.mira070]
public addr = 172.21.6.108

Upgrading the Cluster

As of this writing, the luminous branch is the repo defined in /etc/apt/sources.list.d/ceph.list on the LRC nodes. The Ceph docs can be followed for this procedure but, basically, update and reboot each host at a time starting with MONs, MGRs, MDSs, then OSD hosts.

MONs run out of disk space

I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in /var/log/ceph before logrotate gets a chance to run (even though it runs 4x a day. The process below will get you back up and running again but will wipe out all logs.

ansible  -m shell -a "sudo /bin/sh -c 'rm -vf /var/log/ceph/ceph*.gz'" reesi*
ansible  -m shell -a "sudo /bin/sh -c 'logrotate -f /etc/logrotate.d/ceph-common'" reesi*

Replace LRC Host's root drive

On non-mon hosts

  1. ceph osd set noout on admin host
    1. ceph osd set noscrub; ceph osd set nodeep-scrub to avoid unnecessary I/O
  2. Stop ceph services on OSD host
    1. stop ceph-osd-all on Ubuntu
    2. service ceph stop osd.# on RHEL
  3. Back up /etc/ceph
    1. scp root@mira###.front.sepia.ceph.com:/etc/ceph/ceph.conf .
  4. umount /var/lib/ceph/osd/*
  5. Back up /var/lib/ceph/osd
    1. scp -r root@mira###.front.sepia.ceph.com:/var/lib/ceph/osd/ .
  6. Reimage the machine
  7. Install ceph packages
    1. If needed. set up repo file
    2. Also if needed, import repo GPG key wget -qO - http://download.ceph.com/keys/release.asc | sudo apt-key add -
    3. apt-get install ceph ceph-base ceph-common ceph-osd ceph-test libcephfs1 python-cephfs ceph-deploy
  8. Make sure ntpd is configured and enabled
    1. Manually run ntpdate $ntpserver for one-time sync
  9. Configure or disable firewall
  10. Replace /etc/ceph and /var/lib/ceph/osd structures
    1. scp ceph.conf root@mira###.front.sepia.ceph.com:/etc/ceph/
    2. scp -r osd/* root@mira###.front.sepia.ceph.com:/var/lib/ceph/osd/
  11. Set permissions
    1. chown -R ceph:ceph /var/lib/ceph/osd/
    2. chown ceph:ceph /etc/ceph/ceph.conf
  12. Create an ssh key, copy the pubkey to /root/.ssh/authorized_keys on a monhost and run ceph-deploy gatherkeys $mon where $mon is a mon host
  13. Copy keys to their appropriate places
    1. For the bootstrap key,
      1. mv ceph.bootstrap-osd.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
      2. mv ceph.client.admin.keyring /etc/ceph/
      3. chown ceph:ceph /var/lib/ceph/bootstrap-osd/ceph.keyring
  14. reboot
  15. Unset flags from step 1

See Ceph Docs - Stopping without rebalancing

Add blank disk as OSD

disk=sdX
ceph-disk zap /dev/$disk
ceph-disk prepare /dev/$disk
ceph-disk activate /dev/${disk}1

Replace Failing OSD disk

Evacuating OSD data

If the disk is still relatively healthy and you think it can survive a while longer, you should evacuate the data off it slowly.

  1. On a mon node, ceph osd reweight $osdnum 0.75 or -0.25 the current weight
  2. Wait until recovery I/O is done and keep doing this until the OSD is reweighted to 0

Taking the OSD out of the cluster

  1. On a mon node, ceph osd out $id. This makes sure there are 3 replicas of each PG evacuated.
    1. If any recovery I/O occurs, wait for it to finish
  2. On the OSD host, stop ceph-osd id=$id
    1. Some recovery I/O will occur. This is just the cluster rebalancing. It's fine.
  3. Back on the mon host,
    ceph osd crush remove osd.$id
    ceph osd down osd.$id  # may not be needed as long as osd service is stopped
    ceph osd rm osd.$id
    ceph auth del osd.$id
  4. Unmount the disk from the OSD host
    1. umount /var/lib/ceph/osd/ceph-$id
    2. rm -rf /var/lib/ceph/osd/ceph-$id
  5. Replace the disk
  6. On the OSD host,
    disk=sdX
    ceph-disk zap /dev/$disk
    ceph-disk prepare /dev/$disk
    mkdir /mnt/tmp
    mount /dev/${disk}1 /mnt/tmp
    mkdir /var/lib/ceph/osd/ceph-$(cat /mnt/tmp/whoami)
    chown ceph:ceph /var/lib/ceph/osd/ceph-$(cat /mnt/tmp/whoami)
    umount /mnt/tmp
    ceph-disk activate /dev/${disk}1
services/longrunningcluster.1583956650.txt.gz · Last modified: 2020/03/11 19:57 by djgalloway