This is an old revision of the document!

LONG_RUNNING_CLUSTER

Summary

A small subset of mira systems and all of the reesi systems are used in a Ceph cluster primarily used to store teuthology logs and storage for ceph-post-file.

Topology

Current as of 2020/03/10. OSDs are also collocated on all MON hosts.

MONs

reesi{001..005}

MGRs

reesi{004..006}

MDSs

reesi{001..003} ???

OSD hosts

mira055
mira060
mira093
reesi{001.006}

Retired hosts

mira{019,021,049,070,087,099,116,120} had all daemons removed, OSDs, evacuated and reclaimed as testnodes in February 2020. apama were retired entirely as well.

ceph.conf

This file can be saved on your workstation so you can use it as an admin node.

Current as of 2018/04/03 21:03

[global]
fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a
mon_host = 172.21.6.140, 172.21.6.108, 172.21.2.201, 172.21.2.202, 172.21.2.203, 172.21.2.204, 172.21.2.205
public_network = 172.21.0.0/20

# Setting below for cephmetrics.sepia.ceph.com dashboard use - dgalloway
mon_health_preluminous_compat = true

# ick, we have too many pgs on this cluster.
mon_max_pg_per_osd = 400

[mon]
debug ms = 1
debug mon = 10

[osd]
debug_ms = 1
debug_osd = 10
debug_filestore = 10
setuser_match_path = $osd_data
bluestore cache size = 512000000

[mds]
mds cache size = 500000
mds session timeout = 120
mds session autoclose = 600
debug mds = 4

[mgr]
debug mgr = 20
debug ms = 1

[mon.mira070]
public addr = 172.21.6.108

Upgrading the Cluster

As of this writing, the luminous branch is the repo defined in /etc/apt/sources.list.d/ceph.list on the LRC nodes. The Ceph docs can be followed for this procedure but, basically, update and reboot each host at a time starting with MONs, MGRs, MDSs, then OSD hosts.

MONs run out of disk space

I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in /var/log/ceph before logrotate gets a chance to run (even though it runs 4x a day. The process below will get you back up and running again but will wipe out all logs.

ansible  -m shell -a "sudo /bin/sh -c 'rm -vf /var/log/ceph/ceph*.gz'" reesi*
ansible  -m shell -a "sudo /bin/sh -c 'logrotate -f /etc/logrotate.d/ceph-common'" reesi*

Replace LRC Host's root drive

On non-mon hosts

ceph osd set noout on admin host
1. ceph osd set noscrub; ceph osd set nodeep-scrub to avoid unnecessary I/O
Stop ceph services on OSD host
1. stop ceph-osd-all on Ubuntu
2. service ceph stop osd.# on RHEL
Back up /etc/ceph
1. scp root@mira###.front.sepia.ceph.com:/etc/ceph/ceph.conf .
umount /var/lib/ceph/osd/*
Back up /var/lib/ceph/osd
1. scp -r root@mira###.front.sepia.ceph.com:/var/lib/ceph/osd/ .
Reimage the machine
Install ceph packages
1. If needed. set up repo file
2. Also if needed, import repo GPG key wget -qO - http://download.ceph.com/keys/release.asc | sudo apt-key add -
3. apt-get install ceph ceph-base ceph-common ceph-osd ceph-test libcephfs1 python-cephfs ceph-deploy
Make sure ntpd is configured and enabled
1. Manually run ntpdate $ntpserver for one-time sync
Configure or disable firewall
Replace /etc/ceph and /var/lib/ceph/osd structures
1. scp ceph.conf root@mira###.front.sepia.ceph.com:/etc/ceph/
2. scp -r osd/* root@mira###.front.sepia.ceph.com:/var/lib/ceph/osd/
Set permissions
1. chown -R ceph:ceph /var/lib/ceph/osd/
2. chown ceph:ceph /etc/ceph/ceph.conf
Create an ssh key, copy the pubkey to /root/.ssh/authorized_keys on a monhost and run ceph-deploy gatherkeys $mon where $mon is a mon host
Copy keys to their appropriate places
1. For the bootstrap key,
  1. mv ceph.bootstrap-osd.keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
  2. mv ceph.client.admin.keyring /etc/ceph/
  3. chown ceph:ceph /var/lib/ceph/bootstrap-osd/ceph.keyring
reboot
Unset flags from step 1

See Ceph Docs - Stopping without rebalancing

Add blank disk as OSD

disk=sdX
ceph-disk zap /dev/$disk
ceph-disk prepare /dev/$disk
ceph-disk activate /dev/${disk}1

Replace Failing OSD disk

Evacuating OSD data

If the disk is still relatively healthy and you think it can survive a while longer, you should evacuate the data off it slowly.

On a mon node, ceph osd reweight $osdnum 0.75 or -0.25 the current weight
Wait until recovery I/O is done and keep doing this until the OSD is reweighted to 0

Taking the OSD out of the cluster

On a mon node, ceph osd out $id. This makes sure there are 3 replicas of each PG evacuated.
1. If any recovery I/O occurs, wait for it to finish
On the OSD host, stop ceph-osd id=$id
1. Some recovery I/O will occur. This is just the cluster rebalancing. It's fine.

Back on the mon host,

ceph osd crush remove osd.$id
ceph osd down osd.$id  # may not be needed as long as osd service is stopped
ceph osd rm osd.$id
ceph auth del osd.$id

Unmount the disk from the OSD host
1. umount /var/lib/ceph/osd/ceph-$id
2. rm -rf /var/lib/ceph/osd/ceph-$id
Replace the disk

On the OSD host,

disk=sdX
ceph-disk zap /dev/$disk
ceph-disk prepare /dev/$disk
mkdir /mnt/tmp
mount /dev/${disk}1 /mnt/tmp
mkdir /var/lib/ceph/osd/ceph-$(cat /mnt/tmp/whoami)
chown ceph:ceph /var/lib/ceph/osd/ceph-$(cat /mnt/tmp/whoami)
umount /mnt/tmp
ceph-disk activate /dev/${disk}1

Sepia Lab Wiki

Sidebar

Table of Contents

LONG_RUNNING_CLUSTER

Summary

Topology

MONs

MGRs

MDSs

OSD hosts

Retired hosts

ceph.conf

Upgrading the Cluster

MONs run out of disk space

Replace LRC Host's root drive

On non-mon hosts

Add blank disk as OSD

Replace Failing OSD disk

Evacuating OSD data

Taking the OSD out of the cluster

Sepia Lab Wiki

User Tools

Site Tools

Sidebar

Table of Contents

LONG_RUNNING_CLUSTER

Summary

Topology

MONs

MGRs

MDSs

OSD hosts

Retired hosts

ceph.conf

Upgrading the Cluster

MONs run out of disk space

Replace LRC Host's root drive

On non-mon hosts

Add blank disk as OSD

Replace Failing OSD disk

Evacuating OSD data

Taking the OSD out of the cluster

Page Tools