This is an old revision of the document!

LONG_RUNNING_CLUSTER

Summary

A small subset of mira systems and all of the reesi and ivan systems are used in a permanent Ceph cluster.

It is managed using cephadm.

Here's a rundown of what this cluster stores

teuthology run logs
quay.ceph.io containers
chacra.ceph.com packages
drop.ceph.com
Files sent via ceph-post-file
signer.front.sepia.ceph.com

Cluster dashboard

https://reesi006.front.sepia.ceph.com:8443/ https://reesi005.front.sepia.ceph.com:8443/

Topology

  services:
    mon: 5 daemons, quorum reesi003,reesi002,reesi001,ivan02,ivan01 (age 5h)
    mgr: reesi005.xxyjcw(active, since 2w), standbys: reesi006.erytot, reesi004.tplfrt
    mds: 3/3 daemons up, 5 standby

Retired hosts

mira{019,021,049,070,087,099,116,120} had all daemons removed, OSDs, evacuated and reclaimed as testnodes in February 2020. apama were retired entirely as well.

ceph.conf

This file (along with the admin keyring) can be saved on your workstation so you can use it as an admin node.

# minimal ceph.conf for 28f7427e-5558-4ffd-ae1a-51ec3042759a
[global]
        fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a
        mon_host = [v2:172.21.2.201:3300/0,v1:172.21.2.201:6789/0] [v2:172.21.2.202:3300/0,v1:172.21.2.202:6789/0] [v2:172.21.2.203:3300/0,v1:172.21.2.203:6789/0] [v2:172.21.2.204:3300/0,v1:172.21.2.204:6789/0] [v2:172.21.2.205:3300/0,v1:172.21.2.205:6789/0]

Upgrading the Cluster

The LRC is a testbed we use to test a release candidate before announcing.

For example:

ceph orch upgrade start quay.ceph.io/ceph-ci/ceph:da36d2c9a106ed5231aa923e6c04a2485c89ef4b

watch "ceph -s; ceph orch upgrade status; ceph versions"

MONs run out of disk space

I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in /var/log/ceph before logrotate gets a chance to run (even though it runs 4x a day. The process below will get you back up and running again but will wipe out all logs.

ansible  -m shell -a "sudo /bin/sh -c 'rm -vf /var/log/ceph/*/ceph*.gz'" reesi*
ansible  -m shell -a "sudo /bin/sh -c 'logrotate -f /etc/logrotate.d/ceph-*'" reesi*

One-liners

Most of the stuff above is no longer valuable since Ceph has evolved over time. Here's some one-liners that were useful at the time I posted them.

Restart mon service

systemctl restart ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a@mon.$(hostname -s).service

Watch logs for a mon

podman logs -f $(podman ps | grep "\-mon" | awk '{ print $1 }')

Sepia Lab Wiki

Sidebar

Table of Contents

LONG_RUNNING_CLUSTER

Summary

Topology

Retired hosts

ceph.conf

Upgrading the Cluster

MONs run out of disk space

One-liners

Restart mon service

Watch logs for a mon

Sepia Lab Wiki

User Tools

Site Tools

Sidebar

Table of Contents

LONG_RUNNING_CLUSTER

Summary

Topology

Retired hosts

ceph.conf

Upgrading the Cluster

MONs run out of disk space

One-liners

Restart mon service

Watch logs for a mon

Page Tools