====== LONG_RUNNING_CLUSTER ======
===== Summary =====
A small subset of [[hardware:mira]] systems and all of the [[hardware:reesi]] and [[hardware:ivan]] systems are used in a permanent Ceph cluster.

It is managed using ''cephadm''.

Here's a rundown of what this cluster stores
  * teuthology run logs
  * quay.ceph.io containers
  * chacra.ceph.com packages
  * drop.ceph.com
  * Files sent via [[https://docs.ceph.com/en/latest/man/8/ceph-post-file/|ceph-post-file]]
  * [[production:signer.front.sepia.ceph.com]]


Cluster dashboard

https://reesi004.front.sepia.ceph.com:8443/
===== Topology =====
<code>
  services:
    mon: 5 daemons, quorum reesi003,reesi002,reesi001,ivan02,ivan01 (age 5h)
    mgr: reesi005.xxyjcw(active, since 2w), standbys: reesi006.erytot, reesi004.tplfrt
    mds: 3/3 daemons up, 5 standby
</code>

=== Retired hosts ===
mira{019,021,049,070,087,099,116,120} had all daemons removed, OSDs, evacuated and reclaimed as testnodes in February 2020.
apama were retired entirely as well.

===== ceph.conf =====
This file (along with the admin keyring) can be saved on your workstation so you can use it as an admin node.

<code>
# minimal ceph.conf for 28f7427e-5558-4ffd-ae1a-51ec3042759a
[global]
        fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a
        mon_host = [v2:172.21.2.201:3300/0,v1:172.21.2.201:6789/0] [v2:172.21.2.202:3300/0,v1:172.21.2.202:6789/0] [v2:172.21.2.203:3300/0,v1:172.21.2.203:6789/0] [v2:172.21.2.204:3300/0,v1:172.21.2.204:6789/0] [v2:172.21.2.205:3300/0,v1:172.21.2.205:6789/0]

</code>

===== Upgrading the Cluster =====
The LRC is a testbed we use to test a release candidate before announcing.

For example:
<code>
ceph orch upgrade start quay.ceph.io/ceph-ci/ceph:da36d2c9a106ed5231aa923e6c04a2485c89ef4b

watch "ceph -s; ceph orch upgrade status; ceph versions"
</code>
===== MONs run out of disk space =====
I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in ''/var/log/ceph'' before logrotate gets a chance to run (even though it runs 4x a day.  The process below will get you back up and running again but will wipe out all logs.

<code>
ansible  -m shell -a "sudo /bin/sh -c 'rm -vf /var/log/ceph/*/ceph*.gz'" reesi*
ansible  -m shell -a "sudo /bin/sh -c 'logrotate -f /etc/logrotate.d/ceph-*'" reesi*
</code>

===== One-liners ======
Most of the stuff above is no longer valuable since Ceph has evolved over time.  Here's some one-liners that were useful at the time I posted them.

=== Restart mon service ===
<code>
systemctl restart ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a@mon.$(hostname -s).service
</code>


=== Watch logs for a mon ===
<code>
podman logs -f $(podman ps | grep "\-mon" | awk '{ print $1 }')
</code>

=== LRC iscsi volume for the RHEV cluster===

On Nov 2022 we started seeing data corruption on our main gluster volume where we have all our critical VM's so we connected an iscsi volume from the LRC, those are the steps to connect an iscsi volume to a rev cluster according to this doc
https://docs.google.com/document/d/1GYwv5y4T5vy-1oeAzw-zoLgQs0I3y5v_xD1wXscAA7M/edit

First, make sure you configured the iscsi clients(the RHEV hypervisor hosts in our case) according to this doc and copy the iscsi initiator located under(we will need it for step 11 when we create the hosts on the lrc) /etc/iscsi/initiatorname.iscsi on each host 
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/block_device_guide/index#configuring-the-iscsi-initiator-for-rhel_block
also configure CHAP on each rhev host by adding this in /etc/iscsi/iscsid.conf
 
node.session.auth.authmethod = CHAP
node.session.auth.username = <username>
node.session.auth.password = <password>


ssh to one of the reesi hosts(I configured it from reesi005) and follow the next steps to configure iscsi and create a volume on the LRC

1. Create an rbd pool
<code>
ceph osd pool create <poolname>
ceph osd pool application enable <poolname> rbd
</code>

2. Deploy iscsi on at least four hosts - create a yaml file
<code>
service_type: iscsi
service_id: iscsi
placement:
  hosts:
    - reesi002
    - reesi003
    - reesi004
    - reesi005
spec:
  pool: lrc
  api_secure: false
</code>

3. Connect to the iscsi container on one of the deployed hosts, to find the exact container id run "podman ps" and look for the iscsi container with the word "tcmu" in the end.
<code>
Podman exec -it <iscsi container id> /bin/bash
</code>

for example:
<code>
podman exec -it ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a-iscsi-iscsi-reesi005-luegfv-tcmu /bin/bash
</code>

4. Enter the gwcli
<code>
gwcli
</code>

5. Go to the iscsi-targets
<code>
cd iscsi-targets/
</code>

6. Go to the storage iqn
<code>
cd iqn.2003-01.com.redhat.iscsi-gw:lrc-iscsi1/
</code>

7. Go to gateways
<code>
cd gateways
</code>

8. Create all four gateway's as you specified in the yaml file on step 2
<code>
create reesi002.front.sepia.ceph.com 172.21.2.202
create reesi003.front.sepia.ceph.com 172.21.2.203
create reesi004.front.sepia.ceph.com 172.21.2.204
create reesi005.front.sepia.ceph.com 172.21.2.205
</code>

9. Go to disks
<code>
cd ..
cd disks/
</code>

9. Create RBD image with the name "vol1" in the "lrc" pool
<code>
create pool=lrc image=vol1 size=20T image=rbdimage size=50g
</code>

10.  Go to hosts
<code>
cd ..
cd hosts/
</code>

11. Create the hosts(RHEV hosts, if you have four rhev hosts you will need to run this four times one for each iqn )
<code>
create client_iqn=<iqn from the rhev host> 
</code>

12. cd to each iqn you created in step 11 and enable chap
<code>
auth username=<username> password=<password>
</code>

13. cd to each iqn you added in step 11 and add the RBD image created in step 9
<code>
disk add <pool_name>/<RBD image name>
</code>

14. Set discovery auth to CHAP on the iscsi-targets
<code>
cd ../../
discovery_auth username=<username> password=<password>
</code>

The final step is to mount this RBD_image/lun in RHEV-M Dashboard

go to https://mgr01.front.sepia.ceph.com/ovirt-engine/webadmin/?locale=en_US#storage
Create a new Storage domain and choose the iscsi storage type and fill out the discovery targets section with an IP on one of the iscsi gateway ip's you configured in the yaml in step 2 and fill out the auth with the CHAP username & password you configured in step 14