User Tools

Site Tools


services:longrunningcluster

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
services:longrunningcluster [2020/09/09 17:14]
djgalloway old revision restored (2020/09/09 13:29)
services:longrunningcluster [2023/02/19 15:18] (current)
akraitman [Summary]
Line 1: Line 1:
 ====== LONG_RUNNING_CLUSTER ====== ====== LONG_RUNNING_CLUSTER ======
 ===== Summary ===== ===== Summary =====
-A small subset of [[hardware:​mira]] systems and all of the [[hardware:​reesi]] systems are used in a Ceph cluster ​primarily used to store teuthology logs and storage for [[production:​drop.ceph.com#​sftp_notes|ceph-post-file]].+A small subset of [[hardware:​mira]] systems and all of the [[hardware:​reesi]] and [[hardware:​ivan]] systems are used in a permanent ​Ceph cluster.
  
-===== Topology ===== +It is managed using ''​cephadm''​.
-Current as of 2020/​03/​10. ​ OSDs are also collocated on all MON hosts.+
  
-==== MONs ==== +Here's a rundown of what this cluster stores 
-reesi{001..005}+  * teuthology run logs 
 +  * quay.ceph.io containers 
 +  * chacra.ceph.com packages 
 +  * drop.ceph.com 
 +  * Files sent via [[https://​docs.ceph.com/​en/​latest/​man/​8/​ceph-post-file/​|ceph-post-file]] 
 +  * [[production:​signer.front.sepia.ceph.com]]
  
-==== MGRs ==== 
-reesi{004..006} 
  
-==== MDSs ==== +Cluster dashboard
-reesi{001..003} ???+
  
-==== OSD hosts ==== +https://​reesi004.front.sepia.ceph.com:​8443/​ 
-mira055\\ +===== Topology ===== 
-mira060\\ +<​code>​ 
-mira093\\ +  ​services:​ 
-reesi{001.006}+    mon: 5 daemons, quorum reesi003,​reesi002,​reesi001,​ivan02,​ivan01 (age 5h) 
 +    mgr: reesi005.xxyjcw(active,​ since 2w), standbys: reesi006.erytot,​ reesi004.tplfrt 
 +    mds: 3/3 daemons up, 5 standby 
 +</​code>​
  
 === Retired hosts === === Retired hosts ===
Line 26: Line 30:
  
 ===== ceph.conf ===== ===== ceph.conf =====
-This file can be saved on your workstation so you can use it as an admin node. +This file (along with the admin keyring) ​can be saved on your workstation so you can use it as an admin node.
- +
-Current as of 2018/04/03 21:03+
  
 <​code>​ <​code>​
 +# minimal ceph.conf for 28f7427e-5558-4ffd-ae1a-51ec3042759a
 [global] [global]
-fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a +        ​fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a 
-mon_host = 172.21.6.140, 172.21.6.108, 172.21.2.201, 172.21.2.202172.21.2.203,​ 172.21.2.204,​ 172.21.2.205 +        mon_host = [v2:172.21.2.201:3300/0,v1:172.21.2.201:6789/0] [v2:172.21.2.202:3300/0,v1:172.21.2.202:6789/0] [v2:172.21.2.203:3300/0,v1:​172.21.2.203:​6789/​0] [v2:172.21.2.204:3300/0,v1:172.21.2.204:6789/0] [v2:172.21.2.205:3300/0,​v1:​172.21.2.205:​6789/0]
-public_network = 172.21.0.0/20+
  
-# Setting below for cephmetrics.sepia.ceph.com dashboard use - dgalloway +</​code>​
-mon_health_preluminous_compat = true+
  
-# ick, we have too many pgs on this cluster. +===== Upgrading the Cluster ===== 
-mon_max_pg_per_osd = 400+The LRC is a testbed ​we use to test a release candidate before announcing.
  
-[mon] +For example: 
-debug ms = 1 +<​code>​ 
-debug mon = 10+ceph orch upgrade start quay.ceph.io/​ceph-ci/​ceph:​da36d2c9a106ed5231aa923e6c04a2485c89ef4b
  
-[osd] +watch "ceph -s; ceph orch upgrade status; ceph versions"​ 
-debug_ms = 1 +</​code>​ 
-debug_osd ​10 +===== MONs run out of disk space ===== 
-debug_filestore ​10 +I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in ''/​var/​log/​ceph''​ before logrotate gets a chance to run (even though it runs 4x a day.  The process below will get you back up and running again but will wipe out all logs.
-setuser_match_path ​$osd_data +
-bluestore cache size 512000000+
  
-[mds] +<​code>​ 
-mds cache size = 500000 +ansible ​ -m shell -a "sudo /bin/sh -c 'rm -vf /​var/​log/​ceph/​*/​ceph*.gz'"​ reesi* 
-mds session timeout = 120 +ansible ​ -m shell -a "sudo /bin/sh -c '​logrotate -f /​etc/​logrotate.d/​ceph-*'"​ reesi* 
-mds session autoclose = 600 +</​code>​
-debug mds = 4+
  
-[mgr] +===== One-liners ====== 
-debug mgr 20 +Most of the stuff above is no longer valuable since Ceph has evolved over time.  Here's some one-liners that were useful at the time I posted them.
-debug ms 1+
  
-[mon.mira070] +=== Restart ​mon service ​=== 
-public addr 172.21.6.108+<​code>​ 
 +systemctl restart ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a@mon.$(hostname -s).service
 </​code>​ </​code>​
  
-===== Upgrading the Cluster ===== 
-As of this writing, the luminous branch is the repo defined in ''/​etc/​apt/​sources.list.d/​ceph.list''​ on the LRC nodes. ​ The Ceph docs can be followed for this procedure but, basically, update and reboot each host at a time starting with MONs, MGRs, MDSs, then OSD hosts. 
  
-===== MONs run out of disk space ===== +=== Watch logs for a mon === 
-I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in ''/​var/​log/​ceph''​ before logrotate gets a chance to run (even though it runs 4x a day.  The process below will get you back up and running again but will wipe out all logs.+<​code>​ 
 +podman logs -f $(podman ps | grep "​\-mon"​ | awk '{ print $1 }'
 +</code>
  
 +=== LRC iscsi volume for the RHEV cluster===
 +
 +On Nov 2022 we started seeing data corruption on our main gluster volume where we have all our critical VM's so we connected an iscsi volume from the LRC, those are the steps to connect an iscsi volume to a rev cluster according to this doc
 +https://​docs.google.com/​document/​d/​1GYwv5y4T5vy-1oeAzw-zoLgQs0I3y5v_xD1wXscAA7M/​edit
 +
 +First, make sure you configured the iscsi clients(the RHEV hypervisor hosts in our case) according to this doc and copy the iscsi initiator located under(we will need it for step 11 when we create the hosts on the lrc) /​etc/​iscsi/​initiatorname.iscsi on each host 
 +https://​access.redhat.com/​documentation/​en-us/​red_hat_ceph_storage/​5/​html-single/​block_device_guide/​index#​configuring-the-iscsi-initiator-for-rhel_block
 +also configure CHAP on each rhev host by adding this in /​etc/​iscsi/​iscsid.conf
 + 
 +node.session.auth.authmethod = CHAP
 +node.session.auth.username = <​username>​
 +node.session.auth.password = <​password>​
 +
 +
 +ssh to one of the reesi hosts(I configured it from reesi005) and follow the next steps to configure iscsi and create a volume on the LRC
 +
 +1. Create an rbd pool
 <​code>​ <​code>​
-ansible ​ -m shell -a "sudo /bin/sh -c 'rm -vf /var/log/ceph/​*/​ceph*.gz'"​ reesi* +ceph osd pool create <​poolname>​ 
-ansible ​ -m shell -a "sudo /bin/sh -c '​logrotate -f /​etc/​logrotate.d/​ceph-*'"​ reesi*+ceph osd pool application enable <​poolname>​ rbd
 </​code>​ </​code>​
  
-===== Replace LRC Host's root drive ===== +2. Deploy iscsi on at least four hosts - create a yaml file 
-==== On non-mon ​hosts ==== +<​code>​ 
-  ​''​ceph osd set noout''​ on admin host +service_type:​ iscsi 
-    - ''​ceph osd set noscrub; ceph osd set nodeep-scrub''​ to avoid unnecessary I/O +service_id: iscsi 
-  - Stop ceph services on OSD host +placement
-    - ''​stop ceph-osd-all''​ on Ubuntu +  ​hosts
-    - ''​service ceph stop osd.#''​ on RHEL +    - reesi002 
-  - Back up ''/​etc/​ceph''​ +    - reesi003 
-    - ''​scp root@mira###​.front.sepia.ceph.com:/​etc/​ceph/​ceph.conf .''​ +    - reesi004 
-  ​- ''​umount /​var/​lib/​ceph/​osd/​*''​ +    - reesi005 
-  - Back up ''/​var/​lib/​ceph/​osd''​ +spec: 
-    - ''​scp -r root@mira###​.front.sepia.ceph.com:/​var/​lib/​ceph/​osd/​ .''​ +  ​poollrc 
-  - Reimage the machine +  ​api_securefalse 
-  - Install ceph packages +</code>
-    - If needed. set up repo file +
-    - Also if needed, import repo GPG key ''​wget -qO - %%http://​download.ceph.com/​keys/​release.asc%% | sudo apt-key add -''​ +
-    - ''​apt-get install ceph ceph-base ceph-common ceph-osd ceph-test libcephfs1 python-cephfs ceph-deploy''​ +
-  - Make sure ntpd is configured and enabled +
-    - Manually run ''​ntpdate $ntpserver''​ for one-time sync +
-  - Configure or disable firewall +
-  ​- Replace ''/​etc/​ceph''​ and ''/​var/​lib/​ceph/​osd''​ structures +
-    - ''​scp ceph.conf root@mira###​.front.sepia.ceph.com:/​etc/​ceph/''​ +
-    - ''​scp -r osd/* root@mira###​.front.sepia.ceph.com:/​var/​lib/​ceph/​osd/''​ +
-  ​- Set permissions +
-    - ''​chown -R ceph:ceph /​var/​lib/​ceph/​osd/''​ +
-    - ''​chown ceph:​ceph ​/etc/​ceph/​ceph.conf''​ +
-  - Create an ssh key, copy the pubkey to ''/​root/​.ssh/​authorized_keys''​ on a monhost and run ''​ceph-deploy gatherkeys $mon''​ where ''​$mon''​ is a mon host +
-  - Copy keys to their appropriate places +
-    - For the bootstrap key, +
-      - ''​mv ceph.bootstrap-osd.keyring /​var/​lib/​ceph/​bootstrap-osd/​ceph.keyring''​ +
-      - ''​mv ceph.client.admin.keyring /​etc/​ceph/''​ +
-      - ''​chown ceph:ceph /​var/​lib/​ceph/​bootstrap-osd/​ceph.keyring''​ +
-  - ''​reboot''​ +
-  - Unset flags from step 1+
  
-See [[http://​docs.ceph.com/​docs/​jewel/​rados/​troubleshooting/​troubleshooting-osd/#​stopping-w-out-rebalancing|Ceph Docs - Stopping without rebalancing]] +3Connect to the iscsi container on one of the deployed hosts, to find the exact container id run "​podman ps" and look for the iscsi container with the word "​tcmu"​ in the end.
-===== Add blank disk as OSD =====+
 <​code>​ <​code>​
-disk=sdX +Podman exec -it <iscsi container id> ​/bin/bash
-ceph-disk zap /dev/$disk +
-ceph-disk prepare /​dev/​$disk +
-ceph-disk activate /​dev/​${disk}1+
 </​code>​ </​code>​
  
-===== Replace Failing OSD disk ===== +for example: 
-==== Evacuating OSD data ==== +<​code>​ 
-If the disk is still relatively healthy and you think it can survive a while longer, you should evacuate the data off it slowly.+podman exec -it ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a-iscsi-iscsi-reesi005-luegfv-tcmu /bin/bash 
 +</​code>​
  
-  - On a mon node, ''​ceph osd reweight $osdnum 0.75''​ or -0.25 the current weight +4Enter the gwcli 
-  - Wait until recovery I/O is done and keep doing this until the OSD is reweighted to 0+<​code>​ 
 +gwcli 
 +</code>
  
-==== Taking ​the OSD out of the cluster ====+5. Go to the iscsi-targets 
 +<​code>​ 
 +cd iscsi-targets/​ 
 +</​code>​
  
-  - On a mon node, ''​ceph osd out $id''​ This makes sure there are 3 replicas of each PG evacuated. +6Go to the storage iqn 
-    - If any recovery I/O occurs, wait for it to finish +<​code>​ 
-  - On the OSD host, ''​stop ceph-osd id=$id''​ +cd iqn.2003-01.com.redhat.iscsi-gw:​lrc-iscsi1/​
-    - Some recovery I/O will occur. ​ This is just the cluster rebalancing. ​ It's fine. +
-  - Back on the mon host, <​code>​ +
-ceph osd crush remove osd.$id +
-ceph osd down osd.$id  # may not be needed as long as osd service is stopped +
-ceph osd rm osd.$id +
-ceph auth del osd.$id+
 </​code>​ </​code>​
-  - Unmount the disk from the OSD host + 
-    - ''​umount /​var/​lib/​ceph/​osd/​ceph-$id''​ +7. Go to gateways 
-    - ''​rm -rf /​var/​lib/​ceph/​osd/​ceph-$id''​ +<​code>​ 
-  - Replace the disk +cd gateways
-  - On the OSD host, <​code>​ +
-disk=sdX +
-ceph-disk zap /​dev/​$disk +
-ceph-disk prepare /​dev/​$disk +
-mkdir /mnt/tmp +
-mount /​dev/​${disk}1 /mnt/tmp +
-mkdir /​var/​lib/​ceph/​osd/​ceph-$(cat /​mnt/​tmp/​whoami) +
-chown ceph:ceph /​var/​lib/​ceph/​osd/​ceph-$(cat /​mnt/​tmp/​whoami) +
-umount /mnt/tmp +
-ceph-disk activate /​dev/​${disk}1+
 </​code>​ </​code>​
  
-===== One-liners ====== +8Create all four gateway'​s ​as you specified in the yaml file on step 2 
-Most of the stuff above is no longer valuable since Ceph has evolved over time Here'​s ​some one-liners that were useful at the time I posted them.+<​code>​ 
 +create reesi002.front.sepia.ceph.com 172.21.2.202 
 +create reesi003.front.sepia.ceph.com 172.21.2.203 
 +create reesi004.front.sepia.ceph.com 172.21.2.204 
 +create reesi005.front.sepia.ceph.com 172.21.2.205 
 +</​code>​
  
-=== Restart mon service ===+9. Go to disks
 <​code>​ <​code>​
-systemctl restart ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a@mon.$(hostname -s).service+cd .. 
 +cd disks/
 </​code>​ </​code>​
  
 +9. Create RBD image with the name "​vol1"​ in the "​lrc"​ pool
 +<​code>​
 +create pool=lrc image=vol1 size=20T image=rbdimage size=50g
 +</​code>​
  
-=== Watch logs for a mon ===+10.  Go to hosts
 <​code>​ <​code>​
-for id in $(podman ps | grep "​\-mon"​ | awk '{ print $1 }'; do podman logs -f $id; done+cd .. 
 +cd hosts/
 </​code>​ </​code>​
 +
 +11. Create the hosts(RHEV hosts, if you have four rhev hosts you will need to run this four times one for each iqn )
 +<​code>​
 +create client_iqn=<​iqn from the rhev host> ​
 +</​code>​
 +
 +12. cd to each iqn you created in step 11 and enable chap
 +<​code>​
 +auth username=<​username>​ password=<​password>​
 +</​code>​
 +
 +13. cd to each iqn you added in step 11 and add the RBD image created in step 9
 +<​code>​
 +disk add <​pool_name>/<​RBD image name>
 +</​code>​
 +
 +14. Set discovery auth to CHAP on the iscsi-targets
 +<​code>​
 +cd ../../
 +discovery_auth username=<​username>​ password=<​password>​
 +</​code>​
 +
 +The final step is to mount this RBD_image/​lun in RHEV-M Dashboard
 +
 +go to https://​mgr01.front.sepia.ceph.com/​ovirt-engine/​webadmin/?​locale=en_US#​storage
 +Create a new Storage domain and choose the iscsi storage type and fill out the discovery targets section with an IP on one of the iscsi gateway ip's you configured in the yaml in step 2 and fill out the auth with the CHAP username & password you configured in step 14
 +
 +
services/longrunningcluster.1599671645.txt.gz · Last modified: 2020/09/09 17:14 by djgalloway