User Tools

Site Tools


services:longrunningcluster

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
services:longrunningcluster [2018/04/03 21:08]
djgalloway
services:longrunningcluster [2023/02/19 15:18] (current)
akraitman [Summary]
Line 1: Line 1:
 ====== LONG_RUNNING_CLUSTER ====== ====== LONG_RUNNING_CLUSTER ======
 ===== Summary ===== ===== Summary =====
-A small subset of [[hardware:​mira]] systems and the [[hardware:apama]] systems are used in a Ceph cluster ​primarily used to store teuthology logs and storage for [[production:​drop.ceph.com#​sftp_notes|ceph-post-file]].+A small subset of [[hardware:​mira]] systems and all of the [[hardware:reesi]] and [[hardware:​ivan]] systems are used in a permanent ​Ceph cluster.
  
-===== Topology ===== +It is managed using ''​cephadm''​.
-Current as of 2018/04/03 21:​03. ​ OSDs are also collocated on all 3 MON hosts.+
  
-==== MONs ==== +Here's a rundown of what this cluster stores 
-reesi{001..006}\\ +  * teuthology run logs 
-mira055\\ +  * quay.ceph.io containers 
-mira070+  * chacra.ceph.com packages 
 +  * drop.ceph.com 
 +  * Files sent via [[https://​docs.ceph.com/​en/​latest/​man/​8/​ceph-post-file/​|ceph-post-file]] 
 +  * [[production:​signer.front.sepia.ceph.com]]
  
-==== MGRs ==== 
-reesi{001..003}\\ 
-mira049 
  
-==== MDSs ==== +Cluster dashboard
-reesi{001,​002}\\ +
-mira021\\ +
-mira049\\ +
-mira070+
  
-==== OSD hosts ==== +https://​reesi004.front.sepia.ceph.com:​8443/​ 
-mira019\\ +===== Topology ===== 
-mira021\\ +<​code>​ 
-mira031\\ +  ​services:​ 
-mira049\\ +    mon: 5 daemons, quorum reesi003,​reesi002,​reesi001,​ivan02,​ivan01 (age 5h) 
-mira055\\ +    mgr: reesi005.xxyjcw(active,​ since 2w), standbys: reesi006.erytot,​ reesi004.tplfrt 
-mira060\\ +    mds: 3/3 daemons up, 5 standby 
-mira070\\ +</​code>​ 
-mira087\\ + 
-mira093\\ +=== Retired hosts === 
-mira099\\ +mira{019,​021,​049,​070,​087,​099,​116,​120had all daemons removed, OSDs, evacuated and reclaimed as testnodes in February 2020. 
-mira116\\ +apama were retired entirely as well.
-mira120\\ +
-mira122\\ +
-apama002\\ +
-reesi{001.006}+
  
 ===== ceph.conf ===== ===== ceph.conf =====
-This file can be saved on your workstation so you can use it as an admin node. +This file (along with the admin keyring) ​can be saved on your workstation so you can use it as an admin node.
- +
-Current as of 2018/04/03 21:03+
  
 <​code>​ <​code>​
 +# minimal ceph.conf for 28f7427e-5558-4ffd-ae1a-51ec3042759a
 [global] [global]
-fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a +        ​fsid = 28f7427e-5558-4ffd-ae1a-51ec3042759a 
-mon_host = 172.21.6.140, 172.21.6.108, 172.21.2.201, 172.21.2.202172.21.2.203,​ 172.21.2.204,​ 172.21.2.205 +        mon_host = [v2:172.21.2.201:3300/0,v1:172.21.2.201:6789/0] [v2:172.21.2.202:3300/0,v1:172.21.2.202:6789/0] [v2:172.21.2.203:3300/0,v1:​172.21.2.203:​6789/​0] [v2:172.21.2.204:3300/0,v1:172.21.2.204:6789/0] [v2:172.21.2.205:3300/0,​v1:​172.21.2.205:​6789/0]
-public_network = 172.21.0.0/20+
  
-# Setting below for cephmetrics.sepia.ceph.com dashboard use - dgalloway +</​code>​
-mon_health_preluminous_compat = true+
  
-# ick, we have too many pgs on this cluster. +===== Upgrading the Cluster ===== 
-mon_max_pg_per_osd = 400+The LRC is a testbed ​we use to test a release candidate before announcing.
  
-[mon] +For example: 
-debug ms = 1 +<​code>​ 
-debug mon = 10+ceph orch upgrade start quay.ceph.io/​ceph-ci/​ceph:​da36d2c9a106ed5231aa923e6c04a2485c89ef4b
  
-[osd] +watch "ceph -s; ceph orch upgrade status; ceph versions"​ 
-debug_ms = 1 +</​code>​ 
-debug_osd ​10 +===== MONs run out of disk space ===== 
-debug_filestore ​10 +I sadly got too small of disks for the reesi when we purchased them so they occasionally run out of space in ''/​var/​log/​ceph''​ before logrotate gets a chance to run (even though it runs 4x a day.  The process below will get you back up and running again but will wipe out all logs.
-setuser_match_path ​$osd_data +
-bluestore cache size 512000000+
  
-[mds] +<​code>​ 
-mds cache size = 500000 +ansible ​ -m shell -a "sudo /bin/sh -c 'rm -vf /​var/​log/​ceph/​*/​ceph*.gz'"​ reesi* 
-mds session timeout = 120 +ansible ​ -m shell -a "sudo /bin/sh -c '​logrotate -f /​etc/​logrotate.d/​ceph-*'"​ reesi* 
-mds session autoclose = 600 +</​code>​
-debug mds = 4+
  
-[mgr] +===== One-liners ====== 
-debug mgr 20 +Most of the stuff above is no longer valuable since Ceph has evolved over time.  Here's some one-liners that were useful at the time I posted them.
-debug ms 1+
  
-[mon.mira070] +=== Restart ​mon service ​=== 
-public addr 172.21.6.108+<​code>​ 
 +systemctl restart ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a@mon.$(hostname -s).service
 </​code>​ </​code>​
  
-===== Upgrading the Cluster ===== 
-As of this writing, the luminous branch is the repo defined in ''/​etc/​apt/​sources.list.d/​ceph.list''​ on the LRC nodes. ​ The Ceph docs can be followed for this procedure but, basically, update and reboot each host at a time starting with MONs, MGRs, MDSs, then OSD hosts. 
  
-===== Replace LRC Host's root drive ===== +=== Watch logs for a mon === 
-==== On non-mon hosts ==== +<​code>​ 
-  - ''​ceph osd set noout''​ on admin host +podman logs -f $(podman ps | grep "\-mon" | awk '{ print $1 }'
-    ''​ceph osd set noscrub; ceph osd set nodeep-scrub'' ​to avoid unnecessary I/O +</code> 
-  - Stop ceph services on OSD host + 
-    - ''​stop ceph-osd-all''​ on Ubuntu +=== LRC iscsi volume for the RHEV cluster=== 
-    - ''​service ceph stop osd.#'' ​on RHEL + 
-  - Back up ''/​etc/​ceph'​+On Nov 2022 we started seeing data corruption ​on our main gluster volume where we have all our critical VM's so we connected an iscsi volume from the LRC, those are the steps to connect an iscsi volume to a rev cluster according to this doc 
-    - ''​scp root@mira###​.front.sepia.ceph.com:/etc/ceph/ceph.conf .''​ +https://docs.google.com/document/d/1GYwv5y4T5vy-1oeAzw-zoLgQs0I3y5v_xD1wXscAA7M/edit 
-  - ''​umount ​/var/lib/ceph/​osd/​*''​ + 
-  ​Back up ''/​var/​lib/​ceph/​osd''​ +Firstmake sure you configured ​the iscsi clients(the RHEV hypervisor hosts in our case) according to this doc and copy the iscsi initiator located under(we will need it for step 11 when we create the hosts on the lrc) /etc/iscsi/initiatorname.iscsi on each host  
-    ​''​scp -r root@mira###​.front.sepia.ceph.com:​/var/​lib/​ceph/​osd/​ .''​ +https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/block_device_guide/index#​configuring-the-iscsi-initiator-for-rhel_block 
-  - Reimage the machine +also configure CHAP on each rhev host by adding this in /etc/iscsi/iscsid.conf 
-  - Install ceph packages +  
-    - If needed. set up repo file +node.session.auth.authmethod = CHAP 
-    - Also if neededimport repo GPG key ''​wget -qO - %%http://​download.ceph.com/​keys/​release.asc%% | sudo apt-key add -''​ +node.session.auth.username = <​username>​ 
-    - ''​apt-get install ceph ceph-base ceph-common ceph-osd ceph-test libcephfs1 python-cephfs ceph-deploy''​ +node.session.auth.password = <​password>​ 
-  - Make sure ntpd is configured and enabled + 
-    - Manually run ''​ntpdate $ntpserver'' ​for one-time sync + 
-  - Configure or disable firewall +ssh to one of the reesi hosts(I configured it from reesi005) and follow the next steps to configure iscsi and create a volume on the LRC
-  - Replace ''​/etc/ceph''​ and ''​/var/lib/ceph/​osd''​ structures +
-    - ''​scp ceph.conf root@mira###​.front.sepia.ceph.com:/etc/ceph/''​ +
-    ​''​scp -r osd/* root@mira###​.front.sepia.ceph.com:​/var/lib/ceph/osd/''​ +
-  ​Set permissions +
-    ​''​chown ​-R ceph:ceph /​var/​lib/​ceph/​osd/''​ +
-    - ''​chown ceph:​ceph ​/etc/ceph/ceph.conf''​ +
-  - Create an ssh key, copy the pubkey to ''/​root/​.ssh/​authorized_keys''​ on a monhost and run ''​ceph-deploy gatherkeys $mon''​ where ''​$mon''​ is a mon host +
-  - Copy keys to their appropriate places +
-    - For the bootstrap key, +
-      - ''​mv ceph.bootstrap-osd.keyring /​var/​lib/​ceph/​bootstrap-osd/​ceph.keyring''​ +
-      - ''​mv ceph.client.admin.keyring /​etc/​ceph/''​ +
-      - ''​chown ceph:ceph /​var/​lib/​ceph/​bootstrap-osd/​ceph.keyring''​ +
-  - ''​reboot''​ +
-  - Unset flags from step 1+
  
-See [[http://​docs.ceph.com/​docs/​jewel/​rados/​troubleshooting/​troubleshooting-osd/#​stopping-w-out-rebalancing|Ceph Docs - Stopping without rebalancing]] +1Create an rbd pool
-===== Add blank disk as OSD =====+
 <​code>​ <​code>​
-disk=sdX +ceph osd pool create <​poolname>​ 
-ceph-disk zap /​dev/​$disk +ceph osd pool application enable <​poolname>​ rbd
-ceph-disk prepare /dev/$disk +
-ceph-disk activate /​dev/​${disk}1+
 </​code>​ </​code>​
  
-===== Replace Failing OSD disk ===== +2. Deploy iscsi on at least four hosts - create a yaml file 
-==== Evacuating OSD data ==== +<​code>​ 
-If the disk is still relatively healthy and you think it can survive a while longer, you should evacuate the data off it slowly.+service_type:​ iscsi 
 +service_id: iscsi 
 +placement:​ 
 +  hosts: 
 +    - reesi002 
 +    - reesi003 
 +    - reesi004 
 +    - reesi005 
 +spec: 
 +  pool: lrc 
 +  api_secure: false 
 +</​code>​
  
-  - On a mon node, ''​ceph osd reweight $osdnum 0.75''​ or -0.25 the current weight +3Connect to the iscsi container on one of the deployed hosts, to find the exact container id run "​podman ps" and look for the iscsi container with the word "​tcmu"​ in the end. 
-  Wait until recovery I/O is done and keep doing this until the OSD is reweighted to 0+<​code>​ 
 +Podman exec -it <iscsi container id> ​/bin/bash 
 +</​code>​
  
-==== Taking the OSD out of the cluster ====+for example: 
 +<​code>​ 
 +podman exec -it ceph-28f7427e-5558-4ffd-ae1a-51ec3042759a-iscsi-iscsi-reesi005-luegfv-tcmu /bin/bash 
 +</​code>​
  
-  - On a mon node, ''​ceph osd out $id''​ This makes sure there are 3 replicas of each PG evacuated. +4Enter the gwcli 
-    - If any recovery I/O occurs, wait for it to finish +<​code>​ 
-  - On the OSD host, ''​stop ceph-osd id=$id''​ +gwcli
-    - Some recovery I/O will occur. ​ This is just the cluster rebalancing. ​ It's fine. +
-  - Back on the mon host, <​code>​ +
-ceph osd crush remove osd.$id +
-ceph osd down osd.$id ​ # may not be needed as long as osd service is stopped +
-ceph osd rm osd.$id +
-ceph auth del osd.$id+
 </​code>​ </​code>​
-  - Unmount ​the disk from the OSD host + 
-    ​''​umount /​var/​lib/​ceph/​osd/​ceph-$id''​ +5. Go to the iscsi-targets 
-    - ''​rm -rf /​var/​lib/​ceph/​osd/​ceph-$id''​ +<​code>​ 
-  - Replace the disk +cd iscsi-targets/
-  - On the OSD host, <​code>​ +
-disk=sdX +
-ceph-disk zap /​dev/​$disk +
-ceph-disk prepare /​dev/​$disk +
-mkdir /mnt/tmp +
-mount /​dev/​${disk}1 /mnt/tmp +
-mkdir /​var/​lib/​ceph/​osd/​ceph-$(cat /​mnt/​tmp/​whoami) +
-chown ceph:ceph /​var/​lib/​ceph/​osd/​ceph-$(cat /​mnt/​tmp/​whoami) +
-umount /mnt/tmp +
-ceph-disk activate /dev/${disk}1+
 </​code>​ </​code>​
 +
 +6. Go to the storage iqn
 +<​code>​
 +cd iqn.2003-01.com.redhat.iscsi-gw:​lrc-iscsi1/​
 +</​code>​
 +
 +7. Go to gateways
 +<​code>​
 +cd gateways
 +</​code>​
 +
 +8. Create all four gateway'​s as you specified in the yaml file on step 2
 +<​code>​
 +create reesi002.front.sepia.ceph.com 172.21.2.202
 +create reesi003.front.sepia.ceph.com 172.21.2.203
 +create reesi004.front.sepia.ceph.com 172.21.2.204
 +create reesi005.front.sepia.ceph.com 172.21.2.205
 +</​code>​
 +
 +9. Go to disks
 +<​code>​
 +cd ..
 +cd disks/
 +</​code>​
 +
 +9. Create RBD image with the name "​vol1"​ in the "​lrc"​ pool
 +<​code>​
 +create pool=lrc image=vol1 size=20T image=rbdimage size=50g
 +</​code>​
 +
 +10.  Go to hosts
 +<​code>​
 +cd ..
 +cd hosts/
 +</​code>​
 +
 +11. Create the hosts(RHEV hosts, if you have four rhev hosts you will need to run this four times one for each iqn )
 +<​code>​
 +create client_iqn=<​iqn from the rhev host> ​
 +</​code>​
 +
 +12. cd to each iqn you created in step 11 and enable chap
 +<​code>​
 +auth username=<​username>​ password=<​password>​
 +</​code>​
 +
 +13. cd to each iqn you added in step 11 and add the RBD image created in step 9
 +<​code>​
 +disk add <​pool_name>/<​RBD image name>
 +</​code>​
 +
 +14. Set discovery auth to CHAP on the iscsi-targets
 +<​code>​
 +cd ../../
 +discovery_auth username=<​username>​ password=<​password>​
 +</​code>​
 +
 +The final step is to mount this RBD_image/​lun in RHEV-M Dashboard
 +
 +go to https://​mgr01.front.sepia.ceph.com/​ovirt-engine/​webadmin/?​locale=en_US#​storage
 +Create a new Storage domain and choose the iscsi storage type and fill out the discovery targets section with an IP on one of the iscsi gateway ip's you configured in the yaml in step 2 and fill out the auth with the CHAP username & password you configured in step 14
 +
 +
services/longrunningcluster.1522789682.txt.gz · Last modified: 2018/04/03 21:08 by djgalloway