Differences

This shows you the differences between two versions of the page.

--- hardware:mira [2016/06/29 16:48]
dgalloway [Hardware Specs]
+++ hardware:mira [2020/03/11 20:04] (current)
djgalloway [Summary]
@@ Line 1: / Line 1: @@
 ====== mira ======
 ===== Summary =====
-We have approximately 124 mira hosts.  Some serve as [[services:vpshosts]] hypervisors and some serve as OSD nodes in the [[services:longrunningcluster]].  The remaining nodes are used as baremetal testnodes.
+We have around 100 mira hosts.
-See the ''[long-running-cluster]'' and ''[vps_hosts]'' groups in [[https://github.com/ceph/ceph-sepia-secrets/blob/master/ansible/inventory/sepia|ceph-sepia-secrets Ansible inventory]] to see what systems are used for.
+Some of them used to be used as [[services:vpshosts]] and [[services:longrunningcluster]], however they are nearly 10 years old and severely behind spec-wise.  We're slowly phasing them out to make room for new hardware.  The systems that remain are used as testnodes.  A small subset are still in the [[services:longrunningcluster]].
 ===== Hardware Specs =====
 |            ^ Count    ^ Manufacturer  ^ Model                              ^ Capacity  ^ Notes                                                                                                           ^
@@ Line 12: / Line 11: @@
 ^ RAM        | 4 DIMMs  | Kingston      | 9965434-017.A00LF                  | 4GB       | 16GB total.  PC3-8500R DDR3-1066 REGISTERED ECC CL7 240 PIN                                                     |
 ^ HDD        | 8x       | WD/HGST       |                                    | 1TB       | For VPSHOSTS and testnodes                                                                                      |
-^ HDD        | Asst     | WD/HGST       |                                    | 1TB/4TB   | LRC hosts have a mixture of 1TB and 4TB diskss=                                                                 |
+^ HDD        | Asst     | WD/HGST       |                                    | 1TB/4TB   | LRC hosts have a mixture of 1TB and 4TB disks                                                                   |
 ^ NIC        | 2 ports  | Intel         | 82574L Gigabit Network Connection  | 1Gb       |                                                                                                                 |
 ^ RAID       | 1        | Areca         | Mix of ARC-{1222,1880}             | 8 disks   | JBOD Mode                                                                                                       |
@@ Line 18: / Line 17: @@
-===== Quirks =====
+===== E-Waste =====
-I was able to install Fedora23 on apama001 but still got Red Screen of Death and "Illegal Opcode" after grub tries to boot the OS.  Something about rpm-based distros doesn't like searching for the root partition using UUID.
+As these machines age, they continue to MCE and lock up at higher rates.  To make room for new LRC hosts, we've begun e-wasting miras.
-What fixed it was finding the proper root drive and manually booting to it in a grub rescue prompt.  See below.
+^ Hostname        ^ Date E-Wasted  ^ Ticket Number(s)  ^
+| mira005         |                | PNT0146880        |
+| mira009         |                | PNT0146880        |
+| mira091         |                | PNT0146880        |
+| mira095         |                | PNT0146880        |
+| mira113         |                | PNT0146880        |
+| mira{030..039}  |                | PNT0766680        |
-<code>
+===== Areca RAID Controllers =====
-grub> ls (hd0,msdos1)/boot
+==== Flashing Firmware ====
+**UPDATE** This can be done now simply by running ''%%ansible-playbook firmware.yml --limit="miraXXX*" --tags="areca"%%''
-vmlinuz-4.2.3-300.fc23.x86_64 System.map-4.2.3-300.fc23.x86_64 config-4. 2.3-300.fc23.x86_64 initramfs-4.2.3-300.fc23.x86_64.img initramfs-0-rescue-ba8d c4c42e5a4ec483a635961112dfd8.img vmlinuz-0-rescue-ba8dc4c42e5a4ec483a635961112d fd8 initrd-plymouth.img
+The latest firmware for ARC-1222 controllers can be obtained from [[http://www.areca.us/support/download/RaidCards/BIOS_Firmware/ARC1212_1222.zip|here]].
-grub> set root=(hd0 <tab>
+The latest firmware for ARC-1880 controllers can be obtained from [[http://www.areca.us/support/download/RaidCards/BIOS_Firmware/ARC1880_1213_1223.zip|here]].
-Possible partitions are:
-Device hd0: No known filesystem detected - Sector size 512B - Total size        488386584KiB
+My process for flashing ARC-1222 firmware manually is below.  This assumes you've downloaded and extracted the firmware zip.  The same process can be used for other Areca controllers.  Just use the proper firmware BIN files.
+<code>
+scp /home/dgalloway/BIOS/areca/ARC1212_1222/ARC1212* ubuntu@$host.front.sepia.ceph.com:/home/ubuntu/
+ssh $host
+sudo -i
+for file in $(ls /home/ubuntu/ARC1212*.BIN); do cli64 sys updatefw path=$file; done
+for file in $(ls /home/ubuntu/ARC1212*.BIN); do rm $file; done
+</code>
-Partition hd0,msdos1: Filesystem type ext* - Last modification time     2016-05-03 18:11:21 Tuesday, UUID 4390d2e3-3154-4855-8ad4-8417c430d982 -        Partition start at 1024KiB - Total size 488385536KiB
+==== Other Common Tasks ====
-grub> set root=(hd0,msdos1)
+**Erasing a RAID and setting controller to JBOD mode**
-grub> linux /boot/vmlinuz-4.2.3-300.fc23.x86_64 root=/dev/sda1
+<code>
-grub> initrd /boot/initramfs-4.2.3-300.fc23.x86_64.img
+cli64 set password=0000
-grub> boot
+cli64 vsf delete vol=1
+cli64 rsf delete raid=1
+cli64 sys mode p=1
 </code>
-For CentOS, this had to be modified a bit:
+**Stop Beeper**
+''Parameter: <p=<0(mute)|1(disabled)|2(enabled)>>''
 <code>
-grub> set root=(hd0,msdos1)
+cli64 set password=0000
-grub> linux /vmlinuz-3.10.0-327.el7.x86_64 root=/dev/sdz3 console=ttyS1,115200
+cli64 sys beeper p=0
-grub> initrd /initramfs-3.10.0-327.el7.x86_64.img
-grub> boot
 </code>
-There's probably a permanent fix that can be applied in grub.cfg but haven't looked into it.
+===== Replacing failed/failing drives =====
+This process is a bit annoying.  Depending on which order the HDD backplane is connected to the RAID controller, the order of drive bays on these machines will be:
+<code>
+2 3 4
+6 7 8
+OR
+6 7 8
+2 3 4
+</code>
+To add to the annoyingness, it's not possible to light up the red/failed LED manually on the drive sleds.  So when working with the labs team, it's easiest to have the admin be in front of the machine and either light up the failed drive or light up drive 1 and have them count to the drive bay.
+To light up a drive, I typically just do ''dd if=/dev/sda of=/dev/null'' if I want to light up drive 1.
+If a drive just has failing sectors but is still readable, it's easiest to light up that drive (smart.sh will tell you which drive letter to use ''dd'' on).  If the drive has completely failed, light up drive 1 (usually /dev/sda) and have the admin count up to it.

Sepia Lab Wiki

User Tools

Site Tools

Differences

Page Tools