Differences

This shows you the differences between two versions of the page.

--- hardware:mira [2016/06/29 16:09]
dgalloway created
+++ hardware:mira [2020/03/11 20:04] (current)
djgalloway [Summary]
@@ Line 1: / Line 1: @@
 ====== mira ======
 ===== Summary =====
-We have approximately 124 mira hosts.  Some serve as [[services:vpshosts]] hypervisors and some serve as OSD nodes in the [[services:longrunningcluster]].  The remaining nodes are used as baremetal testnodes.
+We have around 100 mira hosts.
-See the ''[long-running-cluster]'' and ''[vps_hosts]'' groups in [[https://github.com/ceph/ceph-sepia-secrets/blob/master/ansible/inventory/sepia|ceph-sepia-secrets Ansible inventory]] to see what systems are used for.
+Some of them used to be used as [[services:vpshosts]] and [[services:longrunningcluster]], however they are nearly 10 years old and severely behind spec-wise.  We're slowly phasing them out to make room for new hardware.  The systems that remain are used as testnodes.  A small subset are still in the [[services:longrunningcluster]].
 ===== Hardware Specs =====
-|            ^ Count     ^ Manufacturer   ^ Model                            ^ Capacity  ^ Notes                                                                                                           ^
+|            ^ Count    ^ Manufacturer  ^ Model                              ^ Capacity  ^ Notes                                                                                                           ^
-^ Chassis    | N/A       | HP             | ProLiant SL4540 Gen8             | N/A       |                                                                                                                 |
+^ Chassis    | N/A      | Supermicro    | 2U Unmodel                         | N/A       |                                                                                                                 |
-^ Mainboard  | N/A       | HP             | Not Specified                    | N/A       |                                                                                                                 |
+^ Mainboard  | N/A      | Supermicro    | X8SIL                              | N/A       |                                                                                                                 |
-^ CPU        | 2         | Intel          | Xeon(R) CPU E5-2407 0 @ 2.20GHz  | N/A       | [[http://ark.intel.com/products/64614/Intel-Xeon-Processor-E5-2407-10M-Cache-2_20-GHz-6_40-GTs-Intel-QPI|ARK]]  |
+^ CPU        | 1        | Intel         | Xeon(R) CPU X3440 @ 2.53GHz        | N/A       | [[http://ark.intel.com/products/64614/Intel-Xeon-Processor-E5-2407-10M-Cache-2_20-GHz-6_40-GTs-Intel-QPI|ARK]]  |
-^ RAM        | 12 DIMMs  | Not Specified  | Not Specified                    | 4GB       | 48GB total                                                                                                      |
+^ RAM        | 4 DIMMs  | Kingston      | 9965434-017.A00LF                  | 4GB       | 16GB total.  PC3-8500R DDR3-1066 REGISTERED ECC CL7 240 PIN                                                     |
-^ HDD        | 1x        | HP             | MM0500GBKAK                      | 500GB     |                                                                                                                 |
+^ HDD        | 8x       | WD/HGST       |                                    | 1TB       | For VPSHOSTS and testnodes                                                                                      |
-^ HDD        | 25x       | HP             | MB3000GBKAC                      | 3TB       | Labelled as Logical Volumes in smartctl output                                                                  |
+^ HDD        | Asst     | WD/HGST       |                                    | 1TB/4TB   | LRC hosts have a mixture of 1TB and 4TB disks                                                                   |
-^ SSD        | 0         |                |                                  |           |                                                                                                                 |
+^ NIC        | 2 ports  | Intel         | 82574L Gigabit Network Connection  | 1Gb       |                                                                                                                 |
-^ NIC        | 2         | Intel          | I350 Gigabit Network Connection  | 1Gb       |                                                                                                                 |
+^ RAID       | 1        | Areca         | Mix of ARC-{1222,1880}             | 8 disks   | JBOD Mode                                                                                                       |
-^ NIC        | 1         | Mellanox       | MT27500                          | 40Gbps?   | Not connected                                                                                                   |
+^ BMC        | 1        | Supermicro    | N/A                                | N/A       | Reachable at $host.ipmi.sepia.ceph.com                                                                          |
-^ BMC        | 1         | HP             | iLO                              |           | Firmware 1.13                                                                                                   |
-These also have 10Gb SFP+ ports that are cabled but may need switch port configuration.  Probably useful to have before re-adding to cluster.
+===== E-Waste =====
+As these machines age, they continue to MCE and lock up at higher rates.  To make room for new LRC hosts, we've begun e-wasting miras.
-===== Quirks =====
+^ Hostname        ^ Date E-Wasted  ^ Ticket Number(s)  ^
-I was able to install Fedora23 on apama001 but still got Red Screen of Death and "Illegal Opcode" after grub tries to boot the OS.  Something about rpm-based distros doesn't like searching for the root partition using UUID.
+| mira005         |                | PNT0146880        |
+| mira009         |                | PNT0146880        |
+| mira091         |                | PNT0146880        |
+| mira095         |                | PNT0146880        |
+| mira113         |                | PNT0146880        |
+| mira{030..039}  |                | PNT0766680        |
-What fixed it was finding the proper root drive and manually booting to it in a grub rescue prompt.  See below.
+===== Areca RAID Controllers =====
+==== Flashing Firmware ====
+**UPDATE** This can be done now simply by running ''%%ansible-playbook firmware.yml --limit="miraXXX*" --tags="areca"%%''
-<code>
+The latest firmware for ARC-1222 controllers can be obtained from [[http://www.areca.us/support/download/RaidCards/BIOS_Firmware/ARC1212_1222.zip|here]].
-grub> ls (hd0,msdos1)/boot
-vmlinuz-4.2.3-300.fc23.x86_64 System.map-4.2.3-300.fc23.x86_64 config-4. 2.3-300.fc23.x86_64 initramfs-4.2.3-300.fc23.x86_64.img initramfs-0-rescue-ba8d c4c42e5a4ec483a635961112dfd8.img vmlinuz-0-rescue-ba8dc4c42e5a4ec483a635961112d fd8 initrd-plymouth.img
+The latest firmware for ARC-1880 controllers can be obtained from [[http://www.areca.us/support/download/RaidCards/BIOS_Firmware/ARC1880_1213_1223.zip|here]].
-grub> set root=(hd0 <tab>
+My process for flashing ARC-1222 firmware manually is below.  This assumes you've downloaded and extracted the firmware zip.  The same process can be used for other Areca controllers.  Just use the proper firmware BIN files.
-Possible partitions are:
+<code>
+scp /home/dgalloway/BIOS/areca/ARC1212_1222/ARC1212* ubuntu@$host.front.sepia.ceph.com:/home/ubuntu/
+ssh $host
+sudo -i
+for file in $(ls /home/ubuntu/ARC1212*.BIN); do cli64 sys updatefw path=$file; done
+for file in $(ls /home/ubuntu/ARC1212*.BIN); do rm $file; done
+</code>
-Device hd0: No known filesystem detected - Sector size 512B - Total size        488386584KiB
+==== Other Common Tasks ====
+**Erasing a RAID and setting controller to JBOD mode**
+<code>
+cli64 set password=0000
+cli64 vsf delete vol=1
+cli64 rsf delete raid=1
+cli64 sys mode p=1
+</code>
-Partition hd0,msdos1: Filesystem type ext* - Last modification time     2016-05-03 18:11:21 Tuesday, UUID 4390d2e3-3154-4855-8ad4-8417c430d982 -        Partition start at 1024KiB - Total size 488385536KiB
+**Stop Beeper**
-grub> set root=(hd0,msdos1)
-grub> linux /boot/vmlinuz-4.2.3-300.fc23.x86_64 root=/dev/sda1
+''Parameter: <p=<0(mute)|1(disabled)|2(enabled)>>''
-grub> initrd /boot/initramfs-4.2.3-300.fc23.x86_64.img
+<code>
-grub> boot
+cli64 set password=0000
+cli64 sys beeper p=0
 </code>
-For CentOS, this had to be modified a bit:
+===== Replacing failed/failing drives =====
+This process is a bit annoying.  Depending on which order the HDD backplane is connected to the RAID controller, the order of drive bays on these machines will be:
 <code>
-grub> set root=(hd0,msdos1)
+2 3 4
-grub> linux /vmlinuz-3.10.0-327.el7.x86_64 root=/dev/sdz3 console=ttyS1,115200
+6 7 8
-grub> initrd /initramfs-3.10.0-327.el7.x86_64.img
-grub> boot
+OR
+6 7 8
+2 3 4
 </code>
-There's probably a permanent fix that can be applied in grub.cfg but haven't looked into it.
+To add to the annoyingness, it's not possible to light up the red/failed LED manually on the drive sleds.  So when working with the labs team, it's easiest to have the admin be in front of the machine and either light up the failed drive or light up drive 1 and have them count to the drive bay.
+To light up a drive, I typically just do ''dd if=/dev/sda of=/dev/null'' if I want to light up drive 1.
+If a drive just has failing sectors but is still readable, it's easiest to light up that drive (smart.sh will tell you which drive letter to use ''dd'' on).  If the drive has completely failed, light up drive 1 (usually /dev/sda) and have the admin count up to it.

Sepia Lab Wiki

User Tools

Site Tools

Differences

Page Tools