This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
hardware:mira [2016/11/12 02:32] dgalloway |
hardware:mira [2020/03/11 20:04] (current) djgalloway [Summary] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== mira ====== | ====== mira ====== | ||
===== Summary ===== | ===== Summary ===== | ||
- | We have 122 mira hosts. Some serve as [[services:vpshosts]] hypervisors and some serve as OSD nodes in the [[services:longrunningcluster]]. The remaining nodes are used as baremetal testnodes. | + | We have around 100 mira hosts. |
- | + | ||
- | See the ''[long-running-cluster]'' and ''[vps_hosts]'' groups in [[https://github.com/ceph/ceph-sepia-secrets/blob/master/ansible/inventory/sepia|ceph-sepia-secrets Ansible inventory]] to see what systems are used for. | + | |
+ | Some of them used to be used as [[services:vpshosts]] and [[services:longrunningcluster]], however they are nearly 10 years old and severely behind spec-wise. We're slowly phasing them out to make room for new hardware. The systems that remain are used as testnodes. A small subset are still in the [[services:longrunningcluster]]. | ||
===== Hardware Specs ===== | ===== Hardware Specs ===== | ||
| ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ | | ^ Count ^ Manufacturer ^ Model ^ Capacity ^ Notes ^ | ||
Line 12: | Line 11: | ||
^ RAM | 4 DIMMs | Kingston | 9965434-017.A00LF | 4GB | 16GB total. PC3-8500R DDR3-1066 REGISTERED ECC CL7 240 PIN | | ^ RAM | 4 DIMMs | Kingston | 9965434-017.A00LF | 4GB | 16GB total. PC3-8500R DDR3-1066 REGISTERED ECC CL7 240 PIN | | ||
^ HDD | 8x | WD/HGST | | 1TB | For VPSHOSTS and testnodes | | ^ HDD | 8x | WD/HGST | | 1TB | For VPSHOSTS and testnodes | | ||
- | ^ HDD | Asst | WD/HGST | | 1TB/4TB | LRC hosts have a mixture of 1TB and 4TB diskss= | | + | ^ HDD | Asst | WD/HGST | | 1TB/4TB | LRC hosts have a mixture of 1TB and 4TB disks | |
^ NIC | 2 ports | Intel | 82574L Gigabit Network Connection | 1Gb | | | ^ NIC | 2 ports | Intel | 82574L Gigabit Network Connection | 1Gb | | | ||
^ RAID | 1 | Areca | Mix of ARC-{1222,1880} | 8 disks | JBOD Mode | | ^ RAID | 1 | Areca | Mix of ARC-{1222,1880} | 8 disks | JBOD Mode | | ||
^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | | ^ BMC | 1 | Supermicro | N/A | N/A | Reachable at $host.ipmi.sepia.ceph.com | | ||
+ | |||
+ | ===== E-Waste ===== | ||
+ | As these machines age, they continue to MCE and lock up at higher rates. To make room for new LRC hosts, we've begun e-wasting miras. | ||
+ | |||
+ | ^ Hostname ^ Date E-Wasted ^ Ticket Number(s) ^ | ||
+ | | mira005 | | PNT0146880 | | ||
+ | | mira009 | | PNT0146880 | | ||
+ | | mira091 | | PNT0146880 | | ||
+ | | mira095 | | PNT0146880 | | ||
+ | | mira113 | | PNT0146880 | | ||
+ | | mira{030..039} | | PNT0766680 | | ||
===== Areca RAID Controllers ===== | ===== Areca RAID Controllers ===== | ||
==== Flashing Firmware ==== | ==== Flashing Firmware ==== | ||
+ | **UPDATE** This can be done now simply by running ''%%ansible-playbook firmware.yml --limit="miraXXX*" --tags="areca"%%'' | ||
+ | |||
The latest firmware for ARC-1222 controllers can be obtained from [[http://www.areca.us/support/download/RaidCards/BIOS_Firmware/ARC1212_1222.zip|here]]. | The latest firmware for ARC-1222 controllers can be obtained from [[http://www.areca.us/support/download/RaidCards/BIOS_Firmware/ARC1212_1222.zip|here]]. | ||
Line 49: | Line 61: | ||
cli64 sys beeper p=0 | cli64 sys beeper p=0 | ||
</code> | </code> | ||
+ | |||
+ | ===== Replacing failed/failing drives ===== | ||
+ | This process is a bit annoying. Depending on which order the HDD backplane is connected to the RAID controller, the order of drive bays on these machines will be: | ||
+ | |||
+ | <code> | ||
+ | 1 2 3 4 | ||
+ | 5 6 7 8 | ||
+ | |||
+ | OR | ||
+ | |||
+ | 5 6 7 8 | ||
+ | 1 2 3 4 | ||
+ | </code> | ||
+ | |||
+ | To add to the annoyingness, it's not possible to light up the red/failed LED manually on the drive sleds. So when working with the labs team, it's easiest to have the admin be in front of the machine and either light up the failed drive or light up drive 1 and have them count to the drive bay. | ||
+ | |||
+ | To light up a drive, I typically just do ''dd if=/dev/sda of=/dev/null'' if I want to light up drive 1. | ||
+ | |||
+ | If a drive just has failing sectors but is still readable, it's easiest to light up that drive (smart.sh will tell you which drive letter to use ''dd'' on). If the drive has completely failed, light up drive 1 (usually /dev/sda) and have the admin count up to it. |