User Tools

Site Tools


hardware:mira

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
hardware:mira [2016/06/29 16:09]
dgalloway created
hardware:mira [2020/03/11 20:04] (current)
djgalloway [Summary]
Line 1: Line 1:
 ====== mira ====== ====== mira ======
 ===== Summary ===== ===== Summary =====
-We have approximately 124 mira hosts.  Some serve as [[services:​vpshosts]] hypervisors and some serve as OSD nodes in the [[services:​longrunningcluster]]. ​ The remaining nodes are used as baremetal testnodes. +We have around 100 mira hosts.
- +
-See the ''​[long-running-cluster]''​ and ''​[vps_hosts]''​ groups in [[https://​github.com/​ceph/​ceph-sepia-secrets/​blob/​master/​ansible/​inventory/​sepia|ceph-sepia-secrets Ansible inventory]] to see what systems are used for.+
  
 +Some of them used to be used as [[services:​vpshosts]] and [[services:​longrunningcluster]],​ however they are nearly 10 years old and severely behind spec-wise. ​ We're slowly phasing them out to make room for new hardware. ​ The systems that remain are used as testnodes. ​ A small subset are still in the [[services:​longrunningcluster]].
 ===== Hardware Specs ===== ===== Hardware Specs =====
-|            ^ Count     ​^ Manufacturer ​  ​^ Model                            ^ Capacity ​ ^ Notes                                                                                                           ^ +|            ^ Count    ^ Manufacturer ​ ^ Model                              ^ Capacity ​ ^ Notes                                                                                                           ^ 
-^ Chassis ​   | N/A       ​HP             ProLiant SL4540 Gen8             | N/A       ​| ​                                                                                                                | +^ Chassis ​   | N/A      Supermicro ​   ​2U Unmodel ​                        | N/A       ​| ​                                                                                                                | 
-^ Mainboard ​ | N/A       ​HP             Not Specified ​                   ​| N/A       ​| ​                                                                                                                | +^ Mainboard ​ | N/A      Supermicro ​   ​X8SIL                              ​| N/A       ​| ​                                                                                                                | 
-^ CPU        | 2         | Intel          | Xeon(R) CPU E5-2407 0 @ 2.20GHz  ​| N/A       | [[http://​ark.intel.com/​products/​64614/​Intel-Xeon-Processor-E5-2407-10M-Cache-2_20-GHz-6_40-GTs-Intel-QPI|ARK]] ​ | +^ CPU        | 1        ​| Intel         ​| Xeon(R) CPU X3440 @ 2.53GHz        ​| N/A       | [[http://​ark.intel.com/​products/​64614/​Intel-Xeon-Processor-E5-2407-10M-Cache-2_20-GHz-6_40-GTs-Intel-QPI|ARK]] ​ | 
-^ RAM        | 12 DIMMs  | Not Specified  ​Not Specified ​                   ​| 4GB       ​| ​48GB total                                                                                                      +^ RAM        | DIMMs  | Kingston ​     ​9965434-017.A00LF ​                 ​| 4GB       ​| ​16GB total.  PC3-8500R DDR3-1066 REGISTERED ECC CL7 240 PIN                                                     
-^ HDD        | 1x        ​HP             MM0500GBKAK ​                     ​500GB                                                                                                                     ​+^ HDD        | 8x       WD/​HGST ​                                         1TB       For VPSHOSTS and testnodes ​                                                                                     ​
-^ HDD        | 25x       HP             | MB3000GBKAC ​                     | 3TB       | Labelled as Logical Volumes in smartctl output ​                                                                 ​| +^ HDD        | Asst     WD/​HGST ​      |                                    1TB/​4TB ​  LRC hosts have a mixture of 1TB and 4TB disks                                                                   
-^ SSD        ​0         ​| ​               |                                  |           ​| ​                                                                                                                +^ NIC        | 2 ports  ​| Intel         ​82574L ​Gigabit Network Connection ​ | 1Gb       ​| ​                                                                                                                | 
-^ NIC        | 2         ​| Intel          I350 Gigabit Network Connection ​ | 1Gb       ​| ​                                                                                                                | +RAID       | 1        | Areca         | Mix of ARC-{1222,​1880} ​            8 disks   | JBOD Mode                                                                                                       
-NIC        ​| 1         ​| ​Mellanox ​      | MT27500 ​                         ​40Gbps? ​  | Not connected ​                                                                                                  +^ BMC        | 1        Supermicro ​   ​N/A                                ​N/A       Reachable at $host.ipmi.sepia.ceph.com ​                                                                         ​|
-^ BMC        | 1         ​HP             iLO                              ​          ​Firmware 1.13                                                                                                   |+
  
  
-These also have 10Gb SFP+ ports that are cabled but may need switch port configuration.  ​Probably useful to have before re-adding to cluster.+===== E-Waste ===== 
 +As these machines age, they continue to MCE and lock up at higher rates.  ​To make room for new LRC hosts, we've begun e-wasting miras.
  
-===== Quirks ===== +^ Hostname ​       ^ Date E-Wasted ​ ^ Ticket Number(s) ​ ^ 
-I was able to install Fedora23 on apama001 but still got Red Screen of Death and "​Illegal Opcode"​ after grub tries to boot the OS ​Something about rpm-based distros doesn'​t like searching for the root partition using UUID.+| mira005 ​        ​| ​               | PNT0146880 ​       | 
 +| mira009 ​        ​| ​               | PNT0146880 ​       | 
 +| mira091 ​        ​| ​               | PNT0146880 ​       | 
 +| mira095 ​        ​| ​               | PNT0146880 ​       | 
 +| mira113 ​        ​| ​               | PNT0146880 ​       | 
 +| mira{030..039}  |                | PNT0766680 ​       |
  
-What fixed it was finding the proper root drive and manually booting to it in a grub rescue prompt. ​ See below.+===== Areca RAID Controllers ===== 
 +==== Flashing Firmware ==== 
 +**UPDATE** This can be done now simply by running ''​%%ansible-playbook firmware.yml --limit="​miraXXX*"​ --tags="​areca"​%%''​
  
-<​code>​ +The latest firmware for ARC-1222 controllers can be obtained from [[http://​www.areca.us/​support/​download/​RaidCards/​BIOS_Firmware/​ARC1212_1222.zip|here]].
-grub> ls (hd0,​msdos1)/boot+
  
-vmlinuz-4.2.3-300.fc23.x86_64 System.map-4.2.3-300.fc23.x86_64 config-4. 2.3-300.fc23.x86_64 initramfs-4.2.3-300.fc23.x86_64.img initramfs-0-rescue-ba8d c4c42e5a4ec483a635961112dfd8.img vmlinuz-0-rescue-ba8dc4c42e5a4ec483a635961112d fd8 initrd-plymouth.img                                                         +The latest firmware for ARC-1880 controllers can be obtained from [[http://​www.areca.us/​support/​download/​RaidCards/​BIOS_Firmware/​ARC1880_1213_1223.zip|here]].
  
-grubset root=(hd0 <tab> +My process for flashing ARC-1222 firmware manually is below. ​ This assumes you've downloaded and extracted the firmware zip.  The same process can be used for other Areca controllers. ​ Just use the proper firmware BIN files. 
-Possible partitions are:+<code> 
 +scp /​home/​dgalloway/​BIOS/​areca/​ARC1212_1222/​ARC1212* ubuntu@$host.front.sepia.ceph.com:/​home/​ubuntu/​ 
 +ssh $host 
 +sudo -i 
 +for file in $(ls /​home/​ubuntu/​ARC1212*.BIN);​ do cli64 sys updatefw path=$file; done 
 +for file in $(ls /​home/​ubuntu/​ARC1212*.BIN);​ do rm $file; done 
 +</code>
  
-Device hd0: No known filesystem detected - Sector size 512B - Total size        488386584KiB+==== Other Common Tasks ==== 
 +**Erasing a RAID and setting controller to JBOD mode** 
 +<​code>​ 
 +cli64 set password=0000 
 +cli64 vsf delete vol=1 
 +cli64 rsf delete raid=1 
 +cli64 sys mode p=1 
 +</​code>​
  
-Partition hd0,msdos1: Filesystem type ext- Last modification time     ​2016-05-03 18:11:21 Tuesday, UUID 4390d2e3-3154-4855-8ad4-8417c430d982 -        Partition start at 1024KiB - Total size 488385536KiB ​                                                                                                           ​ +**Stop Beeper** 
-grub> set root=(hd0,msdos1) + 
-grub> linux /​boot/​vmlinuz-4.2.3-300.fc23.x86_64 root=/​dev/​sda1 +''​Parameter:​ <p=<0(mute)|1(disabled)|2(enabled)>>''​ 
-grubinitrd /​boot/​initramfs-4.2.3-300.fc23.x86_64.img +<code> 
-grubboot+cli64 set password=0000 
 +cli64 sys beeper p=0
 </​code>​ </​code>​
  
-For CentOS, this had to be modified ​a bit:+===== Replacing failed/​failing drives ===== 
 +This process is a bit annoying. ​ Depending on which order the HDD backplane is connected to the RAID controller, the order of drive bays on these machines will be:
  
 <​code>​ <​code>​
-grub> set root=(hd0,​msdos1) +1 2 4 
-grub> linux /vmlinuz-3.10.0-327.el7.x86_64 root=/​dev/​sdz3 console=ttyS1,​115200 +5 6 7 8 
-grub> initrd /​initramfs-3.10.0-327.el7.x86_64.img + 
-grub> boot+OR 
 + 
 +5 6 7 8 
 +1 2 3 4
 </​code>​ </​code>​
  
-There'​s ​probably a permanent fix that can be applied ​in grub.cfg but haven't looked into it.+To add to the annoyingness,​ it'​s ​not possible to light up the red/failed LED manually on the drive sleds. ​ So when working with the labs team, it's easiest to have the admin be in front of the machine and either light up the failed drive or light up drive 1 and have them count to the drive bay. 
 + 
 +To light up a drive, I typically just do ''​dd if=/dev/sda of=/​dev/​null''​ if I want to light up drive 1. 
 + 
 +If a drive just has failing sectors ​but is still readable, it's easiest to light up that drive (smart.sh will tell you which drive letter to use ''​dd''​ on).  If the drive has completely failed, light up drive 1 (usually /dev/sda) and have the admin count up to it.
hardware/mira.1467216584.txt.gz · Last modified: 2016/06/29 16:09 by dgalloway