User Tools

Site Tools


hardware:mira

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hardware:mira [2016/06/29 16:48]
dgalloway [Hardware Specs]
hardware:mira [2020/03/11 20:04] (current)
djgalloway [Summary]
Line 1: Line 1:
 ====== mira ====== ====== mira ======
 ===== Summary ===== ===== Summary =====
-We have approximately 124 mira hosts.  Some serve as [[services:​vpshosts]] hypervisors and some serve as OSD nodes in the [[services:​longrunningcluster]]. ​ The remaining nodes are used as baremetal testnodes. +We have around 100 mira hosts.
- +
-See the ''​[long-running-cluster]''​ and ''​[vps_hosts]''​ groups in [[https://​github.com/​ceph/​ceph-sepia-secrets/​blob/​master/​ansible/​inventory/​sepia|ceph-sepia-secrets Ansible inventory]] to see what systems are used for.+
  
 +Some of them used to be used as [[services:​vpshosts]] and [[services:​longrunningcluster]],​ however they are nearly 10 years old and severely behind spec-wise. ​ We're slowly phasing them out to make room for new hardware. ​ The systems that remain are used as testnodes. ​ A small subset are still in the [[services:​longrunningcluster]].
 ===== Hardware Specs ===== ===== Hardware Specs =====
 |            ^ Count    ^ Manufacturer ​ ^ Model                              ^ Capacity ​ ^ Notes                                                                                                           ^ |            ^ Count    ^ Manufacturer ​ ^ Model                              ^ Capacity ​ ^ Notes                                                                                                           ^
Line 12: Line 11:
 ^ RAM        | 4 DIMMs  | Kingston ​     | 9965434-017.A00LF ​                 | 4GB       | 16GB total. ​ PC3-8500R DDR3-1066 REGISTERED ECC CL7 240 PIN                                                     | ^ RAM        | 4 DIMMs  | Kingston ​     | 9965434-017.A00LF ​                 | 4GB       | 16GB total. ​ PC3-8500R DDR3-1066 REGISTERED ECC CL7 240 PIN                                                     |
 ^ HDD        | 8x       | WD/​HGST ​      ​| ​                                   | 1TB       | For VPSHOSTS and testnodes ​                                                                                     | ^ HDD        | 8x       | WD/​HGST ​      ​| ​                                   | 1TB       | For VPSHOSTS and testnodes ​                                                                                     |
-^ HDD        | Asst     | WD/​HGST ​      ​| ​                                   | 1TB/​4TB ​  | LRC hosts have a mixture of 1TB and 4TB diskss= ​                                                                |+^ HDD        | Asst     | WD/​HGST ​      ​| ​                                   | 1TB/​4TB ​  | LRC hosts have a mixture of 1TB and 4TB disks                                                                   |
 ^ NIC        | 2 ports  | Intel         | 82574L Gigabit Network Connection ​ | 1Gb       ​| ​                                                                                                                | ^ NIC        | 2 ports  | Intel         | 82574L Gigabit Network Connection ​ | 1Gb       ​| ​                                                                                                                |
 ^ RAID       | 1        | Areca         | Mix of ARC-{1222,​1880} ​            | 8 disks   | JBOD Mode                                                                                                       | ^ RAID       | 1        | Areca         | Mix of ARC-{1222,​1880} ​            | 8 disks   | JBOD Mode                                                                                                       |
Line 18: Line 17:
  
  
-===== Quirks ​===== +===== E-Waste ​===== 
-I was able to install Fedora23 on apama001 but still got Red Screen of Death and "​Illegal Opcode"​ after grub tries to boot the OS.  ​Something about rpm-based distros doesn'​t like searching ​for the root partition using UUID.+As these machines age, they continue ​to MCE and lock up at higher rates.  ​To make room for new LRC hosts, we've begun e-wasting miras.
  
-What fixed it was finding the proper root drive and manually booting to it in a grub rescue prompt. ​ See below.+^ Hostname ​       ^ Date E-Wasted ​ ^ Ticket Number(s) ​ ^ 
 +| mira005 ​        ​| ​               | PNT0146880 ​       | 
 +| mira009 ​        ​| ​               | PNT0146880 ​       | 
 +| mira091 ​        ​| ​               | PNT0146880 ​       | 
 +| mira095 ​        ​| ​               | PNT0146880 ​       | 
 +| mira113 ​        ​| ​               | PNT0146880 ​       | 
 +| mira{030..039}  |                | PNT0766680 ​       |
  
-<​code>​ +===== Areca RAID Controllers ===== 
-grub> ls (hd0,​msdos1)/​boot+==== Flashing Firmware ==== 
 +**UPDATE** This can be done now simply by running ''​%%ansible-playbook firmware.yml --limit="​miraXXX*"​ --tags="​areca"​%%''​
  
-vmlinuz-4.2.3-300.fc23.x86_64 System.map-4.2.3-300.fc23.x86_64 config-4. 2.3-300.fc23.x86_64 initramfs-4.2.3-300.fc23.x86_64.img initramfs-0-rescue-ba8d c4c42e5a4ec483a635961112dfd8.img vmlinuz-0-rescue-ba8dc4c42e5a4ec483a635961112d fd8 initrd-plymouth.img                                                         +The latest firmware for ARC-1222 controllers can be obtained from [[http://​www.areca.us/​support/​download/​RaidCards/​BIOS_Firmware/​ARC1212_1222.zip|here]].
  
-grub> set root=(hd0 <​tab>​ +The latest firmware for ARC-1880 controllers can be obtained from [[http://​www.areca.us/​support/​download/​RaidCards/​BIOS_Firmware/​ARC1880_1213_1223.zip|here]].
-Possible partitions are:+
  
-Device hd0: No known filesystem detected ​Sector size 512B Total size        488386584KiB+My process for flashing ARC-1222 firmware manually is below. ​ This assumes you've downloaded and extracted the firmware zip.  The same process can be used for other Areca controllers. ​ Just use the proper firmware BIN files. 
 +<​code>​ 
 +scp /​home/​dgalloway/​BIOS/​areca/​ARC1212_1222/​ARC1212* ubuntu@$host.front.sepia.ceph.com:/​home/​ubuntu/​ 
 +ssh $host 
 +sudo -
 +for file in $(ls /​home/​ubuntu/​ARC1212*.BIN);​ do cli64 sys updatefw path=$file; done 
 +for file in $(ls /​home/​ubuntu/​ARC1212*.BIN);​ do rm $file; done 
 +</​code>​
  
-Partition hd0,msdos1: Filesystem type ext- Last modification time     ​2016-05-03 18:11:21 Tuesday, UUID 4390d2e3-3154-4855-8ad4-8417c430d982 -        Partition start at 1024KiB - Total size 488385536KiB ​                                                                                                           ​ +==== Other Common Tasks ==== 
-grub> set root=(hd0,​msdos1) +**Erasing a RAID and setting controller to JBOD mode*
-grub> linux /​boot/​vmlinuz-4.2.3-300.fc23.x86_64 root=/dev/sda1 +<code> 
-grub> initrd /​boot/​initramfs-4.2.3-300.fc23.x86_64.img +cli64 set password=0000 
-grub> boot+cli64 vsf delete vol=1 
 +cli64 rsf delete raid=1 
 +cli64 sys mode p=1
 </​code>​ </​code>​
  
-For CentOS, this had to be modified a bit:+**Stop Beeper**
  
 +''​Parameter:​ <​p=<​0(mute)|1(disabled)|2(enabled)>>''​
 <​code>​ <​code>​
-grub> ​set root=(hd0,​msdos1) +cli64 set password=0000 
-grub> linux /​vmlinuz-3.10.0-327.el7.x86_64 root=/dev/sdz3 console=ttyS1,​115200 +cli64 sys beeper p=0
-grub> initrd /​initramfs-3.10.0-327.el7.x86_64.img +
-grub> boot+
 </​code>​ </​code>​
  
-There'​s ​probably a permanent fix that can be applied ​in grub.cfg but haven't looked into it.+===== Replacing failed/​failing drives ===== 
 +This process is a bit annoying. ​ Depending on which order the HDD backplane is connected to the RAID controller, the order of drive bays on these machines will be: 
 + 
 +<​code>​ 
 +1 2 3 4 
 +5 6 7 8 
 + 
 +OR 
 + 
 +5 6 7 8 
 +1 2 3 4 
 +</​code>​ 
 + 
 +To add to the annoyingness,​ it'​s ​not possible to light up the red/failed LED manually on the drive sleds. ​ So when working with the labs team, it's easiest to have the admin be in front of the machine and either light up the failed drive or light up drive 1 and have them count to the drive bay. 
 + 
 +To light up a drive, I typically just do ''​dd if=/dev/sda of=/​dev/​null''​ if I want to light up drive 1. 
 + 
 +If a drive just has failing sectors ​but is still readable, it's easiest to light up that drive (smart.sh will tell you which drive letter to use ''​dd''​ on).  If the drive has completely failed, light up drive 1 (usually /dev/sda) and have the admin count up to it.
hardware/mira.1467218924.txt.gz · Last modified: 2016/06/29 16:48 by dgalloway