Exadata dbnode ve cell node içerisindeki failed durumdaki disklerin yeniden kullanılabilir durumuna getirilmesi
Exadata konfigürasyonu gereği , failed , predictive failure gibi durumdaki disklerin reenable ile yeniden kullanılabilir yapılmasına izin verilmez. Bu, sistemin performansını olumsuz etkilenmemesi için gereklidir. Bazen disklerde hata olmamasına rağmen failed duruma getirilebilmektedir. Disklerin kontrolü sonrasında sorun olmadığı görülürse aşağıdaki yöntemle sorunlu olarak işaretlenen disk yeniden normal duruma getirilebilir.
Failed durumdaki bir diskin reenlable işlemine izin verilmesinin sağlanması için cellinit.ora dosyasında internal parametre olan “_cell_allow_reenable_predfail” parametresi “true” olarak ayarlanmalı ve sonrasında servisler yeniden başlatılmalıdır.
Dbnode ve cellnode tarafındaki işlemler aşağıdaki gibi olacaktır.
Dbnode için gerekli işlemler :
Öncelikle dbnode’lardaki sistem disklerinin kontrolü yapılarak , sorun olmadığı halde failed duruma geldiği doğrulanmalıdır. Aşağıdaki komutların çıktılarındaki Predictive Failure Count ve Last Predictive Failure Event Seq Number 0 (sıfır) ise Firmware state “Online, Spun Up” ise disklerle ilgili sorun olmadığı anlaşılabilir.
[root@exadb01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|predictive|firmware" Slot Number: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: A2A8 Slot Number: 1 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: A2A8 Slot Number: 2 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: A2A8 Slot Number: 3 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: A2A8 [root@exadb01 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot" Virtual Drive: 0 (Target Id: 0) State : Optimal Slot Number: 0 Firmware state: Online, Spun Up Foreign State: None Slot Number: 1 Firmware state: Online, Spun Up Foreign State: None Slot Number: 2 Firmware state: Online, Spun Up Foreign State: None Slot Number: 3 Firmware state: Online, Spun Up Foreign State: None
Yukarıdaki komutların çıktılarında farklı bir durum görülürse diskin değiştirilmesi gerekmektedir. Çıktılardan herşey normal görülürse , “_cell_allow_reenable_predfail” parametresini değiştirebiliriz. Kullanımda olan cellinit.ora dosyasının dizinini öğrenmek için öncelikler imaj versiyonunu tespit etmemiz gerekmektedir.
[root@exadb01 ~]# imageinfo -ver 12.2.1.1.1.170419
Yukarıdaki çıktıdan yola çıkarak , cellinit.ora dosyasının dizini /opt/oracle/cell12.2.1.1.1_LINUX.X64_170419/cellsrv/deploy/config şeklinde olacaktır. Bu dizindeki cellinit.ora dosyasına “_cell_allow_reenable_predfail=true” satırı eklenmelidir.
[root@exadb01 ~]# cd /opt/oracle/cell12.2.1.1.1_LINUX.X64_170419/cellsrv/deploy/config [root@exadb01 config]# vi cellinit.ora [root@exadb01 config]# cat cellinit.ora #CELL Initialization Parameters ipaddress1=192.168.10.7/22 _cell_allow_reenable_predfail=true
sonrasında tüm servislre yeniden başlatılmalıdır.
[root@exadb01 config]# dbmcli -e alter dbserver restart services all Stopping the RS and MS services... The SHUTDOWN of services was successful. Starting the RS and MS services... Getting the state of RS services... running Starting MS services... The STARTUP of MS services was successful.
Bu işlemlerden sonra artık failed durumdaki diski reenable ile yeniden kullanılabilir yapabiliriz.
İlk olarak failed durumdaki diskin id bilgisini bulmamız gerekiyor. İlgili dbnode üzerinden root ile ;
[root@exadb01 config]# dbmcli DBMCLI: Release - Production on Thu Jan 30 16:14:13 EET 2020 Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved. DBMCLI> list physicaldisk 252:0 DJNMBE normal 252:1 DJHW7E normal 252:2 DJ7T3E failed 252:3 DJ7TTE normal
Yukarıdaki çıktıdan reenable yapılacak diskin 252:2 numaralı disk olduğu görülmektedir. Bu disk, aşağıdaki gibi reenable yapılarak yeniden kullanılabilir hale getirilebilir. İşlemler sonrasında diskin senkron hale getirilmesi otomatik olarak başlatılacaktır.
DBMCLI> alter physicaldisk 252:2 reenable force Physical disk 252:2 was reenabled. DBMCLI> list physicaldisk 252:0 DJNMBE normal 252:1 DJHW7E normal 252:2 DJ7T3E normal 252:3 DJ7TTE normal DBMCLI> exit quitting
Cellnode için gerekli işlemler :
Cellnode üzerindeki işlemlerde benzer şekildedir. Sadece servislerin yeniden başlatılması cellinit.ora dosyasının dizini farklı olup, dbmcli yerine cellcli ile işlem yapılmalıdır.
[root@exacel05 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|predictive|firmware" Slot Number: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 1 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 2 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 3 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 4 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 5 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 6 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 7 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 8 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 9 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 10 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 Slot Number: 11 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B25 [root@exacel05 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot" Virtual Drive: 0 (Target Id: 0) State : Optimal Slot Number: 0 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 1 (Target Id: 1) State : Optimal Slot Number: 1 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 2 (Target Id: 2) State : Optimal Slot Number: 2 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 3 (Target Id: 3) State : Optimal Slot Number: 4 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 4 (Target Id: 4) State : Optimal Slot Number: 3 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 5 (Target Id: 5) State : Optimal Slot Number: 5 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 6 (Target Id: 6) State : Optimal Slot Number: 6 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 7 (Target Id: 7) State : Optimal Slot Number: 7 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 8 (Target Id: 8) State : Optimal Slot Number: 8 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 9 (Target Id: 9) State : Optimal Slot Number: 9 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 10 (Target Id: 10) State : Optimal Slot Number: 10 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 11 (Target Id: 11) State : Optimal Slot Number: 11 Firmware state: Online, Spun Up Foreign State: None [root@exacel05 ]# imageinfo -ver 12.2.1.1.1.170419 [root@exacel05 ]# cd /opt/oracle/cell12.2.1.1.1_LINUX.X64_170419/cellsrv/deploy/config [root@exacel05 config]# vi cellinit.ora [root@exacel05 config]# cat cellinit.ora #CELL Initialization Parameters bbuTempThreshold=60 ipaddress1=192.168.10.22/22 bbuChargeThreshold=800 _cell_allow_reenable_predfail=true [root@exacel05 config]# cellcli -e alter cell restart services all Stopping RS services... The SHUTDOWN of RS services was successful. Starting the RS services... Getting the state of RS services... running Restarting CELLSRV services... The RESTART of CELLSRV services was successful. Restarting MS services... The RESTART of MS services was successful. [root@exacel05 config]# cellcli CellCLI: Release 12.2.1.1.1 - Production on Fri Jan 31 10:21:18 EET 2020 Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved. CellCLI> list physicaldisk 20:0 E0MVDK normal 20:1 E0HVCK normal 20:2 E0HVEV failed 20:3 E6ECFR normal 20:4 E57ZDW normal 20:5 E0HV9H normal 20:6 E4GX7X normal 20:7 E0122V normal 20:8 E0HV3G normal 20:9 E60E6Q normal 20:10 E1DAFH normal 20:11 E1NM94 normal FLASH_1_0 1219M0DW59 normal FLASH_1_1 1039M04E3Y normal FLASH_1_2 1041M04JPP normal FLASH_1_3 1039M04E3X normal FLASH_2_0 1027M03LPP normal FLASH_2_1 1031M03YCF normal FLASH_2_2 1031M03XJ9 normal FLASH_2_3 1031M043EL normal FLASH_4_0 1122M09MGG normal FLASH_4_1 1122M09MGW normal FLASH_4_2 1122M09M2X normal FLASH_4_3 1122M09M2W normal FLASH_5_0 1031M041K7 normal FLASH_5_1 1031M040CN normal FLASH_5_2 1031M0400E normal FLASH_5_3 1031M03YPY normal CellCLI> alter physicaldisk 20:2 reenable force Physical disk 20:2 was reenabled. CellCLI> list physicaldisk 20:0 E0MVDK normal 20:1 E0HVCK normal 20:2 E0HVEV normal 20:3 E6ECFR normal 20:4 E57ZDW normal 20:5 E0HV9H normal 20:6 E4GX7X normal 20:7 E0122V normal 20:8 E0HV3G normal 20:9 E60E6Q normal 20:10 E1DAFH normal 20:11 E1NM94 normal FLASH_1_0 1219M0DW59 normal FLASH_1_1 1039M04E3Y normal FLASH_1_2 1041M04JPP normal FLASH_1_3 1039M04E3X normal FLASH_2_0 1027M03LPP normal FLASH_2_1 1031M03YCF normal FLASH_2_2 1031M03XJ9 normal FLASH_2_3 1031M043EL normal FLASH_4_0 1122M09MGG normal FLASH_4_1 1122M09MGW normal FLASH_4_2 1122M09M2X normal FLASH_4_3 1122M09M2W normal FLASH_5_0 1031M041K7 normal FLASH_5_1 1031M040CN normal FLASH_5_2 1031M0400E normal FLASH_5_3 1031M03YPY normal
Bazı durumlarda celldisk’lerin reenable işleminde aşağıdkai gibi hata alınabilir. Bu durumda disk değiştirilmelidir.
CELL-04608: Re-enabling physical disks was not successful:
CELL-04609: An error was encountered while re-enabling physical disk 20:1. Received error: CELL-04615: An error was encountered while re-enabling LUN on physical disk 20:1