|
Engineering Release Notice |
Component: | SAS_FW_Image |
Release Date: | 03-28-2007 |
OEM: | LSI |
Version: | SAS_FW_Image_APP-1.03.20-0225_MPT-01.18.74.00-IT_MPT2-01.18.74.00-IT_BB-R.2.3.12_BIOS-MT30_WEBBIOS-1.03-08_02_CTRLR-1.04-017A_2007_03_28 |
Package: | 5.1.1-0047 |
FW_MPT_1068 | 01.18.74.00-IT |
FW_SAS | 1.03.20-0225 |
FW_MPT_1068_b1 | 01.18.74.00-IT |
Component: | FW_MPT_1068 |
Stream: | FW_MPT_1068_Proj_Integration |
Version: | 01.18.74.00-IT |
Baseline From: | FW_MPT_1068_Release-MPTFW-01.18.73.00-IT-2007_01_26 |
Baseline To: | FW_MPT_1068_Release-MPTFW-01.18.74.00-IT-2007_03_26 |
LSID100066810 | (TASK) | Release MPT 01.18.74 |
LSID100066174 | (DFCT) | Unable to reliably flash dasiy chained encl. |
DFCT ID: | LSID100066174 |
Headline: | Unable to reliably flash dasiy chained encl. |
Description: | flashing enclosures through a 8480E SAS card
with varied configurations is unreliable. Flashing 1 is fine. Flashing 4 or 5 may or may not pass. The enclosure being populated or unpopulated does not seem to matter |
Version of Bug Reported: | 211 |
Steps to Reproduce: | The utility was provided by OEM I have attached the ISO image to this defect. The Encl need to be set to a particular OEM. Create a CD form image and then boot from CD and flash up and down. |
Child Tasks: | LSID100066810 |
Task ID: | LSID100066810 |
Headline: | Release MPT 01.18.74 |
Description: | MPT 01.18.74 code release |
State: | Open |
Change Set Files: | 0 |
References: | LSID100066174(DFCT) |
Component: | FW_SAS |
Stream: | SAS_1.0_Dev |
Version: | 1.03.20-0225 |
Baseline From: | FW_SAS_Release_Dobson-1.03.20-0220_2007_03_06 |
Baseline To: | FW_SAS_Release_Dobson-1.03.20-0225_2007_03_16 |
LSID100066361 | (TASK) | update version.c |
LSID100066139 | (TASK) | Limit Spinupdelay to maximum 15 for MPT |
LSID100066154 | (TASK) | Intermittent link failure causes HDD marked dead |
LSID100066364 | (TASK) | FW_SAS Release Version: 1.03.20-0225 |
LSID100065823 | (TASK) | FW_SAS Release Version: 1.03.21-0221 |
LSID100066143 | (TASK) | Flush Cache before making a Rebuilt drive online |
LSID100056527 | (DFCT) | HDD Spin-up setting values are not possible. |
LSID100065371 | (DFCT) | Stop error after HotRebuild |
LSID100065366 | (DFCT) | Intermittent link failure causes HDD marked dead |
DFCT ID: | LSID100056527 |
Customer DFCT No: | 12319699 |
Headline: | HDD Spin-up setting values are not possible. |
Description: | 06/27/06 still open with latest
FW 6/16/06: SCM 34550 is closed. Need to reopen if still an issue. 2/16/06: retest fail? Setting is don‚t care. 12/22/05: according to notes in scm 34550, this issue verified fixed and closed. retest FW71 12/15/05: restest with FW 69 failed – setting in WebBios seems to be don‚t care. 11/03/05: retest with .66 10/14/05: Pl. add support for Spinup delay. No of drives per spinup is working. Fixed in next FW 10/3/05: duplicated. Issue is with delay setting. 9/20/05: I can modify to HDD spin-up setting values by WebBIOS and GAM, but these settings are not possible. If I changed these parameters, spin-up has been started as follows: test 1. check the behaviour with controller default settings setting: - disk per spins: 2 - delay between spins: 6 sec behaviour: - spin-up for target 1, 2, 4 and 5 --> spin-up for target 0 and 3 - spin-up for 6 disks are completed within 12 seconds test 2. check the behaviour with setting changed. setting: - disk per spins: 1 - delay between spins: 30 sec behaviour: - spin-up for target 1, 2, 4 and 5 --> spin-up for target 0 and 3 - spin-up for 6 disks are completed within 12 seconds This behaviour is same as test No.1. Adaptor's behaviour should depend on settings. note: FW Drop .0055 has not this problem. |
Version of Bug Reported: | 96 |
Version of Bug Fixed: | 1.03.20-0225 |
Steps to Reproduce: | I can modify to HDD spin-up setting values
by WebBIOS and GAM, but these settings are not possible. If I changed these parameters, spin-up has been started as follows: test 1. check the behaviour with controller default settings setting: - disk per spins: 2 - delay between spins: 6 sec behaviour: - spin-up for target 1, 2, 4 and 5 --> spin-up for target 0 and 3 - spin-up for 6 disks are completed within 12 seconds test 2. check the behaviour with setting changed. setting: - disk per spins: 1 - delay between spins: 30 sec behaviour: - spin-up for target 1, 2, 4 and 5 --> spin-up for target 0 and 3 - spin-up for 6 disks are completed within 12 seconds This behaviour is same as test No.1. Adaptor's behaviour should depend on settings. note: FW Drop .0055 has not this problem. |
Resolution: | Fixed Indirectly |
Resolution Description: | The maximum value allowed for Spinup Delay is 15, as MPT allows only 4 bits in its spinup delay field. Earlier, the original 8 bit value from the utility was simply truncated and used in the 4 bit field. Now we set it to 15, the maximun 4 bit value, if the value from the utility is more than that. |
Customer Defect Track No: | 12319699 |
Customer List: | FSC -- FSC |
Child Tasks: | LSID100066139 |
DFCT ID: | LSID100065371 |
Customer DFCT No: | 520PR067 |
Headline: | Stop error after HotRebuild |
Description: | Stop error happens when reboot the system after performing hot rebuild using PCP |
Version of Bug Reported: | 1N41 |
Steps to Reproduce: | [Configuration] H/W: MegaRAID SCSI FW: 1N41 RAID: RAID 5 with 3 HDD OS: W2K DB: Oracle10G [Step] 1. "Offline" HDD at ID0 by PCP, then run HotRebuild 2. "Offline" HDD at ID1 by PCP, then run HotRebuild 3. "Offline" HDD at ID2 by PCP, then run HotRebuild 4. Reboot the system 5. Stop error happens at OS boot *No application is running. |
Customer Defect Track No: | 520PR067 |
Customer List: | NEC -- NEC |
Child Tasks: | LSID100066143 |
DFCT ID: | LSID100065366 |
Customer DFCT No: | HSA0086 |
Headline: | Intermittent link failure causes HDD marked dead |
Description: | Intermittent link failure causes HDD marked dead |
Version of Bug Reported: | 1.03.00-0177 |
Steps to Reproduce: | 1.Make RAID1 using Port.0, 1 SATA HDD. 2.Remove Port.1 HDD. 3.Re-insert removed drive very slowly to simulate "intermittent link failure". 4.It detect Link failure intermittently. And one Device Removed (Link Failure) event cause drive marked dead. [System] Platform (OS) Windows Server 2003 R2 x86 Processors Xeon 3.60GHz BIOS Phoenix BIOS 9IVDTH-E16 Memory 1GB Driver Name & Version Number Msas2k3.sys 1.21.0.32 Utility Name & Version Number MegaRAID Storage Manager 1.18-00 RAID Adapter-1 MegaRAID SAS 8308ELP & ROMB Series # MegaRAID SAS 8308ELP & ROMB Channels 8 Port BIOS MT28 FIRMWARE 1.03.00-0177 [PD] A C T Manufacturer Model FW Rev. Size 1 0 HGST KUROFUNE 500GB 1 1 HGST KUROFUNE 500GB [LD] Adapter Logical Array Size FST Vol. Name RAID SS WP RP CP VS E 1 RAID 1 500GB NTFS - 1 - WT RA DIO 500GB |
Resolution: | Fixed |
Resolution Description: | increase missing delay to 15 seconds. |
Customer Defect Track No: | HSA0086 |
Customer List: | Hitachi -- Hitachi |
Child Tasks: | LSID100066154 |
Task ID: | LSID100066361 |
Headline: | update version.c |
Description: | VER_MAINTENANCE_BOARD 3 |
State: | Open |
Change Set Files: | 0 |
References: |
Task ID: | LSID100066139 |
Headline: | Limit Spinupdelay to maximum 15 for MPT |
Description: | Defect 56527: HDD Spin-up setting values are
not possible. Analysis: ======= The Spinup delay is handled by the MPT chip, and in the interface of MPT, the spinupdelay value is of 4 bits, and hence the maximum value possible is 15. However, the FW was trying to directly assign the 8 bit value from utility to the 4 bit value which was thereby getting truncated. A value of 30 will get truncated to 14 and a value of 40 will get truncated to 8. Fix: === Limit the spinupdelay to 15, the maximum value possible, and so it will behave better. However, the user has to know the maximum possible value to set it appropriately. |
State: | Completed |
Change Set Files: | 0 |
References: | LSID100056527(DFCT) |
Task ID: | LSID100066154 |
Headline: | Intermittent link failure causes HDD marked dead |
Description: | Fix DF65366: If it is Hitach OEM, we are making MPT_DEVICE_PORT_MISSING_DELAY to 15 seconds and MPT_IO_DEVICE_MISSING_DELAY to 15 seconds. |
State: | Open |
Change Set Files: | 0 |
References: | LSID100065366(DFCT) |
Task ID: | LSID100066364 |
Headline: | FW_SAS Release Version: 1.03.20-0225 |
Description: | FW_SAS Release Version: 1.03.20-0225 |
State: | Open |
Change Set Files: | 0 |
References: |
Task ID: | LSID100065823 |
Headline: | FW_SAS Release Version: 1.03.21-0221 |
Description: | FW_SAS Release Version: 1.03.21-0221 |
State: | Completed |
Change Set Files: | 0 |
References: |
Task ID: | LSID100066143 |
Headline: | Flush Cache before making a Rebuilt drive online |
Description: | Issue Title: ========= Potential for data corruption after 2nd drive failure following completion of drive rebuild on RAID 5 Logical Drives configured with Write-Back caching Products Affected: =============== All LSI Megaraid Adapters, excluding Megaraid SAS 1.1 FW Background on RAID 5 Rebuilds =========================== A RAID 5 Logical Drive (LD) can survive single drive failures by maintaining redundant parity data across all drive members of the LD. When any single drive fails, missing data for the failed drive can be reconstructed as needed from the surviving drive members. The RAID adapter can later return the LD to full redundancy by performing a complete “rebuild” operation, which entails reconstructing the entire data set of the failed/missing drive onto a replacement drive. Returning the LD to full redundancy allows it to survive another (subsequent) single-drive failure. Issue Description ============== Megaraid performs rebuilds sequentially from start to finish, on a per-stripe-row basis. For a given row, missing data (or parity) for the rebuilding drive is reconstructed using a bitwise XOR operation of data read from the surviving drives. If none of the rebuilding drive‚s data is dirty in cache for a given row, the reconstructed data is immediately written to the rebuilding disk, making the data and parity consistent on all disks for that row. If dirty host data exists in cache for the rebuilding strip on a given row, special care is taken to only regenerate data that is not dirty, since dirty data represents newer data written by the host that supersedes existing (regenerated) data. After the rebuild logic has reconstructed all non-dirty sectors within a dirty strip, FW marks the entire strip as dirty, AND DOES NOT WRITE THE DATA IMMEDIATELY TO DISK. FW instead relies on the write-back flusher to later write the data to disk. Deferral of this write is necessary because the dirty host data for the rebuilding strip must be made consistent with the parity drive for that row, an operation implemented only in our write-back flusher. When the rebuild operation completes, the rebuilding drive is marked ONLINE, signaling to the user that the LD is fully-redundant and ready to survive another single-drive failure. Even though the rebuild is complete and the LD is marked consistent, Megaraid‚s cache may still contain a certain number of dirty rebuilt-lines that have yet to be flushed to disk. Until these lines are flushed, the data on the rebuilt disk for these dirty rows is undefined (not yet written) and inconsistent with parity. If the LD suffers a 2nd drive failure before these lines have been flushed, data on the 2nd failed drive for these rows will be unrecoverable because the rebuilt disk has not yet been made consistent with the parity, thus making reconstruction of the failed drive‚s data impossible for the affected dirty rows. Resolution ======== The fix for this issue is to flush the entire contents of the write-back cache following the completion of a rebuild, waiting for this flush to complete before allowing the drive‚s state to be changed from REBUILD to ONLINE. Workaround ========== In lieu of corrected FW there is an alternate procedure available to users that will ensure the flushing of all dirty data following completion of a rebuild. After a rebuild has completed, the user should switch the write caching mode of the LD from Write Back to Write Through, which will trigger flushing of all dirty data for the LD. The user can then immediately switch the write mode back to Write Back. Probability of Data Loss =================== This issue affects only RAID 5 LDs configured with Write-Back caching, since Write-Through LDs never contain dirty cache data. The probability of data loss resulting from this issue correlates to the volume of dirty REBUILT data remaining in cache after completion of the rebuild, in relation to the probability of experiencing a 2nd drive failure before the dirty data has been flushed to disk. There are specific factors which affect both of these probabilities. Probability of Dirty Data =================== The likelihood of the rebuild process encountering dirty data for any given row depends on the level of write activity from the host, the amount of adapter memory available for write caching on the specific LD, the user-configured write flush time, and the speed at which the adapter is able to flush dirty data to disk. Aside from these factors, there is a specific I/O load scenario which produces the highest probability of dirty data. In situations in which the host is continually (re)writing to primarily a small subset of blocks within the LD, the cache lines associated with those blocks will remain perpetually dirty in cache. The reasons are two fold: 1) if the amount of unique data written by the host fits entirely in cache, the cache will never experience the forced flushing needed to make room for new, unique data, and 2) whenever a write request is received from the host, Megaraid resets the interval flush timer, thus the typical periodic “sweep-flush” will not be triggered. In this scenario we have observed dirty data remaining in cache for very durations, sometimes exceeding 10 minutes. Probability of 2nd drive failure ======================= The likelihood of a 2nd drive failure in the interim between a completed rebuild and the flushing of dirty rebuilt cache data is very small and is dependent on the number of disks in the LD in relation to the MTBF of those disks. A more likely scenario leading to 2nd drive unavailability is the prospect of a user-initiated manual “copy back” operation. In Megaraid configurations containing hotspares, the failed drive‚s data is automatically rebuilt onto an available hotspare drive .Once this rebuild is complete, some users are inclined to relocate the rebuilt data back into the failing drive‚s “slot” within the enclosure. This can be accomplished by shutting down the system and moving the hotspare into the slot occupied by the failed drive. To avoid the need for a shutdown, some users will replace the failed drive with a fresh drive, then manually fail the hotspare drive after its rebuild has completed, triggering a rebuild operation onto the fresh drive placed in the original failed slot. |
State: | Completed |
Change Set Files: | 0 |
References: | LSID100065371(DFCT) |
Component: | FW_MPT_1068_b1 |
Stream: | FW_MPT_1068_B1_Integration |
Version: | 01.18.74.00-IT |
Baseline From: | FW_MPT_1068_b1_Release-MPTFW-01.18.73.00-IT-2007_01_26 |
Baseline To: | FW_MPT_1068_b1_Release-MPTFW-01.18.74.00-IT-2007_03_26 |
LSID100066818 | (TASK) | Release MPT 01.18.74 for B1 |
Task ID: | LSID100066818 |
Headline: | Release MPT 01.18.74 for B1 |
Description: | MPT 01.18.74 code release |
State: | Open |
Change Set Files: | 0 |
References: |