Closed Bug 189778 Opened 23 years ago Closed 23 years ago

Mozilla crash breaks Promise controller IDE mirror.

Categories

(Core :: Networking: Cache, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED WORKSFORME

People

(Reporter: mjennings.usa, Assigned: gordon)

Details

(Keywords: crash, stackwanted)

Attachments

(2 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20021216 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20021216 This extremely serious failure has happened twice, once while typing into a browser window (version 1.2.1), and another time while trying to access a Mozilla Mail spellchecker that is apparently not installed correctly (version 1.0.2). Each time the Promise Technology RAID controller hard disk mirror was broken, and became "critical". Normally this would mean that there was a hard drive failure. However, in both cases, there was nothing wrong with the hard drives. Apparently the Mozilla crash is destroying information about the mirror stored by the Promise FastTrak TX2 controller at the beginning hidden sectors of the drives. In both cases all instances of Mozilla Mail and Mozilla browser crashed. The crash is caused by rapid keyboard or mouse input, apparently. Then there is intense disk access. Then the TalkBack application appears. (See the TalkBack report on Sunday, January 19, 2003. It is the only one with this email address: Microsoft-BUGS at myrealbox dot com. The crash reported then, Moz 1.0.2, was the second that destroyed the Promise Technology RAID mirror.) In both cases there were several instances of Mozilla browser open with several tabs in each. In both cases the Mozilla Mail application was running and there was one or more email messages being composed. (The user has numerous duties that often require switching to another subject before a former one is resolved.) In both cases the Mozilla crash immediately preceded the FastCheck monitoring utility reporting that the RAID mirror was critical. Operating system: Windows XP with SP1. No other problems except for the normal quirkiness of Windows XP. The controller is configured with two 40 GB Western Digital hard drives. The driver date is 06/11/2002. The driver version is 2.0.0.26. The controller is installed in PCI Slot 2 (PCI bus 1, device 10, function 0). Controller Info: IRQ: 9 BusMaster Base: 0xDF90, ROM Base Addr: 0xFD9E0000, Hardware Type: FastTrak-100 TX/LP (6268/6270) Note: Contrary to what is reported by the controller utility above, the controller is a FastTrak-100 TX2. The motherboard is an Intel 815EEA2 with an 866 MHz Pentium 3 processor. Promise Technology http://www.promise.com/ Promise Technology FastTrak100 TX2 RAID controller http://www.promise.com/support/download/download2_eng.asp?productId=8&category=All&os=100 Reproducible: Couldn't Reproduce Steps to Reproduce: This is a major crash. I don't know how to make it happen other than stressing Mozilla with many instances open. (The crash happens regularly then, but these are the only times the crash broke the mirror.
Could you give the precise talkback ID? (run talkback.exe in mozilla/components/)
URL: Any. Any.
This JPEG image shows the TalkBack IDs of all the crashes of Mozilla that were logged. The latest crash shown (Sunday, January 19, 2003) was the second of the two crashes that destroyed the Promise Technology RAID mirror. The first crash that destroyed the RAID mirror caused an intense amount of disk access that continued for more than 30 seconds. The user, fearing a virus infection, turned off the power. There was therefore no TalkBack report. Other than the mirror being broken, no data seems to have been lost in the second crash that destroyed the RAID mirror. During the first crash that affected the RAID mirror, all files and new folders created over two days were lost.
Latest Talkback incident: TB16361837M. This is the second crash that destroyed the RAID mirror. [It is unfortunate that the TalkBack incident numbers must be laboriously copied by hand. They cannot be selected and copied to the clipboard.] See the attached .JPG image file which shows all the Mozilla crashes that have been logged by TalkBack. The JPEG image only shows the TalkBack IDs of the crashes of Mozilla that were logged. The latest crash shown (Sunday, January 19, 2003) was the second of the two crashes that destroyed the Promise Technology RAID mirror. The first crash that destroyed the RAID mirror caused an intense amount of disk access that continued for more than 30 seconds. The user, fearing a virus infection, turned off the power. There was therefore no TalkBack report. Other than the mirror being broken, no data seems to have been lost in the second crash that destroyed the RAID mirror. During the first crash that affected the RAID mirror, all files and new folders created over two days were lost.
I can't possibly see how Mozilla can cause a RAID array to go critical.... Maybe Mozilla is crashing as a *result* of the array going critical? Stack should help
Keywords: crash, stackwanted
my guess :)
Assignee: asa → gordon
URL: Any.
Component: Browser-General → Networking: Cache
QA Contact: asa → tever
silly mid-air collissions... :)
Whiteboard: TB16361837M
WFM 20030120 Linux with a Promise FasTrak 100. I would try it under XP but i don't want my mirror gets killed. ;) Comment #4: >Maybe Mozilla is crashing as a *result* of the array going critical? I think this is correct.
I'm not going to dupe this myself, but this looks ery much like bug 185251 the spell check error. The stack in the talkback instance doesn't have line numbers, but the pattern is the same, this one ends with the spellchk.dll. spellchk.dll + 0x126a (0x0628126a) spellchk.dll + 0x102a (0x0628102a) xpcom.dll + 0x3120a (0x6118120a) composer.dll + 0x73af (0x601373af) xpcom.dll + 0x377c1 (0x611877c1) xpc3250.dll + 0x12634 (0x60d62634) xpc3250.dll + 0x15b87 (0x60d65b87) The odd thing, is that I don't see how such a crash could cause a driver to fail. The driver should be protected. But I've seen stranger things. Just for good measure, if you haven't already, check to see if there's a new version of the driver. Also odd, is that the TB listed a jmp instruction as the crash site. This is somewhat unusual, the address looked ordinary. It's possible that under stress you're system is haveing problems. When the jmp was executed, it may have failed to swap in the code due to a disk error and thus generated the exception in Mozilla code.
I meant to strike the first part of the message before I posted, but forgot. I really don't think this is a spell check issue, given the jmp instruction crash. My money is on a flaky driver/controller/memory causing code to fail to be paged back in.
First, the problem would not occur if Mozilla did not crash. Or, if a RAID mirror failure did occur, there would be no suspicion it was caused by Mozilla if Mozilla did not crash at the same time that the break of the mirror occurred. At present, the crashing causes us not to want to use Mozilla on all our machines. It's a serious issue. The crashing seems to happen because keyboard or mouse input is overloaded somehow. We have plenty of hard drives, computers, and controllers here. We can run any tests. The biggest cause of hardware failure is bad contacts. This problem is eliminated from consideration because the contacts had been renewed a few days before. Contact renewal is accomplished by pulling all cards and cables out about a millimeter and pushing them back on again. There does not seem to be a hardware failure. Both drives have passed Western Digital's Diagnostic Quick Check. (W.D. is the manufacturer of the drives.) One of the drives of the mirror passed W.D.'s Rigorous check. The other drive of the mirror booted perfectly and is being used to write this email message. Promise Technology Technical Support says they know of no instances in which software is causing a break of a mirror. But, it is possible if something writes to the hidden sectors on the hard drive, where the mirror information is stored. The latest drivers were being used before the crash. The RAID controller BIOS was upgraded from 2.00.0.2 to 2.00.0.24 today.
Ideally with a properly written driver, software running under "user" should never cause a fault in a driver even in a crashing situation. The instruction that caused the failure is jmp and it is to a hard coded address: 60437524 e875260000 call 60439b9e 60437529 e95c020000 jmp 6043778a 6043752e 817f042d010000 cmp dword ptr [edi+0x4],0x12d For windows to fault on the jmp instruction one of two things had to happen. Either the memory was swapped out and and it was unable to read it from disk, or CPU had trouble reading the memory itself. Given the close proximity of the addresses, my bet would be with the CPU unable to read that address in memory. I'd also check your event log to see if you have any errors in there that might help identify the source of the problems.
Running Memtest for more than 2 hours showed no errors. Thanks to Will Dormann for suggesting Memtest. There is every indication that the hardware is healthy.
This bug report includes a .JPG showing numerous crashes. Is anyone else reporting crashes of Mozilla? David Bradley said, "The instruction that caused the failure is jmp and it is to a hard coded address: 60437524 e875260000 call 60439b9e 60437529 e95c020000 jmp 6043778a 6043752e 817f042d010000 cmp dword ptr [edi+0x4],0x12d" Is there any easy way to find what is loaded there?
Additional Comment #13 should have said, "I don't see any similar crashes in the bug reports. Does anyone know of one?" The event log showed nothing near the time of the crash. There may be bad drivers. There are many unsigned drivers. I can try another system. The problem is reproducible on this system. If I open 20 instances of Mozilla, each with 5 to 20 tabs, there is (virtual memory) disk access that seems unreasonable and disfunctional. The system becomes less responsive to keyboard input. The jmp instruction target is Hex 6043778a. This is decimal 1,615,034,250. There is 256 MB in the system.
The address is just an address and no indication of how much memory may be being used. Stats at time of crash: Virtual memory: 215,834,624 Working Set: 110,997,504 Peak Working Set: 120541184 Also note, virtual memory is not a great statistic, as you can allocate 1gig on a system with 128megs and 128meg swap file and it will return just fine. Address space is not the same as memory usage. If Talkback is to be believed, the fail occured on the jump, and that means for whatever reason the CPU wasn't able to jump to that location. It's not impossible that maybe Talkback didn't record the data properly. The other incidents listed no longer exist in the talkback database
One last thing, is the spellchk.dll supposed to work with Mozilla 1.0.2? I thought it was only out for 1.3a or something like that. Still seems strange that if this was the cause it would crash at a jmp instruction.
If the Jmp instruction is real, it is a location in virtual memory. There is the possibility that the Jmp instruction is just data being interpreted as instructions. I am preparing to do a thorough test of Mozilla 1.2.1 on a machine with a clean installation of Windows XP and Mozilla. If the problem does not occur on the new machine, it seems reasonable to believe that it is caused by a faulty driver. Certainly I stress Mozilla considerably. I often open 10 or more instances of Mozilla, each with 5 or more tabs. The crash only occurs under these conditions, and during hurried input. Doesn't TalkBack store the crash information on the user's computer? I am surprised that TalkBack would throw information away. (Programmers have difficulty loving themselves, it seems.)
>If the Jmp instruction is real, it is a location in virtual memory. >There is the possibility that the Jmp instruction is just data being interpreted >as instructions. I've seen this, where a function returns to some erroneous place because the stack was trashed. I dismissed that possibility because the assembler for several instructions before and after seemed reasonable. I'm posting the info from talk back in case anyone else might have ideas. The stack isn't all that useful, why I didn't post it originally, but the other stuff might spark some ideas.
There appears to be an updated driver at: http://www.promise.com/support/download/download2_eng.asp?productId=8&category=All&os=100 Reporter could you please install it? If you already have installed it, could you please reinstall your driver? This smells like a driver issue.
To Andrew Hagen: The system had the latest Promise Technology driver installed at the time of the crash. Re-installing the driver made no difference in the file date. I will do a new and thorough test using a clean install of Windows XP on another machine, using the latest driver. It seems reasonable to guess that this is a bug in the Windows XP virtual memory, but that is only one of several plausible guesses.
Mozilla Bugzilla Bugzilla Bug 189778 http://bugzilla.mozilla.org/show_bug.cgi?id=189778 I seem to have found the problem. Thanks to Gary Hegan on the microsoft.public.windowsxp.general newsgroup, I found that the Intel Application Accelerator software was failing. I un-installed this software that is supposed to be used with Intel motherboards, and the problem dsappeared. The problem caused hard disk channel parity errors that were listed in Windows XP's System Event Viewer. These caused Mozilla to crash, but no other obvious problems: 1) A parity error was detected on \Device\Ide\IdeChnDr0. 2) An error was detected on device \Device\Harddisk0\D during a paging operation. At first, I did not think to look in the System Event log, because I saw no other problems than those with Mozilla. The Intel Application Accelerator is MUCH worse than Intel says. I found that Intel technical support is very poorly trained. They have very little knowledge of major issues, not just this one. Uninstalling Intel Application Accelerator fixed the problem, and seems to have had NO bad effects. The system is faster than before. The normal drivers seem to be fine. There is no need for an "Application Accelerator", which is a bad name for a hard disk driver. The Intel Application Accelerator is VERY trouble-prone: Intel(R) Application Accelerator - Top Technical Issues http://www.intel.com/support/chipsets/iaa/ Intel Application Accelerator Known Compatibility Issues http://www.intel.com/support/chipsets/iaa/compat.htm ftp://download.intel.com/support/chipsets/iaa/iaa_compat2.pdf If you move a hard drive to another computer, there are problems: http://www.intel.com/support/chipsets/iaa/harddrive.htm (Note broken link.) If you upgrade to Windows XP, there are problems: http://www.intel.com/support/chipsets/iaa/xptable.htm http://www.intel.com/support/chipsets/iaa/tti001.htm This web page: http://www.intel.com/support/chipsets/iaa/ident.htm has only one purpose: To link to this web page: http://www.intel.com/support/chipsets/iaa/reasons.htm Humorous Microsoft Support article: Device Settings Are Hard to Find in Windows XP: http://support.microsoft.com/default.aspx?scid=kb;en-us;Q310751 The Intel Application Accelerator won't work with some Intel motherboards: http://support.intel.com/support/chipsets/iaa/sb/CS-001410-prd663.htm The Intel Application Accelerator won't work with some Intel motherboards: http://support.intel.com/support/chipsets/iaa/sb/CS-001448-prd663.htm Windows XP System Event Viewer: start C:\WINDOWS\SYSTEM32\eventvwr.msc __________________ microsoft.public.windowsxp.general Re: Hard Disk: Parity errors, unusual Disk activity Subject: Intel Application Accelerator failure Gary, You are correct. Wow, I lost 20 hours to Intel being flaky. I NEVER would have found this without your help. Michael ________________________________________________________________________________ Gary Hegan wrote: > It is an error caused by Intel's Application Accelerator being installed. I > had the same problem. I removed it and no longer get the warnings. > > -- > Regards > (-: Gary Hegan :-) > > > > > Michael Jennings wrote in message news: e4KMEI8xCHA.1420@TK2MSFTNGP12... > >>I have been receiving these messages in System Event Viewer: >> >>A parity error was detected on \Device\Ide\IdeChnDr0. >> >>An error was detected on device \Device\Harddisk0\D during a paging > > operation. > >>This is on an Intel D815EEA2 motherboard. This is the second D815EEA2 >>motherboard that has failed with IDE controller problems in the last two >>months. It seems very unlikely that two identical motherboards would fail > > in > >>an identical way, after years of error-free use. >> >>The hard disks are Western Digital WD400BB, 40 Gigabytes. W.D. Diagnostics >>show they are error free. >> >>Question #1: Is this a real error, or an error in Windows XP? >> >>Question #2: Is this error specific to drive D? Is that what the second > > error > >>message says? >> >> >>_______________________________________ >> >> >>System Event Viewer: >>start C:\WINDOWS\SYSTEM32\eventvwr.msc >> > > >
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
That's good news. Thank you for posting that informative follow-up.
V. WFM
Status: RESOLVED → VERIFIED
Attachment #9402764 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: