Closed Bug 588873 Opened 15 years ago Closed 11 years ago

Firefox 4.0b Crash [@ NS_AccumulateFastLoadChecksum(unsigned int*, unsigned char const*, unsigned int, int) ]

Categories

(Core :: XPCOM, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- -

People

(Reporter: chofmann, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: not-ready)

Crash Data

Attachments

(1 file, 3 obsolete files)

we crested 400k user on b4 and starting to see some new signatures turn up. this one in fast load code. Correlation to startup or time of session 43 total crashes for NS_AccumulateFastLoadChecksum on 20100818-crashdata.csv 40 startup crashes inside 30 sec. 27 repeated crashes inside 3 min. of last crash Stacks look like http://crash-stats.mozilla.com/report/index/aa934437-6d71-4da5-8d60-8e6e82100819 Frame Module Signature [Expand] Source 0 xul.dll NS_AccumulateFastLoadChecksum xpcom/io/nsFastLoadFile.cpp:172 1 xul.dll nsFastLoadFileReader::ComputeChecksum xpcom/io/nsFastLoadFile.cpp:628 2 xul.dll nsFastLoadFileReader::Open xpcom/io/nsFastLoadFile.cpp:908 3 xul.dll NS_NewFastLoadFileReader xpcom/io/nsFastLoadFile.cpp:1167 4 xul.dll nsFastLoadService::NewInputStream xpcom/io/nsFastLoadService.cpp:171 5 xul.dll mozJSComponentLoader::StartFastLoad 6 xul.dll mozJSComponentLoader::GlobalForLocation js/src/xpconnect/loader/mozJSComponentLoader.cpp:1274 7 xul.dll mozJSComponentLoader::LoadModuleImpl js/src/xpconnect/loader/mozJSComponentLoader.cpp:793 8 xul.dll mozJSComponentLoader::LoadModule js/src/xpconnect/loader/mozJSComponentLoader.cpp:728 9 xul.dll nsComponentManagerImpl::KnownModule::Load xpcom/components/nsComponentManager.cpp:953 10 xul.dll nsFactoryEntry::GetFactory xpcom/components/nsComponentManager.cpp:1942 11 xul.dll nsComponentManagerImpl::CreateInstanceByContractID xpcom/components/nsComponentManager.cpp:1304 12 xul.dll nsComponentManagerImpl::GetServiceByContractID xpcom/components/nsComponentManager.cpp:1670 13 xul.dll CallGetService obj-firefox/xpcom/build/nsComponentManagerUtils.cpp:94 14 xul.dll nsCOMPtr_base::assign_from_gs_contractid_with_error obj-firefox/xpcom/build/nsCOMPtr.cpp:141 15 xul.dll nsAppStartupNotifier::Observe embedding/components/appstartup/src/nsAppStartupNotifier.cpp:100 16 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:3465 17 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:120 18 firefox.exe __tmainCRTStartup obj-firefox/memory/jemalloc/crtsrc/crtexe.c:591 19 kernel32.dll BaseProcessStart more at http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=NS_AccumulateFastLoadChecksum%28unsigned&date=08%2F19%2F2010%2011%3A21%3A00&range_value=1&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=NS_AccumulateFastLoadChecksum%28unsigned%20int*%2C%20unsigned%20char%20const*%2C%20unsigned%20int%2C%20int%29 all 289490 43 0.000148537 4.0b3 15642 31 0.00198184 4.0b2 2871 11 0.00383142 4.0b1 1802 1 0.000554939 os breakdown NS_AccumulateFastLoadChecksumTotal 43 Win5.1 0.91 Win6.0 0.02 Win6.1 0.07
blocking2.0: --- → ?
Keywords: crash, regression
OS: Mac OS X → Windows XP
Severity: normal → critical
I stared at this for a while, can't see what's causing it. The error means that we are somehow reading outside of the mmapped area, but given the existing bounds checks in there, I don't see how that can happen.
Depends on: 598416
Attached patch fix (obsolete) — Splinter Review
This should make this crash go away
Attachment #477697 - Flags: review?(bhsieh)
Comment on attachment 477697 [details] [diff] [review] fix Per IRC discussion, we're worried about allowing ComputeChecksum() to throw an exception. It'd be better to catch the exception inside the method, especially since it's exposed through the .idl
Attachment #477697 - Flags: review?(bhsieh) → review-
Assignee: nobody → tglek
blocking2.0: ? → betaN+
This is still happening in beta7 - lower on the list though ie: #200. I haven't found the signature in any of the latest crash reports on the trunk.
Attached patch use descriptor for checksum (obsolete) — Splinter Review
This adds exception_in_page guards to mmap-io and switches to descriptor io in ComputeChecksum which avoids this exception and provides faster startup performance.
Attachment #477697 - Attachment is obsolete: true
Attachment #494491 - Flags: review?
Attachment #494491 - Flags: review? → review?(ben.hsieh)
>+ if (rem) { >+ NS_AccumulateFastLoadChecksum(&checksum, >+ reinterpret_cast<PRUint8*>(buf), >+ rem, >+ PR_TRUE); Is this last block necessary after the loop? Doesn't seems like other callsites worry about remainder left from NS_AFLC(). >+#ifdef XP_WIN >+ nsAutoString name; >+ rv = mFile->GetPath(name); >+ NS_ENSURE_SUCCESS(rv, rv); >+ HANDLE winFD = ::CreateFileW(name.get(), GENERIC_READ, FILE_SHARE_READ, >+ NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL); >+ if (winFD == INVALID_HANDLE_VALUE) >+ return NS_ERROR_FAILURE; >+ fd = PR_ImportFile((PROsfd) winFD); >+#else How come OpenNSPRFileDesc doesn't work here? Maybe put a comment here. r+ with above
Attachment #494491 - Flags: review?(ben.hsieh) → review+
(In reply to comment #7) > >+ if (rem) { > >+ NS_AccumulateFastLoadChecksum(&checksum, > >+ reinterpret_cast<PRUint8*>(buf), > >+ rem, > >+ PR_TRUE); > Is this last block necessary after the loop? Doesn't seems like other callsites > worry about remainder left from NS_AFLC(). I brought back old reading logic here. Rest of the callsites don't have to worry about Read() underfill as they read in big enough chunks. > > >+#ifdef XP_WIN > >+ nsAutoString name; > >+ rv = mFile->GetPath(name); > >+ NS_ENSURE_SUCCESS(rv, rv); > >+ HANDLE winFD = ::CreateFileW(name.get(), GENERIC_READ, FILE_SHARE_READ, > >+ NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN, NULL); > >+ if (winFD == INVALID_HANDLE_VALUE) > >+ return NS_ERROR_FAILURE; > >+ fd = PR_ImportFile((PROsfd) winFD); > >+#else > How come OpenNSPRFileDesc doesn't work here? Maybe put a comment here. > > r+ with above Ok, FILE_FLAG_SEQUENTIAL_SCAN is special sauce that turns on aggressive readahead in windows
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 616704
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Based on the backouts, either this or bug 605672 caused bug 617048. (My money is on this one, but not all *that* strongly.) If you think it's not this one, feel free to try relanding *after* beta8 branches or is tagged, but make sure not to reland on the same day as bug 605672, and watch the crash stats the next day (e.g., on http://dbaron.org/mozilla/crashes-by-build ).
also looking at crash stats its not clear that the fix had impact in solving the original crash bug sort this query by build id to see b8pre crashes on dec 7 with build id 20101207030325 which should have been inside the range with the patch from comment 9 was landed and wend comment 10 backed out the patch. http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=NS_AccumulateFastLoadChecksum%28unsigned&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=NS_AccumulateFastLoadChecksum%28unsigned%20int*%2C%20unsigned%20char%20const*%2C%20unsigned%20int%2C%20int%29
(In reply to comment #12) > also looking at crash stats its not clear that the fix had impact in solving > the original crash bug > > sort this query by build id to see b8pre crashes on dec 7 with build id > 20101207030325 which should have been inside the range with the patch from > comment 9 was landed and wend comment 10 backed out the patch. > > http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=NS_AccumulateFastLoadChecksum%28unsigned&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=NS_AccumulateFastLoadChecksum%28unsigned%20int*%2C%20unsigned%20char%20const*%2C%20unsigned%20int%2C%20int%29 It did, but looks it did uncover another issue in the code. If look at the new crash, it's a different (and more rare) exception...the same one as present in firefox 3.6(ie before fasl was rewritten). I guess restoring the old algorithm also restored some old bug. I doubt that this caused bug 617048 because it changed a low level detail in how fasl is read, the resulting structures should be identical, with and without this patch.
The only thing that I can tell is wrong with the old code is using memcpy instead of memmove(memory shouldn't overlap in memcpy). Wonder if that could cause issues.
The code was also storing return value of PR_Read() in a PRUint32
Attachment #494491 - Attachment is obsolete: true
Attachment #497382 - Flags: review+
blocking2.0: betaN+ → -
Attachment #497382 - Flags: approval2.0?
Please don't carry over review flags, it makes bugs confusing.
Attachment #497382 - Flags: approval2.0? → approval2.0+
http://hg.mozilla.org/mozilla-central/rev/30e1cfee3545 hopefully my revised patch takes care of the crash, closing for now.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
had a += typo instead of = in the previous patch :(
Attachment #497382 - Attachment is obsolete: true
Attachment #505204 - Flags: approval2.0?
Comment on attachment 505204 [details] [diff] [review] use descriptor for checksum(fixed) This bounced once and I don't think it's critical for the release.
Attachment #505204 - Flags: approval2.0? → approval2.0-
Comment on attachment 497382 [details] [diff] [review] use descriptor for checksum(fixed) (bookkeeping)
Attachment #497382 - Flags: approval2.0+
I don't feel comfortable taking this on the cedar branch.
Whiteboard: not-ready
Crash Signature: [@ NS_AccumulateFastLoadChecksum(unsigned int*, unsigned char const*, unsigned int, int) ]
Assignee: tglek → nobody
This seems to not happen in versions past 6.0.x - have we fixed this in some other way?
Crash Signature: [@ NS_AccumulateFastLoadChecksum(unsigned int*, unsigned char const*, unsigned int, int) ] → [@ NS_AccumulateFastLoadChecksum(unsigned int*, unsigned char const*, unsigned int, int) ] [@ NS_AccumulateFastLoadChecksum ]
this code was removed.
Fixed by code removal per comment 24.
Status: REOPENED → RESOLVED
Closed: 15 years ago11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: